Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 0 additions & 20 deletions 02_Using_the_LUMI_web_interface/Clone_with_JupyterLab.md

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
"outputs": [],
"source": [
"import os\n",
"os.environ[\"HF_HOME\"] = \"/flash/project_465002178/hf-cache\""
"os.environ[\"HF_HOME\"] = \"/flash/project_465002757/hf-cache\""
]
},
{
Expand Down
16 changes: 9 additions & 7 deletions 02_Using_the_LUMI_web_interface/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,28 +7,30 @@
In this exercise you will gain first experience with using the LUMI web interface to navigate files and directories on the LUMI supercomputer. You will also set up your own copy of the exercise repository on the system, so that you can work on them without interfering with the other course participants.

1. Log in to the LUMI web interface: https://www.lumi.csc.fi
2. Create your own subdirectory in `/project/project_465002178/` and `/scratch/project_465002178/`. Use your username for the directory name. You can either
2. Create your own subdirectory in `/project/project_465002757/` and `/scratch/project_465002757/`. Use your username for the directory name. You can either
- Use the built-in file explorer ("Home Directory"), or
- Use the login node shell app in the webinterface
3. Clone the [exercise repository](https://github.com/Lumi-supercomputer/Getting_Started_with_AI_workshop) to your folder in `/project/project_465002178/<username>`. You can either
- use the login node shell app in the webinterface, or
- start a Jupyter lab job and use the Jupyter lab UI for cloning Git repositories, see [Clone_with_JupyterLab.md](./Clone_with_JupyterLab.md) for an illustrated step-by-step guide for this.
3. Clone the [exercise repository](https://github.com/Lumi-supercomputer/Getting_Started_with_AI_workshop) to your folder in `/project/project_465002757/<username>`. You can use the login node shell app in the webinterface for that.
4. Get familiar with the exercise repository layout.

2. Start an interactive Jupyter lab job and run inference with GPT-neo.

In this exercise you will learn how to reserve resources for and start an interactive job to run a Jupyter notebook via the LUMI web interface. The notebook itself introduces you to our running example of finetuning a language model using PyTorch and the training libraries provided by Huggingface. In this exercise you will not do any training, but familiarise yourself a bit with the software and the base model.

1. Start an interactive Jupyter session: Open the Jupyter app (! not "Jupyter for Courses" !) in the LUMI webinterface and set the following settings before pressing `Launch`
- Project: `project_465002178 (LUST Training ...)`
- Project: `project_465002757 (LUST Training ...)`
- Reservation: Use the course reservation `AI_workshop_Day1` (there should only be one available option)
- Partition: `small-g`
- Number of CPU cores: `7`
- Memory (GB): `16`
- Time: `0:30:00`
- Working directory: `/project/$PROJECT`
- Python: `pytorch (Via CSC stack, limited support available)`
- Virtual environment path: leave empty
- Press Advanced
- Custom Python Type: `Container`
- Modules to load: `Local-LAIF lumi-aif-singularity-bindings`
- Path to container with Python: `/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260319_153422/lumi-multitorch-full-u24r64f21m43t29-20260319_153422.sif`
- Container arguments: leave empty
- Init script for container: leave empty
2. Wait for the session to start, then press `Connect to Jupyter`

> **Note**
Expand Down
Binary file removed 02_Using_the_LUMI_web_interface/images/step0.png
Binary file not shown.
Binary file removed 02_Using_the_LUMI_web_interface/images/step1.png
Binary file not shown.
Binary file removed 02_Using_the_LUMI_web_interface/images/step2.png
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
#SBATCH --account=project_465002178
#SBATCH --account=project_465002757
#SBATCH --reservation=AI_workshop_Day1 # comment this out if the reservation is no longer available
#SBATCH --partition=small-g
#SBATCH --gpus-per-node=1
Expand All @@ -10,14 +10,14 @@

# Set up the software environment
# NOTE: the loaded module makes relevant filesystem locations available inside the singularity container
# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI
# (/scratch, /project, etc)
# If you are interested, you can check the exact paths being mounted from
# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua
# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.0.lua
module purge
module use /appl/local/containers/ai-modules
module load singularity-AI-bindings
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings

CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260319_153422/lumi-multitorch-full-u24r64f21m43t29-20260319_153422.sif

# Some environment variables to set up cache directories
SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}"
Expand All @@ -35,7 +35,7 @@ export OUTPUT_DIR=$SCRATCH/$USER/data/
export LOGGING_DIR=$SCRATCH/$USER/runs/

set -xv # print the command so that we can verify setting arguments correctly from the logs
srun singularity exec $CONTAINER \
srun singularity run $CONTAINER \
python GPT-neo-IMDB-finetuning.py \
--model-name gpt-imdb-model \
--output-path $OUTPUT_DIR \
Expand Down
14 changes: 7 additions & 7 deletions 03_Your_first_AI_training_job_on_LUMI/reference_solution/run.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
#SBATCH --account=project_465002178
#SBATCH --account=project_465002757
#SBATCH --reservation=AI_workshop_Day1 # comment this out if the reservation is no longer available
#SBATCH --partition=small-g
#SBATCH --gpus-per-node=1
Expand All @@ -10,14 +10,14 @@

# Set up the software environment
# NOTE: the loaded module makes relevant filesystem locations available inside the singularity container
# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI
# (/scratch, /project, etc)
# If you are interested, you can check the exact paths being mounted from
# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua
# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.0.lua
module purge
module use /appl/local/containers/ai-modules
module load singularity-AI-bindings
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings

CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260319_153422/lumi-multitorch-full-u24r64f21m43t29-20260319_153422.sif

# Some environment variables to set up cache directories
SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}"
Expand All @@ -35,7 +35,7 @@ export OUTPUT_DIR=$SCRATCH/$USER/data/
export LOGGING_DIR=$SCRATCH/$USER/runs/

set -xv # print the command so that we can verify setting arguments correctly from the logs
srun singularity exec $CONTAINER \
srun singularity run $CONTAINER \
python GPT-neo-IMDB-finetuning.py \
--model-name gpt-imdb-model \
--output-path $OUTPUT_DIR \
Expand Down
12 changes: 6 additions & 6 deletions 03_Your_first_AI_training_job_on_LUMI/run.sh
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
#!/bin/bash
#SBATCH --account=project_465002178
#SBATCH --account=project_465002757
#SBATCH --reservation=AI_workshop_Day1 # comment this out if the reservation is no longer available
#SBATCH --partition=...
## <!!! ACTION REQUIRED: SPECIFY ADDITIONAL SLURM PARAMETERS HERE!!!>

# Set up the software environment
# NOTE: the loaded module makes relevant filesystem locations available inside the singularity container
# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI
# (/scratch, /project, etc)
# If you are interested, you can check the exact paths being mounted from
# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua
# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.0.lua
module purge
module use /appl/local/containers/ai-modules
module load singularity-AI-bindings
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings

CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260319_153422/lumi-multitorch-full-u24r64f21m43t29-20260319_153422.sif

# Some environment variables to set up cache directories
SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
#SBATCH --account=project_465002178
#SBATCH --account=project_465002757
#SBATCH --reservation=AI_workshop_Day2 # comment this out if the reservation is no longer available
#SBATCH --partition=standard-g
#SBATCH --nodes=1
Expand All @@ -11,14 +11,14 @@

# Set up the software environment
# NOTE: the loaded module makes relevant filesystem locations available inside the singularity container
# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI
# (/scratch, /project, etc)
# If you are interested, you can check the exact paths being mounted from
# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua
# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.0.lua
module purge
module use /appl/local/containers/ai-modules
module load singularity-AI-bindings
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings

CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260319_153422/lumi-multitorch-full-u24r64f21m43t29-20260319_153422.sif

# Some environment variables to set up cache directories
SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}"
Expand Down Expand Up @@ -46,7 +46,7 @@ export LOCAL_WORLD_SIZE=$SLURM_GPUS_PER_NODE

# As opposed to the example in `run_torchrun.sh`, we can set the CPU binds directly via the slurm command, since we have
# one task per GPU. In this case we do NOT need to set them from within the Python code itself.
srun singularity exec $CONTAINER \
srun singularity run $CONTAINER \
bash -c "RANK=\$SLURM_PROCID \
LOCAL_RANK=\$SLURM_LOCALID \
python GPT-neo-IMDB-finetuning.py \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
#SBATCH --account=project_465002178
#SBATCH --account=project_465002757
#SBATCH --reservation=AI_workshop_Day2 # comment this out if the reservation is no longer available
#SBATCH --partition=standard-g
#SBATCH --nodes=1
Expand All @@ -11,14 +11,14 @@

# Set up the software environment
# NOTE: the loaded module makes relevant filesystem locations available inside the singularity container
# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI
# (/scratch, /project, etc)
# If you are interested, you can check the exact paths being mounted from
# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua
# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.0.lua
module purge
module use /appl/local/containers/ai-modules
module load singularity-AI-bindings
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings

CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260319_153422/lumi-multitorch-full-u24r64f21m43t29-20260319_153422.sif

# Some environment variables to set up cache directories
SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}"
Expand All @@ -41,7 +41,7 @@ set -xv # print the command so that we can verify setting arguments correctly fr
# Since we start only one task with slurm which then starts subprocesses, we cannot use slurm to configure CPU binds.
# Therefore we need to set them up in the Python code itself.

srun singularity exec $CONTAINER \
srun singularity run $CONTAINER \
torchrun --standalone \
--nnodes=1 \
--nproc-per-node=${SLURM_GPUS_PER_NODE} \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
#SBATCH --account=project_465002178
#SBATCH --account=project_465002757
#SBATCH --reservation=AI_workshop_Day2 # comment this out if the reservation is no longer available
#SBATCH --partition=standard-g
#SBATCH --nodes=1
Expand All @@ -11,14 +11,14 @@

# Set up the software environment
# NOTE: the loaded module makes relevant filesystem locations available inside the singularity container
# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI
# (/scratch, /project, etc)
# If you are interested, you can check the exact paths being mounted from
# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua
# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.0.lua
module purge
module use /appl/local/containers/ai-modules
module load singularity-AI-bindings
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings

CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260319_153422/lumi-multitorch-full-u24r64f21m43t29-20260319_153422.sif

# Some environment variables to set up cache directories
SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}"
Expand Down Expand Up @@ -46,7 +46,7 @@ export LOCAL_WORLD_SIZE=$SLURM_GPUS_PER_NODE

# As opposed to the example in `run_torchrun.sh`, we can set the CPU binds directly via the slurm command, since we have
# one task per GPU. In this case we do NOT need to set them from within the Python code itself.
srun singularity exec $CONTAINER \
srun singularity run $CONTAINER \
bash -c "RANK=\$SLURM_PROCID \
LOCAL_RANK=\$SLURM_LOCALID \
python GPT-neo-IMDB-finetuning.py \
Expand Down
14 changes: 7 additions & 7 deletions 08_Scaling_to_multiple_GPUs/reference_solution/run_torchrun.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
#SBATCH --account=project_465002178
#SBATCH --account=project_465002757
#SBATCH --reservation=AI_workshop_Day2 # comment this out if the reservation is no longer available
#SBATCH --partition=standard-g
#SBATCH --nodes=1
Expand All @@ -11,14 +11,14 @@

# Set up the software environment
# NOTE: the loaded module makes relevant filesystem locations available inside the singularity container
# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI
# (/scratch, /project, etc)
# If you are interested, you can check the exact paths being mounted from
# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua
# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.0.lua
module purge
module use /appl/local/containers/ai-modules
module load singularity-AI-bindings
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings

CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260319_153422/lumi-multitorch-full-u24r64f21m43t29-20260319_153422.sif

# Some environment variables to set up cache directories
SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}"
Expand All @@ -41,7 +41,7 @@ set -xv # print the command so that we can verify setting arguments correctly fr
# Since we start only one task with slurm which then starts subprocesses, we cannot use slurm to configure CPU binds.
# Therefore we need to set them up in the Python code itself.

srun singularity exec $CONTAINER \
srun singularity run $CONTAINER \
torchrun --standalone \
--nnodes=1 \
--nproc-per-node=${SLURM_GPUS_PER_NODE} \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
#SBATCH --account=project_465002178
#SBATCH --account=project_465002757
#SBATCH --reservation=AI_workshop_Day2 # comment this out if the reservation is no longer available
#SBATCH --partition=standard-g
#SBATCH --nodes=1
Expand All @@ -11,14 +11,14 @@

# Set up the software environment
# NOTE: the loaded module makes relevant filesystem locations available inside the singularity container
# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI
# (/scratch, /project, etc)
# If you are interested, you can check the exact paths being mounted from
# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua
# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.0.lua
module purge
module use /appl/local/containers/ai-modules
module load singularity-AI-bindings
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings

CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260319_153422/lumi-multitorch-full-u24r64f21m43t29-20260319_153422.sif

# Some environment variables to set up cache directories
SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}"
Expand Down Expand Up @@ -52,7 +52,7 @@ CPU_BIND_MASKS="0x00fe000000000000,0xfe00000000000000,0x0000000000fe0000,0x00000

# tell slurm to configure the cpu binds specified by the mask, additional option v prints to configuration to the logs
srun --cpu-bind=v,mask_cpu=$CPU_BIND_MASKS \
singularity exec $CONTAINER \
singularity run $CONTAINER \
bash -c "RANK=\$SLURM_PROCID \
LOCAL_RANK=\$SLURM_LOCALID \
python GPT-neo-IMDB-finetuning.py \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
#SBATCH --account=project_465002178
#SBATCH --account=project_465002757
#SBATCH --reservation=AI_workshop_Day2 # comment this out if the reservation is no longer available
#SBATCH --partition=standard-g
#SBATCH --nodes=1
Expand All @@ -11,14 +11,14 @@

# Set up the software environment
# NOTE: the loaded module makes relevant filesystem locations available inside the singularity container
# (/scratch, /project, etc) as well as mounts some important system libraries that are optimized for LUMI
# (/scratch, /project, etc)
# If you are interested, you can check the exact paths being mounted from
# /appl/local/containers/ai-modules/singularity-AI-bindings/24.03.lua
# /appl/local/laifs/modules/lumi-aif-singularity-bindings/1.0.0.lua
module purge
module use /appl/local/containers/ai-modules
module load singularity-AI-bindings
module use /appl/local/laifs/modules
module load lumi-aif-singularity-bindings

CONTAINER=/appl/local/containers/sif-images/lumi-pytorch-rocm-6.2.4-python-3.12-pytorch-v2.6.0.sif
CONTAINER=/appl/local/laifs/containers/lumi-multitorch-u24r64f21m43t29-20260319_153422/lumi-multitorch-full-u24r64f21m43t29-20260319_153422.sif

# Some environment variables to set up cache directories
SCRATCH="/scratch/${SLURM_JOB_ACCOUNT}"
Expand All @@ -41,7 +41,7 @@ set -xv # print the command so that we can verify setting arguments correctly fr
# Since we start only one task with slurm which then starts subprocesses, we cannot use slurm to configure CPU binds.
# Therefore we need to set them up in the Python code itself.

srun singularity exec $CONTAINER \
srun singularity run $CONTAINER \
torchrun --standalone \
--nnodes=1 \
--nproc-per-node=${SLURM_GPUS_PER_NODE} \
Expand Down
Loading