VZD1-SLING
Kazalo/Content
Exercises for the SLING workshop
LOGIN NODE
HOSTNAME: nsc-login.ijs.si
CREDENTIALS: provided at the workshop
SLURM
SIMPLE TEST JOBS
#Example 1 srun hostname
#Example 2 srun -N 2 hostname
#Example 3 srun -n 2 hostname
#Example 4 srun -N 2 -n 2 hostname
#Example 5 srun -N 2 -n 4 hostname
SBATCH JOB
Create a simple script myscript.sh:
#!/bin/bash hostname sleep 60
sbatch --partition=gridlong --job-name=test --mem=2G --time=10:00 \ --output=test.log myscript.sh
is the same as:
sbatch -p gridlong -J test --mem=2G -t 10:00 -o test.log myscript.sh
and the same as:
#!/bin/bash #SBATCH --partition=gridlong #SBATCH --job-name=test #SBATCH --mem=2G #SBATCH --time=10:00 #SBATCH --output=test.log sh myscript.sh
Get hostname with sbatch script:
#!/bin/bash #SBATCH --job-name=test #SBATCH --output=result.txt # #SBATCH --ntasks=1 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=2000 srun hostname srun sleep 60
SLURM MPI JOB
Create helloworld.c:
/* C Example */ #include <stdio.h> #include <mpi.h> int main (argc, argv) int argc; char *argv[]; { int rank, size; MPI_Init (&argc, &argv); /* starts MPI */ MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */ MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */ printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize(); return 0; }
Create a sbatch script for this MPI job, require 4 cores and 100MB of memory per core. The job walltime should be 10 minutes.
Create an executable hellompi.sh:
#!/bin/bash # #SBATCH --job-name=test-mpi #SBATCH --output=result-mpi.txt #SBATCH --ntasks=4 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=100 module load mpi mpicc helloworld.c -o hello.mpi srun --mpi=pmix hello.mpi
And run it:
sbatch hellompi.sh
Run multiple parallel jobs sequentially
#!/bin/bash #SBATCH -t 01:00:00 #SBATCH -n 14 #SBATCH -c 2 #load mpi module module load mpi #compile example mpicc helloworld.c -o hello.mpi # copy the output somewhere else and then run another executable, copy again... mkdir /home/<username>/test srun --mpi=pmix hello.mpi > myoutput1 2>&1 cp myoutput1 /home/<username>/test srun --mpi=pmix hello.mpi > myoutput2 2>&1 cp myoutput2 /home/<username>/test srun --mpi=pmix hello.mpi > myoutput3 2>&1 cp myoutput3 /home/<username>/test
Run multiple paralllel jobs simultaneously
#!/bin/bash #SBATCH -t 02:00:00 #SBATCH -n 12 # Load mpi module module load mpi # Compile the example mpicc helloworld.c -o hello.mpi # And run the jobs srun --mpi=pmix -n 2 --cpu_bind=cores hello.mpi > a.out 2>&1 & srun --mpi=pmix -n 4 --cpu_bind=cores hello.mpi > b.out 2>&1 & srun --mpi=pmix -n 6 --cpu_bind=cores hello.mpi > c.out 2>&1 & wait
ARC
ARC SIMPLE TEST JOB
Simple test job, save it as test.xrsl:
& (executable = /usr/bin/env) (jobname = "test") (stdout=test.log) (join=yes) (gmlog=log) (memory=1000)
Send the job:
arcproxy arcsub -c nsc.ijs.si test.xrsl
Check the status of the job:
arcstat <JOBID>
Get the results:
arcget <JOBID>
ARC MPI JOB
Use the same helloworld example as in the SLURM MPI job example.
Prepare the job description file:
& (count = 4) (jobname = "test-mpi") (inputfiles = ("hellompi.sh" "") ("hellompi.c" "") ) (outputfiles = ("result-mpi.txt" "") ) (executable = "hellompi.sh") (stdout = "hellompi.log")
Prepare the executable hellompi.sh:
#!/bin/bash echo "Compiling example" mpicc -o hello hellompi.c echo "Done." echo "Running example:" mpirun --mca btl tcp,self -np 4 $PWD/hello > result-mpi.txt echo "Done."
Send job:
arcsub -c nsc.ijs.si hellompi.xrsl
Retrieve results:
arcget <JOBID>
MULTIPLE JOBS SUBMISSION
Executable:
cat run.sh
#!/bin/bash hostname
Job template:
#!/usr/bin/python import os, sys jobDescription = '''&(executable=run.sh) (cpuTime='5 minutes') (memory="2000") (count = 1) (stdout=stdout.txt) (stderr=stderr.txt) (inputFiles=('run.sh' '')) (jobName=job%04d)'''
Submit script: submit.py:
#!/usr/bin/python import os, sys jobDescription = '''&(executable=run.sh) (cpuTime='5 minutes') (stdout=stdout.txt) (stderr=stderr.txt) (inputFiles=('run.sh' '')) (jobName=job%04d)''' totalJobs = 4 for i in range(totalJobs): # Removing newlines from jobDescription and convert # to a string for use with arcsub jobDescriptionString = "".join(jobDescription.split("\n")) os.system('arcsub -c nsc.ijs.si -S org.nordugrid.gridftpjob\ -o joblist.xml --jobdescrstring="%s"' \ % (jobDescriptionString % i))
SINGULARITY
SINGULARITY TEST JOB with SLURM
cat singularity_test.sh #!/bin/bash #SBATCH -J singularity_test #SBATCH -o singularity_test.out #SBATCH -e singularity_test.err #SBATCH -p gridlong #SBATCH -t 0-00:30 #SBATCH -N 1 #SBATCH -c 1 #SBATCH --mem=4000 # Singularity command line options singularity exec /net/hold/data1/singularity-images/centos7.sif cat /etc/os-release
sbatch singularity test.sh
Run a Tensorflow model:
git clone https://github.com/tensorflow/models.git srun --constraint=gpu singularity exec --nv \ docker://tensorflow/tensorflow:latest-gpu \ python models/tutorials/image/mnist/convolutional.py
Based on all previous examples, write how you would run SLURM and ARC job for the same Tensorflow job.
Tensoflow models are available on /net/hold/data1/arc/software/models
Singularity container in available on /net/hold/data1/singularity/tensorflow-latest.sif