VZD1-SLING
Kazalo/Content
Exercises for the SLING workshop
LOGIN NODE
HOSTNAME: nsc-login.ijs.si
CREDENTIALS: provided at the workshop
SLURM
SIMPLE TEST JOBS
#Example 1 srun hostname
#Example 2 srun -N 2 hostname
#Example 3 srun -n 2 hostname
#Example 4 srun -N 2 -n 2 hostname
#Example 5 srun -N 2 -n 4 hostname
SBATCH JOB
Create a simple script myscript.sh:
#!/bin/bash hostname sleep 60
sbatch --partition=gridlong --job-name=test --mem=2G --time=10:00 \ --output=test.log myscript.sh
is the same as:
sbatch -p gridlong -J test --mem=2G -t 10:00 -o test.log myscript.sh
and the same as:
#!/bin/bash #SBATCH --partition=gridlong #SBATCH --job-name=test #SBATCH --mem=2G #SBATCH --time=10:00 #SBATCH --output=test.log sh myscript.sh
Get hostname with sbatch script:
#!/bin/bash #SBATCH --job-name=test #SBATCH --output=result.txt # #SBATCH --ntasks=1 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=2000 srun hostname srun sleep 60
SLURM MPI JOB
Create helloworld.c:
/* C Example */
#include <stdio.h>
#include <mpi.h>
int main (argc, argv)
int argc;
char *argv[];
{
int rank, size;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
printf( "Hello world from process %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}
Create a sbatch script for this MPI job, require 4 cores and 100MB of memory per core. The job walltime should be 10 minutes.
Create an executable hellompi.sh:
#!/bin/bash # #SBATCH --job-name=test-mpi #SBATCH --output=result-mpi.txt #SBATCH --ntasks=4 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=100 module load mpi mpicc helloworld.c -o hello.mpi srun --mpi=pmix hello.mpi
And run it:
sbatch hellompi.sh
Run multiple parallel jobs sequentially
#!/bin/bash #SBATCH -t 01:00:00 #SBATCH -n 14 #SBATCH -c 2 #load mpi module module load mpi #compile example mpicc helloworld.c -o hello.mpi # copy the output somewhere else and then run another executable, copy again... mkdir /home/<username>/test srun --mpi=pmix hello.mpi > myoutput1 2>&1 cp myoutput1 /home/<username>/test srun --mpi=pmix hello.mpi > myoutput2 2>&1 cp myoutput2 /home/<username>/test srun --mpi=pmix hello.mpi > myoutput3 2>&1 cp myoutput3 /home/<username>/test
Run multiple paralllel jobs simultaneously
#!/bin/bash #SBATCH -t 02:00:00 #SBATCH -n 12 # Load mpi module module load mpi # Compile the example mpicc helloworld.c -o hello.mpi # And run the jobs srun --mpi=pmix -n 2 --cpu_bind=cores hello.mpi > a.out 2>&1 & srun --mpi=pmix -n 4 --cpu_bind=cores hello.mpi > b.out 2>&1 & srun --mpi=pmix -n 6 --cpu_bind=cores hello.mpi > c.out 2>&1 & wait
ARC
ARC SIMPLE TEST JOB
Simple test job, save it as test.xrsl:
& (executable = /usr/bin/env) (jobname = "test") (stdout=test.log) (join=yes) (gmlog=log) (memory=1000)
Send the job:
arcproxy arcsub -c nsc.ijs.si test.xrsl
Check the status of the job:
arcstat <JOBID>
Get the results:
arcget <JOBID>
ARC MPI JOB
Use the same helloworld example as in the SLURM MPI job example.
Prepare the job description file:
&
(count = 4)
(jobname = "test-mpi")
(inputfiles =
("hellompi.sh" "")
("hellompi.c" "") )
(outputfiles = ("result-mpi.txt" "")
)
(executable = "hellompi.sh")
(stdout = "hellompi.log")
Prepare the executable hellompi.sh:
#!/bin/bash echo "Compiling example" mpicc -o hello hellompi.c echo "Done." echo "Running example:" mpirun --mca btl tcp,self -np 4 $PWD/hello > result-mpi.txt echo "Done."
Send job:
arcsub -c nsc.ijs.si hellompi.xrsl
Retrieve results:
arcget <JOBID>
MULTIPLE JOBS SUBMISSION
Executable:
cat run.sh
#!/bin/bash hostname
Job template:
#!/usr/bin/python
import os, sys
jobDescription = '''&(executable=run.sh)
(cpuTime='5 minutes')
(memory="2000")
(count = 1)
(stdout=stdout.txt)
(stderr=stderr.txt)
(inputFiles=('run.sh' ''))
(jobName=job%04d)'''
Submit script: submit.py:
#!/usr/bin/python
import os, sys
jobDescription = '''&(executable=run.sh)
(cpuTime='5 minutes')
(stdout=stdout.txt)
(stderr=stderr.txt)
(inputFiles=('run.sh' ''))
(jobName=job%04d)'''
totalJobs = 4
for i in range(totalJobs):
# Removing newlines from jobDescription and convert
# to a string for use with arcsub
jobDescriptionString = "".join(jobDescription.split("\n"))
os.system('arcsub -c nsc.ijs.si -S org.nordugrid.gridftpjob\
-o joblist.xml --jobdescrstring="%s"' \
% (jobDescriptionString % i))
SINGULARITY
SINGULARITY TEST JOB with SLURM
cat singularity_test.sh #!/bin/bash #SBATCH -J singularity_test #SBATCH -o singularity_test.out #SBATCH -e singularity_test.err #SBATCH -p gridlong #SBATCH -t 0-00:30 #SBATCH -N 1 #SBATCH -c 1 #SBATCH --mem=4000 # Singularity command line options singularity exec /net/hold/data1/singularity-images/centos7.sif cat /etc/os-release
sbatch singularity test.sh
Run a Tensorflow model:
git clone https://github.com/tensorflow/models.git srun --constraint=gpu singularity exec --nv \ docker://tensorflow/tensorflow:latest-gpu \ python models/tutorials/image/mnist/convolutional.py
Based on all previous examples, write how you would run SLURM and ARC job for the same Tensorflow job.
Tensoflow models are available on /net/hold/data1/arc/software/models
Singularity container in available on /net/hold/data1/singularity/tensorflow-latest.sif

