VZD1-SLING

Exercises for the SLING workshop

LOGIN NODE

HOSTNAME: nsc-login.ijs.si

CREDENTIALS: provided at the workshop

SLURM

SIMPLE TEST JOBS

#Example 1
 srun hostname
#Example 2 
srun -N 2 hostname
#Example 3
srun -n 2 hostname
#Example 4
srun -N 2 -n 2 hostname
#Example 5
srun -N 2 -n 4 hostname

SBATCH JOB

Create a simple script myscript.sh:

#!/bin/bash
hostname
sleep 60
sbatch --partition=gridlong --job-name=test --mem=2G --time=10:00 \
--output=test.log myscript.sh

is the same as:

sbatch -p gridlong -J test --mem=2G -t 10:00 -o test.log myscript.sh

and the same as:

#!/bin/bash
 #SBATCH --partition=gridlong 
 #SBATCH --job-name=test 
 #SBATCH --mem=2G
 #SBATCH --time=10:00
 #SBATCH --output=test.log
 sh myscript.sh

Get hostname with sbatch script:

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=result.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=2000
srun hostname
srun sleep 60

SLURM MPI JOB

Create helloworld.c:

/* C Example */
#include <stdio.h>
#include <mpi.h> 


int main (argc, argv)
     int argc;
     char *argv[];
{
  int rank, size;

  MPI_Init (&argc, &argv);      /* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size);        /* get number of processes */
  printf( "Hello world from process %d of %d\n", rank, size );
  MPI_Finalize();
  return 0;
}

Create a sbatch script for this MPI job, require 4 cores and 100MB of memory per core. The job walltime should be 10 minutes.

Create an executable hellompi.sh:

#!/bin/bash
#
#SBATCH --job-name=test-mpi 
#SBATCH --output=result-mpi.txt
#SBATCH --ntasks=4
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100 
module load mpi
mpicc helloworld.c -o hello.mpi
srun --mpi=pmix hello.mpi

And run it:

sbatch hellompi.sh

Run multiple parallel jobs sequentially

#!/bin/bash

#SBATCH -t 01:00:00
#SBATCH -n 14
#SBATCH -c 2

#load mpi module
module load mpi
#compile example
mpicc helloworld.c -o hello.mpi
# copy the output somewhere else and then run another executable, copy again...
mkdir /home/<username>/test
srun --mpi=pmix hello.mpi > myoutput1 2>&1
cp myoutput1 /home/<username>/test
srun --mpi=pmix hello.mpi > myoutput2 2>&1
cp myoutput2 /home/<username>/test
srun --mpi=pmix hello.mpi > myoutput3 2>&1
cp myoutput3 /home/<username>/test

Run multiple paralllel jobs simultaneously

#!/bin/bash
#SBATCH -t 02:00:00
#SBATCH -n 12 

# Load mpi module
module load mpi

# Compile the example
mpicc helloworld.c -o hello.mpi

# And run the job​s
srun --mpi=pmix -n 2 --cpu_bind=cores hello.mpi > a.out 2>&1 &
srun --mpi=pmix -n 4 --cpu_bind=cores hello.mpi > b.out 2>&1 &
srun --mpi=pmix -n 6 --cpu_bind=cores hello.mpi > c.out 2>&1 & 

wait

ARC

ARC SIMPLE TEST JOB

Simple test job, save it as test.xrsl:

&
(executable = /usr/bin/env) 
(jobname = "test") 
(stdout=test.log) 
(join=yes)
(gmlog=log)
(memory=1000)

Send the job:

arcproxy 
arcsub -c nsc.ijs.si test.xrsl

Check the status of the job:

arcstat <JOBID>

Get the results:

arcget <JOBID>

ARC MPI JOB

Use the same helloworld example as in the SLURM MPI job example.

Prepare the job description file:

&
(count = 4)
(jobname = "test-mpi") 
(inputfiles =
("hellompi.sh" "")
("hellompi.c" "") )
(outputfiles = ("result-mpi.txt" "")
)
(executable = "hellompi.sh")
(stdout = "hellompi.log")

Prepare the executable hellompi.sh:

#!/bin/bash 
echo "Compiling example"
mpicc -o hello hellompi.c
echo "Done."

echo "Running example:"
mpirun --mca btl tcp,self -np 4 $PWD/hello > result-mpi.txt
echo "Done."

Send job:

arcsub -c nsc.ijs.si hellompi.xrsl

Retrieve results:

arcget <JOBID>

MULTIPLE JOBS SUBMISSION

Executable:

cat run.sh

#!/bin/bash
hostname

Job template:

#!/usr/bin/python

import os, sys

jobDescription = '''&(executable=run.sh)
(cpuTime='5 minutes')
(memory="2000")
(count = 1)
(stdout=stdout.txt)
(stderr=stderr.txt)
(inputFiles=('run.sh' ''))
(jobName=job%04d)'''

Submit script: submit.py:

#!/usr/bin/python

import os, sys

jobDescription = '''&(executable=run.sh)
(cpuTime='5 minutes')
(stdout=stdout.txt)
(stderr=stderr.txt)
(inputFiles=('run.sh' ''))
(jobName=job%04d)'''

totalJobs = 4

for i in range(totalJobs):
	
	# Removing newlines from jobDescription and convert
	# to a string for use with arcsub
	
	jobDescriptionString = "".join(jobDescription.split("\n"))
	os.system('arcsub -c nsc.ijs.si -S org.nordugrid.gridftpjob\
 -o joblist.xml --jobdescrstring="%s"' \
% (jobDescriptionString % i))

SINGULARITY

SINGULARITY TEST JOB with SLURM

cat singularity_test.sh 
#!/bin/bash
#SBATCH -J singularity_test 
#SBATCH -o singularity_test.out 
#SBATCH -e singularity_test.err 
#SBATCH -p gridlong
#SBATCH -t 0-00:30 
#SBATCH -N 1 
#SBATCH -c 1 
#SBATCH --mem=4000
# Singularity command line options
singularity exec /net/hold/data1/singularity-images/centos7.sif cat /etc/os-release
sbatch singularity test.sh

Run a Tensorflow model:

git clone https://github.com/tensorflow/models.git 
srun --constraint=gpu singularity exec --nv \
docker://tensorflow/tensorflow:latest-gpu \
python models/tutorials/image/mnist/convolutional.py

Based on all previous examples, write how you would run SLURM and ARC job for the same Tensorflow job.

Tensoflow models are available on /net/hold/data1/arc/software/models

Singularity container in available on /net/hold/data1/singularity/tensorflow-latest.sif