Delavnica Comtrade 15.3.2017
This is ARC workshop. Grid jobs will be tested on ARNES cluster Jost using ARC middleware, which is supported on most clusters in Sling.
Kazalo/Content
Install the ARC client
ARC client is available for most Linux distributions and for MacOS. It works partially with Windows, only when https protocol is supported. See this howto to install the client on your machine. Then follow the instructions here.
Install your certificate
Request your certificate on the SiGNET webpage.
Install the certificate on your computer by using this script. To install the certificate manually, follow the instructions below:
# mkdir ~/.arc # openssl pkcs12 -in certificate.p12 -clcerts -nokeys -out usercert.pem # openssl pkcs12 -in certificate.p12 -nocerts -out userkey.pem # chmod 400 userkey.pem # chmod 644 usercert.pem # mv user*.pem ~/.arc
Virtual organization and authorization
To access the grid, each user has to be a member of a virtual organization. Virtual organization attributes roles to users and defines policy. Different clusters support different VO-s. In Sling, all clusters support the national VO, called gen.vo.sling.si.
If you already have your own SiGNET certificate, join the gen.vo.sling.si VO on this webpage: https://voms.sling.si:8443/voms/gen.vo.sling.si
User interface
You can use ARC client on your own computer, or you can use the client installed on a virtual machine, provided by Arnes for this workshop. The username and password will be assigned to you by the staff.
Connect to the virtual machine using your credentials
ssh demo$x@308.ablak.arnes.si $x=001..020
Save your certificate file and key to the ~/.arc/
folder, as described in the beginning of this workshop:
/home/<username>/.arc/usercert.pem /home/<username>/.arc/userkey.pem
ARC client settings
In order to make the submission process as easy as possible, settings can be saved in ~/.arc/client.conf. To use ARC at Arnes, use the following configuration:
vi .arc/client.conf [common] keypath=/home/<username>/.arc/userkey.pem certificatepath=/home/<username>/.arc/usercert.pem [computing/jost] url=ldap://jost.arnes.si:2135 infointerface=org.nordugrid.ldapng submissioninterface=org.nordugrid.gridftpjob
If you want to use HTTPS instead of GRIDFTPD protocol, use these settings:
[computing/jost] url=https://jost.arnes.si:6000/arex infointerface=org.ogf.glue.emies.resourceinfo submissioninterface=org.ogf.glue.emies.activitycreation default=yes
You can check other possible settings here.
To specify the protocol while submitting the job, use the -S switch:
#to use GRIDFTP protocol arcsub -c jost.arnes.si -S org.nordugrid.gridftpjob test.xrsl #to use HTTPS protocol arcsub -c jost.arnes.si -S org.ogf.glue.emies.activitycreation test.xrsl
To see the supported protocols on the cluster, use arcinfo:
arcinfo -c jost.arnes.si
gen.vo.sling.si settings
mkdir -p ~/.arc/vomses/ cat <<end > ~/.arc/vomses/gen.vo.sling.si-voms.sling.si "gen.vo.sling.si" "voms.sling.si" "15001" \ "/C=SI/O=SiGNET/O=SLING/CN=voms.sling.si" "gen.vo.sling.si" end mkdir -p ~/.arc/vomsdir cat <<end > ~/.arc/vomsdir/gen.vo.sling.si /C=SI/O=SiGNET/O=SLING/CN=voms.sling.si /C=SI/O=SiGNET/CN=SiGNET CA end
If your certificate was installed successfully, you are ready to go.
Useful commands
arcproxy #create proxy arcproxy -S gen.vo.sling.si arcsub #send grid job and input files to the cluster arcsub -c jost.arnes.si test.xrsl arcstat #check the status of your job arcstat <JOBID> arcstat -a #all arccat #check the current status of a job (stderr, gm.log) arcget #transfer results arcget -a #all arcget <JOBID> #single job by id arcget -c jost.arnes.si #all jobs running on Jost arcget -S FINISHED #all finished jobs arckill #cancel the job arckill <JOBID> arcls #show directories on storage arcrenew #renew your proxy (while still active) arcsync #sync your joblist with the list on the server arccp #copy files to the external storage arcrm #remove document from storage
Some useful examples:
$ arcproxy -S gen.vo.sling.si #create proxy for gen.vo.sling.si $ arcproxy –I #check proxy information $ arcinfo jost.arnes.si #check cluster status $ arcsub –c jost.arnes.si test.xrsl #send a test job to the cluster $ arcsub –c jost.arnes.si test.xrsl -d DEBUG #submitting in debug mode $ arcstat JOBID ali arcstat -a #check job status $ arccat JOBID ali arccat -a #check current job status $ arcget JOBID ali arcget –a #transfer results
Exercices
Simple grid job submission
Before running your grid jobs on the cluster, some general info about the nodes would be useful. This test job will acquire environment variables on the worker node, where your job will be executed.
Test.xrsl
& (executable = /usr/bin/env) (jobname = "test") (stdout=test.log) (join=yes) (walltime=5) (gmlog=log)
Send the test job to the cluster:
arcsub -c jost.arnes.si -S org.nordugrid.gridftpjob test.xrsl
The same can be achieved by arctest command:
arctest -c jost.arnes.si -J 2
Arctest command is used for basic testing of ARC client and server.
- Test job 1: calculates primer numbers for a number of minutes (-r 5) and outputs the result in stderr. Source is downloaded from an HTTP/FTP server and the program is compiled before running.
- Test job 2: lists all environment variables at the worker node
- Test job 3: copies a file from HTTP into a local file
- arctest –certificate will print basic info about your certificate
Job with input files
First, let’s create two input files file1 and file2
echo "This is file 1" >> file1 echo "This is file 2" >> file2
Then create a bash script file.sh using those two input files:
#!/bin/bash cat file1 file2 > file3
Now, let’s write a description file (file.xrsl) for this job:
& (executable="file.sh") (jobName=input) (inputFiles=("file1" "")("file2" "")("file.sh" "")) (outputFiles=("file3" "")) (stdout=stdout) (stderr=stderr) (walltime="5 minutes") (count=1)
Send the job to the cluster, using debug mode:
arcsub -c jost.arnes.si -S org.nordugrid.gridftpjob test.xrsl -d DEBUG
Job with software
This is an example of a simple program that will be sent to the cluster with the job
Job description sum.xrsl
& (executable="sum.sh") (inputfiles= ("sum.sh" "sum.sh") ("sum.py" "sum.py") ) (outputfiles=("sum.out" " ") ) (stdout="out.txt") (stderr="err.txt") (gmlog="sum.log") (jobName="test-calculation") (runtimeenvironment = "APPS/COMTRADE/DEFAULT")
Program sum.py
sum = 0 print "Display numbers: " for x in ["1", "1050","164999"]: print x print "Calculate the numbers " for y in [1,1050,164999]: sum=sum+y print sum
Execution script sum.sh:
#!/bin/sh python sum.py > sum.out
RTE
Programs are installed on the cluster on shared storage and can be used in your job by specifying the runtime environment (RTE). To see the available runtime environments on the cluster, use this command:
ldapsearch -x -h jost.arnes.si -p 2135 -b 'Mds-Vo-name=local,o=grid' \ | grep nordugrid-cluster-runtimeenvironment
They are also displayed on the grid monitor, see www-old.sling.si/gridmonitor/loadmon.php and click on the cluster.
We will test the jobs with RTE requirements in the next to exercises, when specifying the MPI environment for the job.
Remember, you can also specify multiple runtime environments in the same job description.
Parallel job with OPENMP
First we will run a helloworld OpenMP job on a single server. We will use 4 threads to run the program.
First we the program hello-omp.c:
#include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { int nthreads, tid; /* Fork a team of threads giving them their own copies of variables */ #pragma omp parallel private(nthreads, tid) { /* Obtain thread number */ tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads); } } /* All threads join master thread and disband */ }
Then we need an execution script hello-omp.sh:
#!/bin/bash export OMP_NUM_THREADS=8 mpicc -fopenmp hello-omp.c -o hellomp mpirun -np 1 hellomp > hello-omp.out
Now we need a description file hello-omp.xrsl:
& (executable="hello-omp.sh") (environment= ( "OMP_NUM_THREADS" "8" ) ) (count = 8) (contpernode = 8) (inputfiles= ("hello-omp.sh" "hello-omp.sh") ("hello-omp.c" "hello-omp.c") ) (outputfiles=("hello-omp.out" " ") ) (stdout="out.txt") (stderr="err.txt") (gmlog="hello-omp.log") (jobName="hello-omp") (runtimeenvironment = "APPS/COMTRADE/OPENMPI-2.0.2")
This is a result of the job:
Hello World from thread = 3 Hello World from thread = 5 Hello World from thread = 7 Hello World from thread = 6 Hello World from thread = 1 Hello World from thread = 2 Hello World from thread = 4 Hello World from thread = 0 Number of threads = 8
Parallel job with MPI
Job description hellompi.xrsl:
& (count = 4) (jobname = "hellompi") (inputfiles = ("hellompi.sh" "") ("hellompi.c" "") ) (outputfiles = ("hellompi.out" "") ) (executable = "hellompi.sh") (stdout = "hellompi.log") (join = yes) (walltime = "15 minutes") (gmlog = log) (memory = 2000) (runtimeenvironment = "APPS/COMTRADE/OPENMPI-2.0.2")
Program hellompi.c:
/* C Example */ #include <stdio.h> #include <mpi.h> int main (argc, argv) int argc; char *argv[]; { int rank, size; MPI_Init (&argc, &argv); /* starts MPI */ MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */ MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */ printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize(); return 0; }
Execution script hellompi.sh:
#!/bin/bash date hostname echo "Compiling example" mpicc -o hello hellompi.c echo "Done." echo "Running example:" mpiexec -np 1 ${PWD}/hello > hellompi.out echo "Done." date
AD1: There is more. Try the same job example by using infiniband network and by specifying the program to use 2 cores per node.
Advanced exercises
Massive job submission and Arcrunner
First example for massive job submission is as follows:
We need an xRSL template, it will be included in the submission script:
#!/usr/bin/python import os, sys jobDescription = '''&(executable=run.sh) (cpuTime='5 minutes') (memory="2000") (count = 1) (stdout=stdout.txt) (stderr=stderr.txt) (inputFiles=('run.sh' '')) (runtimeenvironment=APPS/COMTRADE/DEFAULT) (jobName=job%04d)'''
This is the python script to submit the jobs: submit.py
#!/usr/bin/python import os, sys jobDescription = '''&(executable=run.sh) (cpuTime='5 minutes') (stdout=stdout.txt) (stderr=stderr.txt) (inputFiles=('run.sh' '')) (runtimeenvironment=APPS/COMTRADE/DEFAULT) (jobName=job%04d)''' totalJobs = 4 for i in range(totalJobs): # Removing newlines from jobDescription and convert # to a string for use with arcsub jobDescriptionString = "".join(jobDescription.split("\n")) os.system('arcsub -c jost.arnes.si -S org.nordugrid.gridftpjob\ -o joblist.xml --jobdescrstring="%s"' \ % (jobDescriptionString % i))
JobName will be adapted for each job: job0000-job000n-1
TotalJobs is set to 4, therefore 4 grid jobs will be sent.
To run the command, we use a for loop:
for i in range(totalJobs):
Xrsl is used as a string, which is used with arcsub command:
jobDescriptionString = "".join(jobDescription.split("\n"))
We can save the job ID-s to the job.list file and then monitor the job status with: arcstat -j job.list
Submit the jobs to the system:
os.system('arcsub -c jost.arnes.si --jobdescrstring="%s"' \ % (jobDescriptionString % i))
% is used for naming purposes.
run.sh:
#!/bin/sh echo "This is a massive job submission test."
We can submit the jobs:
python submit.py
Check the status:
arcstat -i joblist.xml
Download the results
arcget -i joblist.xml
ARCRUNNER
CSC in Finland wrote a simple submission script, called Arcrunner, that enables massive job submission, monitoring the jobs and retrieving the results, when the jobs are finished. It can be downloaded from here.
First unzip the program:
unzip arcrunner.zip cd arcrunner/bin
Change the jobmanager path to the location where you extracted the arcrunner file:
set jobmanagerpath=("~/arcrunner")
And to add it to your commands:
export PATH=~/arcrunner/bin:$PATH
The minimum input to use it is:
arcrunner -xrsl job_descriptionfile.xrsl
This are the options:
arcrunner options: Option Description -xrsl file_name The common xrsl file name that defines the jobs. -R file_name Text file containing the names of the clusters to be used. -W integer Maximum number of jobs in the grid waiting to run. -Q integer The max time a job stays in a queue before being resubmitted. -S integer The max time a job stays in submitted state before being resubmitted. -J integer Maximum number of simultaneous jobs running in the grid.
GPGPU
Sending a helloworld job using CUDA.
First we need the program hello.cu:
// This is the REAL "hello world" for CUDA! // It takes the string "Hello ", prints it, then passes it to CUDA with an array // of offsets. Then the offsets are added in parallel to produce the string "World!" // By Ingemar Ragnemalm 2010 #include <stdio.h> const int N = 7; const int blocksize = 7; __global__ void hello(char *a, int *b) { a[threadIdx.x] += b[threadIdx.x]; } int main() { char a[N] = "Hello "; int b[N] = {15, 10, 6, 0, -11, 1, 0}; char *ad; int *bd; const int csize = N*sizeof(char); const int isize = N*sizeof(int); printf("%s", a); cudaMalloc( (void**)&ad, csize ); cudaMalloc( (void**)&bd, isize ); cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ); cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ); dim3 dimBlock( blocksize, 1 ); dim3 dimGrid( 1, 1 ); hello<<<dimGrid, dimBlock>>>(ad, bd); cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ); cudaFree( ad ); printf("%s\n", a); return EXIT_SUCCESS; }
We need a script hellocuda.sh to run the program:
nvcc hello.cu -o helloworld ./helloworld
Now we prepare a job description file cuda.xrsl:
& (jobname = "hellocuda") (inputfiles = ("hellocuda.sh" "") ("hello.cu" "") ) (outputfiles = ("hellocuda.out" "") ) (executable = "hellocuda.sh") (stdout = "hellocuda.log") (join = yes) (walltime = "15 minutes") (gmlog = log) (memory = 2000) (runtimeenvironment = "APPS/COMTRADE/GPU")
Grid job in Singularity container
RTE-s in grid solve many scientific problems, but they are limited. We have implemented lightweight virtualization on the cluster so that other operating systems are also supported and grid users have more flexibility.
We will run a test job on the cluster in Singularity container. Let’s create a simple bash script to check the environment on the container: container.sh
#!/bin/bash cat /etc/lsb-release env
And now continue with your job description:
& (executable="container.sh") (jobName=TEST123) (inputFiles=("container.sh" "")) (stdout=stdout) (stderr=stderr) (walltime="5 minutes") (count=1)
Send the job to the cluster and see the results.