Personal tools
You are here: Home Virgo Data Analysis Simple tutorial for Grid job management
Document Actions

Simple tutorial for Grid job management

by Alberto Colla last modified 2013-11-23 18:02

Index


Long-term proxy generation

Here is the correct way (I checked it: it works!) to generate long-term proxies:

  • myproxy-init -d -n
  • voms-proxy-init --voms virgo

myproxy's proxy lasts for 1 week and is the one used to run the job. user's proxy created by voms-proxy-init can expire, be destroyed and recreated without consequences on the job execution.

I have also checked that if myproxy is renewed before it expires, the jobs keep on running!!


HelloWorld job

Here is an example of a simple "HelloWorld" job configuration file (JDL):

Type="Job";
JobType="Normal"; 
VirtualOrganisation = "virgo";
Executable="inputs.sh";
InputSandbox={"/home/collaalb/scripts/inputs.sh","/home/collaalb/scripts/stdin.txt"};
StdInput="stdin.txt";
Arguments=" arg1 arg2";
RetryCount = 3;
StdOutput="std.out";
StdError="std.err";
OutputSandbox={"std.out","std.err"};
Requirements = RegExp ("cnaf.infn.it",other.GlueCEUniqueId)

The executable, inputs.sh:


#!/bin/sh

echo "Hello World!!"
echo "This host is " `/bin/hostname`
echo "I am here: " `pwd`

echo "Disks mounted: "

df -h

echo "params are $1 $2"

echo "Reading first parameter from stdIn ... "
read i

echo "1st parameter from stdIn says " $i

echo "Reading 2nd parameter from stdIn ... "
read j

echo "2nd parameter from stdIn says " $j

lcg-cp lfn:/grid/virgo/collaalb/test.txt file:`pwd`/test.txt
echo "File got from Grid says:"

cat test.txt

echo "done!" 

In this example it is shown how to set script arguments and standard input in the JDL, and how to access Grid data using LCG tools.

Note: only the main executable file (the Executable) is made executable by the system. If the main script runs other programs sent to the worker nodes via the input sandbox, they must be explicitly made executable inside the main script ( chmod +x inner_program ).

The Requirements parameter in the example indicates the CNAF as the execution site of the job. Requirements is an optional parameter. Regular expressions to link more requirements are allowed, e.g. to make the job run on either the Cnaf or Rome Virgo farm:

Requirements = RegExp ("cnaf.infn.it",other.GlueCEUniqueId) || RegExp("roma1.infn.it",other.GlueCEUniqueId)

Last note: all lines in the JDL must end with a semicolon, except the last one!


Parametric job type JDL

Parametric job type allows you to submit bulk of jobs as a single job. The relevant JDL parameters are:

JobType="Parametric";
Arguments=" arg1, arg2, ... , _PARAM_, ..., argN);
Parameters={"par1", "par2", ... "parM"};

At the execution, the JDL will produce as many sub-jobs as the number of Parameters. Each parameter will be put in place of the label _PARAM_. Note that _PARAM_ can be used in other places of the JDL, for instance to indicate the standard output file name:

StdInput = "input_PARAM_.txt";
StdOutput = "output_PARAM_.txt";
StdError = "error_PARAM_.txt";
InputSandbox = {"myjob.exe", "input_PARAM_.txt"};
OutputSandbox = {"output_PARAM_.txt", "error_PARAM_.txt"};

More info on parametric jobs can be found here.


Multi-core job JDL

Some programs (e.g. Matlab compiled executables) can use as many CPU cores as they find on the worker nodes. It may be necessary to limit the number of jobs running on the same node (e.g. to shorten the execution time). This can be done with this set of JDL parameters:

WholeNodes = true;
SMPgranularity = 4;
CpuNumber = 4;

In this specific example the job requires 1 dedicated node with 4 CPUs and 4 cores.

Note: The parameter WholeNodes=True alone is not enough! One has to specify the three of them!


Submit job

Before submitting, check the correctness of the JDL and the list of Computing Elements (CE) matching your job requirements:

glite-wms-job-list-match -a helloWorld.jdl

The output will look like (in this example we required to run on the Rome farm):

Connecting to the service https://wms024.cnaf.infn.it:7443/glite_wms_wmproxy_server

==========================================================================

           COMPUTING ELEMENT IDs LIST 
 The following CE(s) matching your job requirements have been found:

   *CEId*
 - virgo-ce.roma1.infn.it:8443/cream-pbs-virgoglong

==========================================================================

Submit the job:

glite-wms-job-submit -a -e https://wms-multi.grid.cnaf.infn.it:7443/glite_wms_wmproxy_server -o helloWorld.out helloWorld.jdl

where:

  • -e sets the endpoint Workload Management System (WMS). wms-multi.grid.cnaf.infn.it is an alias set up at Cnaf which automatically chooses the best WMS to manage the job;
  • -o means that the jobID is written in the file helloWorld.out, and can be used in the get-status and get-output commands with option -i .

The output will look like:

Connecting to the service https://wms014.cnaf.infn.it:7443/glite_wms_wmproxy_server


====================== glite-wms-job-submit Success ======================

The job has been successfully submitted to the WMProxy
Your job identifier is:

https://wms014.cnaf.infn.it:9000/aQ6x5KKuJVWK6M6Ca6elq

The job identifier has been saved in the following file:
helloWorld.out

==========================================================================

Another useful option is --collection, which will submit all JDL files in the specified directory.


Check job status

To get the job status:

glite-wms-job-status -i helloWorld.out 

The output will look like:

======================= glite-wms-job-status Success =====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://wms014.cnaf.infn.it:9000/aQ6x5KKuJVWK6M6Ca6elqQ
Current Status:     Scheduled
Status Reason:      unavailable
Destination:        virgo-ce.roma1.infn.it:8443/cream-pbs-virgoglong
Submitted:          Thu Nov 21 18:59:54 2013 CET
==========================================================================

Possible statuses are Waiting, Scheduled, Running, Aborted, Cancelled, Done, Done (Exit Code = 0).

If job is Parametric, the output will show an overall status and the status of all sub-jobs.


Get job output

Once the status of the job is Done, its output can be retrieved:

glite-wms-job-output --dir jobOutput -i helloWorld.out 

The directory jobOutput must exist.

This will give:

Connecting to the service https://gridrb.fe.infn.it:7443/glite_wms_wmproxy_server

================================================================================

         JOB GET OUTPUT OUTCOME

Output sandbox files for the job:
https://gridrb.fe.infn.it:9000/o7_V5aBdOseE1xKNvSP3tA
have been successfully retrieved and stored in the directory:
/home/collaalb/jobOutput/collaalb_o7_V5aBdOseE1xKNvSP3tA

================================================================================

Parametric sub-job output will be stored in sub-folders named as the parameter.