Simple tutorial for Grid job management
Index
- Long-term proxy generation
- HelloWorld JDL
- Parametric job type JDL
- Multi-core job JDL
- Submit job
- Check job status
- Get Job output
Long-term proxy generation
Here is the correct way (I checked it: it works!) to generate long-term proxies:
-
myproxy-init -d -n
-
voms-proxy-init --voms virgo
myproxy's proxy lasts for 1 week and is the one used to run the job. user's proxy created by voms-proxy-init
can expire, be destroyed and recreated without consequences on the job execution.
I have also checked that if myproxy
is renewed before it expires, the jobs keep on running!!
HelloWorld job
Here is an example of a simple "HelloWorld" job configuration file (JDL):
Type="Job"; JobType="Normal"; VirtualOrganisation = "virgo"; Executable="inputs.sh"; InputSandbox={"/home/collaalb/scripts/inputs.sh","/home/collaalb/scripts/stdin.txt"}; StdInput="stdin.txt"; Arguments=" arg1 arg2"; RetryCount = 3; StdOutput="std.out"; StdError="std.err"; OutputSandbox={"std.out","std.err"}; Requirements = RegExp ("cnaf.infn.it",other.GlueCEUniqueId)
The executable, inputs.sh
:
#!/bin/sh echo "Hello World!!" echo "This host is " `/bin/hostname` echo "I am here: " `pwd` echo "Disks mounted: " df -h echo "params are $1 $2" echo "Reading first parameter from stdIn ... " read i echo "1st parameter from stdIn says " $i echo "Reading 2nd parameter from stdIn ... " read j echo "2nd parameter from stdIn says " $j lcg-cp lfn:/grid/virgo/collaalb/test.txt file:`pwd`/test.txt echo "File got from Grid says:" cat test.txt echo "done!"
In this example it is shown how to set script arguments and standard input in the JDL, and how to access Grid data using LCG tools.
Note: only the main executable file (the Executable
) is made executable by the system. If the main script runs other programs sent to the worker nodes via the input sandbox, they must be explicitly made executable inside the main script ( chmod +x inner_program
).
The Requirements
parameter in the example indicates the CNAF as the execution site of the job. Requirements is an optional parameter. Regular expressions to link more requirements are allowed, e.g. to make the job run on either the Cnaf or Rome Virgo farm:
Requirements = RegExp ("cnaf.infn.it",other.GlueCEUniqueId) || RegExp("roma1.infn.it",other.GlueCEUniqueId)
Last note: all lines in the JDL must end with a semicolon, except the last one!
Parametric job type JDL
Parametric job type allows you to submit bulk of jobs as a single job. The relevant JDL parameters are:
JobType="Parametric"; Arguments=" arg1, arg2, ... , _PARAM_, ..., argN); Parameters={"par1", "par2", ... "parM"};
At the execution, the JDL will produce as many sub-jobs as the number of Parameters
. Each parameter will be put in place of the label _PARAM_
. Note that _PARAM_
can be used in other places of the JDL, for instance to indicate the standard output file name:
StdInput = "input_PARAM_.txt"; StdOutput = "output_PARAM_.txt"; StdError = "error_PARAM_.txt"; InputSandbox = {"myjob.exe", "input_PARAM_.txt"}; OutputSandbox = {"output_PARAM_.txt", "error_PARAM_.txt"};
More info on parametric jobs can be found here.
Multi-core job JDL
Some programs (e.g. Matlab compiled executables) can use as many CPU cores as they find on the worker nodes. It may be necessary to limit the number of jobs running on the same node (e.g. to shorten the execution time). This can be done with this set of JDL parameters:
WholeNodes = true; SMPgranularity = 4; CpuNumber = 4;
In this specific example the job requires 1 dedicated node with 4 CPUs and 4 cores.
Note: The parameter WholeNodes=True
alone is not enough! One has to specify the three of them!
Submit job
Before submitting, check the correctness of the JDL and the list of Computing Elements (CE) matching your job requirements:
glite-wms-job-list-match -a helloWorld.jdl
The output will look like (in this example we required to run on the Rome farm):
Connecting to the service https://wms024.cnaf.infn.it:7443/glite_wms_wmproxy_server ========================================================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* - virgo-ce.roma1.infn.it:8443/cream-pbs-virgoglong ==========================================================================
Submit the job:
glite-wms-job-submit -a -e https://wms-multi.grid.cnaf.infn.it:7443/glite_wms_wmproxy_server -o helloWorld.out helloWorld.jdl
where:
-
-e
sets the endpoint Workload Management System (WMS).wms-multi.grid.cnaf.infn.it
is an alias set up at Cnaf which automatically chooses the best WMS to manage the job; -
-o
means that the jobID is written in the filehelloWorld.out
, and can be used in theget-status
andget-output
commands with option-i
.
The output will look like:
Connecting to the service https://wms014.cnaf.infn.it:7443/glite_wms_wmproxy_server ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: https://wms014.cnaf.infn.it:9000/aQ6x5KKuJVWK6M6Ca6elq The job identifier has been saved in the following file: helloWorld.out ==========================================================================
Another useful option is --collection
, which will submit all JDL files in the specified directory.
Check job status
To get the job status:
glite-wms-job-status -i helloWorld.out
The output will look like:
======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : https://wms014.cnaf.infn.it:9000/aQ6x5KKuJVWK6M6Ca6elqQ Current Status: Scheduled Status Reason: unavailable Destination: virgo-ce.roma1.infn.it:8443/cream-pbs-virgoglong Submitted: Thu Nov 21 18:59:54 2013 CET ==========================================================================
Possible statuses are Waiting
, Scheduled
, Running
, Aborted
, Cancelled
, Done
, Done (Exit Code = 0)
.
If job is Parametric, the output will show an overall status and the status of all sub-jobs.
Get job output
Once the status of the job is Done
, its output can be retrieved:
glite-wms-job-output --dir jobOutput -i helloWorld.out
The directory jobOutput
must exist.
This will give:
Connecting to the service https://gridrb.fe.infn.it:7443/glite_wms_wmproxy_server ================================================================================ JOB GET OUTPUT OUTCOME Output sandbox files for the job: https://gridrb.fe.infn.it:9000/o7_V5aBdOseE1xKNvSP3tA have been successfully retrieved and stored in the directory: /home/collaalb/jobOutput/collaalb_o7_V5aBdOseE1xKNvSP3tA ================================================================================
Parametric sub-job output will be stored in sub-folders named as the parameter.