Personal tools
You are here: Home Virgo Data Analysis Virgo Data Management at CNAF Bologna
Document Actions

Virgo Data Management at CNAF Bologna

by Alberto Colla last modified 2014-07-07 18:03

Information and basic instructions on how to use the storage facilities at Cnaf to archive and access data

Index


Introduction

Storage areas at CNAF

  • /storage/gpfs_virgo4 (labelled here as /virgo4)
  • /storage/gpfs_virgo3 (labelled here as /virgo3)
  • /opt/exp_software/virgo (software area)
  • Writable areas on the worker nodes

Data management at CNAF

  • Write on /virgo4 or /virgo3 using Grid tools
  • Download data from /virgo4 or /virgo3 with Grid tools
  • Remote file checksum
  • Check disk/tape migration status of /virgo4 data
  • Recall data from tape to disk (Experts only!)

Data management with local LSF jobs (bsub -f option)

  • Hints


Introduction


INFN-CNAF, the Italian Tier-1 located in Bologna, is one of the LHC Tier-1 and it houses computing and storage resources for many other particle physics and astrophysics experiments, including Virgo.
Storage at CNAF amounts to more than 10 PB of tape space and 6 PB of disk space. The centre has recently developed a new mass storage system called GEMSS (Grid Enabled Mass Storage System) 3.2 which proved to be an efficient solution to manage data archiving between disk and tape. The main components of the system are (see figure):




- a filesytem layer implemented by the GPFS (General Parallel File System) framework;
- the IBM TSM (Tivoli Storage Manager) software which manages the tape layer access;
- the StoRM layer that is used in conjunction with the GridFTP servers to provide remote Grid access.

GEMSS services manage data flow between disk bu ers and tape in an automatic and fully transparent way. Files created on the disk bu er are automatically copied to tape; when the disk disk are occupation is over a de fined threshold, the system replaces the disk copy of the "old" files ( files not being accessed for the longest time) with a pointer to the copy on tape ("stub-fi le"). If the fi le is later requested GEMSS automatically recalls the fi le back on disk.

Storage areas at CNAF


/storage/gpfs_virgo4 (labelled here as /virgo4)

  • Main storage area.
  • Disk-Tape area managed by GEMSS.
  • Mounted read-only on the user interfaces and on the computing nodes.
  • Writable only with Grid tools (see below)
  • Virgo data (Raw, h(t), 50 Hz, etc.) and Ligo h(t) are hosted on /virgo4:
/storage/gpfs_virgo4/virgo/virgoD0T1 
/storage/gpfs_virgo4/virgo/virgoD0T1/Run (Virgo data and Ligo h(t))
/storage/gpfs_virgo4/virgo/virgoD0T1/DATA (50 Hz. INGV_turbine, trend)

Note: the directory /storage/gpfs_virgo4/virgo/virgoD0T1 is not readable (if you list it you get "permission denied"), but inner folders, e.g. /storage/gpfs_virgo4/virgo/virgoD0T1/Run are readable!

To list the content of /storage/gpfs_virgo4/virgo/virgoD0T1 use lcg-ls:

lcg-ls -v --vo virgo -l  srm://storm-fe-archive.cr.cnaf.infn.it:8444/virgod0t1/

SE type: SRMv2
dr-xr-xr-x 1 2 2 0 UNKNOWN /virgod0t1/Run
d--------- 1 2 2 0 UNKNOWN /virgod0t1/DATA
dr-xr-xr-x 1 2 2 0 UNKNOWN /virgod0t1/USERS
dr-xr-xr-x 1 2 2 0 UNKNOWN /virgod0t1/NINJA2
collaalb@ui01-virgo:[~]
Note that the path /storage/gpfs_virgo4/virgoD0T1/ is mapped by SRM into /virgod0t1/ !

Data in /storage/gpfs_virgo4/virgoD0T1/ is now also accessible via Grid, e.g. (see below for more examples):

lcg-cp -v -b --vo virgo -T srmv2 srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgod0t1/Run/VSR1/rawdata/V-raw-864088320-240.gwf file:`pwd`/V-raw-864088320-240.gwf


/storage/gpfs_virgo3 (/virgo3)

  • User scratch area.
  • Disk only area
  • Mounted only on the user interface
  • Write accessible via Posix and via Grid
  • Limited bandwidth to the user interfaces, therefore heavy I/O is discouraged
  • Main /virgo3 areas:
/storage/gpfs_virgo3/ 
/storage/gpfs_virgo3/home (user area, home/<user> writable)
/storage/gpfs_virgo3/scratch (scratch, writable)
/storage/gpfs_virgo3/virgo (Writable with Grid tools)


/opt/exp_software/virgo

  • Virgo software area.
  • Mounted RW on the user interfaces and RO on the worker nodes.

Current repository of Virgo software is:
/opt/exp_software/virgo/VCS-8.0

To load the Virgo environment do:
source /opt/exp_software/virgo/VCS-8.0/System/Env.sh

User software can be installed on:
/opt/exp_software/virgo/virgoDev

Virgo and Ligo FFLs are in:
/opt/exp_software/virgo/virgoData/ffl/


Writable areas on the worker nodes


/tmp

  • It is the default working directory for jobs submitted in the local batch system (LSF).
  • Disk size: ~9 GB (warning! Check if job output can fit /tmp before submitting!)

/home/VIRGO/<user>

  • The wn local user home directory.
  • Note that it is not the same home directory on the Virgo UIs!
  • The directory is cleaned after job completion.
  • Size: 100-200 GB, depending on the wn
  • Do "cd /home/VIRGO/<user>" at the beginning of your job to move the working directory there!


Data management at CNAF: examples and hints


Write on /virgo4 or /virgo3 using Grid tools


To write to /virgo4 you need a Grid certificate and be registered on the Virgo VO. Instructions are ... (Work In Progress)

Simplest way to copy a file to /virgo4 and register on the LFC (works from any Grid user interface):
lcg-cr -d storm-fe-archive.cr.cnaf.infn -l /grid/virgo/testcnaf.txt file://etc/fstab

In this way the file is registered in the LFC (therefore it can be retrieved via Grid), but it cannot be accessed via POSIX because it is stored in an area of /virgo4 not reachable by users. In general use this slightly more complex version:
lcg-cr -v -b --vo virgo -l /grid/virgo/testcnaf2.txt -T srmv2 -d srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo4/virgo/test/testcnaf2.txt file:///tmp/test.txt

The file is registered in LFC as /grid/virgo/testcnaf.txt and is also accessible (read-only) from the UIs at Cnaf here:
/storage/gpfs_virgo4/virgo4/virgo/test/testcnaf2.txt

Note: the directory /storage/gpfs_virgo4/virgo4/ is not readable (if you list it you get "permission denied"), but /storage/gpfs_virgo4/virgo4/virgo is!

Another example: copy to /virgo4 with lcg-cp, without registering the file in the LFC:
lcg-cp -b -U srmv2 file:/etc/fstab srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo4/virgo/test/testcnaf3.txt

Example how to copy data to /virgo3 using Grid tools

lcg-cr -v -b --vo virgo -l /grid/virgo/testcnaf3.txt -d srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo3/virgo/test/testcnaf2.txt file:///tmp/test.txt
The file is POSIX accessible from the UI:
/storage/gpfs_virgo3/virgo/test/testcnaf3.txt


New! Third party copy from a GSIFTP endpoint to SRM:

lcg-cp -v -b --src-protocols gsiftp -U srmv2
        gsiftp://atlas1.atlas.aei.uni-hannover.de/home/collaalb/test
        srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo4/virgo/test

Folders behind GSIFTP servers can be listed with uberftp:

uberftp -dir gsiftp://atlas3.atlas.aei.uni-hannover.de//atlas/user/atlas3/


Download data from /virgo4 or /virgo3 with Grid tools


1. Using LFC
lcg-cp -v lfn:/grid/virgo/testcnaf2.txt file://`pwd`/test.txt

2. Not using LFC
lcg-cp -b -T srmv2 srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo4/virgo/test/testcnaf2.txt file://`pwd`/test2.txt

Note: job submitted via Grid can copy to or download files from /virgo4!


Remote file checksum

lcg-get-checksum srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo4/virgo/test/testcnaf2.txt 

Warning: currently the only supported checksum type is adler32!


Check disk/tape migration status of /virgo4 data


To check if a file, or files in a directory in /virgo4 is cached on disk or migrated to tape use yamssLs command:
yamssLs /storage/gpfs_virgo4/virgo4/virgo/test/   

rrw-rwx---+ 1 storm storm 1067 Apr 15 15:44 /storage/gpfs_virgo4/virgo4/virgo/test/test2.txt  
rrw-rwx---+ 1 storm storm 1067 Apr 15 15:41 /storage/gpfs_virgo4/virgo4/virgo/test/test.txt

The first letter of each row shows the migration status of the file:

  • r: file is resident, i.e. only the copy on disk exists;
  • m: file is migrated, i.e. it is present only on tape;
  • p: file is present on disk and on tape.

If file is migrated you can recall it on disk opening it. Do not do that with large data sets! Follow instructions below


Recall data from tape to disk (Experts only!)


Example how to request a recall (bring-on-line, bol) of a file on tape with Grid tools (clientSRM bol -e httpg://... -s srm://...):

clientSRM bol -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444/ -s srm://storm-fe-archive.cr.cnaf.infn.it:8444/virgod0t1/Run/VSR1/rawdata/V-raw-864176400-240.gwf

============================================================
Sending BringOnline request to: httpg://storm-fe-archive.cr.cnaf.infn.it:8444/
Before execute:
Afer execute:
Request Status Code 17
Poll Flag 0
============================================================
Request status:
statusCode="SRM_REQUEST_QUEUED"(17)
============================================================
SRM Response:
requestToken="93836438-d66b-4c4b-9e46-29a3f5f6078f"
arrayOfFileStatuses (size=1)
[0] sourceSURL="srm://storm-fe-archive.cr.cnaf.infn.it:8444/virgod0t1/Run/VSR1/rawdata/V-raw-864176400-240.gwf"
[0] status: statusCode="SRM_REQUEST_QUEUED"(17)
explanation=""
============================================================

Take note of the "requestToken" string. This is the key to query the status of the recall, to be done with clientSRM sbol -e httpg://...:

clientSRM sbol -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444/ -t 93836438-d66b-4c4b-9e46-29a3f5f6078f

============================================================
Sending StatusBOL request to: httpg://storm-fe-archive.cr.cnaf.infn.it:8444/
Before execute:
Afer execute:
Request Status Code 18
Poll Flag 0
============================================================
Request status:
statusCode="SRM_REQUEST_INPROGRESS"(18)
explanation="Request handled!"
============================================================
SRM Response:
arrayOfFileStatuses (size=1)
[0] sourceSURL="srm://storm-fe-archive.cr.cnaf.infn.it:8444/virgod0t1/Run/VSR1/rawdata/V-raw-864176400-240.gwf"
[0] fileSize=1498910938
[0] status: statusCode="SRM_REQUEST_INPROGRESS"(18)
explanation="Recalling file from tape"
============================================================
 
At the end of the recall process you will see:

============================================================
 Request status:
 statusCode="SRM_SUCCESS"(0)
 explanation="All chunks successfully handled!"
============================================================


Another way to check the recall status is with lcg-ls -l:

lcg-ls -v --vo virgo -l  srm://storm-fe-archive.cr.cnaf.infn.it:8444/virgod0t1/Run/VSR1/rawdata/V-raw-864176400-240.gwf

SE type: SRMv2
-r-xr-xr-x 1 2 2 1498910938 NEARLINE /virgod0t1/Run/VSR1/rawdata/V-raw-864176400-240.gwf

Status is NEARLINE if file is on tape, ONLINE_AND_NEARLINE if it is on disk and tape.

Note that the path /storage/gpfs_virgo4/virgoD0T1/ is mapped by SRM into /virgod0t1 !



Data management with local LSF jobs (bsub -f option)


Using Cnaf local batch system (LSF) it is not possible to use lcg tools to copy data to /virgo4.
The only way to retrieve output back to the user interface is to use the -f option of bsub.

Example:

bsub -f "run_output.tgz < /home/VIRGO/<user>/run_output.tgz" -f "run.log < run.log" -o run.log script.sh <args>

In this example the jobs writes the output data in a .tgz file on /home/VIRGO/<user>, and the stdout+stderr in run.log (more precisely, /tmp/run.log ! )

Hints

  • Put software files and libraries in /opt/exp_software/virgo/virgoDev
  • Put big size input data in /virgo4 using Grid tools, then access them from the worker nodes via POSIX or download them on the wn with Grid tools
  • Small size input data (up to few MB) can be sent to the worker nodes with bsub -f option:
bsub -f "inputdata.tgz > /home/VIRGO/<user>/inputdata.tgz" ...