Document Actions

vdas_20110322

by Marie Anne Bizouard — last modified 2011-03-22 22:22

VDAS CNAF storage special meeting

Attendees: Luca Dell' Agnello, Karen Calabrese, MAB, Livio, Alberto, Franco, Romain, Gianluca

Agenda:
1 Castor transfer : status
2 Data access on gpfs_virgo4
- users's pbs
- solutions:data recall on disk, long term solutions
3 gpfs_virgo3 status
4 AOB

Minutes (MAB minutes' taker)

1. Castor migration is done

2. Data access

Alberto: jobs acceeding data on GEMSS are failing because of LSF time limit. When files are on disk there is no pb. Mean time of staging: 40s but it can be longer.

Luca: After the Castor migration to GEMSS we performed also a change of filsystem for gpfs_virgo4. That means that data were no more on disk but only on tapes. That's the reason why you were experiencing pbs. There are different solution depending on the dataset volume and access mode:

- pre-stage all dataset that are accessed enterely. That's what LHC experiments do.

- if the access is random and sparse, then the best option could be to just copy the files to a disk only space.

--> agreement to pre-stage data

The garbage collector is deleted the oldest files accessed, when the disk space usage reaches 90%

Alberto: how long will that take to pre-stage all VSR2 and VSR3 data (~ 200 TB)

Luca: can be long.

MAB: priorities must be set

Alberto: VSR3 raw, h(t), VSR2 raw h(t)

ACTION item for MAB: send the list of dataset to be pre-staged.

ACTION item for CNAF: pre-stage VSR2+VSR3. Send an email when it's done.

ACTION item for CNAF: provide pre-stage script. Could be useful for some Virgo users

Franco: will path remain identical?

Luca: yes

Romain: can I access VSR1 data?

MAB: your case corresponds to random access. Could you try to access files and see how things work. If that does not work, we should copy files in a special zone.

MAB not in favor of copying too many files a priori on special disks. First try the system and then find workaround.

3. gpfs_virgo3

MAB: Has the cleaning requested by CNAF been done?

Alberto: not 100% sure. Done my part.

Luca/Karen: we need to synchronize gpfs_virgo3. Need to shutdown disk and stop jobs.

ACTION item for CNAF: intervention will be done this thursday

ACTION item for MAB: send announcement to vdas and concerned people.

4. storm end point name needs to be change

old name: srm://storm-fe-virgo.cr.cnaf.infn.it/
new name: ?

MAB: which files are they?

Alberto : some files come from transfer tests

Luca: that concerns ~ 100 files.

MAB: do we really want to save these files that are mainly test files?

Alberto: yes would like to keep them

Agreement to save these files.

Alberto sent this email after the meeting:

"the following urls are certainly tests done by me and can be deleted:

srm://storm-fe-virgo.cr.cnaf.infn.it/virgo3/scratch/brovaalbe.iso
srm://storm-fe-virgo.cr.cnaf.infn.it/virgo3/scratch/provaAlbe
srm://storm-fe-virgo.cr.cnaf.infn.it/virgo3/provaAlbe3
srm://storm-fe-virgo.cr.cnaf.infn.it/virgo3/provaAlbe5

Also the 50Hz files like this:
srm://storm-fe-virgo.cr.cnaf.infn.it/virgo3/TSCascina//50Hz/0/V-973087200-06-Nov-2010-15h00-720F.50
are mine, but I would like to keep them to test the migration to the new endpoint. "

5. Data sender issues at Lyon

Franco: commissioning team is requested to access old data through datadisplay. when data are on tapes this does not work at all.

MAB: Data are on tapes and in the cache disk if they have not been deleted from the cache. I'm surprised that VSR3 raw data already disappeared. The cache disk is common to all experiments. Virgo is the biggest user ( ~ 100 TB). but the cache disk is at least 300TB large. Need to check what is in the cache now. I thought that all data transfered are automatically in the cache. That might be untrue.

MAB: do we know how many users would be using Lyon?

Franco: hard to say and the present quality of service is not encouraging users. Is that possible to move all recent data (VSR1/VSR2/VSR3) data in the cache?

MAB: the pb here is money not available resources. EGO is asking us to reduce the volume of data in disk.

ACTION item for MAB: check the cache disk situation at Lyon. Cleaning and stage the most recent data

Appendix: status of gpfs_virgo4 as of today

$ df -h |grep virgo4

Filesystem            Size Used Avail Use% Mounted on
/dev/gpfs_virgo4      350T 203T 148T 58% /storage/gpfs_virgo4

$ du -sh /storage/gpfs_virgo4/virgo/virgoD0T1/*
13T     /storage/gpfs_virgo4/virgo/virgoD0T1/DATA
190T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run
26M     /storage/gpfs_virgo4/virgo/virgoD0T1/USERS

$ du -sh /storage/gpfs_virgo4/virgo/virgoD0T1/Run/*
533G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/C0
1.3T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/C1
1.2T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/C2
1.3T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/C3
1.9T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/C4
1.3T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/C5
6.6T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/C6
2.1T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/C7
2.3T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/C8
625G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/E0
917G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/E1
933G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/E2
938G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/E3
1.1T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/E4
2.6T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/S5
9.3T    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/S6
9.3M    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/VA0
10T     /storage/gpfs_virgo4/virgo/virgoD0T1/Run/VA1
3.6G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/VA2
46T     /storage/gpfs_virgo4/virgo/virgoD0T1/Run/VA3
46T     /storage/gpfs_virgo4/virgo/virgoD0T1/Run/VSR1
36T     /storage/gpfs_virgo4/virgo/virgoD0T1/Run/VSR2
18T     /storage/gpfs_virgo4/virgo/virgoD0T1/Run/VSR3
132G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR1
94G     /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR10
320K    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR11
384K    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR12
1.7G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR13
109G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR2
69G     /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR5
79G     /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR6
124G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR7
41G     /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR8
239G    /storage/gpfs_virgo4/virgo/virgoD0T1/Run/WSR9

Karen's email:

2. final settings of the acl on gpfs_virgo4
""""""""""""""""""""""""""""""""""""""""""""""""""""
# file: storage/gpfs_virgo4/virgo/virgoD0T1
# owner: virgodata
# group: virgo
user::rwx
user:storm:rwx
group::r-x
group:storm:r-x
mask::rwx
other::--x
default:user::rwx
default:user:storm:rwx
default:group::r-x
default:group:storm:r-x
default:group:virgo:r-x
default:mask::rwx
default:other::--x
""""""""""""""""""""""""""""""""""""""""""""""""""
3. scheduling of a short downtime (about 1 hour) for switch on the new
"endpoint storm "storm-fe-archive.cr.cnaf.infn.it

Before of downtime:
4. purge of obsolete surls of virgo data replicated on
storm-fe-virgo.cr.cnaf.infn.it via Grid, and registered
in the LFC and produce the list of surls to be updated to
storm-fe-archive.cr.cnaf.infn.it.
You will find the list of the surls on
http://dl.dropbox.com/u/1798414/storm-fe-virgo-surl.txt (can we delete the
obsolete and testing data?)

Sections

Personal tools

Document Actions

vdas_20110322