vdas_20110622
Special VDAS meeting about Pilot jobs at CNAF
Attendance:
Giovanni Zizzi, Gergely, MAB
Minutes:
Summary of the problem: This year we had until few weeks ago up to 30 pilot jobs running without payload jobs. These jobs are doing nothing but are using a bit of resources. CNAF would like to see these pilot jobs killed. For Atlas and CMS pilot jobs, idle pilot jobs are killed after a time out (10mns). CNAF is proposing to do the same for Virgo. Few weeks ago Gergely stopped all the pilot jobs.
Pilot jobs are meant to wait for job submission from users (payload job). When a job is submitted, without pilot job running, the GRID WMS scheduler will launch a pilot job that manages the users job. If VMS is busy, it can take some time before the pilot job is started. That's the reason why Gergely decided to let run permanently a pool of PJ (convenience to perform tests). One solution would be to have only 1 PJ because the system is able to expand automatically.
Gergely: depending on users' need, there are different implementation solutions: no PJ or 1 PJ ...
MAB: cbc use, omega-scan use, other use? Need to clarify future PJ use with burst and cbc chairs
Action item for MAB: determine what will be the use of PJ for the coming months/years. Feedback for Gergely in order to decide to restart the PJ server or not.