Hi, If I submit jobs with no destination to Dirac sometimes they go to CLOUD.CERN-PROD.ch which seems to send them to vcycle at Manchester. The interesting thing is these are 'Hello World' and typically take ~12secs executing. The ones going to CLOUD.CERN-PROD.ch take more like 1m 20sec - many times longer. Presumably this is some startup overhead that one wouldn't notice in a real job? See http://pprc.qmul.ac.uk/~lloyd/gridpp/gridtests/uktest.html for a summary. Cheers +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Prof Steve Lloyd Head of School of Physics and Astronomy Queen Mary University of London, Mile End Road, London E1 4NS, UK E-mail: s.l.lloyd@qmul.ac.uk Phone: +44-(0)207-882-6967 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Hi Steve, It looks like you're using the whole time that the job is in the "Running" state from the logging info. This includes the sandbox stage-in time (but bizarrely not the stage-out time), so it is probably just a slow network link into the WN/VM network. (For more accurate user payload timing it's best to only measure the "Job_<number> | Running" states). There isn't much more we can investigate about this as we don't get any pilot logs from VAC/vcycle sites (I don't know whether there is any plan to make these available in future versions, if such a log even exists on these nodes). Regards, Simon On Mon, Mar 14, 2016 at 05:12:53PM +0000, Steve Lloyd wrote:
Hi, If I submit jobs with no destination to Dirac sometimes they go to CLOUD.CERN-PROD.ch which seems to send them to vcycle at Manchester. The interesting thing is these are 'Hello World' and typically take ~12secs executing. The ones going to CLOUD.CERN-PROD.ch take more like 1m 20sec - many times longer. Presumably this is some startup overhead that one wouldn't notice in a real job? See http://pprc.qmul.ac.uk/~lloyd/gridpp/gridtests/uktest.html for a summary. Cheers +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Prof Steve Lloyd Head of School of Physics and Astronomy Queen Mary University of London, Mile End Road, London E1 4NS, UK E-mail: s.l.lloyd@qmul.ac.uk Phone: +44-(0)207-882-6967 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On 15 Mar 2016, at 11:45, Simon Fayer <simon.fayer05@IMPERIAL.AC.UK> wrote:
Hi Steve,
It looks like you're using the whole time that the job is in the "Running" state from the logging info. This includes the sandbox stage-in time (but bizarrely not the stage-out time), so it is probably just a slow network link into the WN/VM network. (For more accurate user payload timing it's best to only measure the "Job_<number> | Running" states).
There isn't much more we can investigate about this as we don't get any pilot logs from VAC/vcycle sites (I don't know whether there is any plan to make these available in future versions, if such a log even exists on these nodes).
There’s a quick hack for the VMs at the moment: https://depo.gridpp.ac.uk/gridpp-vm.tier2.hep.manchester.ac.uk/vcycle-cern.t... and The Plan is for DIRAC to have a general pilot log file service which will give you the output of the DIRAC pilot scripts. This will also address it for batch platforms where DIRAC doesn’t currently retrieve the pilot logs through the batch API (e.g. HTCondorCE.) Cheers Andrew -- Dr Andrew McNab University of Manchester High Energy Physics, LHCb@CERN (Distributed Computing Coordinator), and GridPP (LHCb + Tier-2 Evolution) www.hep.manchester.ac.uk/u/mcnab Skype: andrew.mcnab.uk
participants (3)
-
Andrew McNab
-
Simon Fayer
-
Steve Lloyd