DanielaRegards,and in its own way it is correct.I assume this should go with ce01.tier2.hep,manchester.ac.This tag is not set in the configuration system (and I am not aware of any requests to set it).appears in your JDL.Hi Rohini et al,I had another look at job 8920431 and I now noticed that
Tags = "skatelescope.eu.gpu"uk CE and the nordugrid-Condor-gpu queue ?Maybe Andrew can confirm ? I don't think we have tested tags on ARC-CEs yet, so I din't know if it will work even if I set it.The error message is a bit misleading, but I think what it is trying to tell you that there is no place with this tag and your dataOn 23 April 2018 at 11:46, Daniela Bauer <daniela.bauer.grid@googlemail.com > wrote:Hi Rohini,Please always include the mailing list. While Simon and me administer the DIRAC instance we don't actually use it and other people might be better placed to answer your questions.Specifically Job IDs 8920431, 8920312, 8907180,8897518 are failing with Input data errors. However we have confirmed that the input data does in fact exist and is accessible (locally with dirac-dms-get-file) This looks like a catalogue error. Unfortunately when I try and search the logs for the first job I find:
runit/WorkloadManagement/Optimizers_1/log/@400000005ad9a5a43 4a9aaa4.s:2018-04-19 10:37:53 UTC WorkloadManagement/Optimizers_ 1/WorkloadManagement/JobSchedu ling INFO: [JID 8920431] Single chosen site LCG.UKI-NORTHGRID-MAN-HEP.uk specified
runit/WorkloadManagement/Optimizers_1/log/@400000005ad9a5a43 4a9aaa4.s:2018-04-19 10:37:53 UTC WorkloadManagement/Optimizers_ 1/WorkloadManagement/JobSchedu ling INFO: [JID 8920431] Site candidates are ['CLOUD.Datacentred.uk', 'VAC.UKI-LT2-UCL-HEP.uk', 'VAC.UKI-NORTHGRID-MAN-HEP.uk' , 'LCG.UKI-NORTHGRID-MAN-HEP.uk' ]
runit/WorkloadManagement/Optimizers_1/log/@400000005ad9a5a43 4a9aaa4.s:2018-04-19 10:37:53 UTC WorkloadManagement/Optimizers_ 1/WorkloadManagement/JobSchedu ling INFO: [JID 8920431] No staging required
runit/WorkloadManagement/Optimizers_1/log/@400000005ad9a5a43 4a9aaa4.s:2018-04-19 10:37:53 UTC WorkloadManagement/Optimizers_ 1/WorkloadManagement/JobSchedu ling INFO: [JID 8920431] Only site LCG.UKI-NORTHGRID-MAN-HEP.uk is candidate
runit/WorkloadManagement/Optimizers_1/log/@400000005ad9a5a43 4a9aaa4.s:2018-04-19 10:37:53 UTC WorkloadManagement/Optimizers_ 1/WorkloadManagement/JobSchedu ling INFO: [JID 8920431] Done
which as you can see has no error, so I have nothing to go on. I really don't know what to do about this one, I will go and forward it to the DIRAC developers.(later it says:
runit/WorkloadManagement/Optimizers_1/log/@400000005ad9a5a43 4a9aaa4.s:2018-04-19 11:30:33 UTC WorkloadManagement/Optimizers_ 1/WorkloadManagement/JobSchedu ling INFO: [JID 8920431] Not in checking state. Avoid fast track but even that is not an error)Does the error above disappear when you rerun the jobs ?Also, from time to time I have seen jobs fail with ApplicationStatus 'Cannot retrieve banned sites from JobDB' (most recently Job ID 8897033) and also 'FileCatalog error ( 1604 : Failed to perform getReplicas from any catalog)' Job ID 8897076 (several from job group rohini.joshi.20180418103426) and Therese has seen this problem too with Job ID 8865042 These errors seem to be transient and at times re-running jobs resolves the problem.We assume this is a bug in DIRAC as this has come up for other DIRAC instances as well. We've done various modifications to our DIRAC instance (mainly more of everything, as it looks a bit like a load/access problem), but we cannot reproduce it on command, which makes debugging very hard. We'll keep looking.Just for some context, my jobs are uploading some data to RAL (in a lazy way) and are essentially just running gfal-copy command to upload data from DIRAC storage at Manchester to RAL. Therese's job is trying to run a singularity container on a Manchester GPU node.Do you have a retry loop (with a sleep between retries) for your uploads ?@Therese: How do you target the GPU queue ?Sorry that I can't be more helpful at the moment.Regards,Daniela
--Sent from the pit of despair
-----------------------------------------------------------
daniela.bauer@imperial.ac.uk
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/
--Sent from the pit of despair
-----------------------------------------------------------
daniela.bauer@imperial.ac.uk
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/