Jobs stuck at 'Pilot Agent Submittion'
Hello, I've submitted a couple of jobs I am testing some software with and they have had the status 'Pilot Agent Submittion' for a week now. I was wondering if this is normal behaviour (at the moment)? Their ID's are 1824143 and 1824144. In the past jobs have completed much more quickly for me. Thanks very much, Will Furnell.
Hi Will, DIRAC is unable to submit CERN@School pilot jobs to QMUL at the moment due to a site configuration problem. I've just opened a GGUS ticket with them about this: https://ggus.eu/index.php?mode=ticket_info&ticket_id=126650 Once that's fixed your jobs should run, otherwise you could resubmit them and specify a different site (it looks like LCG.UKI-SCOTGRID-GLASGOW.uk is currently working for this VO). Regards, Simon On Wed, Feb 15, 2017 at 01:43:55PM +0000, Will Furnell wrote:
Hello,
I've submitted a couple of jobs I am testing some software with and they have had the status 'Pilot Agent Submittion' for a week now. I was wondering if this is normal behaviour (at the moment)? Their ID's are 1824143 and 1824144.
In the past jobs have completed much more quickly for me.
Thanks very much,
Will Furnell.
Hi, Thank you for your help and raising a ticket, I'll try and resubmit them to LCG.UKI-SCOTGRID-GLASGOW.uk and check the result. Will. On 15/02/2017 14:01, Simon Fayer wrote:
Hi Will,
DIRAC is unable to submit CERN@School pilot jobs to QMUL at the moment due to a site configuration problem. I've just opened a GGUS ticket with them about this: https://ggus.eu/index.php?mode=ticket_info&ticket_id=126650
Once that's fixed your jobs should run, otherwise you could resubmit them and specify a different site (it looks like LCG.UKI-SCOTGRID-GLASGOW.uk is currently working for this VO).
Regards, Simon
On Wed, Feb 15, 2017 at 01:43:55PM +0000, Will Furnell wrote:
Hello,
I've submitted a couple of jobs I am testing some software with and they have had the status 'Pilot Agent Submittion' for a week now. I was wondering if this is normal behaviour (at the moment)? Their ID's are 1824143 and 1824144.
In the past jobs have completed much more quickly for me.
Thanks very much,
Will Furnell.
Hello, Just an update on this, unfortunately I was unable to run a job on LCG.UKI-SCOTGRID-GLASGOW.uk, I presume becuase it was using DIRAC data as an inputfile and I don't have access to a LCG.UKI-SCOTGRID-GLASGOW.uk storage element, so I tried with the site I do have a storage element for - LCG.UKI-NORTHGRID-LIV-HEP.uk (UKI-NORTHGRID-LIV-HEP-disk) but I got the following error - FileCatalog error ( 1604 : Failed to perform getReplicas from any catalog) - and was wondering what the cause could be? I was able to get the file back on my local system fine (using dirac-dms-get-file LFN:/cernatschool.org/user/w/will.furnell/ga_testing_liv.zip). Thank you, Will. On 15/02/2017 14:16, Will Furnell wrote:
Hi,
Thank you for your help and raising a ticket, I'll try and resubmit them to LCG.UKI-SCOTGRID-GLASGOW.uk and check the result.
Will.
On 15/02/2017 14:01, Simon Fayer wrote:
Hi Will,
DIRAC is unable to submit CERN@School pilot jobs to QMUL at the moment due to a site configuration problem. I've just opened a GGUS ticket with them about this: https://ggus.eu/index.php?mode=ticket_info&ticket_id=126650
Once that's fixed your jobs should run, otherwise you could resubmit them and specify a different site (it looks like LCG.UKI-SCOTGRID-GLASGOW.uk is currently working for this VO).
Regards, Simon
On Wed, Feb 15, 2017 at 01:43:55PM +0000, Will Furnell wrote:
Hello,
I've submitted a couple of jobs I am testing some software with and they have had the status 'Pilot Agent Submittion' for a week now. I was wondering if this is normal behaviour (at the moment)? Their ID's are 1824143 and 1824144.
In the past jobs have completed much more quickly for me.
Thanks very much,
Will Furnell.
Hi Will, Please try removing the InputData section from your JDL; if InputData is specified the files have to be available at the site the job is running at, whereas if the file is listed in InputSandbox, it'll be copied from wherever it can be. (There isn't really any need to put a file in both the InputSandbox and InputData, one or the other should suffice depending on the behaviour you want). I've investigated the "FileCatalog error 1604" you saw, it wasn't caused by anything specific about your job. The File Catalogue was overloaded for around four minutes at (coincidentally) the same time you submitted your test job. I've made adjustments to some settings on the server to try and prevent this from happening again. Regards, Simon On Fri, Feb 17, 2017 at 05:08:40PM +0000, Will Furnell wrote:
Hello,
Just an update on this, unfortunately I was unable to run a job on LCG.UKI-SCOTGRID-GLASGOW.uk, I presume becuase it was using DIRAC data as an inputfile and I don't have access to a LCG.UKI-SCOTGRID-GLASGOW.uk storage element, so I tried with the site I do have a storage element for - LCG.UKI-NORTHGRID-LIV-HEP.uk (UKI-NORTHGRID-LIV-HEP-disk) but I got the following error - FileCatalog error ( 1604 : Failed to perform getReplicas from any catalog) - and was wondering what the cause could be?
I was able to get the file back on my local system fine (using dirac-dms-get-file LFN:/cernatschool.org/user/w/will.furnell/ga_testing_liv.zip).
Thank you,
Will.
Hi Simon, Thank you very much for your advice, I'll try and keep to either InputData or the InputSandbox from now on. I didn't know about the difference between them and InputSandbox looks useful in my case to increase the number of sites I can run jobs on. Thanks for investigating the error too - just bad luck to submit at the same time as the overloading! Best regards, Will. On 18/02/2017 19:47, Simon Fayer wrote:
Hi Will,
Please try removing the InputData section from your JDL; if InputData is specified the files have to be available at the site the job is running at, whereas if the file is listed in InputSandbox, it'll be copied from wherever it can be. (There isn't really any need to put a file in both the InputSandbox and InputData, one or the other should suffice depending on the behaviour you want).
I've investigated the "FileCatalog error 1604" you saw, it wasn't caused by anything specific about your job. The File Catalogue was overloaded for around four minutes at (coincidentally) the same time you submitted your test job. I've made adjustments to some settings on the server to try and prevent this from happening again.
Regards, Simon
On Fri, Feb 17, 2017 at 05:08:40PM +0000, Will Furnell wrote:
Hello,
Just an update on this, unfortunately I was unable to run a job on LCG.UKI-SCOTGRID-GLASGOW.uk, I presume becuase it was using DIRAC data as an inputfile and I don't have access to a LCG.UKI-SCOTGRID-GLASGOW.uk storage element, so I tried with the site I do have a storage element for - LCG.UKI-NORTHGRID-LIV-HEP.uk (UKI-NORTHGRID-LIV-HEP-disk) but I got the following error - FileCatalog error ( 1604 : Failed to perform getReplicas from any catalog) - and was wondering what the cause could be?
I was able to get the file back on my local system fine (using dirac-dms-get-file LFN:/cernatschool.org/user/w/will.furnell/ga_testing_liv.zip).
Thank you,
Will.
participants (2)
-
Simon Fayer
-
Will Furnell