******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Dear gridpp-dirac-users, I've recently been asked if there are any grid resources for running "long-lived jobs" within the UK grid? Given that proxy-certs are typically only valid for 48hr does anyone know. Is there any support for running ~7day jobs via the DIRAC system? Or, Is there any potential of this being supported in the future? * I had a quick look through the maxCPUTime entries in the DIRAC config and I see that Manchester advertises a very long (multi-year(?)) parameter here, but most appear to be around the 48hr walltime. As a worst-case scenario our back-end batch-system at Edinburgh supports running weeklong jobs. However, that means users having to split their job management between 2 systems and must become more involved in topics such as data-management, which I'd rather avoid if possible. Thanks for any help, Rob *PS: This isn't the first time I've heard this come up in a conversation when discussing grid-use with non-LHC communities so I'm not sure if there is a plan here. The style of workflow seems to be like what LSST users were seeing at some point in the past. Most jobs are well defined and short lived, but a small percentage of jobs, which can't be broken down smaller, take a very long time to process. When this is combined with the fact, they can't be identified ahead of time these prove difficult to handle. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th' ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
Hi Rob, AFAIK most LHC VOs use 7 day proxies (LSST uses 4 days), as proxy renewal was always dodgy. As for the queues, we mostly take whatever we are given as the bdii as a given and only fix stuff/file tickets if we get a complaint. I have to admit, I try not to think about this, but DIRAC does all kinds of proxy renewals. So if you have a 7 day grid queue, it would probably work. Having said this, I thought LSST was using Panda as their workflow manager and not DIRAC, so I am not sure we are the right people to ask any longer (though the LSST <-> DIRAC interface should be working, as we never decommissioned it, and it's still in the nagios). Regards, Daniela On Tue, 28 Jun 2022 at 12:45, CURRIE Robert <Rob.Currie@ed.ac.uk> wrote:
This email from Rob.Currie@ed.ac.uk originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list <https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
Dear gridpp-dirac-users,
I've recently been asked if there are any grid resources for running "long-lived jobs" within the UK grid?
Given that proxy-certs are typically only valid for 48hr does anyone know.
Is there any support for running ~7day jobs via the DIRAC system? Or, Is there any potential of this being supported in the future? *
I had a quick look through the maxCPUTime entries in the DIRAC config and I see that Manchester advertises a very long (multi-year(?)) parameter here, but most appear to be around the 48hr walltime.
As a worst-case scenario our back-end batch-system at Edinburgh supports running weeklong jobs. However, that means users having to split their job management between 2 systems and must become more involved in topics such as data-management, which I'd rather avoid if possible.
Thanks for any help,
Rob
*PS:
This isn't the first time I've heard this come up in a conversation when discussing grid-use with non-LHC communities so I'm not sure if there is a plan here. The style of workflow seems to be like what LSST users were seeing at some point in the past. Most jobs are well defined and short lived, but a small percentage of jobs, which can't be broken down smaller, take a very long time to process. When this is combined with the fact, they can't be identified ahead of time these prove difficult to handle. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336. -- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
-- ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
Hi Daniela, In this case I'm asking on behalf of a Gridpp VO member (from a smaller experiment without their own VO), OK, well I'll try to get them to submit their longer walltimes and see what sort of experience they get. If the job runs on the 2nd or 3rd attempt at another site, then I think that is a reasonable thing to do until all the jobs are finished. If we hit problems, I'll let you know, Thanks for the help, Rob ________________________________ From: gridpp-dirac-users-bounces@imperial.ac.uk <gridpp-dirac-users-bounces@imperial.ac.uk> on behalf of Daniela Bauer <daniela.bauer.grid@googlemail.com> Sent: 28 June 2022 13:12 To: gridpp-dirac-users@imperial.ac.uk <gridpp-dirac-users@imperial.ac.uk> Subject: Re: [Gridpp-Dirac-Users] Limits on DIRAC jobs This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Rob, AFAIK most LHC VOs use 7 day proxies (LSST uses 4 days), as proxy renewal was always dodgy. As for the queues, we mostly take whatever we are given as the bdii as a given and only fix stuff/file tickets if we get a complaint. I have to admit, I try not to think about this, but DIRAC does all kinds of proxy renewals. So if you have a 7 day grid queue, it would probably work. Having said this, I thought LSST was using Panda as their workflow manager and not DIRAC, so I am not sure we are the right people to ask any longer (though the LSST <-> DIRAC interface should be working, as we never decommissioned it, and it's still in the nagios). Regards, Daniela On Tue, 28 Jun 2022 at 12:45, CURRIE Robert <Rob.Currie@ed.ac.uk<mailto:Rob.Currie@ed.ac.uk>> wrote: This email from Rob.Currie@ed.ac.uk<mailto:Rob.Currie@ed.ac.uk> originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address. Dear gridpp-dirac-users, I've recently been asked if there are any grid resources for running "long-lived jobs" within the UK grid? Given that proxy-certs are typically only valid for 48hr does anyone know. Is there any support for running ~7day jobs via the DIRAC system? Or, Is there any potential of this being supported in the future? * I had a quick look through the maxCPUTime entries in the DIRAC config and I see that Manchester advertises a very long (multi-year(?)) parameter here, but most appear to be around the 48hr walltime. As a worst-case scenario our back-end batch-system at Edinburgh supports running weeklong jobs. However, that means users having to split their job management between 2 systems and must become more involved in topics such as data-management, which I'd rather avoid if possible. Thanks for any help, Rob *PS: This isn't the first time I've heard this come up in a conversation when discussing grid-use with non-LHC communities so I'm not sure if there is a plan here. The style of workflow seems to be like what LSST users were seeing at some point in the past. Most jobs are well defined and short lived, but a small percentage of jobs, which can't be broken down smaller, take a very long time to process. When this is combined with the fact, they can't be identified ahead of time these prove difficult to handle. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336. -- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk<mailto:Gridpp-Dirac-Users@imperial.ac.uk> https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users -- ----------------------------------------------------------- daniela.bauer@imperial.ac.uk<mailto:daniela.bauer@imperial.ac.uk> HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
Hi Rob, Did you check with the sites if they allow long jobs ? If so, can you let me know which ones you are targeting ? Because if a test cycle takes about 2 days, that's way too long for a trial and error approach. Regards, Daniela On Tue, 28 Jun 2022 at 13:42, CURRIE Robert <Rob.Currie@ed.ac.uk> wrote:
Hi Daniela,
In this case I'm asking on behalf of a Gridpp VO member (from a smaller experiment without their own VO),
OK, well I'll try to get them to submit their longer walltimes and see what sort of experience they get. If the job runs on the 2nd or 3rd attempt at another site, then I think that is a reasonable thing to do until all the jobs are finished. If we hit problems, I'll let you know,
Thanks for the help,
Rob ------------------------------ *From:* gridpp-dirac-users-bounces@imperial.ac.uk < gridpp-dirac-users-bounces@imperial.ac.uk> on behalf of Daniela Bauer < daniela.bauer.grid@googlemail.com> *Sent:* 28 June 2022 13:12 *To:* gridpp-dirac-users@imperial.ac.uk <gridpp-dirac-users@imperial.ac.uk
*Subject:* Re: [Gridpp-Dirac-Users] Limits on DIRAC jobs
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Rob,
AFAIK most LHC VOs use 7 day proxies (LSST uses 4 days), as proxy renewal was always dodgy. As for the queues, we mostly take whatever we are given as the bdii as a given and only fix stuff/file tickets if we get a complaint. I have to admit, I try not to think about this, but DIRAC does all kinds of proxy renewals. So if you have a 7 day grid queue, it would probably work. Having said this, I thought LSST was using Panda as their workflow manager and not DIRAC, so I am not sure we are the right people to ask any longer (though the LSST <-> DIRAC interface should be working, as we never decommissioned it, and it's still in the nagios).
Regards, Daniela
On Tue, 28 Jun 2022 at 12:45, CURRIE Robert <Rob.Currie@ed.ac.uk> wrote:
This email from Rob.Currie@ed.ac.uk originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list <https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
Dear gridpp-dirac-users,
I've recently been asked if there are any grid resources for running "long-lived jobs" within the UK grid?
Given that proxy-certs are typically only valid for 48hr does anyone know.
Is there any support for running ~7day jobs via the DIRAC system? Or, Is there any potential of this being supported in the future? *
I had a quick look through the maxCPUTime entries in the DIRAC config and I see that Manchester advertises a very long (multi-year(?)) parameter here, but most appear to be around the 48hr walltime.
As a worst-case scenario our back-end batch-system at Edinburgh supports running weeklong jobs. However, that means users having to split their job management between 2 systems and must become more involved in topics such as data-management, which I'd rather avoid if possible.
Thanks for any help,
Rob
*PS:
This isn't the first time I've heard this come up in a conversation when discussing grid-use with non-LHC communities so I'm not sure if there is a plan here. The style of workflow seems to be like what LSST users were seeing at some point in the past. Most jobs are well defined and short lived, but a small percentage of jobs, which can't be broken down smaller, take a very long time to process. When this is combined with the fact, they can't be identified ahead of time these prove difficult to handle. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336. -- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/ -- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
-- ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
participants (2)
- 
                
                CURRIE Robert
- 
                
                Daniela Bauer