******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone, I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... ) Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
Job submission failure Cannot get URL for WorkloadManagement/JobManager in setup DIRAC-Certification: RuntimeError('Option /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
job.setExecutable('cd my_package_dir && pip install -r requirements.txt --user && python setup.py install --user') job.setExecutable('my_package_dir/my_package/my_script.py') job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help Giuseppe -- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
Hi Giuseppe, I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there. As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory. Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go. Hope that helps, Daniela On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions
https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
Job submission failure Cannot get URL for WorkloadManagement/JobManager in setup DIRAC-Certification: RuntimeError('Option /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
job.setExecutable('cd my_package_dir && pip install -r requirements.txt --user && python setup.py install --user') job.setExecutable('my_package_dir/my_package/my_script.py') job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
-- ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
Hello Daniela, Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay. Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:
$ dirac-wms-job-status -f logfile JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site=LCG.RAL-LCG2.uk; So good progress!
I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there? Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily. Thanks again for you help Giuseppe On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.
As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.
Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.
Hope that helps, Daniela
On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc <http://ganga.cern.ch/dirac_ui/bashrc> as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
> Job submission failure Cannot get URL for > WorkloadManagement/JobManager in setup DIRAC-Certification: > RuntimeError('Option > /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
> job.setExecutable('cd my_package_dir && pip install -r > requirements.txt --user && python setup.py install --user') > job.setExecutable('my_package_dir/my_package/my_script.py') > job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
Hi Giuseppe, if you look a bit further down the page, the DIRAC UI also comes in python3, though it's less well tested. And definitely not on Ubuntu, but I guess someone has to go first. It might not work though and you might have to run this in a container, Rob should be able to help if that's a problem (cernvm should do). The yearly major GridPP DIRAC upgrade is scheduled for July 25th, after that it should be all python3, with just a python2 version of the UI being maintained for a while to allow users to upgrade at their own pace. We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting. Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ? Is this in any way shape or form related to the stuff Mark Holliman does ? Do you use cvmfs at all for your software ? Regards, Daniela On Tue, 28 Jun 2022 at 21:44, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hello Daniela,
Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay.
Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:
$ dirac-wms-job-status -f logfile JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site= LCG.RAL-LCG2.uk;
So good progress!
I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there?
Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips:// dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily.
Thanks again for you help
Giuseppe
On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.
As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.
Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.
Hope that helps, Daniela
On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions
https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
Job submission failure Cannot get URL for WorkloadManagement/JobManager in setup DIRAC-Certification: RuntimeError('Option /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
job.setExecutable('cd my_package_dir && pip install -r requirements.txt --user && python setup.py install --user') job.setExecutable('my_package_dir/my_package/my_script.py') job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
-- ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
Hi Daniela, Many thanks for your email. Regarding Python 3, you're right, sorry I didn't see it was already in the same page. Both local install and CVMFS repository work for me, which is good news.
We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting. Yes, thanks, that's what I intended to use.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ?
I run it in a virtual environment where I build Python and dependencies there. However I've also used pyinstaller and appimage for different applications, but I don't have specific experience with containers.
Is this in any way shape or form related to the stuff Mark Holliman does ?
I'm not sure what you're referring to when you say "the stuff Mark Holliman does". Essentially I've been running my code on the Euclid cluster at the Royal Observatory (~1,300 cores) and also on the Euclid-SAAS cluster (RAL+Cambridge+Imperial) (~3,000 cores). You'll probably know all that, but the latter is in between a cluster and a tiny grid in that it uses cephfs to mount the various sites so it effectively appears as a single cluster.
Do you use cvmfs at all for your software ? I usually prefer to compile from source into a custom environment. I can also access the CVMFS repositories if I need to.
Do you think I need to move my workflow to singularity? In that case, I'll need to customise the image and install the necessary dependencies including my package before I can run my script. Do you have guidance on that? Best wishes Giuseppe On 28/06/2022 22:39, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
if you look a bit further down the page, the DIRAC UI also comes in python3, though it's less well tested. And definitely not on Ubuntu, but I guess someone has to go first. It might not work though and you might have to run this in a container, Rob should be able to help if that's a problem (cernvm should do). The yearly major GridPP DIRAC upgrade is scheduled for July 25th, after that it should be all python3, with just a python2 version of the UI being maintained for a while to allow users to upgrade at their own pace. We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ? Is this in any way shape or form related to the stuff Mark Holliman does ? Do you use cvmfs at all for your software ?
Regards, Daniela
On Tue, 28 Jun 2022 at 21:44, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hello Daniela,
Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay.
Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:
$ dirac-wms-job-status -f logfile JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site=LCG.RAL-LCG2.uk <http://LCG.RAL-LCG2.uk>;
So good progress!
I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there?
Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server <http://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server>" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily.
Thanks again for you help
Giuseppe
On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.
As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.
Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.
Hope that helps, Daniela
On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc <http://ganga.cern.ch/dirac_ui/bashrc> as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
> Job submission failure Cannot get URL for > WorkloadManagement/JobManager in setup DIRAC-Certification: > RuntimeError('Option > /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
> job.setExecutable('cd my_package_dir && pip install -r > requirements.txt --user && python setup.py install --user') > job.setExecutable('my_package_dir/my_package/my_script.py') > job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
Hi Giuseppe, I would suggest as a next step, that, if your software is amenable to it, to tar it up, and ship it to a grid node and see how far you get. (Singularity, while currently fashionable, might actually be overkill.) Depending on the size of the tarball it can go in the sandbox or, if it's more than 10 MB, it's best if you upload it to a storage element or two first and get it from there rather than uploading it with the job. Let me know which option you prefer, we can give you the commands for uploading/replicating etc if you get stuck. I was asking about the SAAS stuff, because that's the only bit of Euclid computing I have encountered so far. Plus, if it runs on the Imperial cloud, chances of this working on the grid are much higher than if it didn't. Though if you need access to data, be warned there is no way to have nfs like mounts on the grid, but data access to/from grid storage elements is faster than most people expect. Most cvmfs repositories should be available on grid nodes and if there is one you need that isn't, please let us know, usually it is no problem to add them. Regards, Daniela On Wed, 29 Jun 2022 at 10:56, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Many thanks for your email. Regarding Python 3, you're right, sorry I didn't see it was already in the same page. Both local install and CVMFS repository work for me, which is good news.
We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Yes, thanks, that's what I intended to use.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ?
I run it in a virtual environment where I build Python and dependencies there. However I've also used pyinstaller and appimage for different applications, but I don't have specific experience with containers.
Is this in any way shape or form related to the stuff Mark Holliman does ?
I'm not sure what you're referring to when you say "the stuff Mark Holliman does". Essentially I've been running my code on the Euclid cluster at the Royal Observatory (~1,300 cores) and also on the Euclid-SAAS cluster (RAL+Cambridge+Imperial) (~3,000 cores). You'll probably know all that, but the latter is in between a cluster and a tiny grid in that it uses cephfs to mount the various sites so it effectively appears as a single cluster.
Do you use cvmfs at all for your software ?
I usually prefer to compile from source into a custom environment. I can also access the CVMFS repositories if I need to.
Do you think I need to move my workflow to singularity? In that case, I'll need to customise the image and install the necessary dependencies including my package before I can run my script. Do you have guidance on that?
Best wishes Giuseppe
On 28/06/2022 22:39, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
if you look a bit further down the page, the DIRAC UI also comes in python3, though it's less well tested. And definitely not on Ubuntu, but I guess someone has to go first. It might not work though and you might have to run this in a container, Rob should be able to help if that's a problem (cernvm should do). The yearly major GridPP DIRAC upgrade is scheduled for July 25th, after that it should be all python3, with just a python2 version of the UI being maintained for a while to allow users to upgrade at their own pace. We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ? Is this in any way shape or form related to the stuff Mark Holliman does ? Do you use cvmfs at all for your software ?
Regards, Daniela
On Tue, 28 Jun 2022 at 21:44, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hello Daniela,
Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay.
Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:
$ dirac-wms-job-status -f logfile JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site= LCG.RAL-LCG2.uk;
So good progress!
I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there?
Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily.
Thanks again for you help
Giuseppe
On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.
As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.
Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.
Hope that helps, Daniela
On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions
https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
Job submission failure Cannot get URL for WorkloadManagement/JobManager in setup DIRAC-Certification: RuntimeError('Option /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
job.setExecutable('cd my_package_dir && pip install -r requirements.txt --user && python setup.py install --user') job.setExecutable('my_package_dir/my_package/my_script.py') job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
-- ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
Hi Daniela, Thanks. SL7 does not seem to ship Python 3 (please confirm), so the interpreter needs to be in the environment. Unfortunately, the tarball of the environment is ~200MB (~700 MB uncompressed), of which 90% is taken up by the Python executable. So I think the options are either use the storage element, although I worry that the transfer time would still be significant so we'll need to replicate across many storage elements, or use a container but again I don't see much benefit there. Regarding the running on SAAS and other sites, yes, the environment has been tested on a number of machines, including Imperial. Please send me any specific advice / example script on how to best upload the environment to the storage element (compressed or uncompressed?) and replicate across the sites. Many thanks for your help. Giuseppe On 29/06/2022 12:07, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I would suggest as a next step, that, if your software is amenable to it, to tar it up, and ship it to a grid node and see how far you get. (Singularity, while currently fashionable, might actually be overkill.) Depending on the size of the tarball it can go in the sandbox or, if it's more than 10 MB, it's best if you upload it to a storage element or two first and get it from there rather than uploading it with the job. Let me know which option you prefer, we can give you the commands for uploading/replicating etc if you get stuck.
I was asking about the SAAS stuff, because that's the only bit of Euclid computing I have encountered so far. Plus, if it runs on the Imperial cloud, chances of this working on the grid are much higher than if it didn't. Though if you need access to data, be warned there is no way to have nfs like mounts on the grid, but data access to/from grid storage elements is faster than most people expect.
Most cvmfs repositories should be available on grid nodes and if there is one you need that isn't, please let us know, usually it is no problem to add them.
Regards, Daniela
On Wed, 29 Jun 2022 at 10:56, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Many thanks for your email. Regarding Python 3, you're right, sorry I didn't see it was already in the same page. Both local install and CVMFS repository work for me, which is good news.
We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Yes, thanks, that's what I intended to use.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ?
I run it in a virtual environment where I build Python and dependencies there. However I've also used pyinstaller and appimage for different applications, but I don't have specific experience with containers.
Is this in any way shape or form related to the stuff Mark Holliman does ?
I'm not sure what you're referring to when you say "the stuff Mark Holliman does". Essentially I've been running my code on the Euclid cluster at the Royal Observatory (~1,300 cores) and also on the Euclid-SAAS cluster (RAL+Cambridge+Imperial) (~3,000 cores). You'll probably know all that, but the latter is in between a cluster and a tiny grid in that it uses cephfs to mount the various sites so it effectively appears as a single cluster.
Do you use cvmfs at all for your software ?
I usually prefer to compile from source into a custom environment. I can also access the CVMFS repositories if I need to.
Do you think I need to move my workflow to singularity? In that case, I'll need to customise the image and install the necessary dependencies including my package before I can run my script. Do you have guidance on that?
Best wishes Giuseppe
On 28/06/2022 22:39, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
if you look a bit further down the page, the DIRAC UI also comes in python3, though it's less well tested. And definitely not on Ubuntu, but I guess someone has to go first. It might not work though and you might have to run this in a container, Rob should be able to help if that's a problem (cernvm should do). The yearly major GridPP DIRAC upgrade is scheduled for July 25th, after that it should be all python3, with just a python2 version of the UI being maintained for a while to allow users to upgrade at their own pace. We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ? Is this in any way shape or form related to the stuff Mark Holliman does ? Do you use cvmfs at all for your software ?
Regards, Daniela
On Tue, 28 Jun 2022 at 21:44, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hello Daniela,
Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay.
Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:
$ dirac-wms-job-status -f logfile JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site=LCG.RAL-LCG2.uk <http://LCG.RAL-LCG2.uk>;
So good progress!
I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there?
Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server <http://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server>" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily.
Thanks again for you help
Giuseppe
On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.
As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.
Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.
Hope that helps, Daniela
On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc <http://ganga.cern.ch/dirac_ui/bashrc> as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
> Job submission failure Cannot get URL for > WorkloadManagement/JobManager in setup DIRAC-Certification: > RuntimeError('Option > /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
> job.setExecutable('cd my_package_dir && pip install -r > requirements.txt --user && python setup.py install --user') > job.setExecutable('my_package_dir/my_package/my_script.py') > job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
Hi Giuseppe, As you noticed, SL7 tends not to have python3 installed by default, so you can't rely on it being on a node. To avoid pre-emptive optimization I would suggest uploading your compressed tarball to a couple of sites first and see how it goes. Given the limited number of sites you'll need at the start I would do it by hand and start with Imperial, because that makes debugging easier for us. Upload: In a dirac ui, with a gridpp proxy (please leave the directory structure as in the example for now): dirac-dms-add-file /gridpp/user/g/giuseppe.congedo/mytarball.tar mylocalcopyofthetarball.tar UKI-LT2-IC-HEP-disk If this succeeds, replicate it to a number of sites: dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SCOTGRID-ECDF-disk dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SOUTHGRID-RALPP-disk dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-NORTHGRID-LANCS-HEP-XGATE-disk (that's a shiny new one, I'd like to see if this work) (I picked one of each of the different T2 federations, but otherwise it's somewhat arbitrary. If you get an error on replication for one or more of the sites, please post it here.) Then please read the documentation on InputData/InputSandboxes (linked from the "Quick Guide to DIRAC page" or directly: https://www.gridpp.ac.uk/wiki/DIRAC_Data_Handling_within_a_Job and have a go. Regards, Daniela On Wed, 29 Jun 2022 at 12:49, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Thanks. SL7 does not seem to ship Python 3 (please confirm), so the interpreter needs to be in the environment. Unfortunately, the tarball of the environment is ~200MB (~700 MB uncompressed), of which 90% is taken up by the Python executable. So I think the options are either use the storage element, although I worry that the transfer time would still be significant so we'll need to replicate across many storage elements, or use a container but again I don't see much benefit there.
Regarding the running on SAAS and other sites, yes, the environment has been tested on a number of machines, including Imperial.
Please send me any specific advice / example script on how to best upload the environment to the storage element (compressed or uncompressed?) and replicate across the sites.
Many thanks for your help.
Giuseppe
On 29/06/2022 12:07, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I would suggest as a next step, that, if your software is amenable to it, to tar it up, and ship it to a grid node and see how far you get. (Singularity, while currently fashionable, might actually be overkill.) Depending on the size of the tarball it can go in the sandbox or, if it's more than 10 MB, it's best if you upload it to a storage element or two first and get it from there rather than uploading it with the job. Let me know which option you prefer, we can give you the commands for uploading/replicating etc if you get stuck.
I was asking about the SAAS stuff, because that's the only bit of Euclid computing I have encountered so far. Plus, if it runs on the Imperial cloud, chances of this working on the grid are much higher than if it didn't. Though if you need access to data, be warned there is no way to have nfs like mounts on the grid, but data access to/from grid storage elements is faster than most people expect.
Most cvmfs repositories should be available on grid nodes and if there is one you need that isn't, please let us know, usually it is no problem to add them.
Regards, Daniela
On Wed, 29 Jun 2022 at 10:56, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Many thanks for your email. Regarding Python 3, you're right, sorry I didn't see it was already in the same page. Both local install and CVMFS repository work for me, which is good news.
We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Yes, thanks, that's what I intended to use.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ?
I run it in a virtual environment where I build Python and dependencies there. However I've also used pyinstaller and appimage for different applications, but I don't have specific experience with containers.
Is this in any way shape or form related to the stuff Mark Holliman does ?
I'm not sure what you're referring to when you say "the stuff Mark Holliman does". Essentially I've been running my code on the Euclid cluster at the Royal Observatory (~1,300 cores) and also on the Euclid-SAAS cluster (RAL+Cambridge+Imperial) (~3,000 cores). You'll probably know all that, but the latter is in between a cluster and a tiny grid in that it uses cephfs to mount the various sites so it effectively appears as a single cluster.
Do you use cvmfs at all for your software ?
I usually prefer to compile from source into a custom environment. I can also access the CVMFS repositories if I need to.
Do you think I need to move my workflow to singularity? In that case, I'll need to customise the image and install the necessary dependencies including my package before I can run my script. Do you have guidance on that?
Best wishes Giuseppe
On 28/06/2022 22:39, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
if you look a bit further down the page, the DIRAC UI also comes in python3, though it's less well tested. And definitely not on Ubuntu, but I guess someone has to go first. It might not work though and you might have to run this in a container, Rob should be able to help if that's a problem (cernvm should do). The yearly major GridPP DIRAC upgrade is scheduled for July 25th, after that it should be all python3, with just a python2 version of the UI being maintained for a while to allow users to upgrade at their own pace. We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ? Is this in any way shape or form related to the stuff Mark Holliman does ? Do you use cvmfs at all for your software ?
Regards, Daniela
On Tue, 28 Jun 2022 at 21:44, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hello Daniela,
Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay.
Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:
$ dirac-wms-job-status -f logfile JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site= LCG.RAL-LCG2.uk;
So good progress!
I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there?
Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily.
Thanks again for you help
Giuseppe
On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.
As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.
Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.
Hope that helps, Daniela
On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo < giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions
https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
Job submission failure Cannot get URL for WorkloadManagement/JobManager in setup DIRAC-Certification: RuntimeError('Option /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
job.setExecutable('cd my_package_dir && pip install -r requirements.txt --user && python setup.py install --user') job.setExecutable('my_package_dir/my_package/my_script.py') job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
-- ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
Hi Daniela, Many thanks. Uploading was successful. Initially the replicate command failed for all three of them, complaining about the local file existing in the current directory. Then I moved one directory up and it worked except for Lancaster:
ERROR Completely failed to replicate file. Failed to replicate with all sources. Just to clarify, the file /gridpp/user/g/giuseppe.congedo/mytarball.tar.gz will need to be transferred to the node by pointing to it as input data (e.g. job.setInputData(['/gridpp/user/g/giuseppe.congedo/mytarball.tar.gz']) and then decompressed into the PWD (e.g. job.setExecutable('tar -xzvf mytarball.tar.gz') ?
Also thanks for sending me the wiki. Best wishes Giuseppe On 29/06/2022 14:04, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
As you noticed, SL7 tends not to have python3 installed by default, so you can't rely on it being on a node.
To avoid pre-emptive optimization I would suggest uploading your compressed tarball to a couple of sites first and see how it goes. Given the limited number of sites you'll need at the start I would do it by hand and start with Imperial, because that makes debugging easier for us.
Upload: In a dirac ui, with a gridpp proxy (please leave the directory structure as in the example for now): dirac-dms-add-file /gridpp/user/g/giuseppe.congedo/mytarball.tar mylocalcopyofthetarball.tar UKI-LT2-IC-HEP-disk If this succeeds, replicate it to a number of sites: dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SCOTGRID-ECDF-disk dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SOUTHGRID-RALPP-disk dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-NORTHGRID-LANCS-HEP-XGATE-disk (that's a shiny new one, I'd like to see if this work)
(I picked one of each of the different T2 federations, but otherwise it's somewhat arbitrary. If you get an error on replication for one or more of the sites, please post it here.)
Then please read the documentation on InputData/InputSandboxes (linked from the "Quick Guide to DIRAC page" or directly: https://www.gridpp.ac.uk/wiki/DIRAC_Data_Handling_within_a_Job and have a go.
Regards, Daniela
On Wed, 29 Jun 2022 at 12:49, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Thanks. SL7 does not seem to ship Python 3 (please confirm), so the interpreter needs to be in the environment. Unfortunately, the tarball of the environment is ~200MB (~700 MB uncompressed), of which 90% is taken up by the Python executable. So I think the options are either use the storage element, although I worry that the transfer time would still be significant so we'll need to replicate across many storage elements, or use a container but again I don't see much benefit there.
Regarding the running on SAAS and other sites, yes, the environment has been tested on a number of machines, including Imperial.
Please send me any specific advice / example script on how to best upload the environment to the storage element (compressed or uncompressed?) and replicate across the sites.
Many thanks for your help.
Giuseppe
On 29/06/2022 12:07, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I would suggest as a next step, that, if your software is amenable to it, to tar it up, and ship it to a grid node and see how far you get. (Singularity, while currently fashionable, might actually be overkill.) Depending on the size of the tarball it can go in the sandbox or, if it's more than 10 MB, it's best if you upload it to a storage element or two first and get it from there rather than uploading it with the job. Let me know which option you prefer, we can give you the commands for uploading/replicating etc if you get stuck.
I was asking about the SAAS stuff, because that's the only bit of Euclid computing I have encountered so far. Plus, if it runs on the Imperial cloud, chances of this working on the grid are much higher than if it didn't. Though if you need access to data, be warned there is no way to have nfs like mounts on the grid, but data access to/from grid storage elements is faster than most people expect.
Most cvmfs repositories should be available on grid nodes and if there is one you need that isn't, please let us know, usually it is no problem to add them.
Regards, Daniela
On Wed, 29 Jun 2022 at 10:56, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Many thanks for your email. Regarding Python 3, you're right, sorry I didn't see it was already in the same page. Both local install and CVMFS repository work for me, which is good news.
We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Yes, thanks, that's what I intended to use.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ?
I run it in a virtual environment where I build Python and dependencies there. However I've also used pyinstaller and appimage for different applications, but I don't have specific experience with containers.
Is this in any way shape or form related to the stuff Mark Holliman does ?
I'm not sure what you're referring to when you say "the stuff Mark Holliman does". Essentially I've been running my code on the Euclid cluster at the Royal Observatory (~1,300 cores) and also on the Euclid-SAAS cluster (RAL+Cambridge+Imperial) (~3,000 cores). You'll probably know all that, but the latter is in between a cluster and a tiny grid in that it uses cephfs to mount the various sites so it effectively appears as a single cluster.
Do you use cvmfs at all for your software ?
I usually prefer to compile from source into a custom environment. I can also access the CVMFS repositories if I need to.
Do you think I need to move my workflow to singularity? In that case, I'll need to customise the image and install the necessary dependencies including my package before I can run my script. Do you have guidance on that?
Best wishes Giuseppe
On 28/06/2022 22:39, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
if you look a bit further down the page, the DIRAC UI also comes in python3, though it's less well tested. And definitely not on Ubuntu, but I guess someone has to go first. It might not work though and you might have to run this in a container, Rob should be able to help if that's a problem (cernvm should do). The yearly major GridPP DIRAC upgrade is scheduled for July 25th, after that it should be all python3, with just a python2 version of the UI being maintained for a while to allow users to upgrade at their own pace. We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ? Is this in any way shape or form related to the stuff Mark Holliman does ? Do you use cvmfs at all for your software ?
Regards, Daniela
On Tue, 28 Jun 2022 at 21:44, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hello Daniela,
Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay.
Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:
$ dirac-wms-job-status -f logfile JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site=LCG.RAL-LCG2.uk <http://LCG.RAL-LCG2.uk>;
So good progress!
I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there?
Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server <http://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server>" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily.
Thanks again for you help
Giuseppe
On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.
As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.
Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.
Hope that helps, Daniela
On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc <http://ganga.cern.ch/dirac_ui/bashrc> as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
> Job submission failure Cannot get URL for > WorkloadManagement/JobManager in setup DIRAC-Certification: > RuntimeError('Option > /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
> job.setExecutable('cd my_package_dir && pip install -r > requirements.txt --user && python setup.py install --user') > job.setExecutable('my_package_dir/my_package/my_script.py') > job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
Hi Giuseppe, "complaining about the local file existing in the current directory" is one of those "features" in DIRAC that I would consider a bug. Glad you worked it out yourself, I'm still going to complain to core DIRAC (again...). Wrt Lancaster: I'll debug this with the site admin. I have to admit, I don't know if DIRAC will handle the unpacking automatically, I have a vague recollection it might do it for tar files, but not gz, but I might be wrong. I would recommend a wrapper script. If you set job.setInputData(['/gridpp/user/g/giuseppe.congedo/mytarball.tar.gz']) I think you'll end up with mytarball.tar.gz in your working directory on the node. Regards, Daniela On Wed, 29 Jun 2022 at 15:16, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Many thanks.
Uploading was successful. Initially the replicate command failed for all three of them, complaining about the local file existing in the current directory. Then I moved one directory up and it worked except for Lancaster:
ERROR Completely failed to replicate file. Failed to replicate with all sources.
Just to clarify, the file /gridpp/user/g/giuseppe.congedo/mytarball.tar.gz will need to be transferred to the node by pointing to it as input data (e.g. job.setInputData(['/gridpp/user/g/giuseppe.congedo/mytarball.tar.gz']) and then decompressed into the PWD (e.g. job.setExecutable('tar -xzvf mytarball.tar.gz') ?
Also thanks for sending me the wiki.
Best wishes Giuseppe
On 29/06/2022 14:04, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
As you noticed, SL7 tends not to have python3 installed by default, so you can't rely on it being on a node.
To avoid pre-emptive optimization I would suggest uploading your compressed tarball to a couple of sites first and see how it goes. Given the limited number of sites you'll need at the start I would do it by hand and start with Imperial, because that makes debugging easier for us.
Upload: In a dirac ui, with a gridpp proxy (please leave the directory structure as in the example for now): dirac-dms-add-file /gridpp/user/g/giuseppe.congedo/mytarball.tar mylocalcopyofthetarball.tar UKI-LT2-IC-HEP-disk If this succeeds, replicate it to a number of sites: dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SCOTGRID-ECDF-disk dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SOUTHGRID-RALPP-disk dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-NORTHGRID-LANCS-HEP-XGATE-disk (that's a shiny new one, I'd like to see if this work)
(I picked one of each of the different T2 federations, but otherwise it's somewhat arbitrary. If you get an error on replication for one or more of the sites, please post it here.)
Then please read the documentation on InputData/InputSandboxes (linked from the "Quick Guide to DIRAC page" or directly: https://www.gridpp.ac.uk/wiki/DIRAC_Data_Handling_within_a_Job and have a go.
Regards, Daniela
On Wed, 29 Jun 2022 at 12:49, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Thanks. SL7 does not seem to ship Python 3 (please confirm), so the interpreter needs to be in the environment. Unfortunately, the tarball of the environment is ~200MB (~700 MB uncompressed), of which 90% is taken up by the Python executable. So I think the options are either use the storage element, although I worry that the transfer time would still be significant so we'll need to replicate across many storage elements, or use a container but again I don't see much benefit there.
Regarding the running on SAAS and other sites, yes, the environment has been tested on a number of machines, including Imperial.
Please send me any specific advice / example script on how to best upload the environment to the storage element (compressed or uncompressed?) and replicate across the sites.
Many thanks for your help.
Giuseppe
On 29/06/2022 12:07, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I would suggest as a next step, that, if your software is amenable to it, to tar it up, and ship it to a grid node and see how far you get. (Singularity, while currently fashionable, might actually be overkill.) Depending on the size of the tarball it can go in the sandbox or, if it's more than 10 MB, it's best if you upload it to a storage element or two first and get it from there rather than uploading it with the job. Let me know which option you prefer, we can give you the commands for uploading/replicating etc if you get stuck.
I was asking about the SAAS stuff, because that's the only bit of Euclid computing I have encountered so far. Plus, if it runs on the Imperial cloud, chances of this working on the grid are much higher than if it didn't. Though if you need access to data, be warned there is no way to have nfs like mounts on the grid, but data access to/from grid storage elements is faster than most people expect.
Most cvmfs repositories should be available on grid nodes and if there is one you need that isn't, please let us know, usually it is no problem to add them.
Regards, Daniela
On Wed, 29 Jun 2022 at 10:56, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Many thanks for your email. Regarding Python 3, you're right, sorry I didn't see it was already in the same page. Both local install and CVMFS repository work for me, which is good news.
We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Yes, thanks, that's what I intended to use.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ?
I run it in a virtual environment where I build Python and dependencies there. However I've also used pyinstaller and appimage for different applications, but I don't have specific experience with containers.
Is this in any way shape or form related to the stuff Mark Holliman does ?
I'm not sure what you're referring to when you say "the stuff Mark Holliman does". Essentially I've been running my code on the Euclid cluster at the Royal Observatory (~1,300 cores) and also on the Euclid-SAAS cluster (RAL+Cambridge+Imperial) (~3,000 cores). You'll probably know all that, but the latter is in between a cluster and a tiny grid in that it uses cephfs to mount the various sites so it effectively appears as a single cluster.
Do you use cvmfs at all for your software ?
I usually prefer to compile from source into a custom environment. I can also access the CVMFS repositories if I need to.
Do you think I need to move my workflow to singularity? In that case, I'll need to customise the image and install the necessary dependencies including my package before I can run my script. Do you have guidance on that?
Best wishes Giuseppe
On 28/06/2022 22:39, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
if you look a bit further down the page, the DIRAC UI also comes in python3, though it's less well tested. And definitely not on Ubuntu, but I guess someone has to go first. It might not work though and you might have to run this in a container, Rob should be able to help if that's a problem (cernvm should do). The yearly major GridPP DIRAC upgrade is scheduled for July 25th, after that it should be all python3, with just a python2 version of the UI being maintained for a while to allow users to upgrade at their own pace. We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ? Is this in any way shape or form related to the stuff Mark Holliman does ? Do you use cvmfs at all for your software ?
Regards, Daniela
On Tue, 28 Jun 2022 at 21:44, Giuseppe Congedo < giuseppe.congedo@ed.ac.uk> wrote:
Hello Daniela,
Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay.
Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:
$ dirac-wms-job-status -f logfile JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site= LCG.RAL-LCG2.uk;
So good progress!
I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there?
Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily.
Thanks again for you help
Giuseppe
On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.
As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.
Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.
Hope that helps, Daniela
On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo < giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions
https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
Job submission failure Cannot get URL for WorkloadManagement/JobManager in setup DIRAC-Certification: RuntimeError('Option /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
job.setExecutable('cd my_package_dir && pip install -r requirements.txt --user && python setup.py install --user') job.setExecutable('my_package_dir/my_package/my_script.py') job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
-- ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
Hi Daniela, Thank you. Yesterday I managed to fetch the environment from the remote storage and unzip it locally. Then I tried to also source the environment:
job.setExecutable('/bin/tar -xvzf myenv.tar.gz && source myenv/bin/activate')
but hit two issues due to '&&' and 'source'. Which shell do the nodes use? What else might be going wrong? Is the problem in how I'm using the API? Also note that the && is because I'll need to source the environment so the correct interpreter for the script is picked up. That will need to be in the same job step/executable, hence the logical AND. Best wishes Giuseppe On 29/06/2022 15:58, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
"complaining about the local file existing in the current directory" is one of those "features" in DIRAC that I would consider a bug. Glad you worked it out yourself, I'm still going to complain to core DIRAC (again...). Wrt Lancaster: I'll debug this with the site admin.
I have to admit, I don't know if DIRAC will handle the unpacking automatically, I have a vague recollection it might do it for tar files, but not gz, but I might be wrong. I would recommend a wrapper script. If you set job.setInputData(['/gridpp/user/g/giuseppe.congedo/mytarball.tar.gz']) I think you'll end up with mytarball.tar.gz in your working directory on the node.
Regards, Daniela
On Wed, 29 Jun 2022 at 15:16, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Many thanks.
Uploading was successful. Initially the replicate command failed for all three of them, complaining about the local file existing in the current directory. Then I moved one directory up and it worked except for Lancaster:
ERROR Completely failed to replicate file. Failed to replicate with all sources.
Just to clarify, the file /gridpp/user/g/giuseppe.congedo/mytarball.tar.gz will need to be transferred to the node by pointing to it as input data (e.g. job.setInputData(['/gridpp/user/g/giuseppe.congedo/mytarball.tar.gz']) and then decompressed into the PWD (e.g. job.setExecutable('tar -xzvf mytarball.tar.gz') ?
Also thanks for sending me the wiki.
Best wishes Giuseppe
On 29/06/2022 14:04, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
As you noticed, SL7 tends not to have python3 installed by default, so you can't rely on it being on a node.
To avoid pre-emptive optimization I would suggest uploading your compressed tarball to a couple of sites first and see how it goes. Given the limited number of sites you'll need at the start I would do it by hand and start with Imperial, because that makes debugging easier for us.
Upload: In a dirac ui, with a gridpp proxy (please leave the directory structure as in the example for now): dirac-dms-add-file /gridpp/user/g/giuseppe.congedo/mytarball.tar mylocalcopyofthetarball.tar UKI-LT2-IC-HEP-disk If this succeeds, replicate it to a number of sites: dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SCOTGRID-ECDF-disk dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SOUTHGRID-RALPP-disk dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-NORTHGRID-LANCS-HEP-XGATE-disk (that's a shiny new one, I'd like to see if this work)
(I picked one of each of the different T2 federations, but otherwise it's somewhat arbitrary. If you get an error on replication for one or more of the sites, please post it here.)
Then please read the documentation on InputData/InputSandboxes (linked from the "Quick Guide to DIRAC page" or directly: https://www.gridpp.ac.uk/wiki/DIRAC_Data_Handling_within_a_Job and have a go.
Regards, Daniela
On Wed, 29 Jun 2022 at 12:49, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Thanks. SL7 does not seem to ship Python 3 (please confirm), so the interpreter needs to be in the environment. Unfortunately, the tarball of the environment is ~200MB (~700 MB uncompressed), of which 90% is taken up by the Python executable. So I think the options are either use the storage element, although I worry that the transfer time would still be significant so we'll need to replicate across many storage elements, or use a container but again I don't see much benefit there.
Regarding the running on SAAS and other sites, yes, the environment has been tested on a number of machines, including Imperial.
Please send me any specific advice / example script on how to best upload the environment to the storage element (compressed or uncompressed?) and replicate across the sites.
Many thanks for your help.
Giuseppe
On 29/06/2022 12:07, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I would suggest as a next step, that, if your software is amenable to it, to tar it up, and ship it to a grid node and see how far you get. (Singularity, while currently fashionable, might actually be overkill.) Depending on the size of the tarball it can go in the sandbox or, if it's more than 10 MB, it's best if you upload it to a storage element or two first and get it from there rather than uploading it with the job. Let me know which option you prefer, we can give you the commands for uploading/replicating etc if you get stuck.
I was asking about the SAAS stuff, because that's the only bit of Euclid computing I have encountered so far. Plus, if it runs on the Imperial cloud, chances of this working on the grid are much higher than if it didn't. Though if you need access to data, be warned there is no way to have nfs like mounts on the grid, but data access to/from grid storage elements is faster than most people expect.
Most cvmfs repositories should be available on grid nodes and if there is one you need that isn't, please let us know, usually it is no problem to add them.
Regards, Daniela
On Wed, 29 Jun 2022 at 10:56, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Many thanks for your email. Regarding Python 3, you're right, sorry I didn't see it was already in the same page. Both local install and CVMFS repository work for me, which is good news.
We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Yes, thanks, that's what I intended to use.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ?
I run it in a virtual environment where I build Python and dependencies there. However I've also used pyinstaller and appimage for different applications, but I don't have specific experience with containers.
Is this in any way shape or form related to the stuff Mark Holliman does ?
I'm not sure what you're referring to when you say "the stuff Mark Holliman does". Essentially I've been running my code on the Euclid cluster at the Royal Observatory (~1,300 cores) and also on the Euclid-SAAS cluster (RAL+Cambridge+Imperial) (~3,000 cores). You'll probably know all that, but the latter is in between a cluster and a tiny grid in that it uses cephfs to mount the various sites so it effectively appears as a single cluster.
Do you use cvmfs at all for your software ?
I usually prefer to compile from source into a custom environment. I can also access the CVMFS repositories if I need to.
Do you think I need to move my workflow to singularity? In that case, I'll need to customise the image and install the necessary dependencies including my package before I can run my script. Do you have guidance on that?
Best wishes Giuseppe
On 28/06/2022 22:39, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
if you look a bit further down the page, the DIRAC UI also comes in python3, though it's less well tested. And definitely not on Ubuntu, but I guess someone has to go first. It might not work though and you might have to run this in a container, Rob should be able to help if that's a problem (cernvm should do). The yearly major GridPP DIRAC upgrade is scheduled for July 25th, after that it should be all python3, with just a python2 version of the UI being maintained for a while to allow users to upgrade at their own pace. We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ? Is this in any way shape or form related to the stuff Mark Holliman does ? Do you use cvmfs at all for your software ?
Regards, Daniela
On Tue, 28 Jun 2022 at 21:44, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hello Daniela,
Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay.
Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:
$ dirac-wms-job-status -f logfile JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site=LCG.RAL-LCG2.uk <http://LCG.RAL-LCG2.uk>;
So good progress!
I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there?
Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server <http://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server>" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily.
Thanks again for you help
Giuseppe
On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.
As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.
Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.
Hope that helps, Daniela
On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc <http://ganga.cern.ch/dirac_ui/bashrc> as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
> Job submission failure Cannot get URL for > WorkloadManagement/JobManager in setup DIRAC-Certification: > RuntimeError('Option > /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
> job.setExecutable('cd my_package_dir && pip install -r > requirements.txt --user && python setup.py install --user') > job.setExecutable('my_package_dir/my_package/my_script.py') > job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
Hi Giuseppe, On Thu, Jun 30, 2022 at 09:10:15AM +0100, Giuseppe Congedo wrote:
job.setExecutable('/bin/tar -xvzf myenv.tar.gz && source myenv/bin/activate')
but hit two issues due to '&&' and 'source'. Which shell do the nodes use?
DIRAC launches the executable directly without a shell, which is probably why you're seeing problems. In the above it'll just run tar and pass everything else as arguments to that. The easiest way is probably to include the shell you want as part of the command: job.setExecutable('tar xvzf myenv.tar.gz') job.setExecutable('/bin/bash -c "source myenv/bin/activate && more-commands..."') The problem with the above is that it can easily become quite unreadable if you have a lot of commands to run in the environment. A neater (and more common) way to achieve this is to include a simple wrapper script that does all of the set-up commands and put that in the input sandbox too (it'll be uploaded when you submit the job): $ cat wrapper.sh #!/bin/bash tar xvzf myenv.tar.gz source myenv/bin/activate ... python my_main_script.py "${@}" job.setInputSanbox(["wrapper.sh", "LFN:/.../myenv.tar.gz"]) job.setExecutable('/bin/bash wrapper.sh specific_args_for_this_job') Regards, Simon
Also, apologies to the Lancaster site admin, the replicating to Lancaster problem is a DIRAC problem, not a Lancaster problem. It might take us a while to sort this out though. --Daniela On Wed, 29 Jun 2022 at 14:04, Daniela Bauer < daniela.bauer.grid@googlemail.com> wrote:
Hi Giuseppe,
As you noticed, SL7 tends not to have python3 installed by default, so you can't rely on it being on a node.
To avoid pre-emptive optimization I would suggest uploading your compressed tarball to a couple of sites first and see how it goes. Given the limited number of sites you'll need at the start I would do it by hand and start with Imperial, because that makes debugging easier for us.
Upload: In a dirac ui, with a gridpp proxy (please leave the directory structure as in the example for now): dirac-dms-add-file /gridpp/user/g/giuseppe.congedo/mytarball.tar mylocalcopyofthetarball.tar UKI-LT2-IC-HEP-disk If this succeeds, replicate it to a number of sites: dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SCOTGRID-ECDF-disk dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SOUTHGRID-RALPP-disk dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-NORTHGRID-LANCS-HEP-XGATE-disk (that's a shiny new one, I'd like to see if this work)
(I picked one of each of the different T2 federations, but otherwise it's somewhat arbitrary. If you get an error on replication for one or more of the sites, please post it here.)
Then please read the documentation on InputData/InputSandboxes (linked from the "Quick Guide to DIRAC page" or directly: https://www.gridpp.ac.uk/wiki/DIRAC_Data_Handling_within_a_Job and have a go.
Regards, Daniela
On Wed, 29 Jun 2022 at 12:49, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Thanks. SL7 does not seem to ship Python 3 (please confirm), so the interpreter needs to be in the environment. Unfortunately, the tarball of the environment is ~200MB (~700 MB uncompressed), of which 90% is taken up by the Python executable. So I think the options are either use the storage element, although I worry that the transfer time would still be significant so we'll need to replicate across many storage elements, or use a container but again I don't see much benefit there.
Regarding the running on SAAS and other sites, yes, the environment has been tested on a number of machines, including Imperial.
Please send me any specific advice / example script on how to best upload the environment to the storage element (compressed or uncompressed?) and replicate across the sites.
Many thanks for your help.
Giuseppe
On 29/06/2022 12:07, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I would suggest as a next step, that, if your software is amenable to it, to tar it up, and ship it to a grid node and see how far you get. (Singularity, while currently fashionable, might actually be overkill.) Depending on the size of the tarball it can go in the sandbox or, if it's more than 10 MB, it's best if you upload it to a storage element or two first and get it from there rather than uploading it with the job. Let me know which option you prefer, we can give you the commands for uploading/replicating etc if you get stuck.
I was asking about the SAAS stuff, because that's the only bit of Euclid computing I have encountered so far. Plus, if it runs on the Imperial cloud, chances of this working on the grid are much higher than if it didn't. Though if you need access to data, be warned there is no way to have nfs like mounts on the grid, but data access to/from grid storage elements is faster than most people expect.
Most cvmfs repositories should be available on grid nodes and if there is one you need that isn't, please let us know, usually it is no problem to add them.
Regards, Daniela
On Wed, 29 Jun 2022 at 10:56, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:
Hi Daniela,
Many thanks for your email. Regarding Python 3, you're right, sorry I didn't see it was already in the same page. Both local install and CVMFS repository work for me, which is good news.
We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Yes, thanks, that's what I intended to use.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ?
I run it in a virtual environment where I build Python and dependencies there. However I've also used pyinstaller and appimage for different applications, but I don't have specific experience with containers.
Is this in any way shape or form related to the stuff Mark Holliman does ?
I'm not sure what you're referring to when you say "the stuff Mark Holliman does". Essentially I've been running my code on the Euclid cluster at the Royal Observatory (~1,300 cores) and also on the Euclid-SAAS cluster (RAL+Cambridge+Imperial) (~3,000 cores). You'll probably know all that, but the latter is in between a cluster and a tiny grid in that it uses cephfs to mount the various sites so it effectively appears as a single cluster.
Do you use cvmfs at all for your software ?
I usually prefer to compile from source into a custom environment. I can also access the CVMFS repositories if I need to.
Do you think I need to move my workflow to singularity? In that case, I'll need to customise the image and install the necessary dependencies including my package before I can run my script. Do you have guidance on that?
Best wishes Giuseppe
On 28/06/2022 22:39, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
if you look a bit further down the page, the DIRAC UI also comes in python3, though it's less well tested. And definitely not on Ubuntu, but I guess someone has to go first. It might not work though and you might have to run this in a container, Rob should be able to help if that's a problem (cernvm should do). The yearly major GridPP DIRAC upgrade is scheduled for July 25th, after that it should be all python3, with just a python2 version of the UI being maintained for a while to allow users to upgrade at their own pace. We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity. How do you usually run your code ? Is this in any way shape or form related to the stuff Mark Holliman does ? Do you use cvmfs at all for your software ?
Regards, Daniela
On Tue, 28 Jun 2022 at 21:44, Giuseppe Congedo < giuseppe.congedo@ed.ac.uk> wrote:
Hello Daniela,
Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay.
Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:
$ dirac-wms-job-status -f logfile JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site= LCG.RAL-LCG2.uk;
So good progress!
I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there?
Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily.
Thanks again for you help
Giuseppe
On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Giuseppe,
I gather you are the gridpp user Rob Currie alluded to in an earlier email ? If so, welcome aboard. DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here: https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac Would you mind trying this out ? If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test. We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners. I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.
As for setting up any software inside your job: This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.
Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.
Hope that helps, Daniela
On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo < giuseppe.congedo@ed.ac.uk> wrote:
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hello everyone,
I am relatively new to gridPP, having obtained a certificate/VO membership only recently. I am writing because I have started experimenting with the Python DIRAC API (followed the official instructions
https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingCl... )
Apart one lucky job (which I managed to submit via /cvmfs/ganga.cern.ch/dirac_ui/bashrc as opposed to the manual procedure above, but failed probably due to me deleting the proxy), unfortunately all my other attempts have failed so far. This is the error I always get:
Job submission failure Cannot get URL for WorkloadManagement/JobManager in setup DIRAC-Certification: RuntimeError('Option /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined') I wonder if I am doing something wrong somewere.
Unrelated from the above, I was also wondering what environment the code will see when it arrives at the remote node. Can I sandbox my Python package, cd into the directory and install it along with all dependencies all in a job step? For instance:
job.setExecutable('cd my_package_dir && pip install -r requirements.txt --user && python setup.py install --user') job.setExecutable('my_package_dir/my_package/my_script.py') job.setInputSandbox([my_package_dir, other_files]) Also, are directories relative to the home?
Many thanks for all your help
Giuseppe
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
--
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
-- ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Dear All, After a bumpy ride and a steep learning curve, I have now managed to submit my first successful jobs on the grid! Let me start by saying that this would not have been possible without the help of Daniela Bauer, Simon Fayer, and later also Dan Whitehouse from Imperial. The purpose of this email is to summarise the experience, and perhaps provide some tips about how to get from just having a certificate and maybe a simple script submitted to the grid (like in my situation 10 days ago), to having a working large-scale submission up and running on the grid. 1) Get your certificate, read the guidance. 2) I found useful to install my local copy of diracos / Dirac API following the guidance on the wiki https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac#Dirac_client_installation in particular the Python 3 installation. Note, I was told that Python 3 will be the default from the end of July. You may want to use CVMFS instead, that is entirely up to you. 3) Once diracos is installed and you have a live proxy, you are ready to go. I will be sourcing diracos specifically for my submission script which I adapted from other runs on clusters managed with slurm. 4) Now it is the time to think about how to manage your code. In my case I have my Python package already source released in tarball, however I did that in the wrong way. I used python-build, but actually it is much easier to use python setup.py sdist from within the local environment. Otherwise you will likely have dependency mismatch somewhere down the line. 5) You will also need an environment to ship with your code. In this case we create a conda environment from an environment.yml, then use conda construct to build an installer file (~100MB). Test your environment+code locally; check the versions; check that everything works as expected. In my case I had to slightly tweak my requirement file to have set versions of Python dependencies, but I also had to change some versions (pyfftw and numpy). 6) Write a wrapper executable. This is what the worker will call each time. It will: 6.1) Set the script to stop execution whenever an error is hit [set -e] 6.2) Install the environment [bash my-env-installer.sh] 6.3) Source the environment [eval "$(my-env/bin/python my-env/bin/conda shell.bash hook)"] (the initial python is necessary to ensure the correct conda is picked up, thanks Dan for help) 6.4) Install your package [pip install my-package.tar.gz] 6.5) Run your script [python my_script.py "${@}"] (you can pass any parameters directly to the wrapper script) 7) Upload installer, package, wrapper, script, and any other input files to storage. Replicate across few more sites. Following recommendation, I used Imperial to make debugging easier. 8) Initialise/submit your job via Dirac API [job.setExecutable(f'bash wrapper.sh {args}'); job.setInputSandbox(my_file_list)] (here 'args' could be named keywords; also ensure you prepend 'LFN:' to your gridpp path if you want to run on all sites, otherwise just use job.setInputData). Thanks again to Simon and Daniela for help. Please feel free to chime in. Any questions about the specifics, please get back to us. I hope I have not forgotten anything critical, and that this will help others get started on the grid in no time. Best wishes Giuseppe -- Dr Giuseppe Congedo (Senior Researcher) Institute for Astronomy, University of Edinburgh Royal Observatory, Blackford Hill Edinburgh, EH9 3HJ The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
participants (3)
- 
                
                Daniela Bauer
- 
                
                Giuseppe Congedo
- 
                
                Simon Fayer