Hi Daniela,

Thank you. Yesterday I managed to fetch the environment from the remote storage and unzip it locally. Then I tried to also source the environment:

job.setExecutable('/bin/tar -xvzf myenv.tar.gz && source myenv/bin/activate')

but hit two issues due to '&&' and 'source'. Which shell do the nodes use? What else might be going wrong? Is the problem in how I'm using the API?

Also note that the && is because I'll need to source the environment so the correct interpreter for the script is picked up. That will need to be in the same job step/executable, hence the logical AND.

Best wishes
Giuseppe


On 29/06/2022 15:58, Daniela Bauer wrote:
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
Hi Giuseppe,

"complaining about the local file existing in the current directory" is one of those "features" in DIRAC that I would consider a bug. Glad you worked it out yourself, I'm still going to complain to core DIRAC (again...).
Wrt Lancaster: I'll debug this with the site admin.

I have to admit, I don't know if DIRAC will handle the unpacking automatically, I have a vague recollection it might do it for tar files, but not gz, but I might be wrong. I would recommend a wrapper script.
If you set
job.setInputData(['/gridpp/user/g/giuseppe.congedo/mytarball.tar.gz'])
I think you'll end up with mytarball.tar.gz in your working directory on the node.

Regards,
Daniela

 



On Wed, 29 Jun 2022 at 15:16, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:

Hi Daniela,

Many thanks.

Uploading was successful. Initially the replicate command failed for all three of them, complaining about the local file existing in the current directory. Then I moved one directory up and it worked except for Lancaster:

ERROR Completely failed to replicate file. Failed to replicate with all sources.
Just to clarify, the file /gridpp/user/g/giuseppe.congedo/mytarball.tar.gz will need to be transferred to the node by pointing to it as input data (e.g. job.setInputData(['/gridpp/user/g/giuseppe.congedo/mytarball.tar.gz']) and then decompressed into the PWD (e.g. job.setExecutable('tar -xzvf mytarball.tar.gz') ?

Also thanks for sending me the wiki.

Best wishes
Giuseppe


On 29/06/2022 14:04, Daniela Bauer wrote:
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
Hi Giuseppe,

As you noticed, SL7 tends not to have python3 installed by default, so you can't rely on it being on a node.

To avoid pre-emptive optimization I would suggest uploading your compressed tarball to a couple of sites first and see how it goes. Given the limited number of sites you'll need at the start I would do it by hand and start with Imperial, because that makes debugging easier for us.

Upload:
In a dirac ui, with a gridpp proxy (please leave the directory structure as in the example for now):
dirac-dms-add-file /gridpp/user/g/giuseppe.congedo/mytarball.tar mylocalcopyofthetarball.tar UKI-LT2-IC-HEP-disk
If this succeeds, replicate it to a number of sites:
dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SCOTGRID-ECDF-disk
dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar UKI-SOUTHGRID-RALPP-disk
dirac-dms-replicate-lfn /gridpp/user/g/giuseppe.congedo/mytarball.tar  UKI-NORTHGRID-LANCS-HEP-XGATE-disk (that's a shiny new one, I'd like to see if this work)

(I picked one of each of the different T2 federations, but otherwise it's somewhat arbitrary. If you get an error on replication for one or more of the sites, please post it here.)

Then please read the documentation on InputData/InputSandboxes (linked from the "Quick Guide to DIRAC page" or directly:
and have a go.

Regards,
Daniela


On Wed, 29 Jun 2022 at 12:49, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:

Hi Daniela,

Thanks. SL7 does not seem to ship Python 3 (please confirm), so the interpreter needs to be in the environment. Unfortunately, the tarball of the environment is ~200MB (~700 MB uncompressed), of which 90% is taken up by the Python executable. So I think the options are either use the storage element, although I worry that the transfer time would still be significant so we'll need to replicate across many storage elements, or use a container but again I don't see much benefit there.

Regarding the running on SAAS and other sites, yes, the environment has been tested on a number of machines, including Imperial.

Please send me any specific advice / example script on how to best upload the environment to the storage element (compressed or uncompressed?) and replicate across the sites.

Many thanks for your help.

Giuseppe



On 29/06/2022 12:07, Daniela Bauer wrote:
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
Hi Giuseppe,

I would suggest as a next step, that, if your software is amenable to it, to tar it up, and ship it to a grid node and see how far you get.  (Singularity, while currently fashionable, might actually be overkill.)
Depending on the size of the tarball it can go in the sandbox or,  if it's more than 10 MB, it's best if you upload it to a storage element or two first and get it from there rather than uploading it with the job.
Let me know which option you prefer, we can give you the commands for uploading/replicating etc if you get stuck.

I was asking about the SAAS stuff, because that's the only bit of Euclid computing I have encountered so far. Plus, if it runs on the Imperial cloud, chances of this working on the grid are much higher than if it didn't.
Though if you need access to data, be warned there is no way to have nfs like mounts on the grid, but data access to/from grid storage elements is faster than most people expect.

Most cvmfs repositories should be available on grid nodes and if there is one you need that isn't, please let us know, usually it is no problem to add them.

Regards,
Daniela


On Wed, 29 Jun 2022 at 10:56, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:

Hi Daniela,

Many thanks for your email. Regarding Python 3, you're right, sorry I didn't see it was already in the same page. Both local install and CVMFS repository work for me, which is good news.

We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.
Yes, thanks, that's what I intended to use.
Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity.
How do you usually run your code ?

I run it in a virtual environment where I build Python and dependencies there. However I've also used pyinstaller and appimage for different applications, but I don't have specific experience with containers.

Is this in any way shape or form related to the stuff Mark Holliman does ?

I'm not sure what you're referring to when you say "the stuff Mark Holliman does". Essentially I've been running my code on the Euclid cluster at the Royal Observatory (~1,300 cores) and also on the Euclid-SAAS cluster (RAL+Cambridge+Imperial) (~3,000 cores). You'll probably know all that, but the latter is in between a cluster and a tiny grid in that it uses cephfs to mount the various sites so it effectively appears as a single cluster.

Do you use cvmfs at all for your software ?
I usually prefer to compile from source into a custom environment. I can also access the CVMFS repositories if I need to.

Do you think I need to move my workflow to singularity? In that case, I'll need to customise the image and install the necessary dependencies including my package before I can run my script. Do you have guidance on that?

Best wishes
Giuseppe


On 28/06/2022 22:39, Daniela Bauer wrote:
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
Hi Giuseppe,

if you look a bit further down the page, the DIRAC UI also comes in python3, though it's less well tested. And definitely not on Ubuntu, but I guess someone has to go first.
It might not work though and you might have to run this in a container, Rob should be able to help if that's a problem (cernvm should do).
The yearly major GridPP DIRAC upgrade is scheduled for July 25th, after that it should be all python3, with just a python2 version of the UI being maintained for a while to allow users to upgrade at their own pace.
We usually recommend the python API for DIRAC if users a) seem to be familiar with python and b) might have to do some heavy lifting.

Which leads me to the next question: Grid worker nodes are typically SL7 nodes. Most people who need something more exciting typically run in a container using singularity.
How do you usually run your code ? Is this in any way shape or form related to the stuff Mark Holliman does ? Do you use cvmfs at all for your software ?

Regards,
Daniela

On Tue, 28 Jun 2022 at 21:44, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:

Hello Daniela,

Thank you for your email. That is correct, Rob has helped me with the first steps. Yes, I have reviewed the earlier thread and it was likely me Rob was talking about. I am trying to run a big cosmic shear simulation for Euclid, something like ~3,000,000 jobs and I hope gridPP will help me achieve that challenging goal! The vast majority of the jobs complete in a few hours, but some (hard to predict) will sometimes take a few days, but a 7 day walltime should be okay.

Thanks for suggesting the guide. Initially I thought I was in a fresh terminal, but hit an issue with the paths. Afterwards I managed to submit and run a job:

$ dirac-wms-job-status -f logfile
JobID=34498902 Status=Done; MinorStatus=Execution Complete; Site=LCG.RAL-LCG2.uk;
So good progress!

I noticed that dirac_ui uses Python 2, so had to quick and dirty symlink my system Python 3 local/bin/python directory (I am on Ubuntu 20.04). Unfortunately, my submission script and all my code is in Python 3. Any ideas there?

Regarding the submission, do you recommend trying again the diracos Python API? I think the issues might have been the missing "-S GridPP -C dips://dirac01.grid.hep.ph.ic.ac.uk:9135/Configuration/Server" and "-g [your_vo_goes_here]_user". Using the diracos API might be easier as it supports Python 3 and I can install all my dependencies/code very easily.

Thanks again for you help

Giuseppe


On 28/06/2022 20:38, Daniela Bauer wrote:
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
Hi Giuseppe,

I gather you are the gridpp user Rob Currie alluded to in an earlier email ?
If so, welcome aboard.
DIRAC installations differ slightly from each other, you can find any setting specific to the GridPP DIRAC instance here:
Would you mind trying this out ?
If you have an SL7 machine with cvmfs mounted to hand, I would recommend using the cvmfs based DIRAC UI (in a clean window!!), much quicker to test.
We have no access to the ganga setup, which makes debugging a bit difficult, so we generally don't recommend this to beginners.
I checked on our DIRAC server and it looks like you are properly registered, so there shouldn't be a problem there.

As for setting up any software inside your job:
This depends on the size of the executable (not everything can be shipped with a sandbox) and other parameters. 
We have no control over were sites run their jobs, so if you write any setup it has to be relative to $PWD, there is no such thing as a home directory.

Once you managed to submit a job or three - you might want to send a script with it that dumps the environment, so you get a feel for the look of a grid environment - we'll get onto the next step, as in how to get your software where it's meant to go.

Hope that helps,
Daniela


On Tue, 28 Jun 2022 at 19:37, Giuseppe Congedo <giuseppe.congedo@ed.ac.uk> wrote:

*******************
This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender.
If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
*******************
Hello everyone,

I am relatively new to gridPP, having obtained a certificate/VO
membership only recently. I am writing because I have started
experimenting with the Python DIRAC API (followed the official
instructions
https://dirac.readthedocs.io/en/latest/UserGuide/GettingStarted/InstallingClient/index.html
)

Apart one lucky job (which I managed to submit via
/cvmfs/ganga.cern.ch/dirac_ui/bashrc as opposed to the manual procedure
above, but failed probably due to me deleting the proxy), unfortunately
all my other attempts have failed so far. This is the error I always get:

> Job submission failure Cannot get URL for
> WorkloadManagement/JobManager in setup DIRAC-Certification:
> RuntimeError('Option
> /DIRAC/Setups/DIRAC-Certification/WorkloadManagement is not defined')
I wonder if I am doing something wrong somewere.

Unrelated from the above, I was also wondering what environment the code
will see when it arrives at the remote node. Can I sandbox my Python
package, cd into the directory and install it along with all
dependencies all in a job step? For instance:

> job.setExecutable('cd my_package_dir && pip install -r
> requirements.txt --user && python setup.py install --user')
> job.setExecutable('my_package_dir/my_package/my_script.py')
> job.setInputSandbox([my_package_dir, other_files])
Also, are directories relative to the home?

Many thanks for all your help

Giuseppe


--
Dr Giuseppe Congedo
(Senior Researcher)
Institute for Astronomy, University of Edinburgh
Royal Observatory, Blackford Hill
Edinburgh, EH9 3HJ

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

--
_______________________________________________
Gridpp-Dirac-Users mailing list
Gridpp-Dirac-Users@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users


--

-----------------------------------------------------------
daniela.bauer@imperial.ac.uk
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: Working from home, please use email.
http://www.hep.ph.ic.ac.uk/~dbauer/

-- 
Dr Giuseppe Congedo
(Senior Researcher)
Institute for Astronomy, University of Edinburgh
Royal Observatory, Blackford Hill
Edinburgh, EH9 3HJ
--
_______________________________________________
Gridpp-Dirac-Users mailing list
Gridpp-Dirac-Users@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users


--

-----------------------------------------------------------
daniela.bauer@imperial.ac.uk
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: Working from home, please use email.
http://www.hep.ph.ic.ac.uk/~dbauer/

-- 
Dr Giuseppe Congedo
(Senior Researcher)
Institute for Astronomy, University of Edinburgh
Royal Observatory, Blackford Hill
Edinburgh, EH9 3HJ
--
_______________________________________________
Gridpp-Dirac-Users mailing list
Gridpp-Dirac-Users@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users


--

-----------------------------------------------------------
daniela.bauer@imperial.ac.uk
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: Working from home, please use email.
http://www.hep.ph.ic.ac.uk/~dbauer/

-- 
Dr Giuseppe Congedo
(Senior Researcher)
Institute for Astronomy, University of Edinburgh
Royal Observatory, Blackford Hill
Edinburgh, EH9 3HJ
--
_______________________________________________
Gridpp-Dirac-Users mailing list
Gridpp-Dirac-Users@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users


--

-----------------------------------------------------------
daniela.bauer@imperial.ac.uk
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: Working from home, please use email.
http://www.hep.ph.ic.ac.uk/~dbauer/

-- 
Dr Giuseppe Congedo
(Senior Researcher)
Institute for Astronomy, University of Edinburgh
Royal Observatory, Blackford Hill
Edinburgh, EH9 3HJ
--
_______________________________________________
Gridpp-Dirac-Users mailing list
Gridpp-Dirac-Users@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users


--

-----------------------------------------------------------
daniela.bauer@imperial.ac.uk
HEP Group/Physics Dep
Imperial College
London, SW7 2BW
Tel: Working from home, please use email.
http://www.hep.ph.ic.ac.uk/~dbauer/

-- 
Dr Giuseppe Congedo
(Senior Researcher)
Institute for Astronomy, University of Edinburgh
Royal Observatory, Blackford Hill
Edinburgh, EH9 3HJ