Copying large simulation output files
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hi, I'm having a slight issue with copying a directory containing my simulation output files (which are pretty large) from the vo.moedal.org directory using the command dirac-dms-directory-sync<https://dirac.readthedocs.io/en/latest/UserGuide/CommandReference/DataManagement/dirac-dms-directory-sync.html> which is turning out to be a tedious time taking task. Is there an alternate faster way to do this for larger files? Currently, I'm using the following job submit file. JobName = "LFNDATA"; Executable = "SHNAME"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"SHNAME", "CFGNAME", "GANGANAME"}; InputData = {"DATA"}; OutputPath = "/70G_4gD_outputs/"; OutputData = {"*.root"}; OutputSandbox = {"*.root"}; OutputSE = "UKI-LT2-QMUL2-disk"; (the provided output path stores my output in the directory /vo.moedal.org/user/a/aditya.upreti/..) I wanted to know if there's a way to directly save the output of the GRID simulation to my local directory instead of having to copy it each time. I would like to save it to my lxplus or EOS directory such as /afs/cern.ch/user/a/aupreti/ or /eos/experiment/moedal/aditya/. I might be missing something trivial and would be glad if you can point me to it. Thank you. Best Regards Aditya Upreti
Hi Aditya, I'd like to take a step back here and ask why you need to copy large amounts of data to a local disk. One of the underlying assumptions of the grid approach is that you do all the heavy lifting on the grid and only copy the final and by then hopefully much reduced data to your local machine. So why is this not happening ? The other approach is that, as networking has come a long way in the past decade or so, you might be able to stream your data using xrootd directly from the grid storage element it's at to your local machine (let's say lxplus@cern). I gather that most of your data is at UKI-LT2-QMUL. This site currently has no xrootd interface for moedal, but the admins would be happy to make you one if needed. Eos at CERN does have an xrootd interface, so I can look at one of your (public*) files at CERN from my machine in the UK without copying it: lx02:~ > root -b ------------------------------------------------------------------ | Welcome to ROOT 6.24/06 https://root.cern | | (c) 1995-2021, The ROOT Team; conception: R. Brun, F. Rademakers | | Built for linuxx8664gcc on Sep 02 2021, 14:20:23 | | From tags/v6-24-06@v6-24-06 | | With c++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44) | | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' | ------------------------------------------------------------------ root [0] TFile *_file0 = TFile::Open("root:// eospublic.cern.ch:1094//eos/experiment/moedal/aditya/MonopoleData.root") (TFile *) 0x341d2d0 root [1] .ls TNetXNGFile** root:// eospublic.cern.ch//eos/experiment/moedal/aditya/MonopoleData.root TNetXNGFile* root:// eospublic.cern.ch//eos/experiment/moedal/aditya/MonopoleData.root KEY: TTree MonopoleNtuple;20 Monopole Simulation Data [current cycle] KEY: TTree MonopoleNtuple;19 Monopole Simulation Data [backup cycle] * I'm not a moedal member, so I wouldn't expect to be able to see much. So if you need to share data with your collaborators and they all have CERN accounts (as otherwise they wouldn't be on lxplus), it should be trivial for them to get a certificate, join the moedal VO and look at the data that way. There will be some overhead in learning and documenting how xrootd works, but it is extensively used by WLCG VOs and others, so presumably it would work for moedal too. As you might have noticed your eos storage area has a grid interface. This means we might be able to commission it as a DIRAC storage element and you might be able to stage out your data to eos at the end of your grid jobs automatically. Writing back to your own home directory is not supported; by the time you've taken authentication, bandwidth, availability, security etc into account, you basically end up with a grid storage element. (Imagine the hilarity if you submit thousands of jobs from your laptop and then the stageout fails, because you turned it off for the weekend.) I had a quick look at your experiments' area on eos using the pilot proxy; it is quite locked down and DIRAC cannot write to it, but if you put Simon and me in touch with whoever manages this storage area we might be able to set up a grid storage folder on eos. NA62 used to have a DIRAC storage element on eos, so it's definitely possible. Finally, given that eos has an xrootd interface you should theoretically be able to just xrdcp your data to eos at the end of each of your jobs (provided they have write access enabled). I am not giving any more details here on purpose as this is strictly a "last resort/if you have to ask you probably shouldn't be doing it" option. Please let Simon and me know how you would like to proceed. Regards, Daniela On Mon, 15 Nov 2021 at 22:54, Aditya Upreti <aupreti@crimson.ua.edu> wrote:
This email from aupreti@crimson.ua.edu originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list <https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
Hi,
I'm having a slight issue with copying a directory containing my simulation output files (which are pretty large) from the vo.moedal.org directory using the command dirac-dms-directory-sync <https://dirac.readthedocs.io/en/latest/UserGuide/CommandReference/DataManagement/dirac-dms-directory-sync.html> which is turning out to be a tedious time taking task. Is there an alternate faster way to do this for larger files?
Currently, I'm using the following job submit file.
JobName = "LFNDATA";
Executable = "SHNAME";
StdOutput = "std.out";
StdError = "std.err";
InputSandbox = {"SHNAME", "CFGNAME", "GANGANAME"};
InputData = {"DATA"};
OutputPath = "/70G_4gD_outputs/";
OutputData = {"*.root"};
OutputSandbox = {"*.root"};
OutputSE = "UKI-LT2-QMUL2-disk";
(the provided output path stores my output in the directory / vo.moedal.org/user/a/aditya.upreti/..)
I wanted to know if there's a way to directly save the output of the GRID simulation to my local directory instead of having to copy it each time. I would like to save it to my lxplus or EOS directory such as /afs/ cern.ch/user/a/aupreti/ or /eos/experiment/moedal/aditya/. I might be missing something trivial and would be glad if you can point me to it.
Thank you.
Best Regards
*Aditya Upreti* -- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
-- Sent from my guinea pig enhanced living room ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
participants (2)
- 
                
                Aditya Upreti
- 
                
                Daniela Bauer