Re: [Gridpp-Dirac-Users] Dirac register file in file catalogue
******************* This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address. ******************* Hi Priyaa, I've taken the liberty of cc'ing the GridPP DIRAC users list, as the issues you raise might be interesting for other people on this list. [...] University of Manchester. We have 1.3 TB data in our highmem machine to be registered in the file catalogue. Have seen few of the links below to do that. [...] There are a number of conditions that need to be met for data to be integrated in the DIRAC file catalogue. I'll try and summarize them below: a) They must be present on a storage element. Data access must be possible via 'srm' or 'xrootd'. b) The storage element must be present in the DIRAC configuration. DIRAC receives the information of available storage elements and their access methods from a BDII, i.e. we expect the storage element to advertise itself in a prescribed manner. There are some special cases where this is not possible, in this case, we can look into setting it up by hand. Any member of a supported VO can double check the configuration for all SEs on the GridPP DIRAC via the web interface. c) The DIRAC file catalogue requires you to adhere to certain conventions in the file path. This has mostly grown out of the need to be able to maintain shared (between experiments rather than individual users) facilities. In order to locate a file, DIRAC internally combines the name of the storage element + the location of your experiments top level directory + LFN to work out the actual location of the file. The LFN must start with /full_vo_name. Looking at a SKA file that is located on my local storage element ( gfe02.grid.hep.ph.ic.ac.uk) /pnfs/ hep.ph.ic.ac.uk/data/ska/skatelescope.eu/user/daniela.bauer/repregtest.1527683178.txt I have: a) data on gfe02 is accessible via srm and xrootd b) gfe02 exists in DIRAC as UKI-LT2-IC-HEP-disk c) the top level directory for SKA is /pnfs/hep.ph.ic.ac.uk/data/ska/ and is listed as such in the DIRAC configuration and the LFN ("/ skatelescope.eu/user/daniela.bauer/repregtest.1527683178.txt") starts with the full /voname. (Note that storage elements using DPM as their software have a tendency to use the full VO name as the top level directory, so you often end up with the full VO name twice in the full file path. This is correct and we enforce this fairly strictly, as every time we don't, it comes back to haunt us.) DIRAC expects data to be uploaded using its own tools and considers registering existing files a task for experts [see https://github.com/DIRACGrid/DIRAC/issues/4548]. We've argued against this, but didn't win that particular argument (yet?). Provided your data meet the conditions outlined above (if you are not sure, please send some more details, and we'll try and work out where they should go, even if they would have to be moved), there are three ways of registering them: 1) Using the file catalogue cli: register : Register a record to the File Catalog usage: register file <lfn> <pfn> <size> <SE> [<guid>] - register new file record in the catalog register replica <lfn> <pfn> <SE> - register new replica in the catalog In the above example - assuming the file in question is the only copy of the file in existence and it is not already registered in the file catalogue: register file /skatelescope.eu/user/daniela.bauer/repregtest.1527683178.txt srm://gfe02.grid.hep.ph.ic.ac.uk:8443/srm/managerv2?SFN=/pnfs/hep.ph.ic.ac.uk/data/ska/skatelescope.eu/user/daniela.bauer/repregtest.1527683178.txt <http://gfe02.grid.hep.ph.ic.ac.uk/pnfs/hep.ph.ic.ac.uk/data/ska/skatelescope.eu/user/daniela.bauer/repregtest.1527683178.txt> 4153 UKI-LT2-IC-HEP-disk Not very elegant, and no checksum, but doable for the odd file that escaped registration. 2) If only a small number of files are affected we tend to suggest to users to download the files (outside DIRAC), remove them from the storage element (outside DIRAC) and then re-upload them using dirac-dms-add-file. That way both the catalogue and paths are handled correctly automatically. 3) For large scale registrations you need to dig into the DIRAC API and (bonus point) implement the level of error handling you are comfortable with. There is currently no script to do this for you, but below is a starting point. from __future__ import print_function import sys import uuid from DIRAC import gLogger, S_OK from DIRAC.Core.Base import Script from DIRAC.Interfaces.API.Dirac import Dirac from DIRAC.Resources.Catalog.FileCatalog import FileCatalog [...] fc = FileCatalog() [...] # for each file you need to assemble a dictionary containing all the details infoDict = {} infoDict['PFN'] = 'srm:// gfe02.grid.hep.ph.ic.ac.uk/pnfs/hep.ph.ic.ac.uk/data/ska/skatelescope.eu/user/daniela.bauer/repregtest.1527683178.txt ' infoDict['Size'] = 4153 infoDict['SE'] = 'UKI-LT2-IC-HEP-disk' infoDict['GUID'] = str(uuid.uuid4()) infoDict['Checksum'] = fileDict = {} lfnpath = '/skatelescope.eu/user/daniela.bauer/repregtest.1527683178.txt' fileDict[lfnpath] = infoDict result = fc.addFile(fileDict) if not result["OK"]: print(result) return if result["Value"]["Failed"]: print(result["Value"]) return Other people on this list might have additional ideas. Regards, Daniela -- Sent from my guinea pig living room ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: Working from home, please use email. http://www.hep.ph.ic.ac.uk/~dbauer/
participants (1)
- 
                
                Daniela Bauer