Re: [Gridpp-Dirac-Users] Dirac file catalogue and pilot problems
Hi Alessandra, about the database time out: We see that occasionally in the logs, but a cursory survey seemed to indiate either network problems or some going wrong deep in dirac core, which given that we can't reproduce it, we haven't managed to debug. I'm going to forward your question to teh developers to see if there's a quick hack, but I wouldn't get my hopes up, I'd stick with a retry loop. About the gfal2 conflict: Does this happen on VAC or all sites ? If it's VAC sites, you need to talk to Andrew. I have an email to him dated the 12/12/2017 which contains the comment : "Hmm, it appears GFAL_PLUGIN_DIR & GFAL_CONFIG_DIR are being reset back to the CVMFS version (which doesn't happen on plain grid sites). Could there be something in profile.d or equivalent which is causing it to get reset across the sudo call or something like that?" This is part of a thread from November when we did the major version upgrade in DIRAC, but we have no access to the VAC stuff to upgrade it in sync. If this happens on all DIRAC sites, then could you please send the bit of code that fails, so we can run some tests. I've added the list in case other people have similar problems. Thanks, Daniela On 16 January 2018 at 14:25, Alessandra Forti <Alessandra.Forti@cern.ch> wrote:
Hi,
I'm helping Rohini getting the handle of running using Dirac and we have encountered a couple of problems.
1) When she uploads files to the storage using dirac-dms-add-file it sometimes fails. The file gets copied but the registration to the file catalogue fails with this time out error.
2018-01-12 18:27:07 UTC dirac-dms-add-file/DataManager[Roxq] DEBUG: Failed to register file. /skatelescope.eu/user/r/ rohini.joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar {'FileCatalog': '*Handshake timeout exceeded*'}
I wonder if the timeout is configurable or if it might be a problem with number of connections or load that can be looked at as I don't think this is a client problem. If you want to see the whole debugging files we can put them somewhere accessible.
2) Something in the dirac submission system sources the emi-wn setup. Since Rohini is using dirac_ui tools this creates usual conflicts with the gfal libraries in the LD_LIBRARY_PATH.
StorageFactory._generateStorageObject: Failed to instantiate UKI-NORTHGRID-MAN-HEP-disk: Unable to open the */cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1 <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1>*/usr/lib64/ gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : */cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/ <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/>*usr/lib64/ gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri == EXCEPTION == GError
GError: Unable to open the */cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/ <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/>*usr/lib64/ gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : */cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1 <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1>*/usr/lib64/ gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri
it is not the site doing this and for now we solved the problem by unsetting a couple of GFAL variables. This has allowed to use at least Manchester, but it still doesn't work on VAC sites like Cambridge. I have to dig further for the latter to check where is the difference. Do you know which part may source the /cvmfs/grid.cern.ch/emi-wn-3. 17.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh and if there is any specific problem with VAC?
cheers alessandra
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- Sent from the pit of despair ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/
Hi Daniela, On 17/01/2018 11:13, Daniela Bauer wrote:
Hi Alessandra,
about the database time out: We see that occasionally in the logs, but a cursory survey seemed to indiate either network problems or some going wrong deep in dirac core, which given that we can't reproduce it, we haven't managed to debug. I'm going to forward your question to teh developers to see if there's a quick hack, but I wouldn't get my hopes up, I'd stick with a retry loop. if there is a ticket for dirac let me know and I'll add my voice. There is also that annoying thing in which they add an extra directory to the TURL but the file catalogue doesn't report it. So the file is copied to srm://...../skatelescope.eu/skatelescope.eu/........ but the file catalogue reports srm://...../skatelescope.eu/........ they could at least report the right thing.
About the gfal2 conflict: Does this happen on VAC or all sites ? it happened also on normal Manchester nodes. Unsetting the GFAL env vars works on normal nodes, and now Rohini can at least run in Manchester but it still desn't work on VAC because things look indeed to be reset (though I haven't yet looked deeper at what is going on). On normal nodes it certainly is not the /etc/profile.d scripts executing that setup. I don't know about VAC though I spoke briefly to Andrew he didn't give me an answer.
cheers alessandra
If it's VAC sites, you need to talk to Andrew. I have an email to him dated the 12/12/2017 which contains the comment : "Hmm, it appears GFAL_PLUGIN_DIR & GFAL_CONFIG_DIR are being reset back to the CVMFS version (which doesn't happen on plain grid sites). Could there be something in profile.d or equivalent which is causing it to get reset across the sudo call or something like that?" This is part of a thread from November when we did the major version upgrade in DIRAC, but we have no access to the VAC stuff to upgrade it in sync. If this happens on all DIRAC sites, then could you please send the bit of code that fails, so we can run some tests.
I've added the list in case other people have similar problems.
Thanks, Daniela
On 16 January 2018 at 14:25, Alessandra Forti <Alessandra.Forti@cern.ch <mailto:Alessandra.Forti@cern.ch>> wrote:
Hi,
I'm helping Rohini getting the handle of running using Dirac and we have encountered a couple of problems.
1) When she uploads files to the storage using dirac-dms-add-file it sometimes fails. The file gets copied but the registration to the file catalogue fails with this time out error.
2018-01-12 18:27:07 UTC dirac-dms-add-file/DataManager[Roxq] DEBUG: Failed to register file. /skatelescope.eu/user/r/rohini.joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar <http://skatelescope.eu/user/r/rohini.joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar> {'FileCatalog': '*Handshake timeout exceeded*'}
I wonder if the timeout is configurable or if it might be a problem with number of connections or load that can be looked at as I don't think this is a client problem. If you want to see the whole debugging files we can put them somewhere accessible.
2) Something in the dirac submission system sources the emi-wn setup. Since Rohini is using dirac_ui tools this creates usual conflicts with the gfal libraries in the LD_LIBRARY_PATH.
StorageFactory._generateStorageObject: Failed to instantiate UKI-NORTHGRID-MAN-HEP-disk: Unable to open the */cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1 <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1>*/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : */cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/ <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/>*usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri == EXCEPTION == GError
GError: Unable to open the */cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/ <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/>*usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : */cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1 <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1>*/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri
it is not the site doing this and for now we solved the problem by unsetting a couple of GFAL variables. This has allowed to use at least Manchester, but it still doesn't work on VAC sites like Cambridge. I have to dig further for the latter to check where is the difference. Do you know which part may source the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh> and if there is any specific problem with VAC?
cheers alessandra
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk <mailto:daniela.bauer@imperial.ac.uk> HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/ <http://www.hep.ph.ic.ac.uk/%7Edbauer/>
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
On 17 Jan 2018, at 11:26, Alessandra Forti <Alessandra.Forti@cern.ch> wrote:
Hi Daniela,
On 17/01/2018 11:13, Daniela Bauer wrote:
Hi Alessandra,
about the database time out: We see that occasionally in the logs, but a cursory survey seemed to indiate either network problems or some going wrong deep in dirac core, which given that we can't reproduce it, we haven't managed to debug. I'm going to forward your question to teh developers to see if there's a quick hack, but I wouldn't get my hopes up, I'd stick with a retry loop. if there is a ticket for dirac let me know and I'll add my voice. There is also that annoying thing in which they add an extra directory to the TURL but the file catalogue doesn't report it. So the file is copied to srm://...../skatelescope.eu/skatelescope.eu/........ but the file catalogue reports srm://...../skatelescope.eu/........ they could at least report the right thing.
About the gfal2 conflict: Does this happen on VAC or all sites ? it happened also on normal Manchester nodes. Unsetting the GFAL env vars works on normal nodes, and now Rohini can at least run in Manchester but it still desn't work on VAC because things look indeed to be reset (though I haven't yet looked deeper at what is going on). On normal nodes it certainly is not the /etc/profile.d scripts executing that setup. I don't know about VAC though I spoke briefly to Andrew he didn't give me an answer.
Andrew didn’t understand why it is happening since the /etc/profile.d in the VMs is correct (it’s from UMD distributed in cvmfs, not EMI). Now I know it’s happening on conventional grid sites too, for GridPP DIRAC but not LHCb DIRAC, it points to something higher up. Cheers Andrew
If it's VAC sites, you need to talk to Andrew. I have an email to him dated the 12/12/2017 which contains the comment : "Hmm, it appears GFAL_PLUGIN_DIR & GFAL_CONFIG_DIR are being reset back to the CVMFS version (which doesn't happen on plain grid sites). Could there be something in profile.d or equivalent which is causing it to get reset across the sudo call or something like that?" This is part of a thread from November when we did the major version upgrade in DIRAC, but we have no access to the VAC stuff to upgrade it in sync. If this happens on all DIRAC sites, then could you please send the bit of code that fails, so we can run some tests.
I've added the list in case other people have similar problems.
Thanks, Daniela
On 16 January 2018 at 14:25, Alessandra Forti <Alessandra.Forti@cern.ch> wrote: Hi,
I'm helping Rohini getting the handle of running using Dirac and we have encountered a couple of problems.
1) When she uploads files to the storage using dirac-dms-add-file it sometimes fails. The file gets copied but the registration to the file catalogue fails with this time out error.
2018-01-12 18:27:07 UTC dirac-dms-add-file/DataManager[Roxq] DEBUG: Failed to register file. /skatelescope.eu/user/r/rohini.joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar {'FileCatalog': 'Handshake timeout exceeded'}
I wonder if the timeout is configurable or if it might be a problem with number of connections or load that can be looked at as I don't think this is a client problem. If you want to see the whole debugging files we can put them somewhere accessible.
2) Something in the dirac submission system sources the emi-wn setup. Since Rohini is using dirac_ui tools this creates usual conflicts with the gfal libraries in the LD_LIBRARY_PATH.
StorageFactory._generateStorageObject: Failed to instantiate UKI-NORTHGRID-MAN-HEP-disk: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri == EXCEPTION == GError
GError: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri
it is not the site doing this and for now we solved the problem by unsetting a couple of GFAL variables. This has allowed to use at least Manchester, but it still doesn't work on VAC sites like Cambridge. I have to dig further for the latter to check where is the difference. Do you know which part may source the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh and if there is any specific problem with VAC?
cheers alessandra
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
Cheers Andrew -- Dr Andrew McNab University of Manchester High Energy Physics, LHCb@CERN (Deputy Computing Coordinator), and GridPP (LHCb + Tier-2 Evolution) www.hep.manchester.ac.uk/u/mcnab Skype: andrew.mcnab.uk
Hi Alessandra, can you please let us know what Rohini is doing that interferes with gfal, because we haven't had anyone else report problems moving files. DIRAC doesn't do tickets, beyond git issues, at the moment, so it's on the main dirac mailing list. We have the (almost) latest and greatest version of DIRAC installed (we are on v6r19p10, p11 only came out a couple of days ago), so if there's something that needs to be moved from LHCb dirac to vanilla DIRAC, the Andrew needs to move it. regards, Daniela On 17 January 2018 at 11:36, Andrew McNab <Andrew.Mcnab@cern.ch> wrote:
On 17 Jan 2018, at 11:26, Alessandra Forti <Alessandra.Forti@cern.ch> wrote:
Hi Daniela,
On 17/01/2018 11:13, Daniela Bauer wrote:
Hi Alessandra,
about the database time out: We see that occasionally in the logs, but a cursory survey seemed to indiate either network problems or some going wrong deep in dirac core, which given that we can't reproduce it, we haven't managed to debug. I'm going to forward your question to teh developers to see if there's a quick hack, but I wouldn't get my hopes up, I'd stick with a retry loop. if there is a ticket for dirac let me know and I'll add my voice. There is also that annoying thing in which they add an extra directory to the TURL but the file catalogue doesn't report it. So the file is copied to srm://...../skatelescope.eu/skatelescope.eu/........ but the file catalogue reports srm://...../skatelescope.eu/........ they could at least report the right thing.
About the gfal2 conflict: Does this happen on VAC or all sites ? it happened also on normal Manchester nodes. Unsetting the GFAL env vars works on normal nodes, and now Rohini can at least run in Manchester but it still desn't work on VAC because things look indeed to be reset (though I haven't yet looked deeper at what is going on). On normal nodes it certainly is not the /etc/profile.d scripts executing that setup. I don't know about VAC though I spoke briefly to Andrew he didn't give me an answer.
Andrew didn’t understand why it is happening since the /etc/profile.d in the VMs is correct (it’s from UMD distributed in cvmfs, not EMI). Now I know it’s happening on conventional grid sites too, for GridPP DIRAC but not LHCb DIRAC, it points to something higher up.
Cheers
Andrew
If it's VAC sites, you need to talk to Andrew. I have an email to him dated the 12/12/2017 which contains the comment : "Hmm, it appears GFAL_PLUGIN_DIR & GFAL_CONFIG_DIR are being reset back to the CVMFS version (which doesn't happen on plain grid sites). Could there be something in profile.d or equivalent which is causing it to get reset across the sudo call or something like that?" This is part of a thread from November when we did the major version upgrade in DIRAC, but we have no access to the VAC stuff to upgrade it in sync. If this happens on all DIRAC sites, then could you please send the bit of code that fails, so we can run some tests.
I've added the list in case other people have similar problems.
Thanks, Daniela
On 16 January 2018 at 14:25, Alessandra Forti <Alessandra.Forti@cern.ch> wrote: Hi,
I'm helping Rohini getting the handle of running using Dirac and we have encountered a couple of problems.
1) When she uploads files to the storage using dirac-dms-add-file it sometimes fails. The file gets copied but the registration to the file catalogue fails with this time out error.
2018-01-12 18:27:07 UTC dirac-dms-add-file/DataManager[Roxq] DEBUG: Failed to register file. /skatelescope.eu/user/r/rohini .joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar {'FileCatalog': 'Handshake timeout exceeded'}
I wonder if the timeout is configurable or if it might be a problem with number of connections or load that can be looked at as I don't think this is a client problem. If you want to see the whole debugging files we can put them somewhere accessible.
2) Something in the dirac submission system sources the emi-wn setup. Since Rohini is using dirac_ui tools this creates usual conflicts with the gfal libraries in the LD_LIBRARY_PATH.
StorageFactory._generateStorageObject: Failed to instantiate UKI-NORTHGRID-MAN-HEP-disk: Unable to open the /cvmfs/ grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal 2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.1 7.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri == EXCEPTION == GError
GError: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.1 7.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/ grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal 2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri
it is not the site doing this and for now we solved the problem by unsetting a couple of GFAL variables. This has allowed to use at least Manchester, but it still doesn't work on VAC sites like Cambridge. I have to dig further for the latter to check where is the difference. Do you know which part may source the /cvmfs/grid.cern.ch/emi-wn-3.1 7.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh and if there is any specific problem with VAC?
cheers alessandra
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
Cheers
Andrew
-- Dr Andrew McNab University of Manchester High Energy Physics, LHCb@CERN (Deputy Computing Coordinator), and GridPP (LHCb + Tier-2 Evolution) www.hep.manchester.ac.uk/u/mcnab Skype: andrew.mcnab.uk
-- Sent from the pit of despair ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 <+44%2020%207594%207810> http://www.hep.ph.ic.ac.uk/~dbauer/
Hi Daniela, she is sourcing the dirac_ui setup and trying to upload files that she downloads from somewhere that is not grid enabled. Are other people trying to use the dirac tools in their jobs to upload files? cheers alessandra On 17/01/2018 11:58, Daniela Bauer wrote:
Hi Alessandra,
can you please let us know what Rohini is doing that interferes with gfal, because we haven't had anyone else report problems moving files. DIRAC doesn't do tickets, beyond git issues, at the moment, so it's on the main dirac mailing list. We have the (almost) latest and greatest version of DIRAC installed (we are on v6r19p10, p11 only came out a couple of days ago), so if there's something that needs to be moved from LHCb dirac to vanilla DIRAC, the Andrew needs to move it.
regards, Daniela
On 17 January 2018 at 11:36, Andrew McNab <Andrew.Mcnab@cern.ch <mailto:Andrew.Mcnab@cern.ch>> wrote:
> On 17 Jan 2018, at 11:26, Alessandra Forti <Alessandra.Forti@cern.ch <mailto:Alessandra.Forti@cern.ch>> wrote: > > Hi Daniela, > > On 17/01/2018 11:13, Daniela Bauer wrote: >> Hi Alessandra, >> >> about the database time out: We see that occasionally in the logs, but a cursory survey seemed to indiate either network problems or some going wrong deep in dirac core, which given that we can't reproduce it, we haven't managed to debug. I'm going to forward your question to teh developers to see if there's a quick hack, but I wouldn't get my hopes up, I'd stick with a retry loop. > if there is a ticket for dirac let me know and I'll add my voice. There is also that annoying thing in which they add an extra directory to the TURL but the file catalogue doesn't report it. So the file is copied to srm://...../skatelescope.eu/skatelescope.eu/...... <http://skatelescope.eu/skatelescope.eu/......>.. but the file catalogue reports srm://...../skatelescope.eu/...... <http://skatelescope.eu/......>.. they could at least report the right thing. >> >> About the gfal2 conflict: Does this happen on VAC or all sites ? > it happened also on normal Manchester nodes. Unsetting the GFAL env vars works on normal nodes, and now Rohini can at least run in Manchester but it still desn't work on VAC because things look indeed to be reset (though I haven't yet looked deeper at what is going on). On normal nodes it certainly is not the /etc/profile.d scripts executing that setup. I don't know about VAC though I spoke briefly to Andrew he didn't give me an answer.
Andrew didn’t understand why it is happening since the /etc/profile.d in the VMs is correct (it’s from UMD distributed in cvmfs, not EMI). Now I know it’s happening on conventional grid sites too, for GridPP DIRAC but not LHCb DIRAC, it points to something higher up.
Cheers
Andrew
>> If it's VAC sites, you need to talk to Andrew. I have an email to him dated the 12/12/2017 which contains the comment : "Hmm, it appears GFAL_PLUGIN_DIR & GFAL_CONFIG_DIR are being reset back to >> the CVMFS version (which doesn't happen on plain grid sites). Could there >> be something in profile.d or equivalent which is causing it to get reset >> across the sudo call or something like that?" This is part of a thread from November when we did the major version upgrade in DIRAC, but we have no access to the VAC stuff to upgrade it in sync. >> If this happens on all DIRAC sites, then could you please send the bit of code that fails, so we can run some tests. >> >> I've added the list in case other people have similar problems. >> >> Thanks, >> Daniela >> >> On 16 January 2018 at 14:25, Alessandra Forti <Alessandra.Forti@cern.ch <mailto:Alessandra.Forti@cern.ch>> wrote: >> Hi, >> >> I'm helping Rohini getting the handle of running using Dirac and we have encountered a couple of problems. >> >> 1) When she uploads files to the storage using dirac-dms-add-file it sometimes fails. The file gets copied but the registration to the file catalogue fails with this time out error. >> >> 2018-01-12 18:27:07 UTC dirac-dms-add-file/DataManager[Roxq] DEBUG: Failed to register file. /skatelescope.eu/user/r/rohini.joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar <http://skatelescope.eu/user/r/rohini.joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar> {'FileCatalog': 'Handshake timeout exceeded'} >> >> I wonder if the timeout is configurable or if it might be a problem with number of connections or load that can be looked at as I don't think this is a client problem. If you want to see the whole debugging files we can put them somewhere accessible. >> >> 2) Something in the dirac submission system sources the emi-wn setup. Since Rohini is using dirac_ui tools this creates usual conflicts with the gfal libraries in the LD_LIBRARY_PATH. >> >> StorageFactory._generateStorageObject: Failed to instantiate UKI-NORTHGRID-MAN-HEP-disk: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so> plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so>: undefined symbol: gfal2_free_uri >> == EXCEPTION == GError >> >> GError: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so> plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so>: undefined symbol: gfal2_free_uri >> >> it is not the site doing this and for now we solved the problem by unsetting a couple of GFAL variables. This has allowed to use at least Manchester, but it still doesn't work on VAC sites like Cambridge. I have to dig further for the latter to check where is the difference. Do you know which part may source the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh> and if there is any specific problem with VAC? >> >> cheers >> alessandra >> >> -- >> Respect is a rational process. \\// >> Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) >> For Ur-Fascism, disagreement is treason. (U. Eco) >> But but but her emails... covfefe! >> >> >> >> >> -- >> Sent from the pit of despair >> >> ----------------------------------------------------------- >> daniela.bauer@imperial.ac.uk <mailto:daniela.bauer@imperial.ac.uk> >> HEP Group/Physics Dep >> Imperial College >> London, SW7 2BW >> Tel: +44-(0)20-75947810 <tel:%2B44-%280%2920-75947810> >> http://www.hep.ph.ic.ac.uk/~dbauer/ <http://www.hep.ph.ic.ac.uk/%7Edbauer/> > > -- > Respect is a rational process. \\// > Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) > For Ur-Fascism, disagreement is treason. (U. Eco) > But but but her emails... covfefe! > > -- > _______________________________________________ > Gridpp-Dirac-Users mailing list > Gridpp-Dirac-Users@imperial.ac.uk <mailto:Gridpp-Dirac-Users@imperial.ac.uk> > https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users <https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users>
Cheers
Andrew
-- Dr Andrew McNab University of Manchester High Energy Physics, LHCb@CERN (Deputy Computing Coordinator), and GridPP (LHCb + Tier-2 Evolution) www.hep.manchester.ac.uk/u/mcnab <http://www.hep.manchester.ac.uk/u/mcnab> Skype: andrew.mcnab.uk <http://andrew.mcnab.uk>
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk <mailto:daniela.bauer@imperial.ac.uk> HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 <tel:+44%2020%207594%207810> http://www.hep.ph.ic.ac.uk/~dbauer/ <http://www.hep.ph.ic.ac.uk/%7Edbauer/>
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
Hi Alessandra, I know LZ uses dirac-dms-add-file and that seems to work without a problem. But if you run inside a DIRAC job, you shouldn't have to source anything, the dirac tools are just there. (I just looked at your failed jobs and it says "source /cvmfs/ganga.cern.ch/dirac_ui/bashrc". This will kill your setup.) What does happen when she just makes a dirac ui (now works on SL6 and SL7) and just tries and adds a file from there interactively? If you rather use gfal inside a dirac UI (because admittedly it's slightly sturdier than the dirac tools) and then register the files separately, we have a bit of code, written for the solid VO (so obviously experiment specific) https://github.com/ic-hep/DIRAC-tools/blob/master/solid/move_files_and_regis... which does this (though this script was a starting point, so it doesn't catch all the errors). Hope that helps. Daniela On 17 January 2018 at 12:07, Alessandra Forti <Alessandra.Forti@cern.ch> wrote:
Hi Daniela,
she is sourcing the dirac_ui setup and trying to upload files that she downloads from somewhere that is not grid enabled.
Are other people trying to use the dirac tools in their jobs to upload files?
cheers alessandra
On 17/01/2018 11:58, Daniela Bauer wrote:
Hi Alessandra,
can you please let us know what Rohini is doing that interferes with gfal, because we haven't had anyone else report problems moving files. DIRAC doesn't do tickets, beyond git issues, at the moment, so it's on the main dirac mailing list. We have the (almost) latest and greatest version of DIRAC installed (we are on v6r19p10, p11 only came out a couple of days ago), so if there's something that needs to be moved from LHCb dirac to vanilla DIRAC, the Andrew needs to move it.
regards, Daniela
On 17 January 2018 at 11:36, Andrew McNab <Andrew.Mcnab@cern.ch> wrote:
On 17 Jan 2018, at 11:26, Alessandra Forti <Alessandra.Forti@cern.ch> wrote:
Hi Daniela,
On 17/01/2018 11:13, Daniela Bauer wrote:
Hi Alessandra,
about the database time out: We see that occasionally in the logs, but a cursory survey seemed to indiate either network problems or some going wrong deep in dirac core, which given that we can't reproduce it, we haven't managed to debug. I'm going to forward your question to teh developers to see if there's a quick hack, but I wouldn't get my hopes up, I'd stick with a retry loop. if there is a ticket for dirac let me know and I'll add my voice. There is also that annoying thing in which they add an extra directory to the TURL but the file catalogue doesn't report it. So the file is copied to srm://...../skatelescope.eu/skatelescope.eu/........ but the file catalogue reports srm://...../skatelescope.eu/........ they could at least report the right thing.
About the gfal2 conflict: Does this happen on VAC or all sites ? it happened also on normal Manchester nodes. Unsetting the GFAL env vars works on normal nodes, and now Rohini can at least run in Manchester but it still desn't work on VAC because things look indeed to be reset (though I haven't yet looked deeper at what is going on). On normal nodes it certainly is not the /etc/profile.d scripts executing that setup. I don't know about VAC though I spoke briefly to Andrew he didn't give me an answer.
Andrew didn’t understand why it is happening since the /etc/profile.d in the VMs is correct (it’s from UMD distributed in cvmfs, not EMI). Now I know it’s happening on conventional grid sites too, for GridPP DIRAC but not LHCb DIRAC, it points to something higher up.
Cheers
Andrew
If it's VAC sites, you need to talk to Andrew. I have an email to him dated the 12/12/2017 which contains the comment : "Hmm, it appears GFAL_PLUGIN_DIR & GFAL_CONFIG_DIR are being reset back to the CVMFS version (which doesn't happen on plain grid sites). Could there be something in profile.d or equivalent which is causing it to get reset across the sudo call or something like that?" This is part of a thread from November when we did the major version upgrade in DIRAC, but we have no access to the VAC stuff to upgrade it in sync. If this happens on all DIRAC sites, then could you please send the bit of code that fails, so we can run some tests.
I've added the list in case other people have similar problems.
Thanks, Daniela
On 16 January 2018 at 14:25, Alessandra Forti < Alessandra.Forti@cern.ch> wrote: Hi,
I'm helping Rohini getting the handle of running using Dirac and we have encountered a couple of problems.
1) When she uploads files to the storage using dirac-dms-add-file it sometimes fails. The file gets copied but the registration to the file catalogue fails with this time out error.
2018-01-12 18:27:07 UTC dirac-dms-add-file/DataManager[Roxq] DEBUG: Failed to register file. /skatelescope.eu/user/r/rohini .joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar {'FileCatalog': 'Handshake timeout exceeded'}
I wonder if the timeout is configurable or if it might be a problem with number of connections or load that can be looked at as I don't think this is a client problem. If you want to see the whole debugging files we can put them somewhere accessible.
2) Something in the dirac submission system sources the emi-wn setup. Since Rohini is using dirac_ui tools this creates usual conflicts with the gfal libraries in the LD_LIBRARY_PATH.
StorageFactory._generateStorageObject: Failed to instantiate UKI-NORTHGRID-MAN-HEP-disk: Unable to open the /cvmfs/ grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal 2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.1 7.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri == EXCEPTION == GError
GError: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.1 7.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/ grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal 2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri
it is not the site doing this and for now we solved the problem by unsetting a couple of GFAL variables. This has allowed to use at least Manchester, but it still doesn't work on VAC sites like Cambridge. I have to dig further for the latter to check where is the difference. Do you know which part may source the /cvmfs/grid.cern.ch/emi-wn-3.1 7.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh and if there is any specific problem with VAC?
cheers alessandra
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
Cheers
Andrew
-- Dr Andrew McNab University of Manchester High Energy Physics, LHCb@CERN (Deputy Computing Coordinator), and GridPP (LHCb + Tier-2 Evolution) www.hep.manchester.ac.uk/u/mcnab Skype: andrew.mcnab.uk
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 <+44%2020%207594%207810> http://www.hep.ph.ic.ac.uk/~dbauer/
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- Sent from the pit of despair ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/
Hi Daniela, thanks, that helps. Now I think I confused jobs outputs when I said emi-wn was sourced everywhere it is sourced only on VAC sites. So On normal grid sites like Manchester and Bristol there is no emi-wn setup, but the pilot sets up the dirac tools and sourcing another dirac_ui is the wrong thing to do (will tell Rohini not to do it) and that's why it failed. If I remove that step ("source /cvmfs/ganga.cern.ch/dirac_ui/bashrc <http://ganga.cern.ch/dirac_ui/bashrc>") I can use dirac tools setup by the pilot Which dirac-dms-add-file? /scratch/9873546.ce03.tier2.hep.manchester.ac.uk/CREAM180749452/DIRAC_kNwYi6pilot/scripts/dirac-dms-add-file Executing dirac-dms-add-file "/skatelescope.eu/user/a/alessandra.forti/pippo" "/scratch/7320711.ce01.tier2.hep.manchester.ac.uk/CREAM885168516/DIRAC_Z1bELSpilot/7429364/pippo" UKI-NORTHGRID-MAN-HEP-disk Uploading /skatelescope.eu/user/a/alessandra.forti/pippo Successfully uploaded file to UKI-NORTHGRID-MAN-HEP-disk but I cannot use gfal though as it fails with error. This is less important, I mention it only because you were suggesting it File "/scratch/7320711.ce01.tier2.hep.manchester.ac.uk/CREAM885168516/DIRAC_Z1bELSpilot/Linux_x86_64_glibc-2.12/bin/gfal-ls", line 24, in <module> from gfal2_util.shell import Gfal2Shell ImportError: No module named gfal2_util.shell On VAC the emi-wn setup is indeed sourced in the /etc/profile.d grep emi-wn /etc/profile.d/* /etc/profile.d/setup-wn-example.sh:base=/cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1 so it is part of the VM and therefore dirac setup doesn't work at all whether we source or we don't an alternative dirac_ui. Uploading /skatelescope.eu/user/a/alessandra.forti/pippo StorageFactory._generateStorageObject: Failed to instantiate UKI-NORTHGRID-MAN-HEP-disk: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri Traceback (most recent call last): File "/scratch/plt/DIRAC/Resources/Storage/StorageFactory.py", line 416, in __generateStorageObject storage = storageClass(storageName, parameters) File "/scratch/plt/DIRAC/Resources/Storage/GFAL2_SRM2Storage.py", line 31, in __init__ super( GFAL2_SRM2Storage, self ).__init__( storageName, parameters ) File "/scratch/plt/DIRAC/Resources/Storage/GFAL2_StorageBase.py", line 74, in __init__ self.ctx = gfal2.creat_context() GError: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri StorageFactory.getStorages: Failed to instantiate any storage protocols. UKI-NORTHGRID-MAN-HEP-disk Error: failed to upload /skatelescope.eu/user/a/alessandra.forti/pippo to UKI-NORTHGRID-MAN-HEP-disk cheers alessandra On 17/01/2018 12:24, Daniela Bauer wrote:
Hi Alessandra,
I know LZ uses dirac-dms-add-file and that seems to work without a problem. But if you run inside a DIRAC job, you shouldn't have to source anything, the dirac tools are just there. (I just looked at your failed jobs and it says "source /cvmfs/ganga.cern.ch/dirac_ui/bashrc <http://ganga.cern.ch/dirac_ui/bashrc>". This will kill your setup.)
What does happen when she just makes a dirac ui (now works on SL6 and SL7) and just tries and adds a file from there interactively?
If you rather use gfal inside a dirac UI (because admittedly it's slightly sturdier than the dirac tools) and then register the files separately, we have a bit of code, written for the solid VO (so obviously experiment specific) https://github.com/ic-hep/DIRAC-tools/blob/master/solid/move_files_and_regis... which does this (though this script was a starting point, so it doesn't catch all the errors).
Hope that helps.
Daniela
On 17 January 2018 at 12:07, Alessandra Forti <Alessandra.Forti@cern.ch <mailto:Alessandra.Forti@cern.ch>> wrote:
Hi Daniela,
she is sourcing the dirac_ui setup and trying to upload files that she downloads from somewhere that is not grid enabled.
Are other people trying to use the dirac tools in their jobs to upload files?
cheers alessandra
On 17/01/2018 11:58, Daniela Bauer wrote:
Hi Alessandra,
can you please let us know what Rohini is doing that interferes with gfal, because we haven't had anyone else report problems moving files. DIRAC doesn't do tickets, beyond git issues, at the moment, so it's on the main dirac mailing list. We have the (almost) latest and greatest version of DIRAC installed (we are on v6r19p10, p11 only came out a couple of days ago), so if there's something that needs to be moved from LHCb dirac to vanilla DIRAC, the Andrew needs to move it.
regards, Daniela
On 17 January 2018 at 11:36, Andrew McNab <Andrew.Mcnab@cern.ch <mailto:Andrew.Mcnab@cern.ch>> wrote:
> On 17 Jan 2018, at 11:26, Alessandra Forti <Alessandra.Forti@cern.ch <mailto:Alessandra.Forti@cern.ch>> wrote: > > Hi Daniela, > > On 17/01/2018 11:13, Daniela Bauer wrote: >> Hi Alessandra, >> >> about the database time out: We see that occasionally in the logs, but a cursory survey seemed to indiate either network problems or some going wrong deep in dirac core, which given that we can't reproduce it, we haven't managed to debug. I'm going to forward your question to teh developers to see if there's a quick hack, but I wouldn't get my hopes up, I'd stick with a retry loop. > if there is a ticket for dirac let me know and I'll add my voice. There is also that annoying thing in which they add an extra directory to the TURL but the file catalogue doesn't report it. So the file is copied to srm://...../skatelescope.eu/skatelescope.eu/...... <http://skatelescope.eu/skatelescope.eu/......>.. but the file catalogue reports srm://...../skatelescope.eu/...... <http://skatelescope.eu/......>.. they could at least report the right thing. >> >> About the gfal2 conflict: Does this happen on VAC or all sites ? > it happened also on normal Manchester nodes. Unsetting the GFAL env vars works on normal nodes, and now Rohini can at least run in Manchester but it still desn't work on VAC because things look indeed to be reset (though I haven't yet looked deeper at what is going on). On normal nodes it certainly is not the /etc/profile.d scripts executing that setup. I don't know about VAC though I spoke briefly to Andrew he didn't give me an answer.
Andrew didn’t understand why it is happening since the /etc/profile.d in the VMs is correct (it’s from UMD distributed in cvmfs, not EMI). Now I know it’s happening on conventional grid sites too, for GridPP DIRAC but not LHCb DIRAC, it points to something higher up.
Cheers
Andrew
>> If it's VAC sites, you need to talk to Andrew. I have an email to him dated the 12/12/2017 which contains the comment : "Hmm, it appears GFAL_PLUGIN_DIR & GFAL_CONFIG_DIR are being reset back to >> the CVMFS version (which doesn't happen on plain grid sites). Could there >> be something in profile.d or equivalent which is causing it to get reset >> across the sudo call or something like that?" This is part of a thread from November when we did the major version upgrade in DIRAC, but we have no access to the VAC stuff to upgrade it in sync. >> If this happens on all DIRAC sites, then could you please send the bit of code that fails, so we can run some tests. >> >> I've added the list in case other people have similar problems. >> >> Thanks, >> Daniela >> >> On 16 January 2018 at 14:25, Alessandra Forti <Alessandra.Forti@cern.ch <mailto:Alessandra.Forti@cern.ch>> wrote: >> Hi, >> >> I'm helping Rohini getting the handle of running using Dirac and we have encountered a couple of problems. >> >> 1) When she uploads files to the storage using dirac-dms-add-file it sometimes fails. The file gets copied but the registration to the file catalogue fails with this time out error. >> >> 2018-01-12 18:27:07 UTC dirac-dms-add-file/DataManager[Roxq] DEBUG: Failed to register file. /skatelescope.eu/user/r/rohini.joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar <http://skatelescope.eu/user/r/rohini.joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar> {'FileCatalog': 'Handshake timeout exceeded'} >> >> I wonder if the timeout is configurable or if it might be a problem with number of connections or load that can be looked at as I don't think this is a client problem. If you want to see the whole debugging files we can put them somewhere accessible. >> >> 2) Something in the dirac submission system sources the emi-wn setup. Since Rohini is using dirac_ui tools this creates usual conflicts with the gfal libraries in the LD_LIBRARY_PATH. >> >> StorageFactory._generateStorageObject: Failed to instantiate UKI-NORTHGRID-MAN-HEP-disk: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so> plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so>: undefined symbol: gfal2_free_uri >> == EXCEPTION == GError >> >> GError: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so> plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so>: undefined symbol: gfal2_free_uri >> >> it is not the site doing this and for now we solved the problem by unsetting a couple of GFAL variables. This has allowed to use at least Manchester, but it still doesn't work on VAC sites like Cambridge. I have to dig further for the latter to check where is the difference. Do you know which part may source the /cvmfs/grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh <http://grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh> and if there is any specific problem with VAC? >> >> cheers >> alessandra >> >> -- >> Respect is a rational process. \\// >> Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) >> For Ur-Fascism, disagreement is treason. (U. Eco) >> But but but her emails... covfefe! >> >> >> >> >> -- >> Sent from the pit of despair >> >> ----------------------------------------------------------- >> daniela.bauer@imperial.ac.uk <mailto:daniela.bauer@imperial.ac.uk> >> HEP Group/Physics Dep >> Imperial College >> London, SW7 2BW >> Tel: +44-(0)20-75947810 <tel:%2B44-%280%2920-75947810> >> http://www.hep.ph.ic.ac.uk/~dbauer/ <http://www.hep.ph.ic.ac.uk/%7Edbauer/> > > -- > Respect is a rational process. \\// > Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) > For Ur-Fascism, disagreement is treason. (U. Eco) > But but but her emails... covfefe! > > -- > _______________________________________________ > Gridpp-Dirac-Users mailing list > Gridpp-Dirac-Users@imperial.ac.uk <mailto:Gridpp-Dirac-Users@imperial.ac.uk> > https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users <https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users>
Cheers
Andrew
-- Dr Andrew McNab University of Manchester High Energy Physics, LHCb@CERN (Deputy Computing Coordinator), and GridPP (LHCb + Tier-2 Evolution) www.hep.manchester.ac.uk/u/mcnab <http://www.hep.manchester.ac.uk/u/mcnab> Skype: andrew.mcnab.uk <http://andrew.mcnab.uk>
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk <mailto:daniela.bauer@imperial.ac.uk> HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 <tel:+44%2020%207594%207810> http://www.hep.ph.ic.ac.uk/~dbauer/ <http://www.hep.ph.ic.ac.uk/%7Edbauer/>
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk <mailto:daniela.bauer@imperial.ac.uk> HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/ <http://www.hep.ph.ic.ac.uk/%7Edbauer/>
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
Hi Alessandra, The gfal error is something we've reported to DIRAC, it's a bug in the which they clearly don't consider important, as you aren't supposed to use plain gfal (because they clearly know better what we want,sigh). What happens is that the python2.6/site-packages are missing from the PYTHONPATH in the default setup. If you wanted to fix it: export PYTHONPATH=$PYTHONPATH:$DIRAC/Linux_x86_64_glibc-2.12/lib/python2.6/site-packages should do the trick. I leave the VAC problem for Andrew to sort out, I don't think there's anything I can do from my side. Regards, Daniela On 17 January 2018 at 19:55, Alessandra Forti <Alessandra.Forti@cern.ch> wrote:
Hi Daniela,
thanks, that helps. Now I think I confused jobs outputs when I said emi-wn was sourced everywhere it is sourced only on VAC sites. So
On normal grid sites like Manchester and Bristol there is no emi-wn setup, but the pilot sets up the dirac tools and sourcing another dirac_ui is the wrong thing to do (will tell Rohini not to do it) and that's why it failed. If I remove that step ("source /cvmfs/ganga.cern.ch/dirac_ui/bashrc") I can use dirac tools setup by the pilot
Which dirac-dms-add-file? /scratch/9873546.ce03.tier2.hep.manchester.ac.uk/CREAM180749452/DIRAC_ kNwYi6pilot/scripts/dirac-dms-add-file
Executing dirac-dms-add-file "/skatelescope.eu/user/a/ alessandra.forti/pippo" "/scratch/7320711.ce01.tier2.hep.manchester.ac.uk/ CREAM885168516/DIRAC_Z1bELSpilot/7429364/pippo" UKI-NORTHGRID-MAN-HEP-disk
Uploading /skatelescope.eu/user/a/alessandra.forti/pippo Successfully uploaded file to UKI-NORTHGRID-MAN-HEP-disk
but I cannot use gfal though as it fails with error. This is less important, I mention it only because you were suggesting it
File "/scratch/7320711.ce01.tier2.hep.manchester.ac.uk/ CREAM885168516/DIRAC_Z1bELSpilot/Linux_x86_64_glibc-2.12/bin/gfal-ls", line 24, in <module> from gfal2_util.shell import Gfal2Shell ImportError: No module named gfal2_util.shell
On VAC the emi-wn setup is indeed sourced in the /etc/profile.d
grep emi-wn /etc/profile.d/* /etc/profile.d/setup-wn-example.sh:base=/cvmfs/grid. cern.ch/emi-wn-3.17.1-1.sl6umd4v1
so it is part of the VM and therefore dirac setup doesn't work at all whether we source or we don't an alternative dirac_ui.
Uploading /skatelescope.eu/user/a/alessandra.forti/pippo StorageFactory._generateStorageObject: Failed to instantiate UKI-NORTHGRID-MAN-HEP-disk: Unable to open the /cvmfs/ grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/ gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3. 17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri Traceback (most recent call last): File "/scratch/plt/DIRAC/Resources/Storage/StorageFactory.py", line 416, in __generateStorageObject storage = storageClass(storageName, parameters) File "/scratch/plt/DIRAC/Resources/Storage/GFAL2_SRM2Storage.py", line 31, in __init__ super( GFAL2_SRM2Storage, self ).__init__( storageName, parameters ) File "/scratch/plt/DIRAC/Resources/Storage/GFAL2_StorageBase.py", line 74, in __init__ self.ctx = gfal2.creat_context() GError: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3. 17.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/ grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/ gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri StorageFactory.getStorages: Failed to instantiate any storage protocols. UKI-NORTHGRID-MAN-HEP-disk Error: failed to upload /skatelescope.eu/user/a/alessandra.forti/pippo to UKI-NORTHGRID-MAN-HEP-disk
cheers alessandra
On 17/01/2018 12:24, Daniela Bauer wrote:
Hi Alessandra,
I know LZ uses dirac-dms-add-file and that seems to work without a problem. But if you run inside a DIRAC job, you shouldn't have to source anything, the dirac tools are just there. (I just looked at your failed jobs and it says "source /cvmfs/ganga.cern.ch/dirac_ui/bashrc". This will kill your setup.)
What does happen when she just makes a dirac ui (now works on SL6 and SL7) and just tries and adds a file from there interactively?
If you rather use gfal inside a dirac UI (because admittedly it's slightly sturdier than the dirac tools) and then register the files separately, we have a bit of code, written for the solid VO (so obviously experiment specific) https://github.com/ic-hep/DIRAC-tools/blob/master/solid/ move_files_and_register.py which does this (though this script was a starting point, so it doesn't catch all the errors).
Hope that helps.
Daniela
On 17 January 2018 at 12:07, Alessandra Forti <Alessandra.Forti@cern.ch> wrote:
Hi Daniela,
she is sourcing the dirac_ui setup and trying to upload files that she downloads from somewhere that is not grid enabled.
Are other people trying to use the dirac tools in their jobs to upload files?
cheers alessandra
On 17/01/2018 11:58, Daniela Bauer wrote:
Hi Alessandra,
can you please let us know what Rohini is doing that interferes with gfal, because we haven't had anyone else report problems moving files. DIRAC doesn't do tickets, beyond git issues, at the moment, so it's on the main dirac mailing list. We have the (almost) latest and greatest version of DIRAC installed (we are on v6r19p10, p11 only came out a couple of days ago), so if there's something that needs to be moved from LHCb dirac to vanilla DIRAC, the Andrew needs to move it.
regards, Daniela
On 17 January 2018 at 11:36, Andrew McNab <Andrew.Mcnab@cern.ch> wrote:
On 17 Jan 2018, at 11:26, Alessandra Forti <Alessandra.Forti@cern.ch> wrote:
Hi Daniela,
On 17/01/2018 11:13, Daniela Bauer wrote:
Hi Alessandra,
about the database time out: We see that occasionally in the logs, but a cursory survey seemed to indiate either network problems or some going wrong deep in dirac core, which given that we can't reproduce it, we haven't managed to debug. I'm going to forward your question to teh developers to see if there's a quick hack, but I wouldn't get my hopes up, I'd stick with a retry loop. if there is a ticket for dirac let me know and I'll add my voice. There is also that annoying thing in which they add an extra directory to the TURL but the file catalogue doesn't report it. So the file is copied to srm://...../skatelescope.eu/skatelescope.eu/........ but the file catalogue reports srm://...../skatelescope.eu/........ they could at least report the right thing.
About the gfal2 conflict: Does this happen on VAC or all sites ? it happened also on normal Manchester nodes. Unsetting the GFAL env vars works on normal nodes, and now Rohini can at least run in Manchester but it still desn't work on VAC because things look indeed to be reset (though I haven't yet looked deeper at what is going on). On normal nodes it certainly is not the /etc/profile.d scripts executing that setup. I don't know about VAC though I spoke briefly to Andrew he didn't give me an answer.
Andrew didn’t understand why it is happening since the /etc/profile.d in the VMs is correct (it’s from UMD distributed in cvmfs, not EMI). Now I know it’s happening on conventional grid sites too, for GridPP DIRAC but not LHCb DIRAC, it points to something higher up.
Cheers
Andrew
If it's VAC sites, you need to talk to Andrew. I have an email to him dated the 12/12/2017 which contains the comment : "Hmm, it appears GFAL_PLUGIN_DIR & GFAL_CONFIG_DIR are being reset back to the CVMFS version (which doesn't happen on plain grid sites). Could there be something in profile.d or equivalent which is causing it to get reset across the sudo call or something like that?" This is part of a thread from November when we did the major version upgrade in DIRAC, but we have no access to the VAC stuff to upgrade it in sync. If this happens on all DIRAC sites, then could you please send the bit of code that fails, so we can run some tests.
I've added the list in case other people have similar problems.
Thanks, Daniela
On 16 January 2018 at 14:25, Alessandra Forti < Alessandra.Forti@cern.ch> wrote: Hi,
I'm helping Rohini getting the handle of running using Dirac and we have encountered a couple of problems.
1) When she uploads files to the storage using dirac-dms-add-file it sometimes fails. The file gets copied but the registration to the file catalogue fails with this time out error.
2018-01-12 18:27:07 UTC dirac-dms-add-file/DataManager[Roxq] DEBUG: Failed to register file. /skatelescope.eu/user/r/rohini .joshi/GOODSN581359/L581367/L581367_SB023_uv.MS_0d357702.tar {'FileCatalog': 'Handshake timeout exceeded'}
I wonder if the timeout is configurable or if it might be a problem with number of connections or load that can be looked at as I don't think this is a client problem. If you want to see the whole debugging files we can put them somewhere accessible.
2) Something in the dirac submission system sources the emi-wn setup. Since Rohini is using dirac_ui tools this creates usual conflicts with the gfal libraries in the LD_LIBRARY_PATH.
StorageFactory._generateStorageObject: Failed to instantiate UKI-NORTHGRID-MAN-HEP-disk: Unable to open the /cvmfs/ grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal 2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/grid.cern.ch/emi-wn-3.1 7.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri == EXCEPTION == GError
GError: Unable to open the /cvmfs/grid.cern.ch/emi-wn-3.1 7.1-1.sl6umd4v1/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so plugin specified in the plugin directory, failure : /cvmfs/ grid.cern.ch/emi-wn-3.17.1-1.sl6umd4v1/usr/lib64/gfal 2-plugins//libgfal_plugin_gridftp.so: undefined symbol: gfal2_free_uri
it is not the site doing this and for now we solved the problem by unsetting a couple of GFAL variables. This has allowed to use at least Manchester, but it still doesn't work on VAC sites like Cambridge. I have to dig further for the latter to check where is the difference. Do you know which part may source the /cvmfs/grid.cern.ch/emi-wn-3.1 7.1-1.sl6umd4v1/etc/profile.d/setup-wn-example.sh and if there is any specific problem with VAC?
cheers alessandra
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
Cheers
Andrew
-- Dr Andrew McNab University of Manchester High Energy Physics, LHCb@CERN (Deputy Computing Coordinator), and GridPP (LHCb + Tier-2 Evolution) www.hep.manchester.ac.uk/u/mcnab Skype: andrew.mcnab.uk
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 <+44%2020%207594%207810> http://www.hep.ph.ic.ac.uk/~dbauer/
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- Sent from the pit of despair
----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 <+44%2020%207594%207810> http://www.hep.ph.ic.ac.uk/~dbauer/
-- Respect is a rational process. \\// Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... covfefe!
-- Sent from the pit of despair ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/
participants (3)
-
Alessandra Forti
-
Andrew McNab
-
Daniela Bauer