jobs failing with the error message: 'Transport endpoint is not connected'
Hello, I am trying to submit jobs to the grid. My jobs are failing with the error message: "Transport endpoint is not connected" Sometimes, when I resubmit the same job, I see that the job is processed perfectly. This confuses me more. I have attached the script that I used to submit the job. I have also attached the std.out and Ganga_Executable.log for your perusal. I will be grateful if you help me to debug this issue. Thank you, Arka -- =============================================================== *Arka Santra* *Post Doctoral Researcher* * Instituto de Fisica Corpuscular* * Parque Cientifico* * C/Catedratico Jose Beltran, 2* * E-46980, Paterna* * Espana* ===============================================================
Hi Arka, this looks like a site issue (cvmfs for LHCb not mounted): /var/tmp/home_crce6_212562329/CREAM212562329/DIRAC_y1RMwGpilot/8844704/run_lhe_v49r8.sh: line 31: /cvmfs/lhcb.cern.ch/lib/LbLogin.sh: Transport endpoint is not connected /var/tmp/home_crce6_212562329/CREAM212562329/DIRAC_y1RMwGpilot/8844704/run_lhe_v49r8.sh: line 35: SetupProject: command not found /var/tmp/home_crce6_212562329/CREAM212562329/DIRAC_y1RMwGpilot/8844704/run_lhe_v49r8.sh: line 39: setenvProject.sh: command not found The job attempted to run at QMUL (on cn538.htc.esc.qmul). As you say the failure is intermittent, there is probably a bad node somewhere at QMUL. I've cc'ed the QMUL admin, so he can have a look. Could you add a line to your script that logs the hostname your jobs tried to run on ? That would help with the debugging, right now this information is only available to the dirac admins. @Dan, could you have a look, please ? Regards, Daniela On 16 April 2018 at 19:19, Arka Santra <santra.arka@gmail.com> wrote:
Hello, I am trying to submit jobs to the grid. My jobs are failing with the error message: "Transport endpoint is not connected"
Sometimes, when I resubmit the same job, I see that the job is processed perfectly. This confuses me more.
I have attached the script that I used to submit the job. I have also attached the std.out and Ganga_Executable.log for your perusal.
I will be grateful if you help me to debug this issue.
Thank you, Arka
-- =============================================================== *Arka Santra*
*Post Doctoral Researcher* * Instituto de Fisica Corpuscular*
* Parque Cientifico* * C/Catedratico Jose Beltran, 2*
* E-46980, Paterna* * Espana* ===============================================================
-- _______________________________________________ Gridpp-Dirac-Users mailing list Gridpp-Dirac-Users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/gridpp-dirac-users
-- Sent from the pit of despair ----------------------------------------------------------- daniela.bauer@imperial.ac.uk HEP Group/Physics Dep Imperial College London, SW7 2BW Tel: +44-(0)20-75947810 http://www.hep.ph.ic.ac.uk/~dbauer/
Hi Arka, This is a problem with CVMFS on the worker node which leads to it not setting up the LHCb environment and therefore producing lots of other errors. Do you know where these jobs ran? If so, you can send a ticket yourself or someone here can follow up maybe? Thanks! mark On 16/04/2018 19:19, santra.arka@gmail.com wrote:
Hello, I am trying to submit jobs to the grid. My jobs are failing with the error message: "Transport endpoint is not connected"
Sometimes, when I resubmit the same job, I see that the job is processed perfectly. This confuses me more.
I have attached the script that I used to submit the job. I have also attached the std.out and Ganga_Executable.log for your perusal.
I will be grateful if you help me to debug this issue.
Thank you, Arka
-- =============================================================== /Arka Santra/ /Post Doctoral Researcher / / Instituto de Fisica Corpuscular/ / Parque Cientifico / / C/Catedratico Jose Beltran, 2/ / E-46980, Paterna / / Espana/ ===============================================================
participants (3)
-
Arka Santra
-
Daniela Bauer
-
Mark Slater