*******************
This email originates from outside Imperial. Do not click on links and attachments unless you recognise the sender.
If you trust the sender, add them to your safe senders list https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
*******************
Hello
In the last few days I have noticed a small fraction of my jobs start to
fail instantly with the error:
(probably ~ 1/1000, though some sites maybe seem more susceptible than
others)
EPoll: Bad file descriptor polling for events
(seems to be after <1s CPU time)
The only thing I have changed in my jobs since this started to happen is
that I now use the feature where you can specify LFN:/your/file in the
inputSandbox (previously i was just manually issuing a download command
inside the job).
To simplify the situation, I made a test job, that has the LFN of a text
file in the inputSandbox, and then the jobs just 'cat's out the
content. Repeating this job a few times at IN2p3 (where I had seen this
happen the most frequently, but it has happened at other sites too), I
managed to bump into the error.
e.g. DIRAC JOB ID: 29573283
I ran some test jobs without the LFN in the inputSandbox and they all
ran fine (though this was a small sample so can't really conclude
anything from that).
So it seems likely it is linked to my use of inputSandbox to download
files, but it is relatively rare that it actually causes an issue. Is
this a known thing? Am I doing something wrong.. should I be using
inputSandbox in this way?
Cheers
Sophie