Problems running on more than one node (ARCHER)
I have a script which runs fine on 1 node (24 cores) but fails on 2 or more. It seems to die somewhere during the extruded mesh generation, with messages such as OSError: /tmp/pyop2-cache-uid15427/9d8ebf65631aa626aeb3cd8098e2662f.so: cannot open shared object file: No such file or directory Full output and error files attached.
Uh, is /tmp globally visible on ARCHER, or is it local to each node? Is it just rank 0, who writes the generated kernel to file? If the answers are local and /tmp, and rank 0 writing the kernel, then it’s easy to see why it fails.
On 15 Feb 2015, at 22:42, Andrew McRae <a.mcrae12@imperial.ac.uk> wrote:
I have a script which runs fine on 1 node (24 cores) but fails on 2 or more.
It seems to die somewhere during the extruded mesh generation, with messages such as
OSError: /tmp/pyop2-cache-uid15427/9d8ebf65631aa626aeb3cd8098e2662f.so: cannot open shared object file: No such file or directory
Full output and error files attached. <2node_out.txt><1node_err.txt><1node_out.txt><2node_err.txt>_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
Don't use /tmp on ARCHER! It's mounted to ramfs, so you *really* shouldn't write anything there. See also http://archer.ac.uk/documentation/user-guide/resource_management.php#sec-3.1... On 2/15/2015 11:01 PM, Miklos Homolya wrote:
Uh, is /tmp globally visible on ARCHER, or is it local to each node? Is it just rank 0, who writes the generated kernel to file?
If the answers are local and /tmp, and rank 0 writing the kernel, then it’s easy to see why it fails.
On 15 Feb 2015, at 22:42, Andrew McRae <a.mcrae12@imperial.ac.uk> wrote:
I have a script which runs fine on 1 node (24 cores) but fails on 2 or more.
It seems to die somewhere during the extruded mesh generation, with messages such as
OSError: /tmp/pyop2-cache-uid15427/9d8ebf65631aa626aeb3cd8098e2662f.so: cannot open shared object file: No such file or directory
Full output and error files attached.
participants (3)
- 
                
                Andrew McRae
- 
                
                Florian Rathgeber
- 
                
                Miklos Homolya