Dear Fatih, Would you be able to compile in debug mode and attach a debugger to one of the hanging instances and send a backtrace? This would be helpful in diagnosing the problem. Please also confirm the exact version of the Nektar++ you are using. Cheers, Chris On Fri, 25 Jan 2019 14:14:44 -0500, Fatih Ertinaz <fertinaz@gmail.com> wrote:
Hello
I deploy a small cluster on the cloud (not on IB). I setup the VMs, keys etc. and I use NFS for shared storage. This is a small cluster intended for 1-2 users, so presumably NFS should be fine. At least that's what I thought.
Currently I am testing the tutorial case "basics-advection-diffusion". It runs when executed in serial or parallel using 1 node (4 cores). However, when I use 2 nodes it hangs during:
Initial Conditions: - Field u: sin(k*x)*cos(k*y) Writing: "ADR_mesh_aligned_0.chk" (0.0199919s, XML)
I see that "ADR_mesh_aligned_0" and its content written successfully, however next directory cannot be created and solver remains idle.
mpirun -np 8 -mca btl_tcp_if_include eth0 -hostfile hosts \ $NEKTAR_BIN/ADRSolver ADR_mesh_aligned.xml ADR_conditions.xml
I tested a simple hostname command and an mpi parallel file write, both worked fine on two nodes with mpirun.
Any suggestions highly appreciated. Thank you
// Fatih
-- Chris Cantwell Imperial College London South Kensington Campus London SW7 2AZ Email: c.cantwell@imperial.ac.uk www.imperial.ac.uk/people/c.cantwell