Hello I deploy a small cluster on the cloud (not on IB). I setup the VMs, keys etc. and I use NFS for shared storage. This is a small cluster intended for 1-2 users, so presumably NFS should be fine. At least that's what I thought. Currently I am testing the tutorial case "basics-advection-diffusion". It runs when executed in serial or parallel using 1 node (4 cores). However, when I use 2 nodes it hangs during:
Initial Conditions: - Field u: sin(k*x)*cos(k*y) Writing: "ADR_mesh_aligned_0.chk" (0.0199919s, XML)
I see that "ADR_mesh_aligned_0" and its content written successfully, however next directory cannot be created and solver remains idle.
mpirun -np 8 -mca btl_tcp_if_include eth0 -hostfile hosts \ $NEKTAR_BIN/ADRSolver ADR_mesh_aligned.xml ADR_conditions.xml
I tested a simple hostname command and an mpi parallel file write, both worked fine on two nodes with mpirun. Any suggestions highly appreciated. Thank you // Fatih