Hello Chris

Thank you for your reply. 

I managed to resolve the problem by adding the "--mca btl tcp,self" flag to the mpi command. Additionally, NFS now sits on top of a GPFS instance which definitely helped achieving faster I/O.

Thank for clarifying parallel I/O approach as well. I guess using HDF5 would definitely help in this case.

// Fatih

On Fri, Feb 1, 2019 at 2:16 AM Chris Cantwell <c.cantwell@imperial.ac.uk> wrote:
Hi Fatih,

Unfortunately, this is not currently possible with Nektar++.

Each rank will either write a single file containing its portion of the subdomain per checkpoint, or will concurrently write to a single HDF5 file, along with all other ranks.

Cheers,
Chris


On Mon, 28 Jan 2019 12:00:30 -0500, Fatih Ertinaz <fertinaz@gmail.com> wrote:
> Maybe I can ask another question about parallel I/O implementation.
>
> Is that possible to determine I/O ranks while running on multiple nodes?
> For instance, can I restrict file read and writes to a single processor
> while running on an HPC environment?
>
> // Fatih
>
> On Fri, Jan 25, 2019 at 2:14 PM Fatih Ertinaz <fertinaz@gmail.com> wrote:
>
> > Hello
> >
> > I deploy a small cluster on the cloud (not on IB). I setup the VMs, keys
> > etc. and I use NFS for shared storage. This is a small cluster intended for
> > 1-2 users, so presumably NFS should be fine. At least that's what I thought.
> >
> > Currently I am testing the tutorial case "basics-advection-diffusion". It
> > runs when executed in serial or parallel using 1 node (4 cores). However,
> > when I use 2 nodes it hangs during:
> > 
> >> Initial Conditions:
> >>   - Field u: sin(k*x)*cos(k*y)
> >> Writing: "ADR_mesh_aligned_0.chk" (0.0199919s, XML) 
> >
> >
> > I see that "ADR_mesh_aligned_0" and its content written successfully,
> > however next directory cannot be created and solver remains idle.
> > 
> >> mpirun -np 8 -mca btl_tcp_if_include eth0 -hostfile hosts \
> >>   $NEKTAR_BIN/ADRSolver ADR_mesh_aligned.xml ADR_conditions.xml 
> >
> >
> > I tested a simple hostname command and an mpi parallel file write, both
> > worked fine on two nodes with mpirun.
> >
> > Any suggestions highly appreciated. Thank you
> >
> > // Fatih
> > 


--
Chris Cantwell
Imperial College London
South Kensington Campus
London SW7 2AZ
Email: c.cantwell@imperial.ac.uk
www.imperial.ac.uk/people/c.cantwell