New subject: Parallel transposition not implemented yet for 3D-Homo-2D approach

20 Apr 2016

      Hi Douglas,

thanks for the feedback. I was aware of --npz parallelization but was using a small number, not 1/2 or 1/4 of HomModesZ. Increasing npz really helped. I still have to try GlobalSysSoln.

Now I face a memory problem for another case. The simulation runs out of memory when starting from a checkpoint file. Here is a little bit information about this case:

- Mesh is made of around 16000 quad elements with p=5, i.e., NUMMODES="6" TYPE="MODIFIED" in xy, and HomModesZ=1080 in z direction.
- I'm trying to run this case on 60 computing nodes each equipped with 24 processors, and a memory of 105 gb. In total, it makes 1440 procs, and 6300gb memory.
- Execution command: mpirun -np 1440 IncNavierStokesSolver --npz 360 config.xml

I was wondering if the memory usage of the application is scaling on different cores during IO, or using only one core. If it is only one core, than if it exceeds 105gb, it crushes I guess. Would you have maybe any suggestion/comment on this?

Thanks,
Asim

On 04/13/2016 12:12 AM, Serson, Douglas wrote:

Hi Asim,

Concerning your questions:

1-

Are you using the command line argument --npz? This is very important for obtaining an efficient parallel performance with the Fourier expansion, since it defines the number of partitions in the z-direction. If it is not set, only the xy plane will be partitioned and the parallelism will saturate quickly.

I suggest initially setting npz to 1/2 or 1/4 of HomModesZ (note that nprocs must be a multiple of npz, since nprocs/npz is the number of partitions in the xy plane).

Also, depending on your particular case and the number of partitions you have in the xy plane, your simulation may benefit from using a direct solver for the linear systems.

This can be activated by adding '-I GlobalSysSoln=XxtMultiLevelStaticCond' to the command line. This is usually more efficient for a small number of partitions, but considering the large size of your problem it might be worth trying it.

2- I am not sure what could be causing that. I suppose it would help if you could send the exact commands you are using to run FieldConvert.

Cheers,

Douglas

________________________________
From: Asim Onder <ceeao@nus.edu.sg><mailto:ceeao@nus.edu.sg>
Sent: 12 April 2016 06:42
To: Sherwin, Spencer J; Serson, Douglas
Cc: nektar-users; Moxey, David
Subject: Re: [Nektar-users] Parallel transposition not implemented yet for 3D-Homo-2D approach

Dear Spencer, Douglas, Nektar-users,

I'm involved now in testing of a local petascale supercomputer, and for some quite limited time I can use several thousand processors for my DNS study.

My test case is oscillating flow over a rippled bed. I build up a dense unstructured grid with p=6 quadrilateral elements in x-y, and Fourier expansions in z directions. In total I have circa half billion dofs per variable. I would have a few questions about this relatively large case:

1. I noticed that scaling gets inefficient after around 500 procs, let's say parallel efficiency goes below 80%. I was wondering if you would have any general suggestions to tune the configurations for a better scaling.

2. Postprocessing vorticity and Q criterion is not working for this case. At the of the execution Fieldconvert writes some small files without the field data. What could be the reason for this?

Thanks you in advance for your suggestions.

Cheers,
Asim

On 03/21/2016 04:16 AM, Sherwin, Spencer J wrote:
Hi Asim,

To follow-up on Douglas’ comment we are trying to get more organised to sort out a developers guide.

We are also holding a user meeting in June. If you were able to make this we could also try and have a session on getting you going on the developmental side of things.

Cheers,
Spencer.

On 17 Mar 2016, at 14:58, Serson, Douglas <d.serson14@imperial.ac.uk<mailto:d.serson14@imperial.ac.uk>> wrote:

Hi Asim,

I am glad that your simulation is now working. About your questions:

1. We have some work done on a filter for calculating Reynolds stresses as the simulation progresses, but it
is not ready yet, and it would not provide all the statistics you want. Since you already have a lot of chk files,
I suppose the best way would indeed be using a script to process all of them with FieldConvert.

2. Yes, this has been recently included in FieldConvert, using the new 'meanmode' module.

3. I just checked that, and apparently this is caused by a bug when using this module without fftw. This should be fixed soon,
but as an alternative this module should work if you switch fftw on (just add  <I PROPERTY="USEFFT" VALUE="FFTW"/>
to you session file, if the code was compiled with support to fftw).

4. I think there is some work towards a developer guide, but I don't how advanced is the progress on that. I am sure Spencer will
be able to provide you with more information on that.

Cheers,
Douglas

________________________________________
From: Asim Onder <ceeao@nus.edu.sg<mailto:ceeao@nus.edu.sg>>
Sent: 17 March 2016 09:10
To: Serson, Douglas; Sherwin, Spencer J
Cc: nektar-users; Moxey, David
Subject: Re: [Nektar-users] Parallel transposition not implemented yet for 3D-Homo-2D approach

Hi Spencer, Douglas,

Thanks to your suggestions I managed to get the turbulent regime for the
oscillatory channel flow.

I have now completed the DNS study for one case, and built up a large
database with checkpoint (*chk) files. I would like to calculate
turbulent statistics using this database, especially for second order
terms, e.g. Reynolds stresses and turbulent dissipation, and third order
terms, e.g. turbulent diffusion terms. However, I am a little bit
confused how I could achieve this. I would appreciate if you could give
some hints about the following:

1. The only way I could think of to calculate turbulent statistics is to
write a simple bash script to iterate over chk files, and apply various
existing/extended FieldConvert operations on individual chk files. This
would require some additional storage to store the intermediate steps,
and therefore would be a bit cumbersome. Would it be any simpler way
directly doing this directly in Nektar++?

2. I have one homogeneous direction, for which I used Fourier
expansions. I would like to apply spatial averaging over this
homogeneous direction. Does Nektar++ already contain such functionality?

3. I want to use 'wss' in Fieldconvert module to calculate wall shear
stress. However, it returns segmentation fault. Any ideas why it could be?

4. I was wondering if there is any introductory document for basic
programming in Nektar++. User guide does not contain information about
programming. It would be nice to have some additional information to
Doxygen documentation.

Thank you very much in advance for your feedback.

Cheers,
Asim

On 02/15/2016 11:59 PM, Serson, Douglas wrote:
Hi Asim,

As Spencer mentioned, svv can help in stabilizing your solution. You can find information on how to set it up in the user guide (pages 92-93), but basically all you need to do is use:
<I PROPERTY="SpectralVanishingViscosity" VALUE="True"/>
You can also tune it by setting the parameters SVVCutoffRatio and SVVDiffCoeff, but I would suggest starting with the default parameters.

Also, you can use the parameter IO_CFLSteps to output the CFL number. This way you can check if the time step you are using is appropriate.

Cheers,
Douglas

From: Sherwin, Spencer J
Sent: 14 February 2016 19:46
To: ceeao
Cc: nektar-users; Serson, Douglas; Moxey, David
Subject: Re: [Nektar-users] Parallel transposition not implemented yet for 3D-Homo-2D approach

Hi Asim,

Getting a flow through transition is very challenging since there is a strong localisation of shear and this can lead to aliasing issues which can then cause instabilities.

Both Douglas and Dave have experienced this with recent simulations so I am cc’ing them to make some suggestions. I would be inclined to be using spectralhpdealiasing and svv. Hopefully Douglas can send you an example of how to switch this on.

Cheers,
Spencer.

On 11 Feb 2016, at 10:32, ceeao<<mailto:ceeao@nus.edu.sg>ceeao@nus.edu.sg<mailto:ceeao@nus.edu.sg>>  wrote:

Hi Spencer, Nektar-Users,

I followed the suggestion and coarsened the grid a bit. This way it worked impressively fast, but the flow is stable and remains laminar, as I didn't add any perturbations.

I need to kick the transition to have turbulence. If I add white noise, even very low magnitude, conjugate gradient solver blows up again. I also tried adding some sinusoidal perturbations to boundary conditions, and again had troubles with CG. I don't really  get CG's extreme sensitivity to perturbations.

Any suggestion is much appreciated. Thanks in advance.

Cheers,
Asim

On 02/08/2016 04:48 PM, Sherwin, Spencer J wrote:
 HI Asim,

How many parallel cores are you running on. Sometime starting up these flows can be tricky especially if you are immediately jumping to a high Reynolds number. Have you tried first starting the flow at a Lower Reynolds number?

Also 100  x 200 is quite a few elements in the x-y plane. Remember the polynomial order adds in more points on top of the mesh discretisation.

I would perhaps recommend trying a smaller mesh to see how that goes first. Actually I note there is a file called TurbChFl_3D1H.xml in the

~/Nektar/Solvers/IncNavierStokesSolver/Examples directory which might be worth looking at. I think this was a mesh used in Ale Bolis’ thesis which you can find under:

http://wwwf.imperial.ac.uk/ssherw/spectralhp/papers/PhDThesis/Bolis_Thesis.p...

Cheers,
Spencer.

On 1 Feb 2016, at 07:01, ceeao<mailto:ceeao@nus.edu.sg><ceeao@nus.edu.sg><mailto:ceeao@nus.edu.sg>  wrote:

Hi Spencer,

Thank you for the quick reply and suggestion. I switched indeed to 3D homo 1D case and this time I have problems with the divergence of linear solvers.

I refined the grid in the channel flow example to 100x200x64 in x-y-z directions, and left everything else the same. When I employ the default global system solver "IterativeStaticCond" with this setup, I get divergence: "Exceeded maximum number of iterations  (5000)". I checked the initial fields and mesh in Paraview, everything seems to be normal.  I also tried the "LowEnergyBlock" preconditioner, and apparently this one is valid only in sheer 3D cases.

My knowledge in iterative solvers for hp-Fem is minimal. Therefore, I was wondering if you could suggest maybe a robust option that at least converge. My concern is getting some rough estimates for the speed of Nektar++ in my oscillating channel flow problem.  If the speed will be promising, I will switch to Nektar++ from OpenFOAM, as OpenFOAM is low-order and not really suitable for DNS.

Thanks again in advance.

Cheers,
Asim

On 01/31/2016 11:53 PM, Sherwin, Spencer J wrote:
 Hi Asim,

I think your conclusions is correct. We did some early implementation into the 2D Homogeneous expansion but have not pulled it all the way through since we did not have a full project on this topic. We have however kept the existing code running  through our regression test. For now I would perhaps suggest you try the 3D homo 1D approach for your runs since you can use parallelisation in that code.

Cheers,
Spencer.

On 29 Jan 2016, at 04:00, ceeao<mailto:ceeao@nus.edu.sg><ceeao@nus.edu.sg><mailto:ceeao@nus.edu.sg>  wrote:

Dear all,

I just installed the library, and need to simulate DNS of a channel flow
with oscillating pressure gradient.
As I have two homogeneous directions I applied standard Fourier
discretization in these directions.

It seems like this case is not parallelized yet, and I got the error in
the subject.

I was wondering if I'm overlooking something. If not, are there maybe
any plans in the future to include parallelization of 2D FFT's?

Thank you in advance.

Best,
Asim Onder
Research Fellow
National University of Singapore

________________________________