Unreproducible results and NaN appears following the incompressible Navier-Stokes tutorial
Dear Nektar-users, we, a students team at Hamburg University, are trying to follow the "Incompressible Navier-Stokes" tutorial on our HPC cluster. Suggested by the hints we extended the tutorial to the 128^3 case, have MPI plus FFTW enabled and are running via mpirun. See attached "tutorial-setup.txt" for exact setup of the extended tutorial. Intel-Parallel-Studio is used as compiler and provider for MPI and BLAS/LAPACK via MKL. We installed using spack and its recent nektar package (plus enabling MKL). Trying to analyze scaling we failed to get any reproducible results. We had a few runs completing all checkpoints without error so we consider our setup correct. But other identical runs quit most times early or sometimes late giving "NaN found during time integration" error. We successfully run earlier tutorials and the 64^3 version with default conditions on a single process. Going back to Nektar version 4.4.0 or 4.3.5 also throws NaN error. We did not start enough runs in those versions to get a successful one. I found this NaN error mentioned in the mailing list archive, but it was pointed out that this was due to an improper physical setup. This should be fine in the tutorial, right? Could anyone provide some hints how to solve this problem? Can this be considered a bug of Nektar or may there be a hidden problem in our MKL setup? Regards Oliver
participants (1)
- 
                
                Oliver Pola