Re: [Nektar-users] Unable to Restart SImulation on Different Cluster

24 Jul 2023

      Hi Issac,

Did you compile the installations of Nektar++ on the two clusters yourself? I would assume it’s important to ensure that both are built with the same set of options / dependencies. If not, this could result in some differences - e.g. are both installations compiled either with, or without FFTW or other optional dependencies.

I’m not an expert on the mathematical aspects of the problem but I suspect that this is likely to be related to some difference in the builds of Nektar++.

This is just an initial thought, maybe a member of the community with more knowledge of the specifics of the problem your solving can offer other insights.

Kind regards,

Jeremy
...
On 24 Jul 2023, at 03:07, Isaac Rosin <isaac.rosin1@ucalgary.ca> wrote:
This email from isaac.rosin1@ucalgary.ca <mailto:isaac.rosin1@ucalgary.ca> originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list <https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address.
Hello Nektar,
I am unable to restart a simulation after moving the case files to a new cluster. I began the simulation on one cluster and ran it to the end of the transient flow stage. Once finished, took the final .fld file, put it on another cluster and tried to run it from that time step. My expansion section is
<EXPANSIONS>
      <F VAR="u,v,w,p" FILE="session.fld" />
</EXPANSIONS>
and my initial conditions are
<FUNCTION NAME="InitialConditions">
      <F VAR="u,v,w,p" FILE="session.fld" />
</FUNCTION>
When I try to start the simulation on the new cluster, I get the same CG iterations problem every time (see output file as text at the bottom of this email). The new cluster uses different CPUs, so I thought this could have something to do with it, but I am still getting the same problem when I try different CPUs. The only thing I tried which had any effect was changing the time step size. This shouldn't have been necessary because the one I was using was already keeping the CFL low (~0.5) during the transient flow. Decreasing CFL only lowered the size of the error on the CG iterations made... line displayed before the Level 0 assertion violation, and not by much.
I would be grateful for your assistance.
Regards,
Isaac
=======================================================================
            EquationType: UnsteadyNavierStokes
            Session Name: session
            Spatial Dim.: 3
      Max SEM Exp. Order: 5
          Num. Processes: 208
          Expansion Dim.: 3
         Projection Type: Continuous Galerkin
     Advect. advancement: explicit
    Diffuse. advancement: implicit
               Time Step: 0.004
            No. of Steps: 187500
     Checkpoints (steps): 63
        Integration Type: IMEX
        Splitting Scheme: Velocity correction (strong press. form)
              Dealiasing: spectral/hp
        Smoothing-SpecHP: SVV (spectral/hp DG Kernel (diff coeff = 1*Uh/p))
=======================================================================
Initial Conditions:
  - Field u: from file session.fld
  - Field v: from file session.fld
  - Field w: from file session.fld
  - Field p: from file session.fld
CG iterations made = 5001 using tolerance of 1e-09 (error = 9.57894e-07, rhs_mag = 15.6227)
Fatal   : Level 0 assertion violation
Exceeded maximum number of iterations
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[2602,1],4]
  Exit code:    1
--------------------------------------------------------------------------
_______________________________________________
Nektar-users mailing list
Nektar-users@imperial.ac.uk <mailto:Nektar-users@imperial.ac.uk>
https://mailman.ic.ac.uk/mailman/listinfo/nektar-users <https://mailman.ic.ac.uk/mailman/listinfo/nektar-users>

Re: [Nektar-users] Unable to Restart SImulation on Different Cluster

Jeremy Cohen