Hello all, I am running a 3DH1D turbulent channel flow at Re_tau=550 (MKM590, larger domain) which has a resolution of ~0.5 billion DoF and streamwise direction is the homogeneous direction with FFT. Our target is to achieve the solver setting to have fastest computation time on Cray- XC40 machine. We have tried a plenty of options e.g. Direct Solver, Iterative solver with 1D FFT and different pre-conditioners. But our CPU Time per time step is ~5 seconds, which is quite slow as per our experience. - We compare "DirectMultiLevelStaticCond" and "DirectStaticCond" solver and finds that the first one is slightly faster than the second one. It was due to 1D decomposition of the direct solver (--npz). We cannot utilize a large number of processors (e.g. 480 processors for 960 HomModeZ). Is 2D decomposition of direct solver exist? - If we use FFTW together with the iterative solver, can I expect better performance than the direct solver? What would be the optimal choice for the solver/preconditioner combination? Would the options in PETCS help? - According to my experience in full 3D DNS case (50 mio. mesh), around 0.2 sec/time step can be expected. Although this requires a large number of cores. Can I achieve something similar with 3DH1D cases? Part of xml file: <EXPANSIONS> <E COMPOSITE="C[0]" NUMMODES="7" FIELDS="u,v,w" TYPE="MODIFIED" /> <E COMPOSITE="C[0]" NUMMODES="7" FIELDS="p" TYPE="MODIFIED" /> </EXPANSIONS> <CONDITIONS> <SOLVERINFO> <I PROPERTY="SolverType" VALUE="VelocityCorrectionScheme"/> <I PROPERTY="EQTYPE" VALUE="UnsteadyNavierStokes"/> <I PROPERTY="AdvectionForm" VALUE="Convective"/> <I PROPERTY="Projection" VALUE="Galerkin"/> <I PROPERTY="TimeIntegrationMethod" VALUE="IMEXOrder2"/> <I PROPERTY="HOMOGENEOUS" VALUE="1D"/> <I PROPERTY="DEALIASING" VALUE="ON"/> <I PROPERTY="USEFFT" VALUE="FFTW" /> <I PROPERTY="SpectralVanishingViscosity" VALUE="True"/> <I PROPERTY="SPECTRALHPDEALIASING" VALUE="True" /> <I PROPERTY="GlobalSysSoln" VALUE="DirectMultiLevelStaticCond" /> </SOLVERINFO> <PARAMETERS> <P> TimeStep = 10e-05 </P> <P> FinalTime = 10.0 </P> <P> NumSteps = FinalTime/TimeStep </P> <P> NumSteps = 1000000 </P> <P> IO_CheckSteps = 2000 </P> <P> IO_InfoSteps = 5 </P> <P> IO_CFLSteps = IO_InfoSteps </P> <P> Re = 10000 </P> <P> Kinvis = 1.0/544.0 </P> <P> HomModesZ = 960 </P> <P> LZ = 8.0*PI </P> <P> SVVCutoffRatio = 0.7 </P> <P> SVVDiffCoeff = 0.1 </P> </PARAMETERS> Thank you! Sandeep