Hi Helmut,

I was just trying to repeat your experience. I ran your Test_1D.xml with an expansion order 5. This mesh has one 20 elements but for 100 000 steps it takes 44 seconds. Scaling this to 5000 element would seem to suggest a run time of 3 hours. This seems to be a lot faster than the 27 hours you mentioned. This is on a Intel Xeon E5/Core i7.

Will next look to see whether my guess on the Time integration is the source of your profiling challenges.

Cheers,
Spencer.



On 4 May 2016, at 15:14, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at> wrote:

Hi Spencer,

 

Thanks for your reply. I agree, a 5th order time stepping is not necessary.

 

I did some more profiling. Substantial time seems to be spent with the construction and deconstruction of (multidimensional) arrays at every call of the respective function (DoOdeRhs, Advect, GetSourceTerm, etc):
Nektar::Array<Nektar::OneD, double const>::~Array()
Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> const>::~Array()
Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> > const>::~Array()

Is there an (adhoc) way to prevent this / make them static in order to gain performance?


I am working with version 4.2.0. Attached is a simple test case for 1D Euler CFE.
However, some additional code is needed for running the CompressibleFlowSolver in 1D correctly:

*) in library/SolverUtils/RiemannSolvers/RiemannSolver::rotateToNormal and ::rotateFromNormal multiplication with the normal vector is needed:

                switch (normals.num_elements())
                {
                    case 1:
                        // instead of "do nothing"
                    {
                        const int nq = inarray[0].num_elements();
                        const int vx = (int)vecLocs[i][0];

                        Vmath::Vmul (nq, inarray [vx], 1, normals [0],  1,
                                         outarray[vx], 1);
                        break;
                    }

*) v_ReduceOrderCoeffs is needed for StdSegExp (because of CompressibleFlowSystem::GetSensor )


Best regards,

Helmut


 
Von: Sherwin, Spencer J [mailto:s.sherwin@imperial.ac.uk] 
Gesendet: Donnerstag, 21. April 2016 20:05
A
n: Kühnelt Helmut
Cc: nektar-users
Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem

 

Hi Helmut, 

 

Thanks for the email and performance details. I have to confess we have been optimising the 2D and 3D codes but not paying much attention to the 1D since it has so far been only used on small problems. I do have a project hat might start next year on using the 1D pulse wave solver so it would be good to sort out some of these issues.

 

Could I first ask is the branch you are developing on our repository, also do can  you give us an example input file so we can have a look where this feature is being called? Also  do you first require a 5th order time stepping scheme.  I am not sure what time step you are using but if you have a time step of 1e-3 a 5th order scheme implies an error of 1e-15 which would be at machine precision. I would guess you are not achieving this space accuracy currently. It is very rare that one is able to match the space and time accuracy. 

 

What seems strange/interesting about the profiling is that it is also declaring an integer array. 

 

It is difficult to comment on the MPI issues without running a test so the branch and input file are useful here.

 

Cheers,
Spencer.

 

On 21 Apr 2016, at 14:40, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at> wrote:

 

Hi Spencer, hi All,

I experience that CompressibleFlowSolver for EulerCFE in 1D is really slow. A computation on a grid with 5000 elements, with P=5 and a 5th order RK_SSP (self-implemented) needs ~1 s per time step on an up-to-date workstation (single core execution). (A calculation of 1e5 steps (1s with a time step = 1e-5s) needs 27 hours...)

The profiler shows that 68% ot the time the code spends  releasing of shared pointers, constructing and destructing Array(), lock() and unlock() and the = operator.

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 32.05      1.38     1.38    10032     0.14     0.14  boost::detail::sp_counted_base::release()
  9.56      1.79     0.41    20514     0.02     0.02  Nektar::Array<Nektar::OneD, double const>::Array(unsigned int, double const&)
  9.09      2.18     0.39  7693730     0.00     0.00  Nektar::Array<Nektar::OneD, double const>::~Array()
  8.86      2.56     0.38 137658633     0.00     0.00  boost::unique_lock<boost::mutex>::lock()
  8.04      2.90     0.35 137608137     0.00     0.00  boost::mutex::unlock()
  7.69      3.23     0.33    20528     0.02     0.02  Nektar::Array<Nektar::OneD, double const>::operator=(Nektar::Array<Nektar::OneD, double const> const&)
  7.23      3.54     0.31  2500500     0.00     0.00  Nektar::ExactSolverToro::v_PointSolve(double, double, double, double, double, double, double, double, double, double, double&, double&, double&, double&, double&)
  4.31      3.73     0.19    74253     0.00     0.01  Nektar::MemPool::Allocate(unsigned long)
  2.33      3.83     0.10                             Nektar::Array<Nektar::OneD, int const>::~Array()
...


Running CompressibleFlowSolver with MPI gives a segmentation fault somewhere in the mesh partitioning (1D mesh)

MeshPartition::MeshPartition()
MeshPartition::ReadGeometry()
0x15d70d0
0x15d71a0
0x15d71a0
0
[node92:28879] *** Process received signal ***
[node92:28879] Signal: Segmentation fault (11)
[node92:28879] Signal code: Address not mapped (1)
[node92:28879] Failing at address: 0x38
[node92:28879] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f7e7a781d40]
[node92:28879] [ 1] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartition12ReadGeometryERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x1268) [0x7f7e7be44118]
[node92:28879] [ 2] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartitionC1ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x40f) [0x7f7e7be44a5f]
[node92:28879] [ 3] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetisC2ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x17) [0x7f7e7be519f7]
[node92:28879] [ 4] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetis6createERKN5boost10shared_ptrINS0_13SessionReaderEEE+0xc5) [0x7f7e7be53505]
[node92:28879] [ 5] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities10NekFactoryISsNS0_13MeshPartitionERKN5boost10shared_ptrINS0_13SessionReaderEEENS0_4noneES9_S9_S9_E14CreateInstanceESsS8_+0x96) [0x7f7e7be77c26]
[node92:28879] [ 6] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader13PartitionMeshEv+0x3fd) [0x7f7e7be6cb5d]
[node92:28879] [ 7] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader11InitSessionEv+0x55) [0x7f7e7be6de55]
[node92:28879] [ 8] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(_ZN6Nektar12LibUtilities13SessionReader14CreateInstanceEiPPc+0x14c) [0x4396bc]
[node92:28879] [ 9] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(main+0x4d) [0x4291ed]
[node92:28879] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7e7a76cec5]
[node92:28879] [11] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver() [0x430b99]
[node92:28879] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 28879 on node node92 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------


Do you have any advice how to speed up the code and to fix the MPI issue?

Best regards,
Helmut

 

________________________________________
Helmut Kühnelt 
Scientist
Mobility Department
Electric Drive Technologies
AIT Austrian Institute of Technology GmbH
Giefinggasse 2 | 1210 Vienna | Austria
T: +43 50550-6245 | M: +43 664 815 78 38 | F: +43 50550-6595 

helmut.kuehnelt@ait.ac.at | http://www.ait.ac.at/ 
 
FN: 115980 i HG Wien | UID: ATU14703506 
This email and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient, please notify the sender by return e-mail or by telephone and delete this message from your system and any printout thereof. Any unauthorized use, reproduction, or dissemination of this message is strictly prohibited. Please note that e-mails are susceptible to change. AIT Austrian Institute of Technology GmbH shall not be liable for the improper or incomplete transmission of the information contained in this communication, nor shall it be liable for any delay in its receipt.
_______________________________________________
Nektar-users mailing list
Nektar-users@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/nektar-users

 

Spencer  Sherwin
McLaren Racing/Royal Academy of Engineering Research Chair, 
Professor of Computational Fluid Mechanics,
Department of Aeronautics,
Imperial College London
South Kensington Campus
London SW7 2AZ

 

+44 (0) 20 759 45052

 

<Test_1D.xml>

Spencer  Sherwin
McLaren Racing/Royal Academy of Engineering Research Chair, 
Professor of Computational Fluid Mechanics,
Department of Aeronautics,
Imperial College London
South Kensington Campus
London SW7 2AZ

+44 (0) 20 759 45052