EulerCFE - performance issue & MPI problem
Hi Spencer, hi All, I experience that CompressibleFlowSolver for EulerCFE in 1D is really slow. A computation on a grid with 5000 elements, with P=5 and a 5th order RK_SSP (self-implemented) needs ~1 s per time step on an up-to-date workstation (single core execution). (A calculation of 1e5 steps (1s with a time step = 1e-5s) needs 27 hours...) The profiler shows that 68% ot the time the code spends releasing of shared pointers, constructing and destructing Array(), lock() and unlock() and the = operator. Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 32.05 1.38 1.38 10032 0.14 0.14 boost::detail::sp_counted_base::release() 9.56 1.79 0.41 20514 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::Array(unsigned int, double const&) 9.09 2.18 0.39 7693730 0.00 0.00 Nektar::Array<Nektar::OneD, double const>::~Array() 8.86 2.56 0.38 137658633 0.00 0.00 boost::unique_lock<boost::mutex>::lock() 8.04 2.90 0.35 137608137 0.00 0.00 boost::mutex::unlock() 7.69 3.23 0.33 20528 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::operator=(Nektar::Array<Nektar::OneD, double const> const&) 7.23 3.54 0.31 2500500 0.00 0.00 Nektar::ExactSolverToro::v_PointSolve(double, double, double, double, double, double, double, double, double, double, double&, double&, double&, double&, double&) 4.31 3.73 0.19 74253 0.00 0.01 Nektar::MemPool::Allocate(unsigned long) 2.33 3.83 0.10 Nektar::Array<Nektar::OneD, int const>::~Array() ... Running CompressibleFlowSolver with MPI gives a segmentation fault somewhere in the mesh partitioning (1D mesh) MeshPartition::MeshPartition() MeshPartition::ReadGeometry() 0x15d70d0 0x15d71a0 0x15d71a0 0 [node92:28879] *** Process received signal *** [node92:28879] Signal: Segmentation fault (11) [node92:28879] Signal code: Address not mapped (1) [node92:28879] Failing at address: 0x38 [node92:28879] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f7e7a781d40] [node92:28879] [ 1] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartition12ReadGeometryERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x1268) [0x7f7e7be44118] [node92:28879] [ 2] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartitionC1ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x40f) [0x7f7e7be44a5f] [node92:28879] [ 3] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetisC2ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x17) [0x7f7e7be519f7] [node92:28879] [ 4] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetis6createERKN5boost10shared_ptrINS0_13SessionReaderEEE+0xc5) [0x7f7e7be53505] [node92:28879] [ 5] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities10NekFactoryISsNS0_13MeshPartitionERKN5boost10shared_ptrINS0_13SessionReaderEEENS0_4noneES9_S9_S9_E14CreateInstanceESsS8_+0x96) [0x7f7e7be77c26] [node92:28879] [ 6] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader13PartitionMeshEv+0x3fd) [0x7f7e7be6cb5d] [node92:28879] [ 7] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader11InitSessionEv+0x55) [0x7f7e7be6de55] [node92:28879] [ 8] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(_ZN6Nektar12LibUtilities13SessionReader14CreateInstanceEiPPc+0x14c) [0x4396bc] [node92:28879] [ 9] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(main+0x4d) [0x4291ed] [node92:28879] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7e7a76cec5] [node92:28879] [11] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver() [0x430b99] [node92:28879] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 28879 on node node92 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Do you have any advice how to speed up the code and to fix the MPI issue? Best regards, Helmut ________________________________________ Helmut Kühnelt Scientist Mobility Department Electric Drive Technologies AIT Austrian Institute of Technology GmbH Giefinggasse 2 | 1210 Vienna | Austria T: +43 50550-6245 | M: +43 664 815 78 38 | F: +43 50550-6595 helmut.kuehnelt@ait.ac.at<https://aitmail.ait.ac.at/owa/redir.aspx?C=LR7CP_NRwEmvpxwWXICJVa6k5FhGic8I_hTs-2rPs4nu1K4MZP-JeHiVYewuzrvNeyP-rUghmyM.&URL=mailto%3ahelmut.kuehnelt%40ait.ac.at> | http://www.ait.ac.at/<https://aitmail.ait.ac.at/owa/redir.aspx?C=LR7CP_NRwEmvpxwWXICJVa6k5FhGic8I_hTs-2rPs4nu1K4MZP-JeHiVYewuzrvNeyP-rUghmyM.&URL=http%3a%2f%2fwww.ait.ac.at%2f> FN: 115980 i HG Wien | UID: ATU14703506 This email and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient, please notify the sender by return e-mail or by telephone and delete this message from your system and any printout thereof. Any unauthorized use, reproduction, or dissemination of this message is strictly prohibited. Please note that e-mails are susceptible to change. AIT Austrian Institute of Technology GmbH shall not be liable for the improper or incomplete transmission of the information contained in this communication, nor shall it be liable for any delay in its receipt.
Hi Helmut, Thanks for the email and performance details. I have to confess we have been optimising the 2D and 3D codes but not paying much attention to the 1D since it has so far been only used on small problems. I do have a project hat might start next year on using the 1D pulse wave solver so it would be good to sort out some of these issues. Could I first ask is the branch you are developing on our repository, also do can you give us an example input file so we can have a look where this feature is being called? Also do you first require a 5th order time stepping scheme. I am not sure what time step you are using but if you have a time step of 1e-3 a 5th order scheme implies an error of 1e-15 which would be at machine precision. I would guess you are not achieving this space accuracy currently. It is very rare that one is able to match the space and time accuracy. What seems strange/interesting about the profiling is that it is also declaring an integer array. It is difficult to comment on the MPI issues without running a test so the branch and input file are useful here. Cheers, Spencer. On 21 Apr 2016, at 14:40, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<mailto:Helmut.Kuehnelt@ait.ac.at>> wrote: Hi Spencer, hi All, I experience that CompressibleFlowSolver for EulerCFE in 1D is really slow. A computation on a grid with 5000 elements, with P=5 and a 5th order RK_SSP (self-implemented) needs ~1 s per time step on an up-to-date workstation (single core execution). (A calculation of 1e5 steps (1s with a time step = 1e-5s) needs 27 hours...) The profiler shows that 68% ot the time the code spends releasing of shared pointers, constructing and destructing Array(), lock() and unlock() and the = operator. Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 32.05 1.38 1.38 10032 0.14 0.14 boost::detail::sp_counted_base::release() 9.56 1.79 0.41 20514 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::Array(unsigned int, double const&) 9.09 2.18 0.39 7693730 0.00 0.00 Nektar::Array<Nektar::OneD, double const>::~Array() 8.86 2.56 0.38 137658633 0.00 0.00 boost::unique_lock<boost::mutex>::lock() 8.04 2.90 0.35 137608137 0.00 0.00 boost::mutex::unlock() 7.69 3.23 0.33 20528 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::operator=(Nektar::Array<Nektar::OneD, double const> const&) 7.23 3.54 0.31 2500500 0.00 0.00 Nektar::ExactSolverToro::v_PointSolve(double, double, double, double, double, double, double, double, double, double, double&, double&, double&, double&, double&) 4.31 3.73 0.19 74253 0.00 0.01 Nektar::MemPool::Allocate(unsigned long) 2.33 3.83 0.10 Nektar::Array<Nektar::OneD, int const>::~Array() ... Running CompressibleFlowSolver with MPI gives a segmentation fault somewhere in the mesh partitioning (1D mesh) MeshPartition::MeshPartition() MeshPartition::ReadGeometry() 0x15d70d0 0x15d71a0 0x15d71a0 0 [node92:28879] *** Process received signal *** [node92:28879] Signal: Segmentation fault (11) [node92:28879] Signal code: Address not mapped (1) [node92:28879] Failing at address: 0x38 [node92:28879] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f7e7a781d40] [node92:28879] [ 1] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartition12ReadGeometryERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x1268) [0x7f7e7be44118] [node92:28879] [ 2] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartitionC1ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x40f) [0x7f7e7be44a5f] [node92:28879] [ 3] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetisC2ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x17) [0x7f7e7be519f7] [node92:28879] [ 4] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetis6createERKN5boost10shared_ptrINS0_13SessionReaderEEE+0xc5) [0x7f7e7be53505] [node92:28879] [ 5] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities10NekFactoryISsNS0_13MeshPartitionERKN5boost10shared_ptrINS0_13SessionReaderEEENS0_4noneES9_S9_S9_E14CreateInstanceESsS8_+0x96) [0x7f7e7be77c26] [node92:28879] [ 6] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader13PartitionMeshEv+0x3fd) [0x7f7e7be6cb5d] [node92:28879] [ 7] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader11InitSessionEv+0x55) [0x7f7e7be6de55] [node92:28879] [ 8] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(_ZN6Nektar12LibUtilities13SessionReader14CreateInstanceEiPPc+0x14c) [0x4396bc] [node92:28879] [ 9] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(main+0x4d) [0x4291ed] [node92:28879] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7e7a76cec5] [node92:28879] [11] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver() [0x430b99] [node92:28879] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 28879 on node node92 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Do you have any advice how to speed up the code and to fix the MPI issue? Best regards, Helmut ________________________________________ Helmut Kühnelt Scientist Mobility Department Electric Drive Technologies AIT Austrian Institute of Technology GmbH Giefinggasse 2 | 1210 Vienna | Austria T: +43 50550-6245 | M: +43 664 815 78 38 | F: +43 50550-6595 helmut.kuehnelt@ait.ac.at<https://aitmail.ait.ac.at/owa/redir.aspx?C=LR7CP_NRwEmvpxwWXICJVa6k5FhGic8I_hTs-2rPs4nu1K4MZP-JeHiVYewuzrvNeyP-rUghmyM.&URL=mailto%3ahelmut.kuehnelt%40ait.ac.at> | http://www.ait.ac.at/<https://aitmail.ait.ac.at/owa/redir.aspx?C=LR7CP_NRwEmvpxwWXICJVa6k5FhGic8I_hTs-2rPs4nu1K4MZP-JeHiVYewuzrvNeyP-rUghmyM.&URL=http%3a%2f%2fwww.ait.ac.at%2f> FN: 115980 i HG Wien | UID: ATU14703506 This email and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient, please notify the sender by return e-mail or by telephone and delete this message from your system and any printout thereof. Any unauthorized use, reproduction, or dissemination of this message is strictly prohibited. Please note that e-mails are susceptible to change. AIT Austrian Institute of Technology GmbH shall not be liable for the improper or incomplete transmission of the information contained in this communication, nor shall it be liable for any delay in its receipt. _______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk<mailto:Nektar-users@imperial.ac.uk> https://mailman.ic.ac.uk/mailman/listinfo/nektar-users Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk> +44 (0) 20 759 45052
Hi Spencer, Thanks for your reply. I agree, a 5th order time stepping is not necessary. I did some more profiling. Substantial time seems to be spent with the construction and deconstruction of (multidimensional) arrays at every call of the respective function (DoOdeRhs, Advect, GetSourceTerm, etc): Nektar::Array<Nektar::OneD, double const>::~Array() Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> const>::~Array() Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> > const>::~Array() Is there an (adhoc) way to prevent this / make them static in order to gain performance? I am working with version 4.2.0. Attached is a simple test case for 1D Euler CFE. However, some additional code is needed for running the CompressibleFlowSolver in 1D correctly: *) in library/SolverUtils/RiemannSolvers/RiemannSolver::rotateToNormal and ::rotateFromNormal multiplication with the normal vector is needed: switch (normals.num_elements()) { case 1: // instead of "do nothing" { const int nq = inarray[0].num_elements(); const int vx = (int)vecLocs[i][0]; Vmath::Vmul (nq, inarray [vx], 1, normals [0], 1, outarray[vx], 1); break; } *) v_ReduceOrderCoeffs is needed for StdSegExp (because of CompressibleFlowSystem::GetSensor ) Best regards, Helmut Von: Sherwin, Spencer J [mailto:s.sherwin@imperial.ac.uk] Gesendet: Donnerstag, 21. April 2016 20:05 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem Hi Helmut, Thanks for the email and performance details. I have to confess we have been optimising the 2D and 3D codes but not paying much attention to the 1D since it has so far been only used on small problems. I do have a project hat might start next year on using the 1D pulse wave solver so it would be good to sort out some of these issues. Could I first ask is the branch you are developing on our repository, also do can you give us an example input file so we can have a look where this feature is being called? Also do you first require a 5th order time stepping scheme. I am not sure what time step you are using but if you have a time step of 1e-3 a 5th order scheme implies an error of 1e-15 which would be at machine precision. I would guess you are not achieving this space accuracy currently. It is very rare that one is able to match the space and time accuracy. What seems strange/interesting about the profiling is that it is also declaring an integer array. It is difficult to comment on the MPI issues without running a test so the branch and input file are useful here. Cheers, Spencer. On 21 Apr 2016, at 14:40, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<redir.aspx?REF=BFxVfx3aQg8yds669LtXvMjNsczHP6QCwctOMZ0VJaIGazU5H3TTCAFtYWlsdG86SGVsbXV0Lkt1ZWhuZWx0QGFpdC5hYy5hdA..>> wrote: Hi Spencer, hi All, I experience that CompressibleFlowSolver for EulerCFE in 1D is really slow. A computation on a grid with 5000 elements, with P=5 and a 5th order RK_SSP (self-implemented) needs ~1 s per time step on an up-to-date workstation (single core execution). (A calculation of 1e5 steps (1s with a time step = 1e-5s) needs 27 hours...) The profiler shows that 68% ot the time the code spends releasing of shared pointers, constructing and destructing Array(), lock() and unlock() and the = operator. Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 32.05 1.38 1.38 10032 0.14 0.14 boost::detail::sp_counted_base::release() 9.56 1.79 0.41 20514 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::Array(unsigned int, double const&) 9.09 2.18 0.39 7693730 0.00 0.00 Nektar::Array<Nektar::OneD, double const>::~Array() 8.86 2.56 0.38 137658633 0.00 0.00 boost::unique_lock<boost::mutex>::lock() 8.04 2.90 0.35 137608137 0.00 0.00 boost::mutex::unlock() 7.69 3.23 0.33 20528 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::operator=(Nektar::Array<Nektar::OneD, double const> const&) 7.23 3.54 0.31 2500500 0.00 0.00 Nektar::ExactSolverToro::v_PointSolve(double, double, double, double, double, double, double, double, double, double, double&, double&, double&, double&, double&) 4.31 3.73 0.19 74253 0.00 0.01 Nektar::MemPool::Allocate(unsigned long) 2.33 3.83 0.10 Nektar::Array<Nektar::OneD, int const>::~Array() ... Running CompressibleFlowSolver with MPI gives a segmentation fault somewhere in the mesh partitioning (1D mesh) MeshPartition::MeshPartition() MeshPartition::ReadGeometry() 0x15d70d0 0x15d71a0 0x15d71a0 0 [node92:28879] *** Process received signal *** [node92:28879] Signal: Segmentation fault (11) [node92:28879] Signal code: Address not mapped (1) [node92:28879] Failing at address: 0x38 [node92:28879] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f7e7a781d40] [node92:28879] [ 1] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartition12ReadGeometryERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x1268) [0x7f7e7be44118] [node92:28879] [ 2] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartitionC1ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x40f) [0x7f7e7be44a5f] [node92:28879] [ 3] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetisC2ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x17) [0x7f7e7be519f7] [node92:28879] [ 4] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetis6createERKN5boost10shared_ptrINS0_13SessionReaderEEE+0xc5) [0x7f7e7be53505] [node92:28879] [ 5] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities10NekFactoryISsNS0_13MeshPartitionERKN5boost10shared_ptrINS0_13SessionReaderEEENS0_4noneES9_S9_S9_E14CreateInstanceESsS8_+0x96) [0x7f7e7be77c26] [node92:28879] [ 6] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader13PartitionMeshEv+0x3fd) [0x7f7e7be6cb5d] [node92:28879] [ 7] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader11InitSessionEv+0x55) [0x7f7e7be6de55] [node92:28879] [ 8] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(_ZN6Nektar12LibUtilities13SessionReader14CreateInstanceEiPPc+0x14c) [0x4396bc] [node92:28879] [ 9] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(main+0x4d) [0x4291ed] [node92:28879] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7e7a76cec5] [node92:28879] [11] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver() [0x430b99] [node92:28879] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 28879 on node node92 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Do you have any advice how to speed up the code and to fix the MPI issue? Best regards, Helmut ________________________________________ Helmut Kühnelt Scientist Mobility Department Electric Drive Technologies AIT Austrian Institute of Technology GmbH Giefinggasse 2 | 1210 Vienna | Austria T: +43 50550-6245 | M: +43 664 815 78 38 | F: +43 50550-6595 helmut.kuehnelt@ait.ac.at<redir.aspx?REF=gFCnNX2Vs5oGSTAgZh5mJuddgmyckNA1gurGImqFHPJnzDc5H3TTCAFodHRwczovL2FpdG1haWwuYWl0LmFjLmF0L293YS9yZWRpci5hc3B4P0M9TFI3Q1BfTlJ3RW12cHh3V1hJQ0pWYTZrNUZoR2ljOElfaFRzLTJyUHM0bnUxSzRNWlAtSmVIaVZZZXd1enJ2TmV5UC1yVWdobXlNLiZVUkw9bWFpbHRvJTNhaGVsbXV0Lmt1ZWhuZWx0JTQwYWl0LmFjLmF0> | http://www.ait.ac.at/<redir.aspx?REF=jucUEm952cET9q9jcpvnn2rilkLdKnspYxsq8p9bmCRnzDc5H3TTCAFodHRwczovL2FpdG1haWwuYWl0LmFjLmF0L293YS9yZWRpci5hc3B4P0M9TFI3Q1BfTlJ3RW12cHh3V1hJQ0pWYTZrNUZoR2ljOElfaFRzLTJyUHM0bnUxSzRNWlAtSmVIaVZZZXd1enJ2TmV5UC1yVWdobXlNLiZVUkw9aHR0cCUzYSUyZiUyZnd3dy5haXQuYWMuYXQlMmY.> FN: 115980 i HG Wien | UID: ATU14703506 This email and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient, please notify the sender by return e-mail or by telephone and delete this message from your system and any printout thereof. Any unauthorized use, reproduction, or dissemination of this message is strictly prohibited. Please note that e-mails are susceptible to change. AIT Austrian Institute of Technology GmbH shall not be liable for the improper or incomplete transmission of the information contained in this communication, nor shall it be liable for any delay in its receipt. _______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk<redir.aspx?REF=iKJh9fJeFrmdBjykC0Fs3Fn1mzr4ngXuwhGcju8HD01nzDc5H3TTCAFtYWlsdG86TmVrdGFyLXVzZXJzQGltcGVyaWFsLmFjLnVr> https://mailman.ic.ac.uk/mailman/listinfo/nektar-users<redir.aspx?REF=zG2DnPAlSVkK9hc4XONPBYkCrODLI4f6rcTJ5gb20AFnzDc5H3TTCAFodHRwczovL21haWxtYW4uaWMuYWMudWsvbWFpbG1hbi9saXN0aW5mby9uZWt0YXItdXNlcnM.> Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<redir.aspx?REF=lwXnXOnHhoZbVdXNDpX6NJW2ZebgeTGQWb6NjgoyVW5nzDc5H3TTCAFtYWlsdG86cy5zaGVyd2luQGltcGVyaWFsLmFjLnVr> +44 (0) 20 759 45052
Hi Helmut, I was just trying to repeat your experience. I ran your Test_1D.xml with an expansion order 5. This mesh has one 20 elements but for 100 000 steps it takes 44 seconds. Scaling this to 5000 element would seem to suggest a run time of 3 hours. This seems to be a lot faster than the 27 hours you mentioned. This is on a Intel Xeon E5/Core i7. Will next look to see whether my guess on the Time integration is the source of your profiling challenges. Cheers, Spencer. On 4 May 2016, at 15:14, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<mailto:Helmut.Kuehnelt@ait.ac.at>> wrote: Hi Spencer, Thanks for your reply. I agree, a 5th order time stepping is not necessary. I did some more profiling. Substantial time seems to be spent with the construction and deconstruction of (multidimensional) arrays at every call of the respective function (DoOdeRhs, Advect, GetSourceTerm, etc): Nektar::Array<Nektar::OneD, double const>::~Array() Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> const>::~Array() Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> > const>::~Array() Is there an (adhoc) way to prevent this / make them static in order to gain performance? I am working with version 4.2.0. Attached is a simple test case for 1D Euler CFE. However, some additional code is needed for running the CompressibleFlowSolver in 1D correctly: *) in library/SolverUtils/RiemannSolvers/RiemannSolver::rotateToNormal and ::rotateFromNormal multiplication with the normal vector is needed: switch (normals.num_elements()) { case 1: // instead of "do nothing" { const int nq = inarray[0].num_elements(); const int vx = (int)vecLocs[i][0]; Vmath::Vmul (nq, inarray [vx], 1, normals [0], 1, outarray[vx], 1); break; } *) v_ReduceOrderCoeffs is needed for StdSegExp (because of CompressibleFlowSystem::GetSensor ) Best regards, Helmut Von: Sherwin, Spencer J [mailto:s.sherwin@imperial.ac.uk] Gesendet: Donnerstag, 21. April 2016 20:05 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem Hi Helmut, Thanks for the email and performance details. I have to confess we have been optimising the 2D and 3D codes but not paying much attention to the 1D since it has so far been only used on small problems. I do have a project hat might start next year on using the 1D pulse wave solver so it would be good to sort out some of these issues. Could I first ask is the branch you are developing on our repository, also do can you give us an example input file so we can have a look where this feature is being called? Also do you first require a 5th order time stepping scheme. I am not sure what time step you are using but if you have a time step of 1e-3 a 5th order scheme implies an error of 1e-15 which would be at machine precision. I would guess you are not achieving this space accuracy currently. It is very rare that one is able to match the space and time accuracy. What seems strange/interesting about the profiling is that it is also declaring an integer array. It is difficult to comment on the MPI issues without running a test so the branch and input file are useful here. Cheers, Spencer. On 21 Apr 2016, at 14:40, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<x-msg://1251/redir.aspx?REF=BFxVfx3aQg8yds669LtXvMjNsczHP6QCwctOMZ0VJaIGazU5H3TTCAFtYWlsdG86SGVsbXV0Lkt1ZWhuZWx0QGFpdC5hYy5hdA..>> wrote: Hi Spencer, hi All, I experience that CompressibleFlowSolver for EulerCFE in 1D is really slow. A computation on a grid with 5000 elements, with P=5 and a 5th order RK_SSP (self-implemented) needs ~1 s per time step on an up-to-date workstation (single core execution). (A calculation of 1e5 steps (1s with a time step = 1e-5s) needs 27 hours...) The profiler shows that 68% ot the time the code spends releasing of shared pointers, constructing and destructing Array(), lock() and unlock() and the = operator. Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 32.05 1.38 1.38 10032 0.14 0.14 boost::detail::sp_counted_base::release() 9.56 1.79 0.41 20514 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::Array(unsigned int, double const&) 9.09 2.18 0.39 7693730 0.00 0.00 Nektar::Array<Nektar::OneD, double const>::~Array() 8.86 2.56 0.38 137658633 0.00 0.00 boost::unique_lock<boost::mutex>::lock() 8.04 2.90 0.35 137608137 0.00 0.00 boost::mutex::unlock() 7.69 3.23 0.33 20528 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::operator=(Nektar::Array<Nektar::OneD, double const> const&) 7.23 3.54 0.31 2500500 0.00 0.00 Nektar::ExactSolverToro::v_PointSolve(double, double, double, double, double, double, double, double, double, double, double&, double&, double&, double&, double&) 4.31 3.73 0.19 74253 0.00 0.01 Nektar::MemPool::Allocate(unsigned long) 2.33 3.83 0.10 Nektar::Array<Nektar::OneD, int const>::~Array() ... Running CompressibleFlowSolver with MPI gives a segmentation fault somewhere in the mesh partitioning (1D mesh) MeshPartition::MeshPartition() MeshPartition::ReadGeometry() 0x15d70d0 0x15d71a0 0x15d71a0 0 [node92:28879] *** Process received signal *** [node92:28879] Signal: Segmentation fault (11) [node92:28879] Signal code: Address not mapped (1) [node92:28879] Failing at address: 0x38 [node92:28879] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f7e7a781d40] [node92:28879] [ 1] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartition12ReadGeometryERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x1268) [0x7f7e7be44118] [node92:28879] [ 2] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartitionC1ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x40f) [0x7f7e7be44a5f] [node92:28879] [ 3] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetisC2ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x17) [0x7f7e7be519f7] [node92:28879] [ 4] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetis6createERKN5boost10shared_ptrINS0_13SessionReaderEEE+0xc5) [0x7f7e7be53505] [node92:28879] [ 5] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities10NekFactoryISsNS0_13MeshPartitionERKN5boost10shared_ptrINS0_13SessionReaderEEENS0_4noneES9_S9_S9_E14CreateInstanceESsS8_+0x96) [0x7f7e7be77c26] [node92:28879] [ 6] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader13PartitionMeshEv+0x3fd) [0x7f7e7be6cb5d] [node92:28879] [ 7] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader11InitSessionEv+0x55) [0x7f7e7be6de55] [node92:28879] [ 8] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(_ZN6Nektar12LibUtilities13SessionReader14CreateInstanceEiPPc+0x14c) [0x4396bc] [node92:28879] [ 9] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(main+0x4d) [0x4291ed] [node92:28879] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7e7a76cec5] [node92:28879] [11] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver() [0x430b99] [node92:28879] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 28879 on node node92 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Do you have any advice how to speed up the code and to fix the MPI issue? Best regards, Helmut ________________________________________ Helmut Kühnelt Scientist Mobility Department Electric Drive Technologies AIT Austrian Institute of Technology GmbH Giefinggasse 2 | 1210 Vienna | Austria T: +43 50550-6245 | M: +43 664 815 78 38 | F: +43 50550-6595 helmut.kuehnelt@ait.ac.at<x-msg://1251/redir.aspx?REF=gFCnNX2Vs5oGSTAgZh5mJuddgmyckNA1gurGImqFHPJnzDc5H3TTCAFodHRwczovL2FpdG1haWwuYWl0LmFjLmF0L293YS9yZWRpci5hc3B4P0M9TFI3Q1BfTlJ3RW12cHh3V1hJQ0pWYTZrNUZoR2ljOElfaFRzLTJyUHM0bnUxSzRNWlAtSmVIaVZZZXd1enJ2TmV5UC1yVWdobXlNLiZVUkw9bWFpbHRvJTNhaGVsbXV0Lmt1ZWhuZWx0JTQwYWl0LmFjLmF0> | http://www.ait.ac.at/<x-msg://1251/redir.aspx?REF=jucUEm952cET9q9jcpvnn2rilkLdKnspYxsq8p9bmCRnzDc5H3TTCAFodHRwczovL2FpdG1haWwuYWl0LmFjLmF0L293YS9yZWRpci5hc3B4P0M9TFI3Q1BfTlJ3RW12cHh3V1hJQ0pWYTZrNUZoR2ljOElfaFRzLTJyUHM0bnUxSzRNWlAtSmVIaVZZZXd1enJ2TmV5UC1yVWdobXlNLiZVUkw9aHR0cCUzYSUyZiUyZnd3dy5haXQuYWMuYXQlMmY.> FN: 115980 i HG Wien | UID: ATU14703506 This email and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient, please notify the sender by return e-mail or by telephone and delete this message from your system and any printout thereof. Any unauthorized use, reproduction, or dissemination of this message is strictly prohibited. Please note that e-mails are susceptible to change. AIT Austrian Institute of Technology GmbH shall not be liable for the improper or incomplete transmission of the information contained in this communication, nor shall it be liable for any delay in its receipt. _______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk<x-msg://1251/redir.aspx?REF=iKJh9fJeFrmdBjykC0Fs3Fn1mzr4ngXuwhGcju8HD01nzDc5H3TTCAFtYWlsdG86TmVrdGFyLXVzZXJzQGltcGVyaWFsLmFjLnVr> https://mailman.ic.ac.uk/mailman/listinfo/nektar-users<x-msg://1251/redir.aspx?REF=zG2DnPAlSVkK9hc4XONPBYkCrODLI4f6rcTJ5gb20AFnzDc5H3TTCAFodHRwczovL21haWxtYW4uaWMuYWMudWsvbWFpbG1hbi9saXN0aW5mby9uZWt0YXItdXNlcnM.> Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<x-msg://1251/redir.aspx?REF=lwXnXOnHhoZbVdXNDpX6NJW2ZebgeTGQWb6NjgoyVW5nzDc5H3TTCAFtYWlsdG86cy5zaGVyd2luQGltcGVyaWFsLmFjLnVr> +44 (0) 20 759 45052 <Test_1D.xml> Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk> +44 (0) 20 759 45052
PS I was running the feature/Euler1D branch from the current master. On 16 May 2016, at 23:22, Sherwin, Spencer J <s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk>> wrote: Hi Helmut, I was just trying to repeat your experience. I ran your Test_1D.xml with an expansion order 5. This mesh has one 20 elements but for 100 000 steps it takes 44 seconds. Scaling this to 5000 element would seem to suggest a run time of 3 hours. This seems to be a lot faster than the 27 hours you mentioned. This is on a Intel Xeon E5/Core i7. Will next look to see whether my guess on the Time integration is the source of your profiling challenges. Cheers, Spencer. On 4 May 2016, at 15:14, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<mailto:Helmut.Kuehnelt@ait.ac.at>> wrote: Hi Spencer, Thanks for your reply. I agree, a 5th order time stepping is not necessary. I did some more profiling. Substantial time seems to be spent with the construction and deconstruction of (multidimensional) arrays at every call of the respective function (DoOdeRhs, Advect, GetSourceTerm, etc): Nektar::Array<Nektar::OneD, double const>::~Array() Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> const>::~Array() Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> > const>::~Array() Is there an (adhoc) way to prevent this / make them static in order to gain performance? I am working with version 4.2.0. Attached is a simple test case for 1D Euler CFE. However, some additional code is needed for running the CompressibleFlowSolver in 1D correctly: *) in library/SolverUtils/RiemannSolvers/RiemannSolver::rotateToNormal and ::rotateFromNormal multiplication with the normal vector is needed: switch (normals.num_elements()) { case 1: // instead of "do nothing" { const int nq = inarray[0].num_elements(); const int vx = (int)vecLocs[i][0]; Vmath::Vmul (nq, inarray [vx], 1, normals [0], 1, outarray[vx], 1); break; } *) v_ReduceOrderCoeffs is needed for StdSegExp (because of CompressibleFlowSystem::GetSensor ) Best regards, Helmut Von: Sherwin, Spencer J [mailto:s.sherwin@imperial.ac.uk] Gesendet: Donnerstag, 21. April 2016 20:05 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem Hi Helmut, Thanks for the email and performance details. I have to confess we have been optimising the 2D and 3D codes but not paying much attention to the 1D since it has so far been only used on small problems. I do have a project hat might start next year on using the 1D pulse wave solver so it would be good to sort out some of these issues. Could I first ask is the branch you are developing on our repository, also do can you give us an example input file so we can have a look where this feature is being called? Also do you first require a 5th order time stepping scheme. I am not sure what time step you are using but if you have a time step of 1e-3 a 5th order scheme implies an error of 1e-15 which would be at machine precision. I would guess you are not achieving this space accuracy currently. It is very rare that one is able to match the space and time accuracy. What seems strange/interesting about the profiling is that it is also declaring an integer array. It is difficult to comment on the MPI issues without running a test so the branch and input file are useful here. Cheers, Spencer. On 21 Apr 2016, at 14:40, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<x-msg://1251/redir.aspx?REF=BFxVfx3aQg8yds669LtXvMjNsczHP6QCwctOMZ0VJaIGazU5H3TTCAFtYWlsdG86SGVsbXV0Lkt1ZWhuZWx0QGFpdC5hYy5hdA..>> wrote: Hi Spencer, hi All, I experience that CompressibleFlowSolver for EulerCFE in 1D is really slow. A computation on a grid with 5000 elements, with P=5 and a 5th order RK_SSP (self-implemented) needs ~1 s per time step on an up-to-date workstation (single core execution). (A calculation of 1e5 steps (1s with a time step = 1e-5s) needs 27 hours...) The profiler shows that 68% ot the time the code spends releasing of shared pointers, constructing and destructing Array(), lock() and unlock() and the = operator. Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 32.05 1.38 1.38 10032 0.14 0.14 boost::detail::sp_counted_base::release() 9.56 1.79 0.41 20514 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::Array(unsigned int, double const&) 9.09 2.18 0.39 7693730 0.00 0.00 Nektar::Array<Nektar::OneD, double const>::~Array() 8.86 2.56 0.38 137658633 0.00 0.00 boost::unique_lock<boost::mutex>::lock() 8.04 2.90 0.35 137608137 0.00 0.00 boost::mutex::unlock() 7.69 3.23 0.33 20528 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::operator=(Nektar::Array<Nektar::OneD, double const> const&) 7.23 3.54 0.31 2500500 0.00 0.00 Nektar::ExactSolverToro::v_PointSolve(double, double, double, double, double, double, double, double, double, double, double&, double&, double&, double&, double&) 4.31 3.73 0.19 74253 0.00 0.01 Nektar::MemPool::Allocate(unsigned long) 2.33 3.83 0.10 Nektar::Array<Nektar::OneD, int const>::~Array() ... Running CompressibleFlowSolver with MPI gives a segmentation fault somewhere in the mesh partitioning (1D mesh) MeshPartition::MeshPartition() MeshPartition::ReadGeometry() 0x15d70d0 0x15d71a0 0x15d71a0 0 [node92:28879] *** Process received signal *** [node92:28879] Signal: Segmentation fault (11) [node92:28879] Signal code: Address not mapped (1) [node92:28879] Failing at address: 0x38 [node92:28879] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f7e7a781d40] [node92:28879] [ 1] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartition12ReadGeometryERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x1268) [0x7f7e7be44118] [node92:28879] [ 2] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartitionC1ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x40f) [0x7f7e7be44a5f] [node92:28879] [ 3] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetisC2ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x17) [0x7f7e7be519f7] [node92:28879] [ 4] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetis6createERKN5boost10shared_ptrINS0_13SessionReaderEEE+0xc5) [0x7f7e7be53505] [node92:28879] [ 5] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities10NekFactoryISsNS0_13MeshPartitionERKN5boost10shared_ptrINS0_13SessionReaderEEENS0_4noneES9_S9_S9_E14CreateInstanceESsS8_+0x96) [0x7f7e7be77c26] [node92:28879] [ 6] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader13PartitionMeshEv+0x3fd) [0x7f7e7be6cb5d] [node92:28879] [ 7] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader11InitSessionEv+0x55) [0x7f7e7be6de55] [node92:28879] [ 8] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(_ZN6Nektar12LibUtilities13SessionReader14CreateInstanceEiPPc+0x14c) [0x4396bc] [node92:28879] [ 9] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(main+0x4d) [0x4291ed] [node92:28879] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7e7a76cec5] [node92:28879] [11] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver() [0x430b99] [node92:28879] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 28879 on node node92 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Do you have any advice how to speed up the code and to fix the MPI issue? Best regards, Helmut ________________________________________ Helmut Kühnelt Scientist Mobility Department Electric Drive Technologies AIT Austrian Institute of Technology GmbH Giefinggasse 2 | 1210 Vienna | Austria T: +43 50550-6245 | M: +43 664 815 78 38 | F: +43 50550-6595 helmut.kuehnelt@ait.ac.at<x-msg://1251/redir.aspx?REF=gFCnNX2Vs5oGSTAgZh5mJuddgmyckNA1gurGImqFHPJnzDc5H3TTCAFodHRwczovL2FpdG1haWwuYWl0LmFjLmF0L293YS9yZWRpci5hc3B4P0M9TFI3Q1BfTlJ3RW12cHh3V1hJQ0pWYTZrNUZoR2ljOElfaFRzLTJyUHM0bnUxSzRNWlAtSmVIaVZZZXd1enJ2TmV5UC1yVWdobXlNLiZVUkw9bWFpbHRvJTNhaGVsbXV0Lmt1ZWhuZWx0JTQwYWl0LmFjLmF0> | http://www.ait.ac.at/<x-msg://1251/redir.aspx?REF=jucUEm952cET9q9jcpvnn2rilkLdKnspYxsq8p9bmCRnzDc5H3TTCAFodHRwczovL2FpdG1haWwuYWl0LmFjLmF0L293YS9yZWRpci5hc3B4P0M9TFI3Q1BfTlJ3RW12cHh3V1hJQ0pWYTZrNUZoR2ljOElfaFRzLTJyUHM0bnUxSzRNWlAtSmVIaVZZZXd1enJ2TmV5UC1yVWdobXlNLiZVUkw9aHR0cCUzYSUyZiUyZnd3dy5haXQuYWMuYXQlMmY.> FN: 115980 i HG Wien | UID: ATU14703506 This email and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient, please notify the sender by return e-mail or by telephone and delete this message from your system and any printout thereof. Any unauthorized use, reproduction, or dissemination of this message is strictly prohibited. Please note that e-mails are susceptible to change. AIT Austrian Institute of Technology GmbH shall not be liable for the improper or incomplete transmission of the information contained in this communication, nor shall it be liable for any delay in its receipt. _______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk<x-msg://1251/redir.aspx?REF=iKJh9fJeFrmdBjykC0Fs3Fn1mzr4ngXuwhGcju8HD01nzDc5H3TTCAFtYWlsdG86TmVrdGFyLXVzZXJzQGltcGVyaWFsLmFjLnVr> https://mailman.ic.ac.uk/mailman/listinfo/nektar-users<x-msg://1251/redir.aspx?REF=zG2DnPAlSVkK9hc4XONPBYkCrODLI4f6rcTJ5gb20AFnzDc5H3TTCAFodHRwczovL21haWxtYW4uaWMuYWMudWsvbWFpbG1hbi9saXN0aW5mby9uZWt0YXItdXNlcnM.> Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<x-msg://1251/redir.aspx?REF=lwXnXOnHhoZbVdXNDpX6NJW2ZebgeTGQWb6NjgoyVW5nzDc5H3TTCAFtYWlsdG86cy5zaGVyd2luQGltcGVyaWFsLmFjLnVr> +44 (0) 20 759 45052 <Test_1D.xml> Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk> +44 (0) 20 759 45052 _______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk<mailto:Nektar-users@imperial.ac.uk> https://mailman.ic.ac.uk/mailman/listinfo/nektar-users Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk> +44 (0) 20 759 45052
Hi Spencer, Your results really puzzle me. Today I ran the Test_1D.xml with a fresh install of the latest version from git on my private laptop. Here are my settings and timings: System: CPU: Intel® Core™ i5-2520M Processor @ 2.50GHz (2 cores w hyper threading) Linux Mint 13 with Kernel 3.5.0.54.59 (Quantal) library version gcc 4.6.3 BLAS libblas3gf 1.2.20110419-2ubuntu1 LAPACK liblapack3gf 3.3.1-1 Boost THIRDPARTY_BUILD_BOOST: ON 1.57.0 in (2) and (3) openblas, version 0.2.18, compiled from source Test case Test1D.xml 20 elements expansion order 5 Runge-Kutta 4th order 1e5 time steps (1) compiled as RELEASE with default settings (no profiling) runs on 2 cores/threads @ 2.5 GHz Total Computation Time = 400s est. calc time for 5000 elements & 1e5 time steps : 27.8 h (2) linking with openblas, version 0.2.18, compiled from source runs on 1 core @ 2.5 GHz Total Computation Time = 180s est. calc time for 5000 elements & 1e5 time steps : 12.5 h (9.8h @ 3.5 GHz) (3) clean build with -march=native & linking with openblas runs on 1 core @ 2.5 GHz Total Computation Time = 192s (even a little slower) My fastest results are still a factor 4 (or 3 when scaled to 3.5 GHz) away from yours. Do you have any clue what the reason could be - outdated libraries, inappropriate compile settings? Could you tell me the details of your system and built. Best regards, Helmut ________________________________ Von: Sherwin, Spencer J [s.sherwin@imperial.ac.uk] Gesendet: Dienstag, 17. Mai 2016 00:35 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem PS I was running the feature/Euler1D branch from the current master. On 16 May 2016, at 23:22, Sherwin, Spencer J <s.sherwin@imperial.ac.uk<UrlBlockedError.aspx>> wrote: Hi Helmut, I was just trying to repeat your experience. I ran your Test_1D.xml with an expansion order 5. This mesh has one 20 elements but for 100 000 steps it takes 44 seconds. Scaling this to 5000 element would seem to suggest a run time of 3 hours. This seems to be a lot faster than the 27 hours you mentioned. This is on a Intel Xeon E5/Core i7. Will next look to see whether my guess on the Time integration is the source of your profiling challenges. Cheers, Spencer. On 4 May 2016, at 15:14, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<UrlBlockedError.aspx>> wrote: Hi Spencer, Thanks for your reply. I agree, a 5th order time stepping is not necessary. I did some more profiling. Substantial time seems to be spent with the construction and deconstruction of (multidimensional) arrays at every call of the respective function (DoOdeRhs, Advect, GetSourceTerm, etc): Nektar::Array<Nektar::OneD, double const>::~Array() Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> const>::~Array() Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> > const>::~Array() Is there an (adhoc) way to prevent this / make them static in order to gain performance? I am working with version 4.2.0. Attached is a simple test case for 1D Euler CFE. However, some additional code is needed for running the CompressibleFlowSolver in 1D correctly: *) in library/SolverUtils/RiemannSolvers/RiemannSolver::rotateToNormal and ::rotateFromNormal multiplication with the normal vector is needed: switch (normals.num_elements()) { case 1: // instead of "do nothing" { const int nq = inarray[0].num_elements(); const int vx = (int)vecLocs[i][0]; Vmath::Vmul (nq, inarray [vx], 1, normals [0], 1, outarray[vx], 1); break; } *) v_ReduceOrderCoeffs is needed for StdSegExp (because of CompressibleFlowSystem::GetSensor ) Best regards, Helmut Von: Sherwin, Spencer J [mailto:s.sherwin@imperial.ac.uk<UrlBlockedError.aspx>] Gesendet: Donnerstag, 21. April 2016 20:05 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem Hi Helmut, Thanks for the email and performance details. I have to confess we have been optimising the 2D and 3D codes but not paying much attention to the 1D since it has so far been only used on small problems. I do have a project hat might start next year on using the 1D pulse wave solver so it would be good to sort out some of these issues. Could I first ask is the branch you are developing on our repository, also do can you give us an example input file so we can have a look where this feature is being called? Also do you first require a 5th order time stepping scheme. I am not sure what time step you are using but if you have a time step of 1e-3 a 5th order scheme implies an error of 1e-15 which would be at machine precision. I would guess you are not achieving this space accuracy currently. It is very rare that one is able to match the space and time accuracy. What seems strange/interesting about the profiling is that it is also declaring an integer array. It is difficult to comment on the MPI issues without running a test so the branch and input file are useful here. Cheers, Spencer. On 21 Apr 2016, at 14:40, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<UrlBlockedError.aspx>> wrote: Hi Spencer, hi All, I experience that CompressibleFlowSolver for EulerCFE in 1D is really slow. A computation on a grid with 5000 elements, with P=5 and a 5th order RK_SSP (self-implemented) needs ~1 s per time step on an up-to-date workstation (single core execution). (A calculation of 1e5 steps (1s with a time step = 1e-5s) needs 27 hours...) The profiler shows that 68% ot the time the code spends releasing of shared pointers, constructing and destructing Array(), lock() and unlock() and the = operator. Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 32.05 1.38 1.38 10032 0.14 0.14 boost::detail::sp_counted_base::release() 9.56 1.79 0.41 20514 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::Array(unsigned int, double const&) 9.09 2.18 0.39 7693730 0.00 0.00 Nektar::Array<Nektar::OneD, double const>::~Array() 8.86 2.56 0.38 137658633 0.00 0.00 boost::unique_lock<boost::mutex>::lock() 8.04 2.90 0.35 137608137 0.00 0.00 boost::mutex::unlock() 7.69 3.23 0.33 20528 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::operator=(Nektar::Array<Nektar::OneD, double const> const&) 7.23 3.54 0.31 2500500 0.00 0.00 Nektar::ExactSolverToro::v_PointSolve(double, double, double, double, double, double, double, double, double, double, double&, double&, double&, double&, double&) 4.31 3.73 0.19 74253 0.00 0.01 Nektar::MemPool::Allocate(unsigned long) 2.33 3.83 0.10 Nektar::Array<Nektar::OneD, int const>::~Array() ... Running CompressibleFlowSolver with MPI gives a segmentation fault somewhere in the mesh partitioning (1D mesh) MeshPartition::MeshPartition() MeshPartition::ReadGeometry() 0x15d70d0 0x15d71a0 0x15d71a0 0 [node92:28879] *** Process received signal *** [node92:28879] Signal: Segmentation fault (11) [node92:28879] Signal code: Address not mapped (1) [node92:28879] Failing at address: 0x38 [node92:28879] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f7e7a781d40] [node92:28879] [ 1] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartition12ReadGeometryERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x1268) [0x7f7e7be44118] [node92:28879] [ 2] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartitionC1ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x40f) [0x7f7e7be44a5f] [node92:28879] [ 3] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetisC2ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x17) [0x7f7e7be519f7] [node92:28879] [ 4] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetis6createERKN5boost10shared_ptrINS0_13SessionReaderEEE+0xc5) [0x7f7e7be53505] [node92:28879] [ 5] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities10NekFactoryISsNS0_13MeshPartitionERKN5boost10shared_ptrINS0_13SessionReaderEEENS0_4noneES9_S9_S9_E14CreateInstanceESsS8_+0x96) [0x7f7e7be77c26] [node92:28879] [ 6] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader13PartitionMeshEv+0x3fd) [0x7f7e7be6cb5d] [node92:28879] [ 7] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader11InitSessionEv+0x55) [0x7f7e7be6de55] [node92:28879] [ 8] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(_ZN6Nektar12LibUtilities13SessionReader14CreateInstanceEiPPc+0x14c) [0x4396bc] [node92:28879] [ 9] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(main+0x4d) [0x4291ed] [node92:28879] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7e7a76cec5] [node92:28879] [11] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver() [0x430b99] [node92:28879] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 28879 on node node92 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Do you have any advice how to speed up the code and to fix the MPI issue? Best regards, Helmut ________________________________________ Helmut Kühnelt Scientist Mobility Department Electric Drive Technologies AIT Austrian Institute of Technology GmbH Giefinggasse 2 | 1210 Vienna | Austria T: +43 50550-6245 | M: +43 664 815 78 38 | F: +43 50550-6595 helmut.kuehnelt@ait.ac.at<UrlBlockedError.aspx> | http://www.ait.ac.at/<UrlBlockedError.aspx> FN: 115980 i HG Wien | UID: ATU14703506 This email and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient, please notify the sender by return e-mail or by telephone and delete this message from your system and any printout thereof. Any unauthorized use, reproduction, or dissemination of this message is strictly prohibited. Please note that e-mails are susceptible to change. AIT Austrian Institute of Technology GmbH shall not be liable for the improper or incomplete transmission of the information contained in this communication, nor shall it be liable for any delay in its receipt. _______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk<UrlBlockedError.aspx> https://mailman.ic.ac.uk/mailman/listinfo/nektar-users<UrlBlockedError.aspx> Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<UrlBlockedError.aspx> +44 (0) 20 759 45052 <Test_1D.xml> Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<UrlBlockedError.aspx> +44 (0) 20 759 45052 _______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk<UrlBlockedError.aspx> https://mailman.ic.ac.uk/mailman/listinfo/nektar-users Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<UrlBlockedError.aspx> +44 (0) 20 759 45052
Hi Helmut, I tested this case on my laptop and it ran in 35s, which is fairly close to what Spencer got. I am not sure what could be causing it to be so much slower for you, but I noticed two things that might help you in the future: - Using collections (page 207 of the user guide) helps a lot in this case. For me, it reduced the computation time to 19s (almost two times faster). - I don't know why, but using N=9 with this same mesh was only slightly slower. Therefore, you may want to consider using a higher order with fewer elements in your refined simulation. Cheers, Douglas 2016-05-17 22:01 GMT+01:00 Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at>:
Hi Spencer,
Your results really puzzle me.
Today I ran the Test_1D.xml with a fresh install of the latest version from git on my private laptop.
Here are my settings and timings:
*System*: CPU: Intel® Core™ i5-2520M Processor @ 2.50GHz (2 cores w hyper threading) Linux Mint 13 with Kernel 3.5.0.54.59 (Quantal) *library* *version* gcc 4.6.3 BLAS libblas3gf 1.2.20110419-2ubuntu1 LAPACK liblapack3gf 3.3.1-1 Boost THIRDPARTY_BUILD_BOOST: ON 1.57.0 in (2) and (3) openblas, version 0.2.18, compiled from source
* Test case* Test1D.xml 20 elements expansion order 5 Runge-Kutta 4th order 1e5 time steps
(1) compiled as RELEASE with default settings (no profiling) runs on 2 cores/threads @ 2.5 GHz Total Computation Time = 400s est. calc time for 5000 elements & 1e5 time steps : 27.8 h
(2) linking with openblas, version 0.2.18, compiled from source runs on 1 core @ 2.5 GHz Total Computation Time = 180s est. calc time for 5000 elements & 1e5 time steps : 12.5 h (9.8h @ 3.5 GHz)
(3) clean build with -march=native & linking with openblas runs on 1 core @ 2.5 GHz Total Computation Time = 192s (even a little slower)
My fastest results are still a factor 4 (or 3 when scaled to 3.5 GHz) away from yours.
Do you have any clue what the reason could be - outdated libraries, inappropriate compile settings?
Could you tell me the details of your system and built.
Best regards,
Helmut
------------------------------ *Von:* Sherwin, Spencer J [s.sherwin@imperial.ac.uk] *Gesendet:* Dienstag, 17. Mai 2016 00:35
*An:* Kühnelt Helmut *Cc:* nektar-users *Betreff:* Re: [Nektar-users] EulerCFE - performance issue & MPI problem
PS I was running the feature/Euler1D branch from the current master.
On 16 May 2016, at 23:22, Sherwin, Spencer J <s.sherwin@imperial.ac.uk <http://UrlBlockedError.aspx>> wrote:
Hi Helmut,
I was just trying to repeat your experience. I ran your Test_1D.xml with an expansion order 5. This mesh has one 20 elements but for 100 000 steps it takes 44 seconds. Scaling this to 5000 element would seem to suggest a run time of 3 hours. This seems to be a lot faster than the 27 hours you mentioned. This is on a Intel Xeon E5/Core i7.
Will next look to see whether my guess on the Time integration is the source of your profiling challenges.
Cheers, Spencer.
On 4 May 2016, at 15:14, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at <http://UrlBlockedError.aspx>> wrote:
Hi Spencer,
Thanks for your reply. I agree, a 5th order time stepping is not necessary.
I did some more profiling. Substantial time seems to be spent with the construction and deconstruction of (multidimensional) arrays at every call of the respective function (DoOdeRhs, Advect, GetSourceTerm, etc): Nektar::Array<Nektar::OneD, double const>::~Array() Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> const>::~Array() Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, Nektar::Array<Nektar::OneD, double> > const>::~Array()
Is there an (adhoc) way to prevent this / make them static in order to gain performance?
I am working with version 4.2.0. Attached is a simple test case for 1D Euler CFE. However, some additional code is needed for running the CompressibleFlowSolver in 1D correctly:
*) in library/SolverUtils/RiemannSolvers/RiemannSolver::rotateToNormal and ::rotateFromNormal multiplication with the normal vector is needed:
switch (normals.num_elements()) { case 1: // instead of "do nothing" { const int nq = inarray[0].num_elements(); const int vx = (int)vecLocs[i][0];
Vmath::Vmul (nq, inarray [vx], 1, normals [0], 1, outarray[vx], 1); break; }
*) v_ReduceOrderCoeffs is needed for StdSegExp (because of CompressibleFlowSystem::GetSensor )
Best regards,
Helmut
*Von:* Sherwin, Spencer J [mailto:s.sherwin@imperial.ac.uk <http://UrlBlockedError.aspx>] *Gesendet**:* Donnerstag, 21. April 2016 20:05 *A**n:* Kühnelt Helmut *Cc:* nektar-users *Betreff:* Re: [Nektar-users] EulerCFE - performance issue & MPI problem
Hi Helmut,
Thanks for the email and performance details. I have to confess we have been optimising the 2D and 3D codes but not paying much attention to the 1D since it has so far been only used on small problems. I do have a project hat might start next year on using the 1D pulse wave solver so it would be good to sort out some of these issues.
Could I first ask is the branch you are developing on our repository, also do can you give us an example input file so we can have a look where this feature is being called? Also do you first require a 5th order time stepping scheme. I am not sure what time step you are using but if you have a time step of 1e-3 a 5th order scheme implies an error of 1e-15 which would be at machine precision. I would guess you are not achieving this space accuracy currently. It is very rare that one is able to match the space and time accuracy.
What seems strange/interesting about the profiling is that it is also declaring an integer array.
It is difficult to comment on the MPI issues without running a test so the branch and input file are useful here.
Cheers, Spencer.
On 21 Apr 2016, at 14:40, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at <http://UrlBlockedError.aspx>> wrote:
Hi Spencer, hi All,
I experience that CompressibleFlowSolver for EulerCFE in 1D is really slow. A computation on a grid with 5000 elements, with P=5 and a 5th order RK_SSP (self-implemented) needs ~1 s per time step on an up-to-date workstation (single core execution). (A calculation of 1e5 steps (1s with a time step = 1e-5s) needs 27 hours...)
The profiler shows that 68% ot the time the code spends releasing of shared pointers, constructing and destructing Array(), lock() and unlock() and the = operator.
Flat profile:
Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 32.05 1.38 1.38 10032 0.14 0.14 boost::detail::sp_counted_base::release() 9.56 1.79 0.41 20514 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::Array(unsigned int, double const&) 9.09 2.18 0.39 7693730 0.00 0.00 Nektar::Array<Nektar::OneD, double const>::~Array() 8.86 2.56 0.38 137658633 0.00 0.00 boost::unique_lock<boost::mutex>::lock() 8.04 2.90 0.35 137608137 0.00 0.00 boost::mutex::unlock() 7.69 3.23 0.33 20528 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::operator=(Nektar::Array<Nektar::OneD, double const> const&) 7.23 3.54 0.31 2500500 0.00 0.00 Nektar::ExactSolverToro::v_PointSolve(double, double, double, double, double, double, double, double, double, double, double&, double&, double&, double&, double&) 4.31 3.73 0.19 74253 0.00 0.01 Nektar::MemPool::Allocate(unsigned long) 2.33 3.83 0.10 Nektar::Array<Nektar::OneD, int const>::~Array() ...
Running CompressibleFlowSolver with MPI gives a segmentation fault somewhere in the mesh partitioning (1D mesh)
MeshPartition::MeshPartition() MeshPartition::ReadGeometry() 0x15d70d0 0x15d71a0 0x15d71a0 0 [node92:28879] *** Process received signal *** [node92:28879] Signal: Segmentation fault (11) [node92:28879] Signal code: Address not mapped (1) [node92:28879] Failing at address: 0x38 [node92:28879] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f7e7a781d40] [node92:28879] [ 1] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartition12ReadGeometryERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x1268) [0x7f7e7be44118] [node92:28879] [ 2] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartitionC1ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x40f) [0x7f7e7be44a5f] [node92:28879] [ 3] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetisC2ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x17) [0x7f7e7be519f7] [node92:28879] [ 4] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetis6createERKN5boost10shared_ptrINS0_13SessionReaderEEE+0xc5) [0x7f7e7be53505] [node92:28879] [ 5] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities10NekFactoryISsNS0_13MeshPartitionERKN5boost10shared_ptrINS0_13SessionReaderEEENS0_4noneES9_S9_S9_E14CreateInstanceESsS8_+0x96) [0x7f7e7be77c26] [node92:28879] [ 6] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader13PartitionMeshEv+0x3fd) [0x7f7e7be6cb5d] [node92:28879] [ 7] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader11InitSessionEv+0x55) [0x7f7e7be6de55] [node92:28879] [ 8] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(_ZN6Nektar12LibUtilities13SessionReader14CreateInstanceEiPPc+0x14c) [0x4396bc] [node92:28879] [ 9] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(main+0x4d) [0x4291ed] [node92:28879] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7e7a76cec5] [node92:28879] [11] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver() [0x430b99] [node92:28879] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 28879 on node node92 exited on signal 11 (Segmentation fault). --------------------------------------------------------------------------
Do you have any advice how to speed up the code and to fix the MPI issue?
Best regards, Helmut
________________________________________ *Helmut Kühnelt * Scientist Mobility Department Electric Drive Technologies *AIT Austrian Institute of Technology GmbH* Giefinggasse 2 | 1210 Vienna | Austria T: +43 50550-6245 | M: +43 664 815 78 38 | F: +43 50550-6595 helmut.kuehnelt@ait.ac.at <http://UrlBlockedError.aspx> | http://www.ait.ac.at/ <http://UrlBlockedError.aspx>
FN: 115980 i HG Wien | UID: ATU14703506 This email and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient, please notify the sender by return e-mail or by telephone and delete this message from your system and any printout thereof. Any unauthorized use, reproduction, or dissemination of this message is strictly prohibited. Please note that e-mails are susceptible to change. AIT Austrian Institute of Technology GmbH shall not be liable for the improper or incomplete transmission of the information contained in this communication, nor shall it be liable for any delay in its receipt. _______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk <http://UrlBlockedError.aspx> https://mailman.ic.ac.uk/mailman/listinfo/nektar-users <http://UrlBlockedError.aspx>
Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ
s.sherwin@imperial.ac.uk <http://UrlBlockedError.aspx> +44 (0) 20 759 45052
<Test_1D.xml>
Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ
s.sherwin@imperial.ac.uk <http://UrlBlockedError.aspx> +44 (0) 20 759 45052
_______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk <http://UrlBlockedError.aspx> https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ
s.sherwin@imperial.ac.uk <http://UrlBlockedError.aspx> +44 (0) 20 759 45052
_______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/nektar-users
HI Helmut, I am building with Release mode and nothing else in particular. I happen to have MPI turned on but I do not think that will make a difference. One thing that might be worth checking is the use of openblas. I have had trouble with openblas in the past for small matrix sizes. The compilation I used will have been using the default installation in /usr/lib/libblas.a. I guess Douglas’ test will have used the Framework of Mac OS X. However looking at the set up on the compute module I do seem to have openblas loaded (@Chris: Could you confirm what version of openblas is running on Victoria Other configurations are: library gcc 4.9.2 boost 1.58 - installed from a module Do you have another machine/laptop we could turn this test on? Cheers, Spencer. PS I had a time step of 1e-6 and attach my .xml file below
Hi Spencer, hi Douglas, Indeed, using Collections (auto => SumFac) definitively helps: Total Computation Time = 95s (180s before) I noticed that the code runs single threaded, even though Boost_USE_MULTITHREADED is ON and OpenBlas is compiled as multi threaded. Any hints on that? Cheers, Helmut ________________________________ Von: Sherwin, Spencer J [s.sherwin@imperial.ac.uk] Gesendet: Mittwoch, 18. Mai 2016 10:25 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem HI Helmut, I am building with Release mode and nothing else in particular. I happen to have MPI turned on but I do not think that will make a difference. One thing that might be worth checking is the use of openblas. I have had trouble with openblas in the past for small matrix sizes. The compilation I used will have been using the default installation in /usr/lib/libblas.a. I guess Douglas’ test will have used the Framework of Mac OS X. However looking at the set up on the compute module I do seem to have openblas loaded (@Chris: Could you confirm what version of openblas is running on Victoria Other configurations are: library gcc 4.9.2 boost 1.58 - installed from a module Do you have another machine/laptop we could turn this test on? Cheers, Spencer. PS I had a time step of 1e-6 and attach my .xml file below
Hi Helmut, I pushed some changes to the feature/Euler1D branch which should make this simulation somewhat faster. It would be great if you could test that to see if it helps with your problem. Cheers, Douglas 2016-05-18 13:29 GMT+01:00 Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at>:
Hi Spencer, hi Douglas,
Indeed, using Collections (auto => SumFac) definitively helps: Total Computation Time = 95s (180s before)
I noticed that the code runs single threaded, even though Boost_USE_MULTITHREADED is ON and OpenBlas is compiled as multi threaded. Any hints on that?
Cheers, Helmut
------------------------------ *Von:* Sherwin, Spencer J [s.sherwin@imperial.ac.uk] *Gesendet:* Mittwoch, 18. Mai 2016 10:25 *An:* Kühnelt Helmut *Cc:* nektar-users *Betreff:* Re: [Nektar-users] EulerCFE - performance issue & MPI problem
HI Helmut,
I am building with Release mode and nothing else in particular. I happen to have MPI turned on but I do not think that will make a difference.
One thing that might be worth checking is the use of openblas. I have had trouble with openblas in the past for small matrix sizes. The compilation I used will have been using the default installation in /usr/lib/libblas.a. I guess Douglas’ test will have used the Framework of Mac OS X. However looking at the set up on the compute module I do seem to have openblas loaded
(@Chris: Could you confirm what version of openblas is running on Victoria
Other configurations are: library gcc 4.9.2 boost 1.58 - installed from a module
Do you have another machine/laptop we could turn this test on?
Cheers, Spencer.
PS I had a time step of 1e-6 and attach my .xml file below
Hi Douglas, hi Spencer, I run the test case with the feature/Euler1D branch with SumFac : Time-integration : 18.7779s Total Computation Time = 21s This make a difference! no tuning: Time-integration : 32.797s Total Computation Time = 35s Also no too bad! I already notice the orphaned code that you removed. Was this the solution of my speed issue, or was it because I used the release version 4.3.1.? Cheers, Helmut ________________________________ Von: Douglas Serson [d.serson@gmail.com] Gesendet: Donnerstag, 19. Mai 2016 15:40 An: Kühnelt Helmut Cc: Sherwin, Spencer J; nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem Hi Helmut, I pushed some changes to the feature/Euler1D branch which should make this simulation somewhat faster. It would be great if you could test that to see if it helps with your problem. Cheers, Douglas 2016-05-18 13:29 GMT+01:00 Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<redir.aspx?REF=VaiQVChfgkaSHMIK8V1w5MoxFXPMoGbSCKCoUTD34VlEJI4k8n_TCAFtYWlsdG86SGVsbXV0Lkt1ZWhuZWx0QGFpdC5hYy5hdA..>>: Hi Spencer, hi Douglas, Indeed, using Collections (auto => SumFac) definitively helps: Total Computation Time = 95s (180s before) I noticed that the code runs single threaded, even though Boost_USE_MULTITHREADED is ON and OpenBlas is compiled as multi threaded. Any hints on that? Cheers, Helmut ________________________________ Von: Sherwin, Spencer J [s.sherwin@imperial.ac.uk<redir.aspx?REF=T8yT6PfzUIHmdHW76qXpUm3oNtBk-gKf_xC1CdXkNYpEJI4k8n_TCAFtYWlsdG86cy5zaGVyd2luQGltcGVyaWFsLmFjLnVr>] Gesendet: Mittwoch, 18. Mai 2016 10:25 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem HI Helmut, I am building with Release mode and nothing else in particular. I happen to have MPI turned on but I do not think that will make a difference. One thing that might be worth checking is the use of openblas. I have had trouble with openblas in the past for small matrix sizes. The compilation I used will have been using the default installation in /usr/lib/libblas.a. I guess Douglas’ test will have used the Framework of Mac OS X. However looking at the set up on the compute module I do seem to have openblas loaded (@Chris: Could you confirm what version of openblas is running on Victoria Other configurations are: library gcc 4.9.2 boost 1.58 - installed from a module Do you have another machine/laptop we could turn this test on? Cheers, Spencer. PS I had a time step of 1e-6 and attach my .xml file below
Hi Helmut, Douglas Thanks for Douglas’ analysis. He spotted that in some heavily used routines there were some unnecessary copies which made new arrays or shared pointers. This lead to a lock call and a counter initialisation which in your case where there is very little actually work in these methods were being dominated by these unnecessary calls. So I believe this orphaned code was the issue in your case. It also seemed related to your relatively slow mobile cpu since we were unable to reproduce this on our own laptops or desktops. Also it did not seem the boost version made much difference as you also mentioned. Anyway good to have gotten to the bottom of this. We have this in a fix branch and will upload it into master and the web page distribution as soon as possible. Cheers, Spencer. On 19 May 2016, at 15:48, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<mailto:Helmut.Kuehnelt@ait.ac.at>> wrote: Hi Douglas, hi Spencer, I run the test case with the feature/Euler1D branch with SumFac : Time-integration : 18.7779s Total Computation Time = 21s This make a difference! no tuning: Time-integration : 32.797s Total Computation Time = 35s Also no too bad! I already notice the orphaned code that you removed. Was this the solution of my speed issue, or was it because I used the release version 4.3.1.? Cheers, Helmut ________________________________ Von: Douglas Serson [d.serson@gmail.com<mailto:d.serson@gmail.com>] Gesendet: Donnerstag, 19. Mai 2016 15:40 An: Kühnelt Helmut Cc: Sherwin, Spencer J; nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem Hi Helmut, I pushed some changes to the feature/Euler1D branch which should make this simulation somewhat faster. It would be great if you could test that to see if it helps with your problem. Cheers, Douglas 2016-05-18 13:29 GMT+01:00 Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<x-msg://150/redir.aspx?REF=VaiQVChfgkaSHMIK8V1w5MoxFXPMoGbSCKCoUTD34VlEJI4k8n_TCAFtYWlsdG86SGVsbXV0Lkt1ZWhuZWx0QGFpdC5hYy5hdA..>>: Hi Spencer, hi Douglas, Indeed, using Collections (auto => SumFac) definitively helps: Total Computation Time = 95s (180s before) I noticed that the code runs single threaded, even though Boost_USE_MULTITHREADED is ON and OpenBlas is compiled as multi threaded. Any hints on that? Cheers, Helmut ________________________________ Von: Sherwin, Spencer J [s.sherwin@imperial.ac.uk<x-msg://150/redir.aspx?REF=T8yT6PfzUIHmdHW76qXpUm3oNtBk-gKf_xC1CdXkNYpEJI4k8n_TCAFtYWlsdG86cy5zaGVyd2luQGltcGVyaWFsLmFjLnVr>] Gesendet: Mittwoch, 18. Mai 2016 10:25 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem HI Helmut, I am building with Release mode and nothing else in particular. I happen to have MPI turned on but I do not think that will make a difference. One thing that might be worth checking is the use of openblas. I have had trouble with openblas in the past for small matrix sizes. The compilation I used will have been using the default installation in /usr/lib/libblas.a. I guess Douglas’ test will have used the Framework of Mac OS X. However looking at the set up on the compute module I do seem to have openblas loaded (@Chris: Could you confirm what version of openblas is running on Victoria Other configurations are: library gcc 4.9.2 boost 1.58 - installed from a module Do you have another machine/laptop we could turn this test on? Cheers, Spencer. PS I had a time step of 1e-6 and attach my .xml file below Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk> +44 (0) 20 759 45052
Hi Douglas, hi Spencer, With all the measures, the total speed up should be around a factor of 20 (2 for openblas, 2 for auto tuning and 5 for the code optimization). Thank you for your support to solve the problems. Have a nice weekend, Helmut ________________________________ Von: Sherwin, Spencer J [s.sherwin@imperial.ac.uk] Gesendet: Donnerstag, 19. Mai 2016 18:22 An: Kühnelt Helmut Cc: Douglas Serson; nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem Hi Helmut, Douglas Thanks for Douglas’ analysis. He spotted that in some heavily used routines there were some unnecessary copies which made new arrays or shared pointers. This lead to a lock call and a counter initialisation which in your case where there is very little actually work in these methods were being dominated by these unnecessary calls. So I believe this orphaned code was the issue in your case. It also seemed related to your relatively slow mobile cpu since we were unable to reproduce this on our own laptops or desktops. Also it did not seem the boost version made much difference as you also mentioned. Anyway good to have gotten to the bottom of this. We have this in a fix branch and will upload it into master and the web page distribution as soon as possible. Cheers, Spencer. On 19 May 2016, at 15:48, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<redir.aspx?REF=CKeHW-FPnevAhIE5HPKBz1_poGKGiINN5ZJKE5mRyyNAHaRMloDTCAFtYWlsdG86SGVsbXV0Lkt1ZWhuZWx0QGFpdC5hYy5hdA..>> wrote: Hi Douglas, hi Spencer, I run the test case with the feature/Euler1D branch with SumFac : Time-integration : 18.7779s Total Computation Time = 21s This make a difference! no tuning: Time-integration : 32.797s Total Computation Time = 35s Also no too bad! I already notice the orphaned code that you removed. Was this the solution of my speed issue, or was it because I used the release version 4.3.1.? Cheers, Helmut ________________________________ Von: Douglas Serson [d.serson@gmail.com<redir.aspx?REF=4TSLGst-0uj-VpTZdraV9b7rsTrnB3h9OxlmDMr0I7tAHaRMloDTCAFtYWlsdG86ZC5zZXJzb25AZ21haWwuY29t>] Gesendet: Donnerstag, 19. Mai 2016 15:40 An: Kühnelt Helmut Cc: Sherwin, Spencer J; nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem Hi Helmut, I pushed some changes to the feature/Euler1D branch which should make this simulation somewhat faster. It would be great if you could test that to see if it helps with your problem. Cheers, Douglas 2016-05-18 13:29 GMT+01:00 Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<UrlBlockedError.aspx>>: Hi Spencer, hi Douglas, Indeed, using Collections (auto => SumFac) definitively helps: Total Computation Time = 95s (180s before) I noticed that the code runs single threaded, even though Boost_USE_MULTITHREADED is ON and OpenBlas is compiled as multi threaded. Any hints on that? Cheers, Helmut ________________________________ Von: Sherwin, Spencer J [s.sherwin@imperial.ac.uk<UrlBlockedError.aspx>] Gesendet: Mittwoch, 18. Mai 2016 10:25 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem HI Helmut, I am building with Release mode and nothing else in particular. I happen to have MPI turned on but I do not think that will make a difference. One thing that might be worth checking is the use of openblas. I have had trouble with openblas in the past for small matrix sizes. The compilation I used will have been using the default installation in /usr/lib/libblas.a. I guess Douglas’ test will have used the Framework of Mac OS X. However looking at the set up on the compute module I do seem to have openblas loaded (@Chris: Could you confirm what version of openblas is running on Victoria Other configurations are: library gcc 4.9.2 boost 1.58 - installed from a module Do you have another machine/laptop we could turn this test on? Cheers, Spencer. PS I had a time step of 1e-6 and attach my .xml file below Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<redir.aspx?REF=MfqWePGibJA47oFAV0axrBC7_YaZxhs9NE4BjKx5gPFAHaRMloDTCAFtYWlsdG86cy5zaGVyd2luQGltcGVyaWFsLmFjLnVr> +44 (0) 20 759 45052
Hi Helmut, I am only sorry we cannot achieve this much speed up on our other solvers! Cheers, Spencer. On 20 May 2016, at 11:36, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<mailto:Helmut.Kuehnelt@ait.ac.at>> wrote: Hi Douglas, hi Spencer, With all the measures, the total speed up should be around a factor of 20 (2 for openblas, 2 for auto tuning and 5 for the code optimization). Thank you for your support to solve the problems. Have a nice weekend, Helmut ________________________________ Von: Sherwin, Spencer J [s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk>] Gesendet: Donnerstag, 19. Mai 2016 18:22 An: Kühnelt Helmut Cc: Douglas Serson; nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem Hi Helmut, Douglas Thanks for Douglas’ analysis. He spotted that in some heavily used routines there were some unnecessary copies which made new arrays or shared pointers. This lead to a lock call and a counter initialisation which in your case where there is very little actually work in these methods were being dominated by these unnecessary calls. So I believe this orphaned code was the issue in your case. It also seemed related to your relatively slow mobile cpu since we were unable to reproduce this on our own laptops or desktops. Also it did not seem the boost version made much difference as you also mentioned. Anyway good to have gotten to the bottom of this. We have this in a fix branch and will upload it into master and the web page distribution as soon as possible. Cheers, Spencer. On 19 May 2016, at 15:48, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<x-msg://203/redir.aspx?REF=CKeHW-FPnevAhIE5HPKBz1_poGKGiINN5ZJKE5mRyyNAHaRMloDTCAFtYWlsdG86SGVsbXV0Lkt1ZWhuZWx0QGFpdC5hYy5hdA..>> wrote: Hi Douglas, hi Spencer, I run the test case with the feature/Euler1D branch with SumFac : Time-integration : 18.7779s Total Computation Time = 21s This make a difference! no tuning: Time-integration : 32.797s Total Computation Time = 35s Also no too bad! I already notice the orphaned code that you removed. Was this the solution of my speed issue, or was it because I used the release version 4.3.1.? Cheers, Helmut ________________________________ Von: Douglas Serson [d.serson@gmail.com<x-msg://203/redir.aspx?REF=4TSLGst-0uj-VpTZdraV9b7rsTrnB3h9OxlmDMr0I7tAHaRMloDTCAFtYWlsdG86ZC5zZXJzb25AZ21haWwuY29t>] Gesendet: Donnerstag, 19. Mai 2016 15:40 An: Kühnelt Helmut Cc: Sherwin, Spencer J; nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem Hi Helmut, I pushed some changes to the feature/Euler1D branch which should make this simulation somewhat faster. It would be great if you could test that to see if it helps with your problem. Cheers, Douglas 2016-05-18 13:29 GMT+01:00 Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<x-msg://203/UrlBlockedError.aspx>>: Hi Spencer, hi Douglas, Indeed, using Collections (auto => SumFac) definitively helps: Total Computation Time = 95s (180s before) I noticed that the code runs single threaded, even though Boost_USE_MULTITHREADED is ON and OpenBlas is compiled as multi threaded. Any hints on that? Cheers, Helmut ________________________________ Von: Sherwin, Spencer J [s.sherwin@imperial.ac.uk<x-msg://203/UrlBlockedError.aspx>] Gesendet: Mittwoch, 18. Mai 2016 10:25 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem HI Helmut, I am building with Release mode and nothing else in particular. I happen to have MPI turned on but I do not think that will make a difference. One thing that might be worth checking is the use of openblas. I have had trouble with openblas in the past for small matrix sizes. The compilation I used will have been using the default installation in /usr/lib/libblas.a. I guess Douglas’ test will have used the Framework of Mac OS X. However looking at the set up on the compute module I do seem to have openblas loaded (@Chris: Could you confirm what version of openblas is running on Victoria Other configurations are: library gcc 4.9.2 boost 1.58 - installed from a module Do you have another machine/laptop we could turn this test on? Cheers, Spencer. PS I had a time step of 1e-6 and attach my .xml file below Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<x-msg://203/redir.aspx?REF=MfqWePGibJA47oFAV0axrBC7_YaZxhs9NE4BjKx5gPFAHaRMloDTCAFtYWlsdG86cy5zaGVyd2luQGltcGVyaWFsLmFjLnVr> +44 (0) 20 759 45052 Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk> +44 (0) 20 759 45052
Hi Helmut, Just to confirm I have not forgotten your email. I just have not found the issue yet although I have an idea and will discuss it with Chris and Dave. (I think some allocations by reference have been replaced by copy constructors in the time integration but need to look a bit closer). In general we are trying to avoid introducing statics since we have some developments threading capcabilitles. Cheers, Spencer. On 21 Apr 2016, at 14:40, Kühnelt Helmut <Helmut.Kuehnelt@ait.ac.at<mailto:Helmut.Kuehnelt@ait.ac.at>> wrote: Hi Spencer, hi All, I experience that CompressibleFlowSolver for EulerCFE in 1D is really slow. A computation on a grid with 5000 elements, with P=5 and a 5th order RK_SSP (self-implemented) needs ~1 s per time step on an up-to-date workstation (single core execution). (A calculation of 1e5 steps (1s with a time step = 1e-5s) needs 27 hours...) The profiler shows that 68% ot the time the code spends releasing of shared pointers, constructing and destructing Array(), lock() and unlock() and the = operator. Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 32.05 1.38 1.38 10032 0.14 0.14 boost::detail::sp_counted_base::release() 9.56 1.79 0.41 20514 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::Array(unsigned int, double const&) 9.09 2.18 0.39 7693730 0.00 0.00 Nektar::Array<Nektar::OneD, double const>::~Array() 8.86 2.56 0.38 137658633 0.00 0.00 boost::unique_lock<boost::mutex>::lock() 8.04 2.90 0.35 137608137 0.00 0.00 boost::mutex::unlock() 7.69 3.23 0.33 20528 0.02 0.02 Nektar::Array<Nektar::OneD, double const>::operator=(Nektar::Array<Nektar::OneD, double const> const&) 7.23 3.54 0.31 2500500 0.00 0.00 Nektar::ExactSolverToro::v_PointSolve(double, double, double, double, double, double, double, double, double, double, double&, double&, double&, double&, double&) 4.31 3.73 0.19 74253 0.00 0.01 Nektar::MemPool::Allocate(unsigned long) 2.33 3.83 0.10 Nektar::Array<Nektar::OneD, int const>::~Array() ... Running CompressibleFlowSolver with MPI gives a segmentation fault somewhere in the mesh partitioning (1D mesh) MeshPartition::MeshPartition() MeshPartition::ReadGeometry() 0x15d70d0 0x15d71a0 0x15d71a0 0 [node92:28879] *** Process received signal *** [node92:28879] Signal: Segmentation fault (11) [node92:28879] Signal code: Address not mapped (1) [node92:28879] Failing at address: 0x38 [node92:28879] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f7e7a781d40] [node92:28879] [ 1] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartition12ReadGeometryERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x1268) [0x7f7e7be44118] [node92:28879] [ 2] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13MeshPartitionC1ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x40f) [0x7f7e7be44a5f] [node92:28879] [ 3] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetisC2ERKN5boost10shared_ptrINS0_13SessionReaderEEE+0x17) [0x7f7e7be519f7] [node92:28879] [ 4] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities18MeshPartitionMetis6createERKN5boost10shared_ptrINS0_13SessionReaderEEE+0xc5) [0x7f7e7be53505] [node92:28879] [ 5] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities10NekFactoryISsNS0_13MeshPartitionERKN5boost10shared_ptrINS0_13SessionReaderEEENS0_4noneES9_S9_S9_E14CreateInstanceESsS8_+0x96) [0x7f7e7be77c26] [node92:28879] [ 6] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader13PartitionMeshEv+0x3fd) [0x7f7e7be6cb5d] [node92:28879] [ 7] /home/hkuehnelt/nektar++/build/library/LibUtilities/libLibUtilities.so.4.3.0(_ZN6Nektar12LibUtilities13SessionReader11InitSessionEv+0x55) [0x7f7e7be6de55] [node92:28879] [ 8] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(_ZN6Nektar12LibUtilities13SessionReader14CreateInstanceEiPPc+0x14c) [0x4396bc] [node92:28879] [ 9] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver(main+0x4d) [0x4291ed] [node92:28879] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7e7a76cec5] [node92:28879] [11] /home/hkuehnelt/nektar++/build/solvers/CompressibleFlowSolver/CompressibleFlowSolver() [0x430b99] [node92:28879] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 28879 on node node92 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- Do you have any advice how to speed up the code and to fix the MPI issue? Best regards, Helmut ________________________________________ Helmut Kühnelt Scientist Mobility Department Electric Drive Technologies AIT Austrian Institute of Technology GmbH Giefinggasse 2 | 1210 Vienna | Austria T: +43 50550-6245 | M: +43 664 815 78 38 | F: +43 50550-6595 helmut.kuehnelt@ait.ac.at<https://aitmail.ait.ac.at/owa/redir.aspx?C=LR7CP_NRwEmvpxwWXICJVa6k5FhGic8I_hTs-2rPs4nu1K4MZP-JeHiVYewuzrvNeyP-rUghmyM.&URL=mailto%3ahelmut.kuehnelt%40ait.ac.at> | http://www.ait.ac.at/<https://aitmail.ait.ac.at/owa/redir.aspx?C=LR7CP_NRwEmvpxwWXICJVa6k5FhGic8I_hTs-2rPs4nu1K4MZP-JeHiVYewuzrvNeyP-rUghmyM.&URL=http%3a%2f%2fwww.ait.ac.at%2f> FN: 115980 i HG Wien | UID: ATU14703506 This email and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient, please notify the sender by return e-mail or by telephone and delete this message from your system and any printout thereof. Any unauthorized use, reproduction, or dissemination of this message is strictly prohibited. Please note that e-mails are susceptible to change. AIT Austrian Institute of Technology GmbH shall not be liable for the improper or incomplete transmission of the information contained in this communication, nor shall it be liable for any delay in its receipt. _______________________________________________ Nektar-users mailing list Nektar-users@imperial.ac.uk<mailto:Nektar-users@imperial.ac.uk> https://mailman.ic.ac.uk/mailman/listinfo/nektar-users Spencer Sherwin McLaren Racing/Royal Academy of Engineering Research Chair, Professor of Computational Fluid Mechanics, Department of Aeronautics, Imperial College London South Kensington Campus London SW7 2AZ s.sherwin@imperial.ac.uk<mailto:s.sherwin@imperial.ac.uk> +44 (0) 20 759 45052
participants (3)
- 
                
                Douglas Serson
- 
                
                Kühnelt Helmut
- 
                
                Sherwin, Spencer J