More detailled breakdown of PETSc timings / higher order geometric MG
Hi Lawrence (cc firedrake), (apologies, accidentially sent from my unregistered gmail-account first). having talked to Rob yesterday, I’m also looking at the performance of the current (non-hybridised) solver at higher order again. As I said, the main bottleneck that makes the geometric multigrid more expensive is the high cost of the velocity mass matrix solve, which I have to do in the Schur-complement forward- and backward substitution (i.e. applying the triangular matrices in the full Schur-complement), and also in the pressure solve (since I don’t use the diagonal-only form, i.e. 'pc_fieldsplit_schur_fact_type': 'FULL'). However, the PETSc solver has to invert the velocity mass matrix as well, so it should be hit by the same costs. Do you know how I can extract the time for this to make a proper comparison? If I run with PETSC_OPTION=-log_summary then I get a breakdown of the PETSc times, I can only get PCApply. I guess what I’m saying is that I'm now unsure what PCApply actually measures. If the PETSc solver does 11 GMRES iterations, it claims that PCApply was called 23 times, so my conjucture is that this measures 11 pressure solves and 11 mass matrix solves, but probably not the time spent in the forward/backward substitution (as I said, I do run with the 'pc_fieldsplit_schur_fact_type': ‘FULL’ option). Can I break those times down further, so that I get, for example, the time spent in the two velocity mass matrix solves in the forward/backward substitution and the time in solving the Schur-complement pressure system M_p + \omega^2*D^T*diag(M_u)*D? Data I have currently: In the matrix-free solver, one velocity mass matrix inverse costs 2.27s, and I need two per iteration just for the forward/backward substitution. On the other hand, one GMRES iteration of the PETSc solver (which includes everything: applying the mixed operator, solving the pressure system, inverting the velocity mass matrices) takes 3.87s, so something is not right there. If I can get a better like-for-like comparison of the times in the PETSc and matrix-free solver it should be possible to identify the bottlenecks. Thanks, Eike -- Dr Eike Hermann Mueller Lecturer in Scientific Computing Department of Mathematical Sciences University of Bath Bath BA2 7AY, United Kingdom +44 1225 38 5557 e.mueller@bath.ac.uk http://people.bath.ac.uk/em459/
sorry, those times were with an unoptimised PETSc.
Data I have currently: In the matrix-free solver, one velocity mass matrix inverse costs 2.27s, and I need two per iteration just for the forward/backward substitution. On the other hand, one GMRES iteration of the PETSc solver (which includes everything: applying the mixed operator, solving the pressure system, inverting the velocity mass matrices) takes 3.87s, so something is not right there.
If I use optimised PETSc, I get ~0.8s for one velocity mass matrix solve in the matrix-free solver (and a total time per iteration of 3.2s). The time per iteration in the PETSc solver with AMG preconditioner is 0.8s. Thanks, Eike
Dear all, to get a more detailled breakdown of the PETSc fieldsplit preconditioner I now tried ksp = up_solver.snes.getKSP() ksp.setMonitor(self._ksp_monitor) ksp_hdiv = ksp.getPC().getFieldSplitSubKSP() ksp_hdiv.setMonitor(self._ksp_monitor) to attach my own KSP monitor to the solver for the HDiv system. I can then use that to work out the time per iteration and number of iterations of the velocity mass matrix solve. I suspect that for some reason the same PC (preonly+bjacobi+ILU) is less efficient for my standalone velocity mass matrix solve, possibly because the ilu does not work due to the wrong dof-ordering (I observe that preonly+bjacobi+ILU is not faster than cg+jacobi for my inversion, but in the fieldsplit case there is a significant difference). However, the third line of the code above crashes with a nasty segfault in PETSc: File "/Users/eikemueller/PostDocBath/EllipticSolvers/Firedrake_workspace/firedrake-helmholtzsolver/source/gravitywaves.py", line 475, in solve pc_hdiv = ksp.getPC().getFieldSplitSubKSP() File "PC.pyx", line 384, in petsc4py.PETSc.PC.getFieldSplitSubKSP (src/petsc4py.PETSc.c:136328) petsc4py.PETSc.Error: error code 85 [0] PCFieldSplitGetSubKSP() line 1662 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0] PCFieldSplitGetSubKSP_FieldSplit_Schur() line 1259 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0] MatSchurComplementGetKSP() line 317 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/ksp/utils/schurm.c [0] Null argument, when expecting valid pointer [0] Null Object: Parameter # 1 Thanks, Eike
On 17 Mar 2015, at 09:22, Eike Mueller <E.Mueller@bath.ac.uk> wrote:
sorry, those times were with an unoptimised PETSc.
Data I have currently: In the matrix-free solver, one velocity mass matrix inverse costs 2.27s, and I need two per iteration just for the forward/backward substitution. On the other hand, one GMRES iteration of the PETSc solver (which includes everything: applying the mixed operator, solving the pressure system, inverting the velocity mass matrices) takes 3.87s, so something is not right there.
If I use optimised PETSc, I get ~0.8s for one velocity mass matrix solve in the matrix-free solver (and a total time per iteration of 3.2s). The time per iteration in the PETSc solver with AMG preconditioner is 0.8s.
Thanks,
Eike
On 18 Mar 2015, at 08:11, Eike Mueller <e.mueller@bath.ac.uk> wrote:
Dear all,
to get a more detailled breakdown of the PETSc fieldsplit preconditioner I now tried
ksp = up_solver.snes.getKSP() ksp.setMonitor(self._ksp_monitor) ksp_hdiv = ksp.getPC().getFieldSplitSubKSP() ksp_hdiv.setMonitor(self._ksp_monitor)
to attach my own KSP monitor to the solver for the HDiv system. I can then use that to work out the time per iteration and number of iterations of the velocity mass matrix solve. I suspect that for some reason the same PC (preonly+bjacobi+ILU) is less efficient for my standalone velocity mass matrix solve, possibly because the ilu does not work due to the wrong dof-ordering (I observe that preonly+bjacobi+ILU is not faster than cg+jacobi for my inversion, but in the fieldsplit case there is a significant difference).
However, the third line of the code above crashes with a nasty segfault in PETSc:
File "/Users/eikemueller/PostDocBath/EllipticSolvers/Firedrake_workspace/firedrake-helmholtzsolver/source/gravitywaves.py", line 475, in solve pc_hdiv = ksp.getPC().getFieldSplitSubKSP() File "PC.pyx", line 384, in petsc4py.PETSc.PC.getFieldSplitSubKSP (src/petsc4py.PETSc.c:136328) petsc4py.PETSc.Error: error code 85 [0] PCFieldSplitGetSubKSP() line 1662 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0] PCFieldSplitGetSubKSP_FieldSplit_Schur() line 1259 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0] MatSchurComplementGetKSP() line 317 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/ksp/utils/schurm.c [0] Null argument, when expecting valid pointer [0] Null Object: Parameter # 1
You probably needed to call up_solver.snes.setUp() (and maybe up_solver.snes.setFromOptions(), once you've set the petsc options appropriately) before you can pull the schur complement KSPs out. Lawrence
Hi Lawrence, if I set up the ksp as below, I get rid of the segfault. However, I don’t get any output from the KSP monitors attached to the subKSPs, only from the KSP monitor of the main KSP. up_solver = LinearVariationalSolver(up_problem, solver_parameters=sparams) ksp = up_solver.snes.getKSP() ksp.setUp() ksp.setMonitor(self._ksp_monitor) ksp_hdiv = ksp.getPC().getFieldSplitSubKSP() ksp_hdiv[0].setMonitor(KSPMonitor('fieldsplit_0',verbose=2)) ksp_hdiv[1].setMonitor(KSPMonitor('fieldsplit_1',verbose=2)) with self._ksp_monitor: up_solver.solve() Thanks, Eike -- Dr Eike Hermann Mueller Lecturer in Scientific Computing Department of Mathematical Sciences University of Bath Bath BA2 7AY, United Kingdom +44 1225 38 5557 e.mueller@bath.ac.uk http://people.bath.ac.uk/em459/
On 18 Mar 2015, at 15:09, Lawrence Mitchell <lawrence.mitchell@imperial.ac.uk> wrote:
On 18 Mar 2015, at 08:11, Eike Mueller <e.mueller@bath.ac.uk <mailto:e.mueller@bath.ac.uk>> wrote:
Dear all,
to get a more detailled breakdown of the PETSc fieldsplit preconditioner I now tried
ksp = up_solver.snes.getKSP() ksp.setMonitor(self._ksp_monitor) ksp_hdiv = ksp.getPC().getFieldSplitSubKSP() ksp_hdiv.setMonitor(self._ksp_monitor)
to attach my own KSP monitor to the solver for the HDiv system. I can then use that to work out the time per iteration and number of iterations of the velocity mass matrix solve. I suspect that for some reason the same PC (preonly+bjacobi+ILU) is less efficient for my standalone velocity mass matrix solve, possibly because the ilu does not work due to the wrong dof-ordering (I observe that preonly+bjacobi+ILU is not faster than cg+jacobi for my inversion, but in the fieldsplit case there is a significant difference).
However, the third line of the code above crashes with a nasty segfault in PETSc:
File "/Users/eikemueller/PostDocBath/EllipticSolvers/Firedrake_workspace/firedrake-helmholtzsolver/source/gravitywaves.py", line 475, in solve pc_hdiv = ksp.getPC().getFieldSplitSubKSP() File "PC.pyx", line 384, in petsc4py.PETSc.PC.getFieldSplitSubKSP (src/petsc4py.PETSc.c:136328) petsc4py.PETSc.Error: error code 85 [0] PCFieldSplitGetSubKSP() line 1662 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0] PCFieldSplitGetSubKSP_FieldSplit_Schur() line 1259 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0] MatSchurComplementGetKSP() line 317 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/ksp/utils/schurm.c [0] Null argument, when expecting valid pointer [0] Null Object: Parameter # 1
You probably needed to call
up_solver.snes.setUp() (and maybe up_solver.snes.setFromOptions(), once you've set the petsc options appropriately) before you can pull the schur complement KSPs out.
Lawrence
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk <mailto:firedrake@imperial.ac.uk> https://mailman.ic.ac.uk/mailman/listinfo/firedrake <https://mailman.ic.ac.uk/mailman/listinfo/firedrake>
Hi Lawrence, turns out it does work, but I did not get any output since I used KSP=PREONLY… Once I change this to e.g. JACOBI my monitor prints what it should print. Thanks, Eike
On 18 Mar 2015, at 20:15, Eike Mueller <E.Mueller@bath.ac.uk> wrote:
Hi Lawrence,
if I set up the ksp as below, I get rid of the segfault. However, I don’t get any output from the KSP monitors attached to the subKSPs, only from the KSP monitor of the main KSP.
up_solver = LinearVariationalSolver(up_problem, solver_parameters=sparams) ksp = up_solver.snes.getKSP() ksp.setUp() ksp.setMonitor(self._ksp_monitor) ksp_hdiv = ksp.getPC().getFieldSplitSubKSP() ksp_hdiv[0].setMonitor(KSPMonitor('fieldsplit_0',verbose=2)) ksp_hdiv[1].setMonitor(KSPMonitor('fieldsplit_1',verbose=2)) with self._ksp_monitor: up_solver.solve()
Thanks,
Eike
--
Dr Eike Hermann Mueller Lecturer in Scientific Computing
Department of Mathematical Sciences University of Bath Bath BA2 7AY, United Kingdom
+44 1225 38 5557 e.mueller@bath.ac.uk <mailto:e.mueller@bath.ac.uk> http://people.bath.ac.uk/em459/
On 18 Mar 2015, at 15:09, Lawrence Mitchell <lawrence.mitchell@imperial.ac.uk <mailto:lawrence.mitchell@imperial.ac.uk>> wrote:
On 18 Mar 2015, at 08:11, Eike Mueller <e.mueller@bath.ac.uk <mailto:e.mueller@bath.ac.uk>> wrote:
Dear all,
to get a more detailled breakdown of the PETSc fieldsplit preconditioner I now tried
ksp = up_solver.snes.getKSP() ksp.setMonitor(self._ksp_monitor) ksp_hdiv = ksp.getPC().getFieldSplitSubKSP() ksp_hdiv.setMonitor(self._ksp_monitor)
to attach my own KSP monitor to the solver for the HDiv system. I can then use that to work out the time per iteration and number of iterations of the velocity mass matrix solve. I suspect that for some reason the same PC (preonly+bjacobi+ILU) is less efficient for my standalone velocity mass matrix solve, possibly because the ilu does not work due to the wrong dof-ordering (I observe that preonly+bjacobi+ILU is not faster than cg+jacobi for my inversion, but in the fieldsplit case there is a significant difference).
However, the third line of the code above crashes with a nasty segfault in PETSc:
File "/Users/eikemueller/PostDocBath/EllipticSolvers/Firedrake_workspace/firedrake-helmholtzsolver/source/gravitywaves.py", line 475, in solve pc_hdiv = ksp.getPC().getFieldSplitSubKSP() File "PC.pyx", line 384, in petsc4py.PETSc.PC.getFieldSplitSubKSP (src/petsc4py.PETSc.c:136328) petsc4py.PETSc.Error: error code 85 [0] PCFieldSplitGetSubKSP() line 1662 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0] PCFieldSplitGetSubKSP_FieldSplit_Schur() line 1259 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c [0] MatSchurComplementGetKSP() line 317 in /Users/eikemueller/PostDocBath/EllipticSolvers/petsc/src/ksp/ksp/utils/schurm.c [0] Null argument, when expecting valid pointer [0] Null Object: Parameter # 1
You probably needed to call
up_solver.snes.setUp() (and maybe up_solver.snes.setFromOptions(), once you've set the petsc options appropriately) before you can pull the schur complement KSPs out.
Lawrence
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk <mailto:firedrake@imperial.ac.uk> https://mailman.ic.ac.uk/mailman/listinfo/firedrake <https://mailman.ic.ac.uk/mailman/listinfo/firedrake>
firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
participants (2)
- 
                
                Eike Mueller
- 
                
                Lawrence Mitchell