On 18/07/16 14:17, Nicolas Barral wrote:
Dear all,
In my code, I need to compute the hessian of my solution (for now a Lagrange P1 field, on an unstructured simplicial mesh). To that aim, I have been using the classic FE method, which is given in the Monge-Ampère demonstration in 2D (http://www.firedrakeproject.org/demos/ma-demo.py.html).
I am now trying to do the same thing in 3D. But it is slow. For now, I am running in serial, so I expected it to be slow but now that slow.
The code used is the following: n = 20 mesh = UnitCubeMesh(n, n, n) W = FunctionSpace(mesh, 'CG', 1) M = TensorFunctionSpace(mesh, 'CG', 1) icExpr = Expression("((x[0]-0.35)*(x[0]-0.35) + (x[1]-0.35)*(x[1]-0.35) + (x[2]-0.35)*(x[2]-0.35) < 0.15*0.15)") w = Function(W).interpolate(icExpr) sigma = TestFunction(M) H = Function(M) n = FacetNormal(mesh) Lh = inner(sigma, H)*dx + inner(div(sigma), grad(w))*dx Lh -= (sigma[0,0]*w.dx(0)*n[0] + sigma[1,0]*w.dx(1)*n[0] + sigma[2,0]*w.dx(2)*n[0] \ + sigma[0,1]*w.dx(0)*n[1] + sigma[1,1]*w.dx(1)*n[1] + sigma[2,1]*w.dx(2)*n[1] \ + sigma[0,2]*w.dx(0)*n[2] + sigma[1,2]*w.dx(1)*n[2] + sigma[2,2]*w.dx(2)*n[2])*ds H_prob = NonlinearVariationalProblem(Lh, H) H_solv = NonlinearVariationalSolver(H_prob) H_solv.solve() (Warning if you try to run this code, UniCubeMesh can take a long time for not so great n)
If we consider a unit cube with 50 vertices on each edge (n=50, that makes 132,651 vertices and 750,000 tetrahedra which is a small 3D case), the wall clock time for the last instruction H_solv.solve() is 15 min. For n=65 (287,496 vertices and 1,647,750 tets), the wall clock time goes up to one hour. For n=85 (531,441 vertices, 3,072,000 tets), it takes 80 min, The processor is an Intel Xeon E5-2640 v3 @ 2.60GHz, and these timings are obtained with the default set of parameters for the solver (notably snes_rtol = 1e-8 and ksp_rtol = 1e-5). For both meshes, the solver ends after 2 SNES iterations, each iteration requiring 5 KSP iterations, which seem to be relatively small numbers.
If I understand correctly, these times include the assembly and the actual solution of the problem. I do not know how to distinguish the assembly time from the solver time, but, if I can trust Firedrake debug prints (with monitor_snes: True), the assembly is an order of magnitude slower than the solver.
Do these CPU times look normal to you ?
I have, changed snes_rtol to 1e-2, as I don't need a great precision for my hessian (I'm leaving ksp_rtol to 1e-5 to preserve the convergence, am I right ?). I also changed the preconditioner, and use SOR, which helped (a lot for tiny meshes, somewhat less for bigger ones). With these options, my CPU time becomes 5min for n=50 and 9min for n=80.
That is better. However, I am a little concerned by the fact that the CPU time quickly increases with n, and that this method seems significantly slower than a gradient/Hessian recovery method. In the context of mesh adaptation, I have to compute the Hessian frequently, and it is not supposed to be a costly stage in the process.
So what do you think? Are there other things I can do to speed this up?
Please wrap the different bits of your code: mesh/functionspace creation and then solver creation/solve in timed_stage context managers: from pyop2.profiling import timed_stage with timed_stage("Make spaces"): mesh = ... V = FunctionSpace(...) ... with timed_stage("Solve"): solve(Lh == 0, H, ...) And send the output of python script.py -log_view For the different mesh sizes. Lawrence