"For n=65 (287,496 vertices and 1,647,750 tets), the wall clock time goes up to one hour. For n=85 (531,441 vertices, 3,072,000 tets), it takes 80 min" Are you sure? 1.85x as many DoFs but only 1.33x the wallclock time? This doesn't give me much faith in the numbers you provide, I'm afraid. On 18 July 2016 at 14:17, Nicolas Barral <n.barral@imperial.ac.uk> wrote:
Dear all,
In my code, I need to compute the hessian of my solution (for now a Lagrange P1 field, on an unstructured simplicial mesh). To that aim, I have been using the classic FE method, which is given in the Monge-Ampère demonstration in 2D (http://www.firedrakeproject.org/demos/ma-demo.py.html).
I am now trying to do the same thing in 3D. But it is slow. For now, I am running in serial, so I expected it to be slow but now that slow.
The code used is the following: n = 20 mesh = UnitCubeMesh(n, n, n) W = FunctionSpace(mesh, 'CG', 1) M = TensorFunctionSpace(mesh, 'CG', 1) icExpr = Expression("((x[0]-0.35)*(x[0]-0.35) + (x[1]-0.35)*(x[1]-0.35) + (x[2]-0.35)*(x[2]-0.35) < 0.15*0.15)") w = Function(W).interpolate(icExpr) sigma = TestFunction(M) H = Function(M) n = FacetNormal(mesh) Lh = inner(sigma, H)*dx + inner(div(sigma), grad(w))*dx Lh -= (sigma[0,0]*w.dx(0)*n[0] + sigma[1,0]*w.dx(1)*n[0] + sigma[2,0]*w.dx(2)*n[0] \ + sigma[0,1]*w.dx(0)*n[1] + sigma[1,1]*w.dx(1)*n[1] + sigma[2,1]*w.dx(2)*n[1] \ + sigma[0,2]*w.dx(0)*n[2] + sigma[1,2]*w.dx(1)*n[2] + sigma[2,2]*w.dx(2)*n[2])*ds H_prob = NonlinearVariationalProblem(Lh, H) H_solv = NonlinearVariationalSolver(H_prob) H_solv.solve() (Warning if you try to run this code, UniCubeMesh can take a long time for not so great n)
If we consider a unit cube with 50 vertices on each edge (n=50, that makes 132,651 vertices and 750,000 tetrahedra which is a small 3D case), the wall clock time for the last instruction H_solv.solve() is 15 min. For n=65 (287,496 vertices and 1,647,750 tets), the wall clock time goes up to one hour. For n=85 (531,441 vertices, 3,072,000 tets), it takes 80 min, The processor is an Intel Xeon E5-2640 v3 @ 2.60GHz, and these timings are obtained with the default set of parameters for the solver (notably snes_rtol = 1e-8 and ksp_rtol = 1e-5). For both meshes, the solver ends after 2 SNES iterations, each iteration requiring 5 KSP iterations, which seem to be relatively small numbers.
If I understand correctly, these times include the assembly and the actual solution of the problem. I do not know how to distinguish the assembly time from the solver time, but, if I can trust Firedrake debug prints (with monitor_snes: True), the assembly is an order of magnitude slower than the solver.
Do these CPU times look normal to you ?
I have, changed snes_rtol to 1e-2, as I don't need a great precision for my hessian (I'm leaving ksp_rtol to 1e-5 to preserve the convergence, am I right ?). I also changed the preconditioner, and use SOR, which helped (a lot for tiny meshes, somewhat less for bigger ones). With these options, my CPU time becomes 5min for n=50 and 9min for n=80.
That is better. However, I am a little concerned by the fact that the CPU time quickly increases with n, and that this method seems significantly slower than a gradient/Hessian recovery method. In the context of mesh adaptation, I have to compute the Hessian frequently, and it is not supposed to be a costly stage in the process.
So what do you think? Are there other things I can do to speed this up?
Many thanks,
-- Nicolas
-- Nicolas Barral
Dept. of Earth Science and Engineering Imperial College London Royal School of Mines - Office 4.88 London SW7 2AZ
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake