CPU time for hessian computation

18 Jul 2016

      Dear all,

In my code, I need to compute the hessian of my solution (for now a 
Lagrange P1 field, on an unstructured simplicial mesh). To that aim, I 
have been using the classic FE method, which is given in the 
Monge-Ampère demonstration in 2D
(http://www.firedrakeproject.org/demos/ma-demo.py.html).

I am now trying to do the same thing in 3D. But it is slow. For now, I 
am running in serial, so I expected it to be slow but now that slow.

The code used is the following:
n = 20
mesh = UnitCubeMesh(n, n, n)
W = FunctionSpace(mesh, 'CG', 1)
M = TensorFunctionSpace(mesh, 'CG', 1)
icExpr = Expression("((x[0]-0.35)*(x[0]-0.35) + (x[1]-0.35)*(x[1]-0.35) 
+ (x[2]-0.35)*(x[2]-0.35) < 0.15*0.15)")
w = Function(W).interpolate(icExpr)
sigma = TestFunction(M)
H = Function(M)
n = FacetNormal(mesh)
Lh = inner(sigma, H)*dx + inner(div(sigma), grad(w))*dx
Lh -= (sigma[0,0]*w.dx(0)*n[0] + sigma[1,0]*w.dx(1)*n[0] + 
sigma[2,0]*w.dx(2)*n[0] \
         +  sigma[0,1]*w.dx(0)*n[1] + sigma[1,1]*w.dx(1)*n[1] + 
sigma[2,1]*w.dx(2)*n[1] \
         +  sigma[0,2]*w.dx(0)*n[2] + sigma[1,2]*w.dx(1)*n[2] + 
sigma[2,2]*w.dx(2)*n[2])*ds
H_prob = NonlinearVariationalProblem(Lh, H)
H_solv = NonlinearVariationalSolver(H_prob)
H_solv.solve()
(Warning if you try to run this code, UniCubeMesh can take a long time 
for not so great n)

If we consider a unit cube with 50 vertices on each edge (n=50, that 
makes 132,651 vertices and 750,000 tetrahedra which is a small 3D case), 
the wall clock time for the last instruction H_solv.solve() is 15 min. 
For n=65 (287,496 vertices and 1,647,750 tets), the wall clock time goes 
up to one hour.  For n=85 (531,441 vertices, 3,072,000 tets), it takes 
80 min, The processor is an
Intel Xeon E5-2640 v3 @ 2.60GHz, and these timings are obtained with the 
default set of parameters for the solver (notably snes_rtol = 1e-8 and 
ksp_rtol = 1e-5). For both meshes, the solver ends after 2 SNES 
iterations, each iteration requiring 5 KSP iterations, which seem to be 
relatively small numbers.

If I understand correctly, these times include the assembly and the 
actual solution of the problem. I do not know how to distinguish the 
assembly time from the solver time, but, if I can trust Firedrake debug 
prints (with monitor_snes: True), the assembly is an order of magnitude 
slower than the solver.

Do these CPU times look normal to you ?

I have, changed snes_rtol to 1e-2, as I don't need a great precision for 
my hessian (I'm leaving ksp_rtol to 1e-5 to preserve the convergence, am 
I right ?). I also changed the preconditioner, and use SOR, which helped 
(a lot for tiny meshes, somewhat less for bigger ones). With these 
options, my CPU time becomes 5min for n=50 and 9min for n=80.

That is better. However, I am a little concerned by the fact that the 
CPU time quickly increases with n, and that this method seems 
significantly slower than a gradient/Hessian recovery method. In the 
context of mesh adaptation, I have to compute the Hessian frequently, 
and it is not supposed to be a costly stage in the process.

So what do you think? Are there other things I can do to speed this up?

Many thanks,

-- 
Nicolas

-- 
Nicolas Barral

Dept. of Earth Science and Engineering
Imperial College London
Royal School of Mines - Office 4.88
London SW7 2AZ

Nicolas Barral

Lawrence Mitchell

Lawrence Mitchell

tags

participants (2)