"For n=65 (287,496 vertices and 1,647,750 tets), the wall clock time goes
up to one hour. For n=85 (531,441 vertices, 3,072,000 tets), it takes
80 min"

Are you sure? 1.85x as many DoFs but only 1.33x the wallclock time? This doesn't give me much faith in the numbers you provide, I'm afraid.

On 18 July 2016 at 14:17, Nicolas Barral <n.barral@imperial.ac.uk> wrote:

Dear all,

In my code, I need to compute the hessian of my solution (for now a
Lagrange P1 field, on an unstructured simplicial mesh). To that aim, I
have been using the classic FE method, which is given in the
Monge-Ampère demonstration in 2D
(http://www.firedrakeproject.org/demos/ma-demo.py.html).

I am now trying to do the same thing in 3D. But it is slow. For now, I
am running in serial, so I expected it to be slow but now that slow.

The code used is the following:
n = 20
mesh = UnitCubeMesh(n, n, n)
W = FunctionSpace(mesh, 'CG', 1)
M = TensorFunctionSpace(mesh, 'CG', 1)
icExpr = Expression("((x[0]-0.35)*(x[0]-0.35) + (x[1]-0.35)*(x[1]-0.35)
+ (x[2]-0.35)*(x[2]-0.35) < 0.15*0.15)")
w = Function(W).interpolate(icExpr)
sigma = TestFunction(M)
H = Function(M)
n = FacetNormal(mesh)
Lh = inner(sigma, H)*dx + inner(div(sigma), grad(w))*dx
Lh -= (sigma[0,0]*w.dx(0)*n[0] + sigma[1,0]*w.dx(1)*n[0] +
sigma[2,0]*w.dx(2)*n[0] \
+ sigma[0,1]*w.dx(0)*n[1] + sigma[1,1]*w.dx(1)*n[1] +
sigma[2,1]*w.dx(2)*n[1] \
+ sigma[0,2]*w.dx(0)*n[2] + sigma[1,2]*w.dx(1)*n[2] +
sigma[2,2]*w.dx(2)*n[2])*ds
H_prob = NonlinearVariationalProblem(Lh, H)
H_solv = NonlinearVariationalSolver(H_prob)
H_solv.solve()
(Warning if you try to run this code, UniCubeMesh can take a long time
for not so great n)

If we consider a unit cube with 50 vertices on each edge (n=50, that
makes 132,651 vertices and 750,000 tetrahedra which is a small 3D case),
the wall clock time for the last instruction H_solv.solve() is 15 min.
For n=65 (287,496 vertices and 1,647,750 tets), the wall clock time goes
up to one hour. For n=85 (531,441 vertices, 3,072,000 tets), it takes
80 min, The processor is an
Intel Xeon E5-2640 v3 @ 2.60GHz, and these timings are obtained with the
default set of parameters for the solver (notably snes_rtol = 1e-8 and
ksp_rtol = 1e-5). For both meshes, the solver ends after 2 SNES
iterations, each iteration requiring 5 KSP iterations, which seem to be
relatively small numbers.

If I understand correctly, these times include the assembly and the
actual solution of the problem. I do not know how to distinguish the
assembly time from the solver time, but, if I can trust Firedrake debug
prints (with monitor_snes: True), the assembly is an order of magnitude
slower than the solver.

Do these CPU times look normal to you ?

I have, changed snes_rtol to 1e-2, as I don't need a great precision for
my hessian (I'm leaving ksp_rtol to 1e-5 to preserve the convergence, am
I right ?). I also changed the preconditioner, and use SOR, which helped
(a lot for tiny meshes, somewhat less for bigger ones). With these
options, my CPU time becomes 5min for n=50 and 9min for n=80.

That is better. However, I am a little concerned by the fact that the
CPU time quickly increases with n, and that this method seems
significantly slower than a gradient/Hessian recovery method. In the
context of mesh adaptation, I have to compute the Hessian frequently,
and it is not supposed to be a costly stage in the process.

So what do you think? Are there other things I can do to speed this up?

Many thanks,

--
Nicolas

--
Nicolas Barral

Dept. of Earth Science and Engineering
Imperial College London
Royal School of Mines - Office 4.88
London SW7 2AZ

_______________________________________________
firedrake mailing list
firedrake@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/firedrake