Dear Firedrake --

So far so good!

However, I am not sure of certain best practices in parallel.  Consider the following familiar Helmholtz example.  I have added some printing fluff before and after:


from firedrake import *

mesh = UnitSquareMesh(10, 10)
V = FunctionSpace(mesh, "CG", 1)
u = TrialFunction(V)
v = TestFunction(V)
f = Function(V)
x, y = SpatialCoordinate(mesh)

print('setting up problem')

f.interpolate((1+8*pi*pi)*cos(x*pi*2)*cos(y*pi*2))
a = (dot(grad(v), grad(u)) + v * u) * dx
L = f * v * dx
u = Function(V)
solve(a == L, u, options_prefix='s', solver_parameters={'ksp_type': 'cg'})

f.interpolate(cos(x*pi*2)*cos(y*pi*2))
L2err = sqrt(assemble(dot(u - f, u - f) * dx))
umax = u.dat.data.max()

print('L2 error norm = %g, max(u) = %g' % (L2err,umax))


Suppose we run first in serial and then on two processes:

(firedrake) ~$ python3 helmholtz.py -s_ksp_converged_reason
setting up problem
  Linear s_ solve converged due to CONVERGED_RTOL iterations 13
L2 error norm = 0.0625707, max(u) = 1.03139

(firedrake) ~$ mpiexec -n 2 python3 helmholtz.py -s_ksp_converged_reason
  Linear s_ solve converged due to CONVERGED_RTOL iterations 28
setting up problem
L2 error norm = 0.0625707, max(u) = 0.803552
setting up problem
L2 error norm = 0.0625707, max(u) = 1.03138

This reveals several issues:

1.  Should I use "if COMM_WORLD.rank == 0:" to avoid printing n times?  Or should I use a collective print like PetscPrintf()?  If so where could I find a demo?  (I'm familiar with the C API not petsc4py ...)

2.  There is a flush issue if I only print on rank 0; the solver reports CONVERGED before the first print happens.  How to flush or avoid the need to flush?

3.  It looks like "assemble()" does a MPI_Allreduce() to get the right L2 error.  Is that right?  How to do this myself, for example for the solution maximum to appear on all ranks?  (Probably a petsc4py call?)

Is there a demo which shows how the firedrake developers prefer to do these things?  Parallel best-practices, more or less.  All the demos I can find seem to be serial w.r.t. the above niceties.

The page https://www.firedrakeproject.org/parallelism.html addresses less-introductory parallelism issues in my opinion.  I am not worried about the math either; I'm aware that the above difference in CG iterations has everything to do with the difference in default preconditioning.

Thanks!

Ed


--
Ed Bueler
Dept of Math and Stat and Geophysical Institute
University of Alaska Fairbanks
Fairbanks, AK 99775-6660
301C Chapman