Dear Firedrake --

So far so good!

However, I am not sure of certain best practices in parallel. Consider the following familiar Helmholtz example. I have added some printing fluff before and after:

from firedrake import *

mesh = UnitSquareMesh(10, 10)

V = FunctionSpace(mesh, "CG", 1)

u = TrialFunction(V)

v = TestFunction(V)

f = Function(V)

x, y = SpatialCoordinate(mesh)

print('setting up problem')

f.interpolate((1+8*pi*pi)*cos(x*pi*2)*cos(y*pi*2))

a = (dot(grad(v), grad(u)) + v * u) * dx

L = f * v * dx

u = Function(V)

solve(a == L, u, options_prefix='s', solver_parameters={'ksp_type': 'cg'})

f.interpolate(cos(x*pi*2)*cos(y*pi*2))

L2err = sqrt(assemble(dot(u - f, u - f) * dx))

umax = u.dat.data.max()

print('L2 error norm = %g, max(u) = %g' % (L2err,umax))

Suppose we run first in serial and then on two processes:

(firedrake) ~$ python3 helmholtz.py -s_ksp_converged_reason

setting up problem

Linear s_ solve converged due to CONVERGED_RTOL iterations 13

L2 error norm = 0.0625707, max(u) = 1.03139

(firedrake) ~$ mpiexec -n 2 python3 helmholtz.py -s_ksp_converged_reason

Linear s_ solve converged due to CONVERGED_RTOL iterations 28

setting up problem

L2 error norm = 0.0625707, max(u) = 0.803552

setting up problem

L2 error norm = 0.0625707, max(u) = 1.03138

This reveals several issues:

1. Should I use "if COMM_WORLD.rank == 0:" to avoid printing n times? Or should I use a collective print like PetscPrintf()? If so where could I find a demo? (I'm familiar with the C API not petsc4py ...)

2. There is a flush issue if I only print on rank 0; the solver reports CONVERGED before the first print happens. How to flush or avoid the need to flush?

3. It looks like "assemble()" does a MPI_Allreduce() to get the right L2 error. Is that right? How to do this myself, for example for the solution maximum to appear on all ranks? (Probably a petsc4py call?)

Is there a demo which shows how the firedrake developers prefer to do these things? Parallel best-practices, more or less. All the demos I can find seem to be serial w.r.t. the above niceties.

The page https://www.firedrakeproject.org/parallelism.html addresses less-introductory parallelism issues in my opinion. I am not worried about the math either; I'm aware that the above difference in CG iterations has everything to do with the difference in default preconditioning.

Thanks!

Ed Bueler
Dept of Math and Stat and Geophysical Institute
University of Alaska Fairbanks
Fairbanks, AK 99775-6660
301C Chapman