[firedrake] demo for printing and doing certain all-reduce operations smoothly in parallel?

28 Apr 2018

      Dear Firedrake --

So far so good!

However, I am not sure of certain best practices in parallel.  Consider the
following familiar Helmholtz example.  I have added some printing fluff
before and after:

from firedrake import *

mesh = UnitSquareMesh(10, 10)
V = FunctionSpace(mesh, "CG", 1)
u = TrialFunction(V)
v = TestFunction(V)
f = Function(V)
x, y = SpatialCoordinate(mesh)

print('setting up problem')

f.interpolate((1+8*pi*pi)*cos(x*pi*2)*cos(y*pi*2))
a = (dot(grad(v), grad(u)) + v * u) * dx
L = f * v * dx
u = Function(V)
solve(a == L, u, options_prefix='s', solver_parameters={'ksp_type': 'cg'})

f.interpolate(cos(x*pi*2)*cos(y*pi*2))
L2err = sqrt(assemble(dot(u - f, u - f) * dx))
umax = u.dat.data.max()

print('L2 error norm = %g, max(u) = %g' % (L2err,umax))

Suppose we run first in serial and then on two processes:

(firedrake) ~$ python3 helmholtz.py -s_ksp_converged_reason
setting up problem
  Linear s_ solve converged due to CONVERGED_RTOL iterations 13
L2 error norm = 0.0625707, max(u) = 1.03139

(firedrake) ~$ mpiexec -n 2 python3 helmholtz.py -s_ksp_converged_reason
  Linear s_ solve converged due to CONVERGED_RTOL iterations 28
setting up problem
L2 error norm = 0.0625707, max(u) = 0.803552
setting up problem
L2 error norm = 0.0625707, max(u) = 1.03138

This reveals several issues:

1.  Should I use "if COMM_WORLD.rank == 0:" to avoid printing n times?  Or
should I use a collective print like PetscPrintf()?  If so where could I
find a demo?  (I'm familiar with the C API not petsc4py ...)

2.  There is a flush issue if I only print on rank 0; the solver reports
CONVERGED before the first print happens.  How to flush or avoid the need
to flush?

3.  It looks like "assemble()" does a MPI_Allreduce() to get the right L2
error.  Is that right?  How to do this myself, for example for the solution
maximum to appear on all ranks?  (Probably a petsc4py call?)

Is there a demo which shows how the firedrake developers prefer to do these
things?  Parallel best-practices, more or less.  All the demos I can find
seem to be serial w.r.t. the above niceties.

The page https://www.firedrakeproject.org/parallelism.html addresses
less-introductory parallelism issues in my opinion.  I am not worried about
the math either; I'm aware that the above difference in CG iterations has
everything to do with the difference in default preconditioning.

Thanks!

Ed

-- 
Ed Bueler
Dept of Math and Stat and Geophysical Institute
University of Alaska Fairbanks
Fairbanks, AK 99775-6660
301C Chapman