Dear Firedrake -- So far so good! However, I am not sure of certain best practices in parallel. Consider the following familiar Helmholtz example. I have added some printing fluff before and after: from firedrake import * mesh = UnitSquareMesh(10, 10) V = FunctionSpace(mesh, "CG", 1) u = TrialFunction(V) v = TestFunction(V) f = Function(V) x, y = SpatialCoordinate(mesh) print('setting up problem') f.interpolate((1+8*pi*pi)*cos(x*pi*2)*cos(y*pi*2)) a = (dot(grad(v), grad(u)) + v * u) * dx L = f * v * dx u = Function(V) solve(a == L, u, options_prefix='s', solver_parameters={'ksp_type': 'cg'}) f.interpolate(cos(x*pi*2)*cos(y*pi*2)) L2err = sqrt(assemble(dot(u - f, u - f) * dx)) umax = u.dat.data.max() print('L2 error norm = %g, max(u) = %g' % (L2err,umax)) Suppose we run first in serial and then on two processes: (firedrake) ~$ python3 helmholtz.py -s_ksp_converged_reason setting up problem Linear s_ solve converged due to CONVERGED_RTOL iterations 13 L2 error norm = 0.0625707, max(u) = 1.03139 (firedrake) ~$ mpiexec -n 2 python3 helmholtz.py -s_ksp_converged_reason Linear s_ solve converged due to CONVERGED_RTOL iterations 28 setting up problem L2 error norm = 0.0625707, max(u) = 0.803552 setting up problem L2 error norm = 0.0625707, max(u) = 1.03138 This reveals several issues: 1. Should I use "if COMM_WORLD.rank == 0:" to avoid printing n times? Or should I use a collective print like PetscPrintf()? If so where could I find a demo? (I'm familiar with the C API not petsc4py ...) 2. There is a flush issue if I only print on rank 0; the solver reports CONVERGED before the first print happens. How to flush or avoid the need to flush? 3. It looks like "assemble()" does a MPI_Allreduce() to get the right L2 error. Is that right? How to do this myself, for example for the solution maximum to appear on all ranks? (Probably a petsc4py call?) Is there a demo which shows how the firedrake developers prefer to do these things? Parallel best-practices, more or less. All the demos I can find seem to be serial w.r.t. the above niceties. The page https://www.firedrakeproject.org/parallelism.html addresses less-introductory parallelism issues in my opinion. I am not worried about the math either; I'm aware that the above difference in CG iterations has everything to do with the difference in default preconditioning. Thanks! Ed -- Ed Bueler Dept of Math and Stat and Geophysical Institute University of Alaska Fairbanks Fairbanks, AK 99775-6660 301C Chapman