On Mon, Apr 30, 2018 at 12:40 AM, Lawrence Mitchell <lawrence.mitchell@imperial.ac.uk> wrote:

Dear Ed,

> On 28 Apr 2018, at 23:49, Ed Bueler <elbueler@alaska.edu> wrote:
>
> Dear Firedrake --
>
> So far so good!
>
> However, I am not sure of certain best practices in parallel. Consider the following familiar Helmholtz example. I have added some printing fluff before and after:

See inline commentary below.

> from firedrake import *
>
> mesh = UnitSquareMesh(10, 10)
> V = FunctionSpace(mesh, "CG", 1)
> u = TrialFunction(V)
> v = TestFunction(V)
> f = Function(V)
> x, y = SpatialCoordinate(mesh)
>
> print('setting up problem')
>
> f.interpolate((1+8*pi*pi)*cos(x*pi*2)*cos(y*pi*2))
> a = (dot(grad(v), grad(u)) + v * u) * dx
> L = f * v * dx
> u = Function(V)
> solve(a == L, u, options_prefix='s', solver_parameters={'ksp_type': 'cg'})
>
> f.interpolate(cos(x*pi*2)*cos(y*pi*2))
> L2err = sqrt(assemble(dot(u - f, u - f) * dx))
> umax = u.dat.data.max()
>
> print('L2 error norm = %g, max(u) = %g' % (L2err,umax))
>
>
> Suppose we run first in serial and then on two processes:
>
> (firedrake) ~$ python3 helmholtz.py -s_ksp_converged_reason
> setting up problem
> Linear s_ solve converged due to CONVERGED_RTOL iterations 13
> L2 error norm = 0.0625707, max(u) = 1.03139
>
> (firedrake) ~$ mpiexec -n 2 python3 helmholtz.py -s_ksp_converged_reason
> Linear s_ solve converged due to CONVERGED_RTOL iterations 28
> setting up problem
> L2 error norm = 0.0625707, max(u) = 0.803552
> setting up problem
> L2 error norm = 0.0625707, max(u) = 1.03138
>
> This reveals several issues:
>
> 1. Should I use "if COMM_WORLD.rank == 0:" to avoid printing n times? Or should I use a collective print like PetscPrintf()? If so where could I find a demo? (I'm familiar with the C API not petsc4py ...)

For printing, there are two options (I prefer the second option).

1. Either manually handle the communicator issue:

e.g.

comm = mesh.comm

if comm.rank == 0:
print(...)

2. Use PETSc's printing facilities via petsc4py.

from firedrake.petsc import PETSc

PETSc.Sys.Print("some string", comm=mesh.comm)

Note that you have to do all the string formatting in Python.

This also gives you access to synchronised printing:

PETSc.Sys.syncPrint("...", comm=...)
...
PETSc.Sys.syncFlush(comm=...)

BTW, all Firedrake objects comm with a `.comm` property (if you find ones that don't have it, please do report a bug). You only explicitly need to provide it when building the Mesh object, because everything is collective over the communicator the Mesh is built on (so everything else inherits).

> 2. There is a flush issue if I only print on rank 0; the solver reports CONVERGED before the first print happens. How to flush or avoid the need to flush?

So if you print through PETSc, then the flushing will happen in the same order that PETSc's other output flushes. Otherwise you would have to do:

import sys
sys.stdout.flush()

To flush Python's output stream.

> 3. It looks like "assemble()" does a MPI_Allreduce() to get the right L2 error. Is that right? How to do this myself, for example for the solution maximum to appear on all ranks? (Probably a petsc4py call?)

All of the "public" API is collective over the communicator you originally built the mesh on. So "assemble" gives you a globally consistent view of the operation you asked for. In the case of assembling a functional (as here), that involves doing the allreduce for you. For assembly of linear/bilinear forms, you get back the local part of the distributed object (but we've called MatAssemblyBegin/End for you by the time you use the values).

When you access the data underlying a function, this is morally equivalent to doing VecGetArray (except, warning! it is collective over the communicator, because it might need to do some computation. So it's best to think of it like DMLocalToGlobalBegin/End + VecGetArray).

So again you have a few options in parallel to get the right results:

1. Use MPI yourself.

from mpi4py import MPI

local_max = max(u.dat.data_ro) # .data_ro acts like VecGetArrayRead

global_max = u.comm.allreduce(local_max, op=MPI.MAX)

2. Avail yourself of the PETSc Vec interface.

Firedrake Function objects can offer a view of their dat as a Vec (this is a "global" vector, so it's on the original communicator, not COMM_SELF). Because this might involve some halo exchanges (and data copies in some cases), these are provided as context managers:

with u.dat.vec_ro as v:
# You have read-only access to the vector here.
global_max = v.max()

with u.dat.vec_wo as v:
# You have write-only access to the vector here.
v.set(1)

with u.dat.vec as v:
# You have read-write access to the vector here.
global_max = v.max()
v.set(2)

> Is there a demo which shows how the firedrake developers prefer to do these things? Parallel best-practices, more or less. All the demos I can find seem to be serial w.r.t. the above niceties.

I think we don't have anything particularly suitable unfortunately. The public interface side of things is broadly just "works in parallel", but then there are these wrinkles regarding printing and the like. We'd be delighted to take contributions/suggestions on what would be good to cover: part of the problem of already knowing this stuff is it being a little harder to think of what needs to be spelt out.

Cheers,

Lawrence

Ed Bueler
Dept of Math and Stat and Geophysical Institute
University of Alaska Fairbanks
Fairbanks, AK 99775-6660
301C Chapman