3. It looks like "assemble()" does a MPI_Allreduce() to get the right L2 error. Is that right? How to do this myself, for example for the solution maximum to appear on all ranks? (Probably a petsc4py call?)

I'm also interested in the answers to the other parts of your question but I do have a partial answer to finding the maximum of a decomposed Function. Replacing the last line with

print('L2 error norm = %g, max(u) = %g or %g' % (L2err,umax, u.vector().max()))

gives output

L2 error norm = 0.0625707, max(u) = 0.803552 or 1.03138

L2 error norm = 0.0625707, max(u) = 1.03138 or 1.03138

See https://firedrakeproject.org/firedrake.html?highlight=max#firedrake.vector.Vector.max

There was some earlier discussion here http://mailman.ic.ac.uk/pipermail/firedrake/2016-June/002232.html showing how to find the maximum in a lower level way; I'm not sure whether that's superseded by the Vector.max method or offers other advantages.