3.  It looks like "assemble()" does a MPI_Allreduce() to get the right L2 error.  Is that right?  How to do this myself, for example for the solution maximum to appear on all ranks?  (Probably a petsc4py call?)

I'm also interested in the answers to the other parts of your question but I do have a partial answer to finding the maximum of a decomposed Function.  Replacing the last line with

    print('L2 error norm = %g, max(u) = %g or %g' % (L2err,umax, u.vector().max()))

gives output
  
    L2 error norm = 0.0625707, max(u) = 0.803552 or 1.03138
    L2 error norm = 0.0625707, max(u) = 1.03138 or 1.03138

See https://firedrakeproject.org/firedrake.html?highlight=max#firedrake.vector.Vector.max

There was some earlier discussion here http://mailman.ic.ac.uk/pipermail/firedrake/2016-June/002232.html showing how to find the maximum in a lower level way; I'm not sure whether that's superseded by the Vector.max method or offers other advantages.