3.  It looks like "assemble()" does a MPI_Allreduce() to get the right L2 error.  Is that right?  How to do this myself, for example for the solution maximum to appear on all ranks?  (Probably a petsc4py call?)
I'm also interested in the answers to the other parts of your question but I do have a partial answer to finding the maximum of a decomposed Function.  Replacing the last line with 
    print('L2 error norm = %g, max(u) = %g or %g' % (L2err,umax, u.vector().max()))
gives output
   
    L2 error norm = 0.0625707, max(u) = 0.803552 or 1.03138
    L2 error norm = 0.0625707, max(u) = 1.03138 or 1.03138