Hi all,
I am not sure if this is a firedrake issue/question per se, but I noticed that when I run PETSc's SNES ex12 (3D FEM poisson with DMPlex) with 531,441 dofs, i get the following speedup for KSPSolve(...) on up to 8 cores:
1: 4.4628 s
2: 2.0991 s
4: 1.0930 s
8: 0.6591 s
That's roughly 85 % parallel efficiency. Now, when I run a similar firedrake 3D version:
1: 7.1377 s
2: 3.6969 s
4: 2.0406 s
8: 1.2939 s
That's now almost 69% parallel efficiency. I used the same solver and preconditioner (GMRE with ML). The PETSC_DIR I am using for the SNES ex12 is /path/to/firedrake/lib/python2.7/site-packages/petsc so I am basically using the same petsc and compilers for both cases. Using OpenMPI-1.6.5 with the binding options "--bind-to-core --bysocket" on an Intel Xeon E5 2670 node.
The efficiency for firedrake gets worse when I scale up the problem and use up to 64 cores (8 cores and 8 nodes). I get as bad as 40 % efficiency when I still maintain roughly 75% for the PETSc case.
This is rather strange. Shouldn't I expect roughly the same scaling performance for both implementations? Or is this normal? Note that I disregarded the assembly of the Jacobian or Function for either case because firedrake is much faster that Matt's PetscFE. 
I understand that the more "serially efficient" a code is, the less parallel efficient it may be. Might this scaling issue have to do with python and/or the implementation of firedrake? FWIW, I installed my own python library because the HPC machine does not have a compatible python 2.7
Thanks,
Justin