New subject: Firedrake on supercomputers

7 Aug 2014

      Hi Lawrence,

Thanks for these, this is great. On stampede I can access the compilers 
in the devel queue so this works like a charm.

Would you have any hints for improving parallel scalabilty? Right now 
things start to level out at 8 cores (2d shallow water with p2-p1 elems, 
76k triangles).

Also, is there another tool for timings than PYOP2_PRINT_SUMMARY?

Cheers,

Tuomas
...
Hi Tuomas,
comments in line below.
On 7 Aug 2014, at 07:31, Tuomas Karna <tuomas.karna at gmail.com  <https://mailman.ic.ac.uk/mailman/listinfo/firedrake>> wrote:
...
/  Hi all,
/>/  
/>/  I'm running my code on TACC Stampede and I have a couple of questions:
/>/  
/>/  How can I setup Firedrake/PyOP2 for the target machine, for example to use Intel compilers, target architecture (sandybridge) instruction set, or Intel MKL libraries?
/
Note that for all this to work, you need to be able to launch the compiler on compute nodes (in particular for intel, the compute nodes will have to be able to talk to the license server), but I hope this all works.
Selection of the intel compiler is:
from firedrake import *
parameters["coffee"]["compiler"] = "intel"
This still launches 'mpicc' but uses intel specific flags when compiling.  If the compiler is actually called something different (e.g. on Cray systems it's always cc) then you need to do:
export CC=name_of_cc_compiler
# only do this if the linker is separate from the compiler
export LDSHARED=name_of_linker
and then:
from firedrake import *
parameters["coffee"]["compiler"] = "intel"
In addition, you probably want to select the AVX instruction set:
parameters["coffee"]["simd_isa"] = "avx"
Finally, you can select various code transformations to be applied to the generated kernels to increase their performance (seehttp://arxiv.org/abs/1407.0904  for many details).  In particular you may wish to try:
# loop invariant code motion
parameters["coffee"]["licm"] = True
# Data alignment (to give compilers a better chance at vectorising)
parameters["coffee"]["ap"] = True
# Request the compiler try harder to auto-vectorise
from pyop2.coffee.ast_plan import AUTOVECT
parameters["coffee"]["vect"] = (AUTOVECT, -1)
Our experience is that for most forms, the big gains come from loop invariant code motion and data alignment, but Fabio can comment in much more detail on this.
Finally, usage of MKL, by default, none of the code Firedrake/PyOP2 generate uses BLAS at all, so if you want to exploit BLAS in the solver library, you just needed to compile PETSc appropriately.  You can apply code transformations to the kernels to convert them into BLAS calls, but for low order, it may not be much of a win (since we do not amortise the call-overhead across elements), but you can try it:
# BLAS-transformations aren't exposed through the parameters dict so:
from firedrake import *
op2.configuration["blas"] = "mkl"
...
You can also, if it's available, try Eigen instead:
op2.configuration["blas"] = "eigen"
...
/  Until now I've only used MPI. What do I need to do to run with openmp or cuda for instance? I've got the PyOP2 dependencies in place, as they are listed on the Firedrake website.
/
Running with openmp requires that you've built PETSc "-with-threadcomm --with-openmp --with-pthreadclasses".  You can then either do:
export PYOP2_BACKEND=openmp
# Or however your batch system wants you to do this.
export OMP_NUM_THREADS=...
mpiexec ... python script.py
or:
from firedrake import *
op2.init(backend='openmp')
...
Note that our current experience is that the OpenMP backend does not offer many (if any) performance improvements over just MPI mode, if you're solving big elliptic problems using AMG, you may wish to use it due to decreased memory pressure.
CUDA only works for a subset of firedrake functionality.  In particular, the following currently don't work:
- Solving mixed systems
- Nonlinear solves
- Extruded meshes
- Solving linear systems with CUDA + MPI
Furthermore, for moderately complicated forms, our current strategy for parallelisation on GPUs is far from optimal (to the point where the compiler can often fail), work is starting in this area, but it's only in early stages.
Florian has been doing some recent benchmarking in this area, so perhaps he will comment in more detail.
...
/  Also, does anyone have experience on using Intel MIC coprocessors with Firedrake?
/
I believe Fabio has done some work in this region, but I'm not sure if he did it running the full toolchain, perhaps he can comment.
Cheers,
Lawrence

Re: [firedrake] Firedrake on supercomputers

Tuomas Karna

Lawrence Mitchell

Tuomas Karna

Lawrence Mitchell

Florian Rathgeber

Tuomas Karna

tags

participants (3)