I'm using low order elements so further optimization might not be relevant, I guess it depends on what you mean by "lots of coefficients". I'm running 2d shallow water model with ~all the terms so there is some complexity there.However, if your function spaces have relatively high polynomial order or if your form has a lot of coefficients, then you may be interested in other optimizations as well. Please, let me know if this is the case, in which case I'll try to be more precise.are the key parameters.For what concerns optimizing assembly through COFFEE (which is a tool integrated with PyOP2 that does the optimization of local element matrices/vectors evaluations) it is actually true that in general:Hi TuomasComments from Lawrence are correct.
parameters["coffee"]["licm"] = True
parameters["coffee"]["ap"] = True
Sounds great!Said that, our (short-term) goal is to let you (users) abstract completely from playing with these sort of "low level parameters". Just to let you know, we are currently working on an autotuning system that, once in place, should allow you to get significant better run-times while avoiding the need for setting/trying manually individual optimizations. We are not far from that, so we'll keep you posted (should take order of days).
OK, that makes sense.As for the MIC, we have some performance numbers concerning assembly "in isolation", but we (I) have never brought the whole toolchain on the MIC - that is, we have just used it as an accelerator. In particular, code is *not* specifically optimized for these kind of architectures, especially if you are running at low polynomial order (in practice, we are not taking full advantage of the large vector lanes).As for MKL in assembly kernels, I'm running experiments in these days to quantify how much faster can we go by transforming assembly code into a sequence of BLAS calls. Problem is that at low polynomial order the involved matrices are kind of small. What I have seen so far (but these are early experiments) is that if you are (I'm being vague here, but just to give you an idea) using like polynomial order 3 or 4 and if you are form as some coefficients in it, then turning to BLAS may be a (big) win. I hope I'll be able to report more (convincing) results, and to be more precise, in the next few days.
-- FabioHope this helps
2014-08-07 7:31 GMT+01:00 Tuomas Karna <tuomas.karna@gmail.com>:
Hi all,
I'm running my code on TACC Stampede and I have a couple of questions:
How can I setup Firedrake/PyOP2 for the target machine, for example to
use Intel compilers, target architecture (sandybridge) instruction set,
or Intel MKL libraries?
Until now I've only used MPI. What do I need to do to run with openmp or
cuda for instance? I've got the PyOP2 dependencies in place, as they are
listed on the Firedrake website.
Also, does anyone have experience on using Intel MIC coprocessors with
Firedrake?
Cheers,
Tuomas
_______________________________________________
firedrake mailing list
firedrake@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/firedrake