Hi Tuomas Comments from Lawrence are correct. For what concerns optimizing assembly through COFFEE (which is a tool integrated with PyOP2 that does the optimization of local element matrices/vectors evaluations) it is actually true that in general: parameters["coffee"]["licm"] = True parameters["coffee"]["ap"] = True are the key parameters. However, if your function spaces have relatively high polynomial order *or *if your form has a lot of coefficients, then you may be interested in other optimizations as well. Please, let me know if this is the case, in which case I'll try to be more precise. Said that, our (short-term) goal is to let you (users) abstract completely from playing with these sort of "low level parameters". Just to let you know, we are currently working on an autotuning system that, once in place, should allow you to get significant better run-times while avoiding the need for setting/trying manually individual optimizations. We are not far from that, so we'll keep you posted (should take order of days). As for the MIC, we have some performance numbers concerning assembly "in isolation", but we (I) have never brought the whole toolchain on the MIC - that is, we have just used it as an accelerator. In particular, code is *not* specifically optimized for these kind of architectures, especially if you are running at low polynomial order (in practice, we are not taking full advantage of the large vector lanes). As for MKL in assembly kernels, I'm running experiments in these days to quantify how much faster can we go by transforming assembly code into a sequence of BLAS calls. Problem is that at low polynomial order the involved matrices are kind of small. What I have seen so far (but these are early experiments) is that if you are (I'm being vague here, but just to give you an idea) using like polynomial order 3 or 4 and if you are form as some coefficients in it, then turning to BLAS may be a (big) win. I hope I'll be able to report more (convincing) results, and to be more precise, in the next few days. Hope this helps -- Fabio 2014-08-07 7:31 GMT+01:00 Tuomas Karna <tuomas.karna@gmail.com>:
Hi all,
I'm running my code on TACC Stampede and I have a couple of questions:
How can I setup Firedrake/PyOP2 for the target machine, for example to use Intel compilers, target architecture (sandybridge) instruction set, or Intel MKL libraries?
Until now I've only used MPI. What do I need to do to run with openmp or cuda for instance? I've got the PyOP2 dependencies in place, as they are listed on the Firedrake website.
Also, does anyone have experience on using Intel MIC coprocessors with Firedrake?
Cheers,
Tuomas
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
Hi Fabio, Thanks for the info, very useful. On 08/07/2014 01:52 AM, Fabio Luporini wrote:
Hi Tuomas
Comments from Lawrence are correct.
For what concerns optimizing assembly through COFFEE (which is a tool integrated with PyOP2 that does the optimization of local element matrices/vectors evaluations) it is actually true that in general: parameters["coffee"]["licm"] = True parameters["coffee"]["ap"] = True are the key parameters. However, if your function spaces have relatively high polynomial order *or *if your form has a lot of coefficients, then you may be interested in other optimizations as well. Please, let me know if this is the case, in which case I'll try to be more precise.
I'm using low order elements so further optimization might not be relevant, I guess it depends on what you mean by "lots of coefficients". I'm running 2d shallow water model with ~all the terms so there is some complexity there.
Said that, our (short-term) goal is to let you (users) abstract completely from playing with these sort of "low level parameters". Just to let you know, we are currently working on an autotuning system that, once in place, should allow you to get significant better run-times while avoiding the need for setting/trying manually individual optimizations. We are not far from that, so we'll keep you posted (should take order of days).
Sounds great!
As for the MIC, we have some performance numbers concerning assembly "in isolation", but we (I) have never brought the whole toolchain on the MIC - that is, we have just used it as an accelerator. In particular, code is *not* specifically optimized for these kind of architectures, especially if you are running at low polynomial order (in practice, we are not taking full advantage of the large vector lanes).
As for MKL in assembly kernels, I'm running experiments in these days to quantify how much faster can we go by transforming assembly code into a sequence of BLAS calls. Problem is that at low polynomial order the involved matrices are kind of small. What I have seen so far (but these are early experiments) is that if you are (I'm being vague here, but just to give you an idea) using like polynomial order 3 or 4 and if you are form as some coefficients in it, then turning to BLAS may be a (big) win. I hope I'll be able to report more (convincing) results, and to be more precise, in the next few days. OK, that makes sense.
Cheers, Tuomas
Hope this helps
-- Fabio
2014-08-07 7:31 GMT+01:00 Tuomas Karna <tuomas.karna@gmail.com <mailto:tuomas.karna@gmail.com>>:
Hi all,
I'm running my code on TACC Stampede and I have a couple of questions:
How can I setup Firedrake/PyOP2 for the target machine, for example to use Intel compilers, target architecture (sandybridge) instruction set, or Intel MKL libraries?
Until now I've only used MPI. What do I need to do to run with openmp or cuda for instance? I've got the PyOP2 dependencies in place, as they are listed on the Firedrake website.
Also, does anyone have experience on using Intel MIC coprocessors with Firedrake?
Cheers,
Tuomas
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk <mailto:firedrake@imperial.ac.uk> https://mailman.ic.ac.uk/mailman/listinfo/firedrake
participants (2)
- 
                
                Fabio Luporini
- 
                
                Tuomas Karna