Hi Spencer, hi Douglas, Indeed, using Collections (auto => SumFac) definitively helps: Total Computation Time = 95s (180s before) I noticed that the code runs single threaded, even though Boost_USE_MULTITHREADED is ON and OpenBlas is compiled as multi threaded. Any hints on that? Cheers, Helmut ________________________________ Von: Sherwin, Spencer J [s.sherwin@imperial.ac.uk] Gesendet: Mittwoch, 18. Mai 2016 10:25 An: Kühnelt Helmut Cc: nektar-users Betreff: Re: [Nektar-users] EulerCFE - performance issue & MPI problem HI Helmut, I am building with Release mode and nothing else in particular. I happen to have MPI turned on but I do not think that will make a difference. One thing that might be worth checking is the use of openblas. I have had trouble with openblas in the past for small matrix sizes. The compilation I used will have been using the default installation in /usr/lib/libblas.a. I guess Douglas’ test will have used the Framework of Mac OS X. However looking at the set up on the compute module I do seem to have openblas loaded (@Chris: Could you confirm what version of openblas is running on Victoria Other configurations are: library gcc 4.9.2 boost 1.58 - installed from a module Do you have another machine/laptop we could turn this test on? Cheers, Spencer. PS I had a time step of 1e-6 and attach my .xml file below