Hi all, So I have some personal computers which will give me roughly the same serial performance (in terms of wall-clock time) as a single compute node (via interactive mode e.g., salloc -N 1 -n 16) at one of LANL's machines (Intel Xeon E5 2670). However, once I submit my firedrake program as a job script (via MOAB) to said LANL machine, the performance becomes 10x faster! Certainly not complaining, but I find this unusual. I never encountered such a performance improvement if I were to submit a normal PETSc program. Does this have something to do with the internal framework of firedrake? I can provide more information as needed Thanks, Justin
Hi Justin,
On 9 Aug 2016, at 10:52, Justin Chang <jychang48@gmail.com> wrote:
Hi all,
So I have some personal computers which will give me roughly the same serial performance (in terms of wall-clock time) as a single compute node (via interactive mode e.g., salloc -N 1 -n 16) at one of LANL's machines (Intel Xeon E5 2670).
However, once I submit my firedrake program as a job script (via MOAB) to said LANL machine, the performance becomes 10x faster! Certainly not complaining, but I find this unusual. I never encountered such a performance improvement if I were to submit a normal PETSc program.
Does this have something to do with the internal framework of firedrake? I can provide more information as needed
I can think of a few possible things that might be going on. But these are all somewhat speculation. 1. The compiler on your local machine is older than at LANL. In particular, LANL may provide gcc 5.x, which is vastly better at vectorizing code than gcc 4.x. This can make a real difference in speed if your problem is assembly (and in particular, flop) dominated. 2. The compute nodes have much better memory bandwidth than your desktop machine. As a corollary, the batch job scheduler may perform process pinning, whereas on your desktop (or in interactive mode) this may not be the case. 3. You're not actually computing anything? Firedrake's default "lazy" evaluation strategy can lead to some "faster than seemingly possible" timings if you use explicit methods (no solve calls) and never look at the results: we don't actually compute the answer! 4. "Magic fairy dust"? More seriously, can you provide the output of "-log_view" on both systems. Firedrake hooks up its internal evaluation routines with PETSc event timing, so you get flop counts and times for those. To separate the result into different parts you can use PETSc log stages (either directly view petsc4py or else): from pyop2.profiling import timed_stage with timed_stage("stage name"): # some code Cheers, Lawrence
Silly me, the "problem" is none of the above. I was accidentally running my programs with 16 MPI processes in my job script. Hence the 10x improvement Thanks though! On Tue, Aug 9, 2016 at 5:10 AM, Lawrence Mitchell < lawrence.mitchell@imperial.ac.uk> wrote:
Hi Justin,
On 9 Aug 2016, at 10:52, Justin Chang <jychang48@gmail.com> wrote:
Hi all,
So I have some personal computers which will give me roughly the same serial performance (in terms of wall-clock time) as a single compute node (via interactive mode e.g., salloc -N 1 -n 16) at one of LANL's machines (Intel Xeon E5 2670).
However, once I submit my firedrake program as a job script (via MOAB) to said LANL machine, the performance becomes 10x faster! Certainly not complaining, but I find this unusual. I never encountered such a performance improvement if I were to submit a normal PETSc program.
Does this have something to do with the internal framework of firedrake? I can provide more information as needed
I can think of a few possible things that might be going on. But these are all somewhat speculation.
1. The compiler on your local machine is older than at LANL.
In particular, LANL may provide gcc 5.x, which is vastly better at vectorizing code than gcc 4.x. This can make a real difference in speed if your problem is assembly (and in particular, flop) dominated.
2. The compute nodes have much better memory bandwidth than your desktop machine. As a corollary, the batch job scheduler may perform process pinning, whereas on your desktop (or in interactive mode) this may not be the case.
3. You're not actually computing anything?
Firedrake's default "lazy" evaluation strategy can lead to some "faster than seemingly possible" timings if you use explicit methods (no solve calls) and never look at the results: we don't actually compute the answer!
4. "Magic fairy dust"?
More seriously, can you provide the output of "-log_view" on both systems. Firedrake hooks up its internal evaluation routines with PETSc event timing, so you get flop counts and times for those. To separate the result into different parts you can use PETSc log stages (either directly view petsc4py or else):
from pyop2.profiling import timed_stage
with timed_stage("stage name"): # some code
Cheers,
Lawrence
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
participants (2)
- 
                
                Justin Chang
- 
                
                Lawrence Mitchell