Submitting a firedrake job script

3 Oct 2015

      Hi all,

In another thread I had a long running issue with firedrake on an HPC
machine. I *think* I found part of the problem.

Everytime I submit a job script to a compute node, it has to (re)compile
all the FFC/OP2 forms. What I did to circumvent this problem was modify the
job script so that I run my firedrake program twice: once with a very small
mesh (to compile all the necessities) and again to simulate my actual
finite element problem. In the first run, I compiled the code on a single
MPI processes, so if the subsequent run is performed on that same node, I
have no issues. However, if it has to run on two different compute nodes,
my program freezes because I suspect that the ranks on the original node
has the cache/compiled forms whereas the other node does not, hence my
program hanging.

All that said, I have two questions:

1) If I already have the compiled/cached forms of my firedrake program, how
do I make it so that when I submit a job, the new compute node does not
need to recompile my program?

2) Attached is an output summary of what happens when the both runs are
across two compute nodes (#MSUB -l nodes=2:ppn=1). Can you guys dissect
what's going on from this?

Thanks,
Justin

Justin Chang

tags

participants (1)