You probably want something like the following in your job submission
script:
PYOP2_CACHE_DIR=$WORK/pyop2-cache
export PYOP2_CACHE_DIR
FIREDRAKE_FFC_KERNEL_CACHE_DIR=$WORK/firedrake-ffc-cache
export FIREDRAKE_FFC_KERNEL_CACHE_DIR
where $WORK is visible from all nodes. Otherwise I think the default
location for these folders is /tmp, which is probably local to each node.
Your issue 2) seems to be related to using firedrake with prefork; Lawrence
can say more here. I think the 'solution' is to revert some Firedrake
commits :-\
On 3 October 2015 at 10:21, Justin Chang <jychang48(a)gmail.com> wrote:
> Hi all,
>
> In another thread I had a long running issue with firedrake on an HPC
> machine. I *think* I found part of the problem.
>
> Everytime I submit a job script to a compute node, it has to (re)compile
> all the FFC/OP2 forms. What I did to circumvent this problem was modify the
> job script so that I run my firedrake program twice: once with a very small
> mesh (to compile all the necessities) and again to simulate my actual
> finite element problem. In the first run, I compiled the code on a single
> MPI processes, so if the subsequent run is performed on that same node, I
> have no issues. However, if it has to run on two different compute nodes,
> my program freezes because I suspect that the ranks on the original node
> has the cache/compiled forms whereas the other node does not, hence my
> program hanging.
>
> All that said, I have two questions:
>
> 1) If I already have the compiled/cached forms of my firedrake program,
> how do I make it so that when I submit a job, the new compute node does not
> need to recompile my program?
>
> 2) Attached is an output summary of what happens when the both runs are
> across two compute nodes (#MSUB -l nodes=2:ppn=1). Can you guys dissect
> what's going on from this?
>
> Thanks,
> Justin
>