Re: [firedrake] Submitting a firedrake job script
You probably want something like the following in your job submission script: PYOP2_CACHE_DIR=$WORK/pyop2-cache export PYOP2_CACHE_DIR FIREDRAKE_FFC_KERNEL_CACHE_DIR=$WORK/firedrake-ffc-cache export FIREDRAKE_FFC_KERNEL_CACHE_DIR where $WORK is visible from all nodes. Otherwise I think the default location for these folders is /tmp, which is probably local to each node. Your issue 2) seems to be related to using firedrake with prefork; Lawrence can say more here. I think the 'solution' is to revert some Firedrake commits :-\ On 3 October 2015 at 10:21, Justin Chang <jychang48@gmail.com> wrote:
Hi all,
In another thread I had a long running issue with firedrake on an HPC machine. I *think* I found part of the problem.
Everytime I submit a job script to a compute node, it has to (re)compile all the FFC/OP2 forms. What I did to circumvent this problem was modify the job script so that I run my firedrake program twice: once with a very small mesh (to compile all the necessities) and again to simulate my actual finite element problem. In the first run, I compiled the code on a single MPI processes, so if the subsequent run is performed on that same node, I have no issues. However, if it has to run on two different compute nodes, my program freezes because I suspect that the ranks on the original node has the cache/compiled forms whereas the other node does not, hence my program hanging.
All that said, I have two questions:
1) If I already have the compiled/cached forms of my firedrake program, how do I make it so that when I submit a job, the new compute node does not need to recompile my program?
2) Attached is an output summary of what happens when the both runs are across two compute nodes (#MSUB -l nodes=2:ppn=1). Can you guys dissect what's going on from this?
Thanks, Justin
Ah yes that's exactly what I was missing: those environment variables. Is there one for firedrake-mesh-cache-* ? Because export FIREDRAKE_MESH_CACHE_DIR=... did not work and resorted back to /tmp The issue 2) remains, though it's not as much of a concern now if i "preprocess" the cache on a single compute node On Sat, Oct 3, 2015 at 4:13 AM, Andrew McRae <A.T.T.McRae@bath.ac.uk> wrote:
You probably want something like the following in your job submission script:
PYOP2_CACHE_DIR=$WORK/pyop2-cache export PYOP2_CACHE_DIR
FIREDRAKE_FFC_KERNEL_CACHE_DIR=$WORK/firedrake-ffc-cache export FIREDRAKE_FFC_KERNEL_CACHE_DIR
where $WORK is visible from all nodes. Otherwise I think the default location for these folders is /tmp, which is probably local to each node.
Your issue 2) seems to be related to using firedrake with prefork; Lawrence can say more here. I think the 'solution' is to revert some Firedrake commits :-\
On 3 October 2015 at 10:21, Justin Chang <jychang48@gmail.com> wrote:
Hi all,
In another thread I had a long running issue with firedrake on an HPC machine. I *think* I found part of the problem.
Everytime I submit a job script to a compute node, it has to (re)compile all the FFC/OP2 forms. What I did to circumvent this problem was modify the job script so that I run my firedrake program twice: once with a very small mesh (to compile all the necessities) and again to simulate my actual finite element problem. In the first run, I compiled the code on a single MPI processes, so if the subsequent run is performed on that same node, I have no issues. However, if it has to run on two different compute nodes, my program freezes because I suspect that the ranks on the original node has the cache/compiled forms whereas the other node does not, hence my program hanging.
All that said, I have two questions:
1) If I already have the compiled/cached forms of my firedrake program, how do I make it so that when I submit a job, the new compute node does not need to recompile my program?
2) Attached is an output summary of what happens when the both runs are across two compute nodes (#MSUB -l nodes=2:ppn=1). Can you guys dissect what's going on from this?
Thanks, Justin
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
participants (2)
- 
                
                Andrew McRae
- 
                
                Justin Chang