Hi Lawrence, thanks, I will go through my code and replace all constants by PyOP2 Constants. How can I check that this worked? Can I set PYOP2_DUMP_GENCODE=1, PYOP2_DUMP_GENCODE_PATH=./build, run with two different resolutions and then check if the second run generated any new files? Thanks, Eike On 02/02/15 11:57, Lawrence Mitchell wrote:
Hi Eike,
On 2 Feb 2015, at 09:09, Eike Mueller <E.Mueller@bath.ac.uk> wrote:
Hi Lawrence,
Below are the first weak scaling results from runs at lowest order on up to 96 cores on ARCHER. On 384 cores the code crashes with a PETSc error (segfault). This crash is already in the matrix-free solver (which, of course, uses a PETSc KSP).
Could this be an issue with the python module for launching the compilation/loading the kernels in PyOP2? However, on Friday I ran with PYOP2_NO_FORK_AVAILABLE=1, which I thought would fix this? If I run with PYOP2_NO_FORK_AVAILABLE=0, then it crashes with a different error because it can't compile a kernel.
This morning just repeated exactly the same 384 core run (PYOP2_NO_FORK_AVAILABLE=1 as before) and now it goes through without problems (i.e. it does both the matrix-free and the PETSc solve).
I observe something similar with the 1536 run: The first run crashed, in the subsequent runs the matrix-free solver completes but it crashes later in the run where it gets to the PETSc solve.
I then set PYOP2_DEBUG=1 in the 1536 core run, and again it fails because it can't compile code. The resulting .err file is empty. I then ran the compilation command in the .log file manually. It goes through, but with a warning, which I attach together with the output of the run.
My experience of runs on ARCHER is that our JIT-compilation can occasionally fail, especially on "lots" of cores. PYOP2_NO_FORK_AVAILABLE=1 helps a bit, but isn't perfect. TBH, I don't really have any ideas as to why this might be the case: as you observe, sometimes things work a little better.
If you can set up the problem such that a "small" run populates the code caches fully (so that when running large jobs you don't need to compile any modules) that seems to work best. This mostly involves replacing literal constants in forms/expressions with Constant(value). That way the same code is generated irrespective of the value (and you therefore don't need to recompile).
Lawrence