Projection on DG0 space fails on large core counts
Dear firedrakers, I now re-ran my code on up to 1536 cores on ARCHER, but I get a problem when I try to project an expression onto a DG0 function space on an extruded grid. The full (very large) log is here https://gist.github.com/eikehmueller/83a5fc139e1fedb5306c but as far as I can tell the following crashes: r_p.project(expression,solver_parameters={'ksp_type':'cg','pc_type':'jacobi’}) and here is the relevant part of the trace that I attempted to reconstruct: File "/work/n02/n02/eike/git_workspace/firedrake/firedrake/function.py", line 157, in project return projection.project(b, self, *args, **kwargs) File "/work/n02/n02/eike/git_workspace/firedrake/firedrake/projection.py", line 94, in project […] solving_utils.check_snes_convergence(self.snes) File "/work/n02/n02/eike/git_workspace/firedrake/firedrake/variational_solver.py", line 163, in solve File "/work/n02/n02/eike/git_workspace/PyOP2/pyop2/profiling.py", line 199, in wrapper %s""" % (snes.getIterationNumber(), msg)) File "/work/n02/n02/eike/git_workspace/firedrake/firedrake/solving_utils.py", line 62, in check_snes_convergence return f(*args, **kwargs) RuntimeError: Nonlinear solve failed to converge after 1 nonlinear iterations. It does work fine on smaller processor numbers. Maybe the PETSc integers overflow again, the number of cells is 5242880 x 64 = 335544320 ~ 2^{28}, which is not too far from 2^{32}, but I thought I check in case you’ve seen something similar before. I thought I had managed to run problems of this size in the past (i.e. earlier this year). Thanks, Eike
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 30/05/15 19:22, Eike Mueller wrote:
Dear firedrakers,
I now re-ran my code on up to 1536 cores on ARCHER, but I get a problem when I try to project an expression onto a DG0 function space on an extruded grid.
The full (very large) log is here https://gist.github.com/eikehmueller/83a5fc139e1fedb5306c but as far as I can tell
the following crashes:
r_p.project(expression,solver_parameters={'ksp_type':'cg','pc_type':'jacobi’})
and here is the relevant part of the trace that I attempted to reconstruct:
File "/work/n02/n02/eike/git_workspace/firedrake/firedrake/function.py", line 157, in project return projection.project(b, self, *args, **kwargs) File "/work/n02/n02/eike/git_workspace/firedrake/firedrake/projection.py",
line 94, in project
[…] solving_utils.check_snes_convergence(self.snes) File "/work/n02/n02/eike/git_workspace/firedrake/firedrake/variational_solver.py",
line 163, in solve
File "/work/n02/n02/eike/git_workspace/PyOP2/pyop2/profiling.py", line 199, in wrapper %s""" % (snes.getIterationNumber(), msg)) File "/work/n02/n02/eike/git_workspace/firedrake/firedrake/solving_utils.py",
line 62, in check_snes_convergence
return f(*args, **kwargs) RuntimeError: Nonlinear solve failed to converge after 1 nonlinear iterations.
It does work fine on smaller processor numbers. Maybe the PETSc integers overflow again, the number of cells is 5242880 x 64 = 335544320 ~ 2^{28}, which is not too far from 2^{32}, but I thought I check in case you’ve seen something similar before. I thought I had managed to run problems of this size in the past (i.e. earlier this year).
So the other potentially useful piece of information is that this solver failed to converge: "Inner linear solve failed to converge after 0 iterations with reason: DIVERGED_NANORINF" Which means that the initial residual that you were trying to project had a norm which was either NAN or INF. I.e. assemble(expression*DG0_test_function*dx) had a nan/inf. Does this help? Otherwise I'm pretty stumped. Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVbuSWAAoJECOc1kQ8PEYvuBMIAMkR9DN4AANk9AE9bU/zgoJA JKrRlPVrLWs5YlYjh5J0AmjjL0MQHZG0SbPlw9YdDSHybE4jXKm0MHiQAMxCN3El zeB85znyq5o3JaU8yaG/fp0I5jr1TyGc7Kpc9uk7roXklmFOGl4L6zTF0u7299+5 k7UCxaodRou2klT2K7CR3bj/11rItllWkg3Zu3JTSGpnC6Uoh8nVE1OH+ZVq+w81 Ujl05rAWjJaOlKOoWhaTF+6SK4nECtrOHH6HRujDZyOZR/qVGVxc9KHcyDX4aqLy QoM+0DLfjNH0/Fzf+HPj2ISA5wBT4TC5sXNJaAuM8SB6FJfVWaI/SA/1lTq5Gd0= =ISuP -----END PGP SIGNATURE-----
participants (2)
- 
                
                Eike Mueller
- 
                
                Lawrence Mitchell