12 Nov
2015
12 Nov
'15
2:46 p.m.
On 12/11/15 14:45, Eike Mueller wrote:
Hi Lawrence,
ok, problem solved. If I use the in-place Thomas algorithm for the lowest order tridiagonal system instead of LAPACKs LU solver routines I get excellent memory throughput (average 3.4GB/s per core, so about peak for the full node).
The time per iteration drops significantly from 0.44s to 0.24s (compared to 0.35s for the PETSc solver with hypre preconditioner), so this was really a change worth implementing!
Ah, nice! Thanks, Lawrence