-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15/07/15 21:14, Justin Chang wrote:
First option works wonderfully for me, but now I am wondering how I would employ the second option.
Specifically, I want to profile SNESSolve()
OK, so calls out to PETSc are done from Python (via petsc4py). It's just calls to integral assembly (i.e. evaluation of jacobians and residuals) that go through a generated code path. To be more concrete, let's say you have the following code: F = some_residual problem = NonlinearVariationalProblem(F, u, ...) solver = NonlinearVariationalSolver(problem) solver.solve() Then the call chain inside solver.solve is effectively: solver.solve -> SNESSolve -> # via petsc4py SNESComputeJacobian -> assemble(Jacobian) # Callback to Firedrake SNESComputeFunction -> assemble(residual) # Callback to Firedrake KSPSolve So if you wrapped flop counting around the outermost solver.solve() call, you're pretty close to wrapping SNESSolve. Or do you mean something else when profiling SNESSolve?
I would prefer to circumvent profiling of the DMPlex distribution because it seems that is a major bottleneck for multiple processes at the moment.
Can you provide an example mesh/process count that demonstrates this issue, or at least characterize it a little better? Michael Lange and Matt Knepley have done a lot of work on making DMPlexDistribute much faster than it was over the last 9 months or so. So if it turns out still to be slow, we'd really like to know about it and try and fix it. Cheers, Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVp29aAAoJECOc1kQ8PEYv+xYIAKMWLy2Go1WXjxKAj9+RvbHs s26Dr/nJufqgC9GxArKRM0g/iJXD9sTnJckSQQQA1wHZzuVigr+ZFyHkN6HeNkbM HILg5Mu7SYWvAwQOo18G3y6e8c7WFryJU7eNcEcfMqgZqQnfQ0JrV5iIshgM36mx aP6VN7PfmJgy0CxQ/QuYyemt+U/9qvMAMSqfWNd5xRABTFw+dLcaj/h2T6u8EKxA JCbhr3WTpeVsKygdDl01ZkGXjG7xd0tYRq9Y0AoZ7K9fUQlAYcAAPhfjlSz9ABZe ZHWgJi724uzcnbAxtnY78TDqD0eHFFfRetEwd5Bn2G8uAssXZYzOg+DO49ETjn8= =eDM8 -----END PGP SIGNATURE-----