Re: [firedrake] Firedrake paper results

12 Oct 2014

      On 11/10/14 15:16, Patrick Farrell wrote:
...
On 11/10/14 11:27, Florian Rathgeber wrote:
...
I'd still be interested in feedback on this!
On 04/10/14 09:12, Florian Rathgeber wrote:
...
Dear all,
I finished a first draft of the results section for the Firedrake paper.
Any feedback gratefully received!
If you don't have access to the repository you can get a PDF from
https://wwwhomes.doc.ic.ac.uk/~fr710/paper.pdf
Hi Florian,
Thanks for the reminder, I missed this the first time. Comments follow
in no
particular order.
Many thanks for those very helpful comments! Some remarks inline.
...
I'm really impressed (and amazed) that in the Cahn--Hilliard example,
firedrake's assembly is two orders of magnitude faster! That's
remarkable.  Any
idea why that's the case? I assume the DOLFIN runs used the same CFLAGS
to the
compiler etc? (I didn't see that mentioned anywhere, although I may have
missed
it) I'm looking forward to reading about why that's the case in the
as-yet-unwritten section.
Generated code for DOLFIN is compiled with -O3 -ffast-math -march=native
(as recommended by Marie), for Firedrake with -O3 -fno-tree-vectorize. I
have now mentioned those. I'm using the DOLFIN build maintained by Chris
and yourself (fenics/dev), so I'm assuming this is highly optimised. I'd
be interested to know those flags.

The performance difference afaict is due a combination of
1) splitting the mixed forms
2) caching of ParLoop objects on the residual/Jacobian forms
3) lower execution overhead and inlining of PyOP2 kernels
...
I'd prefer you didn't mention it as a "dolfin-adjoint application",
because the
O-K solver doesn't really have anything to do with d-a (it's just a
repository I
store random solvers in). Maybe an acknowledgement at the end for the
preconditioner setup or implementation instead of footnote 5?
You should cite Jessica Bosch's and Andy Wathen's paper on the C-H
preconditioner:
@article{bosch2014,
author = {Bosch, J. and Kay, D. and Stoll, M. and Wathen, A.},
title = {Fast solvers for {Cahn--Hilliard} inpainting},
journal = {SIAM Journal on Imaging Sciences},
volume = {7},
number = {1},
pages = {67--97},
year = {2014},
doi = {10.1137/130921842},
}
I think it would be clearer to write something like
"""
The inverse Schur complement, S^-1, is approximated by
\begin{equation}
S^-1 \approx \hat{S}^-1 = H^-1 M H^-1,
\end{equation}
where H and M are ...
"""
rather than the paragraph after (26-27), which is unnecessarily verbose.
Thanks, I have incorporated your suggestions.
...
Are there actual solves with H^-1 done, or does it just use one AMG
V-cycle? (My
experience with the O-K solver is that you're much better off doing the
latter,
but you all know what you're doing).
It is just using one AMG V-cycle. My experience was the same as yours,
anything more than one V-cycle slows things down considerably.
...
I'm surprised that MATNEST doesn't make as much difference, I thought it
would
do more. It would be nice to see the memory usage too: I'm guessing
that's where
MATNEST would make a bigger difference. At scale (to billions of DOFs) I
only
run with 2/24 cores per node because of memory limitations, probably
because of
all the damn copies.
Lawrence has already commented on this.
...
In the graphs, it would be nice to have a "total runtime" to compare
dolfin and
firedrake from a user's perspective, as well as the breakdown into
assembly and
solve etc.
I'm not convinced the total runtime is a very useful comparison at the
moment due to very different implementation of the mesh generator and
the well-known performance issues with DMPlex which would distort those
timings. A breakdown into assembly and solve could be useful.
...
Speaking of the graphs, is there a reason for the choice of
cyan-magenta-brown?
I'd imagine there are colour combinations that would be easier to read.
Maybe
the 538 style
(http://matplotlib.org/examples/style_sheets/plot_fivethirtyeight.html)?
Does that cause difficulties for the daltonists among us?
This is the default colour cycle of the "Set2" colour map from
ColorBrewer. I'll try the style you suggest, thanks.
...
Do you guys run into problems with starting the Python interpreter on many
cores? Chris Richardson's been doing some work on that, and has had some
partial
success with zipping the files the Python interpreter loads; if you've
solved
this problem, or ran into it, it would be good to mention it in the paper.
We have not, as Lawrence mentioned.

Florian
...
Cheerio,
Patrick