On 11/10/14 15:16, Patrick Farrell wrote:
On 11/10/14 11:27, Florian Rathgeber wrote:
I'd still be interested in feedback on this!
On 04/10/14 09:12, Florian Rathgeber wrote:
Dear all,
I finished a first draft of the results section for the Firedrake paper. Any feedback gratefully received!
If you don't have access to the repository you can get a PDF from
Hi Florian,
Thanks for the reminder, I missed this the first time. Comments follow in no particular order.
Many thanks for those very helpful comments! Some remarks inline.
I'm really impressed (and amazed) that in the Cahn--Hilliard example, firedrake's assembly is two orders of magnitude faster! That's remarkable. Any idea why that's the case? I assume the DOLFIN runs used the same CFLAGS to the compiler etc? (I didn't see that mentioned anywhere, although I may have missed it) I'm looking forward to reading about why that's the case in the as-yet-unwritten section.
Generated code for DOLFIN is compiled with -O3 -ffast-math -march=native (as recommended by Marie), for Firedrake with -O3 -fno-tree-vectorize. I have now mentioned those. I'm using the DOLFIN build maintained by Chris and yourself (fenics/dev), so I'm assuming this is highly optimised. I'd be interested to know those flags. The performance difference afaict is due a combination of 1) splitting the mixed forms 2) caching of ParLoop objects on the residual/Jacobian forms 3) lower execution overhead and inlining of PyOP2 kernels
I'd prefer you didn't mention it as a "dolfin-adjoint application", because the O-K solver doesn't really have anything to do with d-a (it's just a repository I store random solvers in). Maybe an acknowledgement at the end for the preconditioner setup or implementation instead of footnote 5?
You should cite Jessica Bosch's and Andy Wathen's paper on the C-H preconditioner:
@article{bosch2014, author = {Bosch, J. and Kay, D. and Stoll, M. and Wathen, A.}, title = {Fast solvers for {Cahn--Hilliard} inpainting}, journal = {SIAM Journal on Imaging Sciences}, volume = {7}, number = {1}, pages = {67--97}, year = {2014}, doi = {10.1137/130921842}, }
I think it would be clearer to write something like
""" The inverse Schur complement, S^-1, is approximated by \begin{equation} S^-1 \approx \hat{S}^-1 = H^-1 M H^-1, \end{equation} where H and M are ... """
rather than the paragraph after (26-27), which is unnecessarily verbose.
Thanks, I have incorporated your suggestions.
Are there actual solves with H^-1 done, or does it just use one AMG V-cycle? (My experience with the O-K solver is that you're much better off doing the latter, but you all know what you're doing).
It is just using one AMG V-cycle. My experience was the same as yours, anything more than one V-cycle slows things down considerably.
I'm surprised that MATNEST doesn't make as much difference, I thought it would do more. It would be nice to see the memory usage too: I'm guessing that's where MATNEST would make a bigger difference. At scale (to billions of DOFs) I only run with 2/24 cores per node because of memory limitations, probably because of all the damn copies.
Lawrence has already commented on this.
In the graphs, it would be nice to have a "total runtime" to compare dolfin and firedrake from a user's perspective, as well as the breakdown into assembly and solve etc.
I'm not convinced the total runtime is a very useful comparison at the moment due to very different implementation of the mesh generator and the well-known performance issues with DMPlex which would distort those timings. A breakdown into assembly and solve could be useful.
Speaking of the graphs, is there a reason for the choice of cyan-magenta-brown? I'd imagine there are colour combinations that would be easier to read. Maybe the 538 style (http://matplotlib.org/examples/style_sheets/plot_fivethirtyeight.html)? Does that cause difficulties for the daltonists among us?
This is the default colour cycle of the "Set2" colour map from ColorBrewer. I'll try the style you suggest, thanks.
Do you guys run into problems with starting the Python interpreter on many cores? Chris Richardson's been doing some work on that, and has had some partial success with zipping the files the Python interpreter loads; if you've solved this problem, or ran into it, it would be good to mention it in the paper.
We have not, as Lawrence mentioned. Florian
Cheerio,
Patrick