On 21 Jun 2014, at 19:43, Colin Cotter wrote: Hi Paul, I seem to be missing some context for this discussion, or did I just miss an email? Or, more likely, a group meeting! Anyway, what's the story? Cjc Hi Colin I just thought you might be interested - it appears to show that we should be able to get bit-accurate summations in parallel, at low cost. So, interpreting optimistically (possibly prematurely), it means that it might be possible to make PyOP2 fully deterministic by default. In contrast, at present in PyOP2 the precise association of floating-point adds may vary due to thread to thread races, or when the mesh is recoloured or repartitioned. I think the paper's claim applies to global reductions; I'm not sure it scales nicely to addtos, though there might be other solutions for that. Paul On 21 Jun 2014 16:15, "Kelly, Paul H J" <p.kelly@imperial.ac.uk<mailto:p.kelly@imperial.ac.uk>> wrote: Further to various discussions of reproducibility, I thought you might be interested in this paper: http://hal.archives-ouvertes.fr/docs/00/94/93/55/PDF/superaccumulator.pdf Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures Abstract. On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, especially reductions, may become non-deterministic and thus non-reproducible mainly due to non-associativity of floating-point operations. We introduce a solution to compute deterministic sums of floating-point numbers efficiently and with the best possible accuracy. Our multi-level algorithm consists of two main stages: a filtering stage that uses fast vectorized floating-point expansions; an accumulation stage based on superaccumulators in a high-radix carry-save representation. We present implementations on recent Intel desktop and server processors, on Intel Xeon Phi accelerator, and on AMD and NVIDIA GPUs. We show that the numerical reproducibility and bit-perfect accuracy can be achieved at no additional cost for large sums that have dynamic ranges of up to 90 orders of magnitude by leveraging arithmetic units that are left underused by standard reduction algorithms. Paul _______________________________________________ firedrake mailing list firedrake@imperial.ac.uk<mailto:firedrake@imperial.ac.uk> https://mailman.ic.ac.uk/mailman/listinfo/firedrake