Hi Paul, I seem to be missing some context for this discussion, or did I just miss an email? Or, more likely, a group meeting! Anyway, what's the story? Cjc On 21 Jun 2014 16:15, "Kelly, Paul H J" <p.kelly@imperial.ac.uk> wrote:
Further to various discussions of reproducibility, I thought you might be interested in this paper:
http://hal.archives-ouvertes.fr/docs/00/94/93/55/PDF/superaccumulator.pdf
Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures
Abstract. On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, especially reductions, may become non-deterministic and thus non-reproducible mainly due to non-associativity of floating-point operations. We introduce a solution to compute deterministic sums of floating-point numbers efficiently and with the best possible accuracy. Our multi-level algorithm consists of two main stages: a filtering stage that uses fast vectorized floating-point expansions; an accumulation stage based on superaccumulators in a high-radix carry-save representation. We present implementations on recent Intel desktop and server processors, on Intel Xeon Phi accelerator, and on AMD and NVIDIA GPUs. *We show that the numerical reproducibility* *and bit-perfect accuracy can be achieved at no additional cost *for large sums that have dynamic ranges of up to 90 orders of magnitude by leveraging arithmetic units that are left underused by standard reduction algorithms.
Paul