Re: [firedrake] reproducible parallel summation

21 Jun 2014


      Hi Paul, I seem to be missing some context for this discussion, or did I
just miss an email? Or, more likely, a group meeting! Anyway, what's the
story?
Cjc
On 21 Jun 2014 16:15, "Kelly, Paul H J" <p.kelly@imperial.ac.uk> wrote:
...
Further to various discussions of reproducibility, I thought you might be
interested in this paper:
http://hal.archives-ouvertes.fr/docs/00/94/93/55/PDF/superaccumulator.pdf
Full-Speed Deterministic Bit-Accurate
Parallel Floating-Point Summation
on Multi- and Many-Core Architectures
Abstract. On modern multi-core, many-core, and heterogeneous
architectures, ﬂoating-point computations,
especially reductions, may become non-deterministic and thus
non-reproducible mainly due to non-associativity
of ﬂoating-point operations. We introduce a solution to compute
deterministic sums of ﬂoating-point numbers
efﬁciently and with the best possible accuracy. Our multi-level algorithm
consists of two main stages: a ﬁltering
stage that uses fast vectorized ﬂoating-point expansions; an accumulation
stage based on superaccumulators in a
high-radix carry-save representation. We present implementations on recent
Intel desktop and server processors,
on Intel Xeon Phi accelerator, and on AMD and NVIDIA GPUs. *We show that
the numerical reproducibility*
*and bit-perfect accuracy can be achieved at no additional cost *for
large sums that have dynamic ranges of up to
90 orders of magnitude by leveraging arithmetic units that are left
underused by standard reduction algorithms.
Paul

Re: [firedrake] reproducible parallel summation

Colin Cotter