Re: [firedrake] PDESoft 2014 slides

12 Jul 2014


      On 12/07/14 08:06, David Ham wrote:
...
Those look like interesting results.
Do we have any idea why we are slow on CUDA on the RHS?
The reason is that afaict the kernel uses too many resources: 57
registers and 28.047K of shared memory. We therefore get a theoretical
occupancy of 6.25% i.e. only 1/16 SMX units on the 680 can be used. That
is up to 64 DP FMAs at half the clock speed of a Xeon core...
...
Do we have any indication of actual speed compared with peak flops or
bandwidth?
I haven't been able to figure out how to drive the Nvidia profiler to
record the required metrics, but we should be able to get those somehow.

Florian
...
Regards,
David
On Friday, July 11, 2014, Rathgeber, Florian
<f.rathgeber10@imperial.ac.uk <mailto:f.rathgeber10@imperial.ac.uk>> wrote:
I have now added performance results for advection assembly (matrix +
    RHS). We can still claim (performance) portability to some degree across
    sequential, OpenMP and CUDA.
On 10/07/14 11:23, David Ham wrote:
    > I'm concerned that there are no performance results at all. Do we not
    > even have CPU results?
    >
    > On Wednesday, July 9, 2014, Rathgeber, Florian
    > <f.rathgeber10@imperial.ac.uk <javascript:;>
    <mailto:f.rathgeber10@imperial.ac.uk <javascript:;>>> wrote:
    >
    >     Draft slides for my 15min PDESoft talk on PyOP2 next week are at
    >     http://kynan.github.io/pdesoft2014
    >
    >     Any comments and suggestions much appreciated.
    >
    >     Florian

Re: [firedrake] PDESoft 2014 slides

Florian Rathgeber