Hi Eike,
I think, in general, when you want to disable COFFEE, probably the best place to do so is very soon after "from firedrake import *".
Or, alternatively, you may merge caffeine_withdrawal branch with whatever branch you are using, and use that version of Firedrake.
Regards,
Miklos
From: firedrake-bounces@imperial.ac.uk [firedrake-bounces@imperial.ac.uk] on behalf of Eike Mueller [e.mueller@bath.ac.uk]
Sent: 14 February 2015 10:53
To: firedrake
Subject: Re: [firedrake] Crash when running at higher order on ARCHER: resolved now
Dear firedrakers,
I finally got to the bottom of this. It turns out that I had set parameters[“COFFEE”][“O2”]= False around the parloop which executes the kernel, but not around the bit of code which actually compiles the UFL form. Stupid mistake… So this caused a horrible segfault, since the kernel expected data of size A[8][20], but it was passed A[6][18]. It was quite tricky to find this kind of bug, though, since it only segfaults without much information. I finally managed to run interactively on ARCHER, inspected the core dump with gdb and looked at the generated c-code. I was wondering whether this kind of issue can be detected when you generate the wrapper code? Don’t you know both the signature of the function and the passed data at this point?
Or has the COFFEE optimisation issue been resolved? I pulled the latest version of COFFEE, though.
Thanks,
Eike
--Dr Eike Hermann Mueller
Research Associate (PostDoc)
Department of Mathematical Sciences
University of Bath
Bath BA2 7AY, United Kingdom
+44 1225 38 5803
e.mueller@bath.ac.uk
http://people.bath.ac.uk/em459/
On 5 Feb 2015, at 16:28, Eike Mueller <E.Mueller@bath.ac.uk> wrote:
Thanks, I tried the atp and also inspected the core dump with
gdb python core
There is no backtrace in the core dump, and ATP does not generate any information either.
I still only get the segfault in my output file. I hope I can localise this a bit more tomorrow.
Eike
On 05/02/15 15:23, Patrick Farrell wrote:
On 05/02/15 14:37, Lawrence Mitchell wrote:
Number of cells on finest grid = 5120 dx = 364.458 km, dt =
2429.717 s _pmiu_daemon(SIGCHLD): [NID 01160] [c6-0c0s2n0] [Thu Feb
5 14:22:05 2015] PE RANK 11 exit signal Segmentation fault [NID
01160] 2015-02-05 14:22:05 Apid 12880356: initiated application
termination Application 12880356 exit codes: 139 Application
12880356 resources: utime ~31s, stime ~19s, Rss ~318352, inblocks
~104428, outblocks ~788 Finished atThu Feb 5 14:22:10 GMT 2015
Hmm, that's not a lot of useful information.
Try running again with
module load atp
export ATP_ENABLED=1
Sometimes it gives useful information about abnormal terminations;
http://www.archer.ac.uk/documentation/best-practice-guide/debug.php
Patrick
_______________________________________________
firedrake mailing list
firedrake@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/firedrake
_______________________________________________
firedrake mailing list
firedrake@imperial.ac.uk
https://mailman.ic.ac.uk/mailman/listinfo/firedrake