Re: [firedrake] parallel weirdness
Hi Lawrence, I updated the files with the exact settings I see the problem, so could you run test_linear.py again with the current HEAD of the branch "periodic_parallel" (267de84)? With this setting I got results that looked different at t = 600*dt in serial and parallel. I've attached the picture for your reference (left: serial, right: parallel(2 cores), serial run gives a symmetric result but parallel run results in slightly asymmetric). All the best, Hiroe 2014-11-19 11:10 GMT+00:00 Mitchell, Lawrence <lawrence.mitchell@imperial.ac.uk>:
On 18 Nov 2014, at 21:35, Hiroe Yamazaki <h.yamazaki@imperial.ac.uk> wrote:
Hi Lawrence,
you can reproduce the problem by runing test_linear.py at bitbucket.org/colinjcotter/slicemodels/branch/periodic_parallel. With 1 and 2 cores, it gives different results.
The code Colin is describing with line numbers is an old-version of slicemodels.py, which is found here:
https://gist.github.com/anonymous/c308131558ff54fec9c4
In this version I added this term
- (-dt*div(w)*pbar)*dx
in line 270, which was previously added to the RHS in line 139. With this change, the parallelization problem seemed to be fixed and I got the same results with 1 and 2 cores, but but it makes no sense to us.
I cannot reproduce this problem with the current HEAD of that branch (6ce1ffb02). The output vtus look the same in the eyeball norm, and once I fix the printing of the deltap (Colin, in parallel you can't look at the sum of the local entries. Do function.dat.norm to get the l2 norm or norm(function) for the L2 norm). In particular, if I specify jacobi PCs for /all/ solvers I get basically identical values in serial and parallel (up to solver tolerance)
So I don't really know where to look. Can you give a precise sequence of steps for me to follow to reproduce the problem and observe it?
Lawrence
On 19 Nov 2014, at 13:01, Hiroe Yamazaki <h.yamazaki@imperial.ac.uk> wrote:
Hi Lawrence,
I updated the files with the exact settings I see the problem, so could you run test_linear.py again with the current HEAD of the branch "periodic_parallel" (267de84)? With this setting I got results that looked different at t = 600*dt in serial and parallel. I've attached the picture for your reference (left: serial, right: parallel(2 cores), serial run gives a symmetric result but parallel run results in slightly asymmetric).
Thanks, I see the problem. I note that if I dial the solver tolerances up to 1e-10, then the problem appears to go away, but other than that I've got no idea. I note that by moving this one term, you'll change the subspaces the Krylov solves explore. If I monitor the convergence of the U solver, I see that the true residual is reduced less fast than the preconditioned residual, so is the problem that you somehow haven't killed the errors well enough? Does this give any useful clues? Cheers, Lawrence
Hi Lawrence, Why does that term change the Krylov space? It's either added in ures, or in that line. Cjc On 19 Nov 2014 17:23, "Lawrence Mitchell" <lawrence.mitchell@imperial.ac.uk> wrote:
On 19 Nov 2014, at 13:01, Hiroe Yamazaki <h.yamazaki@imperial.ac.uk> wrote:
Hi Lawrence,
I updated the files with the exact settings I see the problem, so could you run test_linear.py again with the current HEAD of the branch "periodic_parallel" (267de84)? With this setting I got results that looked different at t = 600*dt in serial and parallel. I've attached the picture for your reference (left: serial, right: parallel(2 cores), serial run gives a symmetric result but parallel run results in slightly asymmetric).
Thanks, I see the problem. I note that if I dial the solver tolerances up to 1e-10, then the problem appears to go away, but other than that I've got no idea. I note that by moving this one term, you'll change the subspaces the Krylov solves explore. If I monitor the convergence of the U solver, I see that the true residual is reduced less fast than the preconditioned residual, so is the problem that you somehow haven't killed the errors well enough? Does this give any useful clues?
Cheers,
Lawrence
_______________________________________________ firedrake mailing list firedrake@imperial.ac.uk https://mailman.ic.ac.uk/mailman/listinfo/firedrake
On 20 Nov 2014, at 07:32, Colin Cotter <colin.cotter@imperial.ac.uk> wrote:
Hi Lawrence, Why does that term change the Krylov space? It's either added in ures, or in that line. Cjc
So it's possible I didn't manage to unpick what's going on there properly. So I search in the space span{r_0, A r_0, ...} Where A is the velocity mass matrix and r_0 the initial residual. So if I change Lures to remove this div term, I thought that maybe r_0 would change. But are you saying it's come in from the initial guess in that case (because you've just moved it around)? Lawrence
Yes, that's what is weird. --cjc ________________________________________ From: firedrake-bounces@imperial.ac.uk [firedrake-bounces@imperial.ac.uk] on behalf of Lawrence Mitchell [lawrence.mitchell@imperial.ac.uk] Sent: 20 November 2014 08:59 To: firedrake Subject: Re: [firedrake] parallel weirdness On 20 Nov 2014, at 07:32, Colin Cotter <colin.cotter@imperial.ac.uk> wrote:
Hi Lawrence, Why does that term change the Krylov space? It's either added in ures, or in that line. Cjc
So it's possible I didn't manage to unpick what's going on there properly. So I search in the space span{r_0, A r_0, ...} Where A is the velocity mass matrix and r_0 the initial residual. So if I change Lures to remove this div term, I thought that maybe r_0 would change. But are you saying it's come in from the initial guess in that case (because you've just moved it around)? Lawrence
participants (4)
- 
                
                Colin Cotter
- 
                
                Cotter, Colin J
- 
                
                Hiroe Yamazaki
- 
                
                Lawrence Mitchell