Here are 2 different Poisson codes (degree 3 Lagrange elements) and their outputs:
nref=7 
for j in range(nref):
     mesh = UnitSquareMesh(2**(j+1), 2**(j+1))
     mesh.coordinates.dat.data[:, 0] = 1.0 - mesh.coordinates.dat.data[:, 0]
     V = FunctionSpace(mesh, "Lagrange", 3) 
     u = TrialFunction(V)
     v = TestFunction(V)
     x, y = SpatialCoordinate(mesh) 
     
     f = Function(V)
     f.interpolate(-(2+x**2+y**2)*exp((x**2+y**2)/2))
    
     g=Function(V)
     g.interpolate(exp((x**2+y**2)/2))
     
     a = (dot(grad(v), grad(u)) ) * dx
     L = f * v * dx
     bcs = [DirichletBC(V, g, (1, 2, 3, 4))] 
    
     u = Function(V)
     solve(a == L, u, bcs=bcs)
     #u=u+g #this method works better here; but not for time marching
     L2error[j]=sqrt(assemble(dot(u - g, u - g) * dx))
     H1error[j]=sqrt(assemble(dot(grad(u-g),grad(u-g))*dx))
     
 #L=f*v*dx-dot(grad(g),grad(v))*dx 
     #bcs = [DirichletBC(V, 0, (1, 2, 3, 4))] #if only want one boundary specified: for example use (1,) I think
 output:
 L2 errors
n=  0
0.00043533120955450153
n=  1
2.6722563512459733e-05
n=  2
1.6269810018255959e-06
n=  3
1.6694766615405184e-07
n=  4
1.1739111863202767e-06
n=  5
7.280913725610821e-06
n=  6
2.0327589385362215e-05
L2 convergence rate
4.025983134615211
4.037789103155838
3.2847295743419402
-2.813855433236994
-2.632796250527484
-1.48124771918039
H1 errors
n=  0
0.005033670038757814
n=  1
0.0007002176561563797
n=  2
9.299757283782469e-05
n=  3
1.2089747635515502e-05
n=  4
7.532429204089909e-06
n=  5
3.3080917711483075e-05
n=  6
9.162160804557016e-05
H1 convergence rate
2.8457353045032385
2.912538471487578
2.9434089338107663
0.682597016344028
-2.134812144094874
-1.4696886262688682 
Now making the substition u=w+g and solving for w (which has homogeneous boundary condition)
...
a = (dot(grad(v), grad(u)) ) * dx
L=f*v*dx-dot(grad(g),grad(v))*dx 
bcs = [DirichletBC(V, 0, (1, 2, 3, 4))] 
u = Function(V)
 solve(a == L, u, bcs=bcs)
u=u+g 
output:
L2 errors
n=  0
0.00043533082808728233
n=  1
2.672295570362407e-05
n=  2
1.6257969941867753e-06
n=  3
1.0013920117782886e-07
n=  4
6.2201879296317444e-09
n=  5
3.878122859276369e-10
n=  6
2.4212950203939354e-11
L2 convergence rate
4.025960697004482
4.03886055546273
4.0210683688015765
4.008904872857674
4.003527754961185
4.001507729908028
H1 errors
n=  0
0.0050336743129552495
n=  1
0.0007001812028177674
n=  2
9.279064302630619e-05
n=  3
1.1994409777711576e-05
n=  4
1.527217787021703e-06
n=  5
1.9276952639853355e-07
n=  6
2.4216339384015273e-08
H1 convergence rate
2.845811638200329
2.915677094480334
2.9516171647844374
2.9733844513983505
2.9859569015421905
2.9928242994816663
Obviously the latter problem gives better results. Why is this?