Your FunctionSpace W has roughly 388*584*2 = 450k degrees of freedom. Your VectorFunctionSpace V has roughly 2*450k = 900k degrees of freedom. Then you construct a MixedFunctionSpace Q = W*W*V, which will have roughly 1.8 million degrees of freedom.
LU causes fill-in, so (unless there's something very special about your PDE/discretisation) each row of the matrix will have O(sqrt(N)) [I think] non-zero entries which must be stored. And 1.8 million * O(sqrt(1.8 million)) * (8 bytes/double + 4 bytes/index) is probably more than 8 GB.
Using MUMPS rather than the inbuilt LU might help? I.e., add the option "pc_factor_mat_solver_package": "mumps" to your solver_parameters dict.