To one whom it may concern,
I modified the code to solve the 3d lid-driven cavity problem with a uniform grid of size 40x40x40, and found that it takes extremely long time and large memory. I had the same issue with my code of similar sizes, which uses Pardiso to solve the linear system. I'm wondering if fem is indeed unsolvable for a large problem like this? Do you have any experience to make it faster by changing the parameters in the linear solver? 
Looking forward to hearing from you.
Best,
Zizhou