-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/11/15 10:06, Eike Mueller wrote:
Hi Lawrence (copied to firedrake, since overheads from loading libraries might be a general concern),
I tried it on ARCHER and adding caching for the kernels does not make any difference. The LU solve performance at lowest order is poor, but an individual call takes actually more time (~0.01s) than the operator application (~0.001s), so I would have thought the overheads are actually relatively smaller for the LU solve. For the operator application the reported BW is excellent, but for the LU solve it is very poor. At higher order both BWs are good, here the data volume is larger, but the time for one LU solve call is still ~0.01s. Maybe in this case any overhead that shows up at lowest order is hidden.
Could there be an overhead from loading the LAPACK library, which is required for the LU solve?
This isn't how dynamic loading works. The first time you load the .so, in the warmup phase, the symbol is resolved, and the trampoline is replaced by a direct call. I have effectively no idea what's going on. Does the LU solve take this long on this much data if you just call it from C? IOW, I think it's not "our" fault, unless somehow you're managing to get a recompile or similar every time you call _lu_solve. Lawrence -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAEBAgAGBQJWQGtBAAoJECOc1kQ8PEYv+PUH/0o9la78TbSn7UTWe9anzMwC o4GkJ0lfbwvmZ6PWI+fPzrsH4lnR1AOiWSvG/BBNIW4SQvMhx50otImyeQePZ+9s 7uZqOcKdyvsRncFDSpdlND5eDO4+o9QVfINrmw4W9eXe9WsIUPHAWNsINkvyqnfX GlW8dRynKoIPqs7ZR3DfNHUF0RRtbY3z4Zo/jjeDzGXnvdXVagmhLRG17UQ2WB8H p8qSFBTNgnSKS1kKvUNlaR0cL2agTuoPSAY6ITnb7hJzBxSGXrWNcj8dFuune6hi wWkSxS5Y2Lgio+X/Jw36zMUdBTXLzwWSfjBhiYpHgch9zGXAskNSMdZM8DbeA3I= =7KZc -----END PGP SIGNATURE-----