Hi guys,
I want to measure AI using the procedure described here on the NERSC website:
Basically they obtain FLOPs from Intel's SDE and the bytes from VTune. And in their C code example, they inserted macros to indicate where to start and stop the measurements.
Does anyone here know how to do something similar for Python-based programs? Or if there's a "hack" to allow me to insert these C macros into the firedrake code somewhere?
Thanks,
Justin