Have folks seen good Linux tools for actually monitoring/profiling the GPU's inner workings? Soon I'll need to scale running ML models. For CPUs, I have a whole bag of tricks for examining and monitoring performance. But for GPUs, I feel like a caveman.
In nvidia land, `nvidia-smi` is like `top` for your gpus. If you're running compiled CUDA, `nvprof` is very useful. But I'm not sure how much work it would take to profile something like a pytorch model.