Disclosure: I'm an HPC admin who developed a materials simulation framework for my Ph.D.
Simulations run on FP64, and you have to since you're already approximating stuff with numerical algorithms (analytic solution of many things are impossible anyway). Even if you can do things with FP8, transferring everything to GPU is not trivially possible.
A simulation contains tons of different algorithms, and not all of them can be modeled as a set of matrix operations effectively. Also, moving kernels in an out of GPU is not an instant affair, plus moving data to GPU is always more expensive.
You have GPUDirect and MultiDMA engines in modern GPUs, but they need hardcore coding and knowing what you're doing if you're not solving popular stuff with established libraries and so on.
Plus, if you don't prefer to be vendor locked, at least one of the vendors artificially limit the performance you can get from their cards.
On the other hand, all of the prominent linear algebra libraries squeeze out the CPUs you have relatively easily, and you don't have to have matrices and vectors to get this performance from CPUs anyway.
Lastly, I want to touch on that parallelization such problems are not always trivial even on CPUs. When you go multinode via MPI, things get fun. Getting GPUs into that mix is somewhat of a madness if you're not prepared.
Simulations run on FP64, and you have to since you're already approximating stuff with numerical algorithms (analytic solution of many things are impossible anyway). Even if you can do things with FP8, transferring everything to GPU is not trivially possible.
A simulation contains tons of different algorithms, and not all of them can be modeled as a set of matrix operations effectively. Also, moving kernels in an out of GPU is not an instant affair, plus moving data to GPU is always more expensive.
You have GPUDirect and MultiDMA engines in modern GPUs, but they need hardcore coding and knowing what you're doing if you're not solving popular stuff with established libraries and so on.
Plus, if you don't prefer to be vendor locked, at least one of the vendors artificially limit the performance you can get from their cards.
On the other hand, all of the prominent linear algebra libraries squeeze out the CPUs you have relatively easily, and you don't have to have matrices and vectors to get this performance from CPUs anyway.
Lastly, I want to touch on that parallelization such problems are not always trivial even on CPUs. When you go multinode via MPI, things get fun. Getting GPUs into that mix is somewhat of a madness if you're not prepared.