Yeah they do all the time. I remember in my parallel computing course where we got to use our 800 core test PC back in grad school where people were running simulations of different weather patterns and climate change. Earthquake simulations and what not. A lot of that can be done taking advantage of all of those cores. Academia specifically heavily uses these to get closer to the "physics" with clear discrete limitations
1. Low latency network, 1-2us. Most servers can't ping their local switch that quickly, let alone the most distant switch for 1M nodes
2. High bandwidth network, at least 200gbit
3. A parallel filesystem
4. Very few node types.
5. Network topology designed for low latency/high bandwidth, things like hypercube, dragonfly, or fat tree.
6. Software stack that is aware of the topology and makes use of it for efficiency and collective operations,
7. Tuned system images to minimize noise, maximize efficiency, and reduce context switches and interupts. Reserving cores for handing interrupts is common at larger core counts.