Yup. The beauty of it is that the underlying ai accelerator/hardware is completely abstracted away. There’s a CoreML ONNX execution provider, though I haven’t used it.
No more fighting with hardcoded cuda:0 everywhere.
The only pain point is that you’ll often have to manually convert a PyTorch model from huggingface to onnx unless it’s very popular.
I should try those on the NVIDIA Spark, be interesting to see if they are easy to work with on ARM64.