There are lots of good reasons to make cryptographic operations instructions instead of a memory mapped peripheral, but I prefer something like VIA padlock which implemented cipher modes instead of just implementing the round function as instruction. Any implementation could even trap those and implement them in a peripheral. The problem with memory mapped peripherals is that access to them has to be multiplexed and their state preserved by context switches. Specialized instruction on existing registers avoid this problem. VIA padlock solved it by piggybacking on the existing x86 REP prefix for interruptible string instructions and only cached the cipher round keys in the crypto unit reloading them from memory (or repeating the key schedule) after a context switch.
In lots of places this makes sense. E.g. lots of embedded ARM platforms have a separate AES / ECC accelerator peripheral.
The trouble comes when you need to share access to a memory mapped peripheral among multiple threads/processes/users etc. It can be done, but it's usually easier to manage CPU registers than peripheral devices for things like crypto operations in larger systems. Plus, you have to do access control to the peripheral (so other processes don't try and steal your key), if its all within the security boundary of a "normal" process, you get that (mostly) for free.
All of the above has caveats and exceptions, but generally (ARM, SPARC, x86, now RISC-V) take this approach.
That, and other operations requiring three input registers -- therefore a LOT of encoding space -- has been postponed to a possible future extension.
Full GREV and my lovely GORC have also gotten lost, though the encodings for the specific REV and ORC instructions that are included are upwardly compatible with the proposed general versions.
- Scalar crypto: https://github.com/riscv/riscv-crypto/releases
- Vectors: https://github.com/riscv/riscv-v-spec/releases
- Bitmanip: https://github.com/riscv/riscv-bitmanip/releases