In the context of encrypting 32 or 64 bit IDs, where there is no nonce, that'd be equivalent to XOR encryption and much weaker than TFA's small block ciphers.
If you really want to encrypt and decrypt 32-bit numbers without having any nonces available, the fastest way on non-microcontroller CPUs remains using the AES instructions.
You can exploit the fact that the core of AES consists of 32-bit invertible mixing functions. In order to extend AES to 128-bit, a byte permutation is used, which mixes the bytes of the 32-bit words.
The AES instructions are such, that you can cancel the byte permutation. In this case, you can use the AES instructions to encrypt separately four 32-bit words, instead of one 128-bit block.
Similarly by canceling the standard byte permutation and replacing it with separate permutations on the 2 halves, you can make the AES instructions independently encrypt two 64-bit words.
These AES modifications remain faster than any software cipher.
How to cancel the internal permutation and replace it with external shuffle instructions was already described in the Intel white paper published in 2010, at the launch of Westmere, the first CPU with AES instructions.
Using AES with a pshufb to take out the ShiftRows step would be 2 cycles for the pshufb and 4 cycles for each aesenc, and at 10 rounds, you have ~60 cycles.
It's quite close, and to say which one wins, we'd need to actually benchmark it. One is not clearly much faster than the other.
There's nothing similar about AES and Speck, and the "microcode" for AES isn't like what you're thinking of. If you want to learn more about it, you can look up the specifications for AES and Intel's AES instruction set.