That's great! The processor's hypervisor-like firmware should handle task switching, page table manipulation, etc, and the OS kernel should use upcalls to the firmware instead of needing to have various special-case paths for various minor hardware variants. Had the x86 BIOS been a bit better designed (and a bit more performant), we likely would have seen OS kernels leaning much harder on firmware that shipped with the processor instead of having to make as many assumptions about the hardware and special-case checks.
Besides allowing for more easily isolated security domains, this allows things like (if properly designed) not needing to wait for kernel improvements to take advantage of more/wider vector registers or other changes that change the amount of processor state to serialize/deserialize when task switching.
The DEC Alpha AXP worked somewhat like this with its PALCode firmware. The Tru64 UNIX (and Linux, *BSD, etc.) and VMS kernels actually were unable to execute the privileged CPU instructions. The OS kernel needed to make upcalls to the PALCode, which then could use privileged instructions and could see model-specific registers, etc. The PALCode version used for Tru64 emulated two protection rings, and the PALCode version used with VMS emulated more (I think 4) rings of protection by just keeping an extra integer around for each task, and using that to determine which tasks could currently make which upcalls. One could (and probably should) extend this ring emulation to a bit vector of per-task revokable capabilities that could be passed to child tasks/processes/threads.
Hopefully we see something like this for RISC-V, using seL4 for the "realm manager". This would probably require an extra userspace driver process running to intermediate realm setup and manipulation, but wouldn't be in the critical path for system calls or other userspace drivers.
We're already running hypervisors so many places that it makes sense to run a formally verified separation kernel everywhere, and run hypervisors and OS kernels as userspace daemons. This avoids the hypervisor needing to emulate hardware as an ad-hoc upcall mechanism and instead simplifies both the hypervisor and the OS kernel. The overhead of modern microkernels is so low that your cell phone's baseband processor is likely running an L4 microkernel. It's called paravirtualization when the OS kernel is modified to use upcalls to the hypervisor instead of trying to perform privileged operations that will be trapped (and then emulated) by the hypervisor. Paravirtualization improves VM performance and potentially sidesteps hypervisor emulation bugs, but it would simplify the kernel (and potentially make it easier to optimize) if OS kernels ran paravirtualized even when there is one guest OS per physical computerp
Edit: Of course, there's a small performance hit in the single guest OS case, but if that's the common code path, presumably both hardware and the kernels could be better optimized. Also, if you're supporting OS-opaque realms, you're already paying this hypervisor cost all the time anyway.
Apple actually ships a proprietary ARM extension for lateral exception levels to help enforce kernel integrity, which includes gating access to code that fiddles with page tables.
Besides allowing for more easily isolated security domains, this allows things like (if properly designed) not needing to wait for kernel improvements to take advantage of more/wider vector registers or other changes that change the amount of processor state to serialize/deserialize when task switching.
The DEC Alpha AXP worked somewhat like this with its PALCode firmware. The Tru64 UNIX (and Linux, *BSD, etc.) and VMS kernels actually were unable to execute the privileged CPU instructions. The OS kernel needed to make upcalls to the PALCode, which then could use privileged instructions and could see model-specific registers, etc. The PALCode version used for Tru64 emulated two protection rings, and the PALCode version used with VMS emulated more (I think 4) rings of protection by just keeping an extra integer around for each task, and using that to determine which tasks could currently make which upcalls. One could (and probably should) extend this ring emulation to a bit vector of per-task revokable capabilities that could be passed to child tasks/processes/threads.
Hopefully we see something like this for RISC-V, using seL4 for the "realm manager". This would probably require an extra userspace driver process running to intermediate realm setup and manipulation, but wouldn't be in the critical path for system calls or other userspace drivers.
We're already running hypervisors so many places that it makes sense to run a formally verified separation kernel everywhere, and run hypervisors and OS kernels as userspace daemons. This avoids the hypervisor needing to emulate hardware as an ad-hoc upcall mechanism and instead simplifies both the hypervisor and the OS kernel. The overhead of modern microkernels is so low that your cell phone's baseband processor is likely running an L4 microkernel. It's called paravirtualization when the OS kernel is modified to use upcalls to the hypervisor instead of trying to perform privileged operations that will be trapped (and then emulated) by the hypervisor. Paravirtualization improves VM performance and potentially sidesteps hypervisor emulation bugs, but it would simplify the kernel (and potentially make it easier to optimize) if OS kernels ran paravirtualized even when there is one guest OS per physical computerp
Edit: Of course, there's a small performance hit in the single guest OS case, but if that's the common code path, presumably both hardware and the kernels could be better optimized. Also, if you're supporting OS-opaque realms, you're already paying this hypervisor cost all the time anyway.