> One shouldn't have to enable filesystem-wide W|X just to run one application
This is a very good point, and it seems like enabling it more granularly would be worth it.
But, for your second point, I have trouble believing that switching between W!X and X!W is going to happen frequently enough to be substantial for performance. (It was very small in Firefox's JavaScript JIT, causing something like a 1%-4% performance hit depending on system).
An emulator is a very different use case than Firefox's JIT.
Take Dolphin (the Gamecube / Wii emulator) for example: you have a 4GiB DVD, it has a massive 80-hour game's entire code engine on there. You cannot recompile the entire disc at startup. Even if you could (you really can't), these games tend to push code into RAM to execute, which a static recompiler cannot handle.
The way dynamic recompilers handle the tremendous burden is to recompile small blocks at a time, and track which ones are hot and cold. When the buffers fill up, they start dropping old, stale code.
It's hard to say exactly what the performance impact would be, and it'd probably vary per game title. But it'd be a lot worse than a web browser recompiling jQuery plus another 200KiB of custom Javascript once on page load.
Plus, I am sure there are many more uses cases for W|X than just emulators and JITs. It would be a shame to try and eradicate them all from existence.
Browser JITs don't just recompile things on page load. They have to add inline cache entries whenever the existing IC is missed.
Now in practice ICs usually hit (that's the point!) so once you've been running for a bit things should hopefully not need more recompiling. Which is a significant difference from the situation you describe.
How often is too often? When I output compilation logs for HotSpot, it's almost never not compiling (in terms of human perception of how fast the messages are output).
If it's writing code into memory, and you're going to compile it and rewrite the jump when it's done, isn't the part where the current code is writing in W mode already?
The process would be:
1. compile the code when you fault on the basic block exit.
2. mark that basic block executable.
3. Optionally only after return or other jump: mark the jumping basic block as writable, patch the jump, then mark it as executable.
In this case, there would be three changes, JS JITs are doing this a lot more often than Dolphin is. They often have more than one level of JIT in addition to an interpreter; so they will end up doing this dance more than once per basic block.
Until I see at least some microbenchmarks and concrete estimates, I don't think I'll worry too much about this. Though it is unfortunate to have to modify all of this code.
An emulator is a particular niche application, and a prime example of an exception for which one could enable W|X - acknowledging that most apps don't need W|X doesn't mean that W|X apps would be eradicated.
> One day far in the future upstream software developers will understand that W^X violations are a tremendously risky practice and that style of programming will be banished outright.
Whoever wrote that they should be banned outright, I feel is being very short-sighted. It would be like banning cars because it's possible to seriously injure a person with one.
tj later replied to me on Twitter saying they meant for it to become per-process in the future. Once that happens, I'll be okay with this change. Right now, I think filesystem-level is far too broad. I often will want an emulator on the same filesystem as I want W^X protections for other applications on.
It's different from your typical language interpreter type JIT in that it's used as an optimisation for generating an optimal processing kernel. Once generated, it's only used once before it's thrown away, as a different input requires a different processing kernel.
Have actually tested using syscalls (not for W^X but rather as a potential workaround for newer Intel CPUs exhibiting weird SMC detection) and have found the overhead to be way too much, even for just one syscall (whilst switching between W/X would require two).
This is a very good point, and it seems like enabling it more granularly would be worth it.
But, for your second point, I have trouble believing that switching between W!X and X!W is going to happen frequently enough to be substantial for performance. (It was very small in Firefox's JavaScript JIT, causing something like a 1%-4% performance hit depending on system).