Related to unspecified vs undefined. I recall some C code was trying to be tricky and read from just allocated memory. Something like:
int* ptr = malloc(size);
if(ptr[offset] == 0)
{
}
The code was assuming that the value in an allocated buffer did not change.
However, it was pointed out in review that it could change with these steps:
1) The malloc allocates from a new memory page. This page is often not mapped to a physical page until written to.
2) The reads just return the default (often 0 value) as the page is not mapped.
3) Another allocation is made that is written to the same page. This maps the page to physical memory which then changes the value of the original allocation.
A read from an unmapped page producing a different value than reading from that same page after it's mapped is an OS bug (*). If this was an already allocated page that had something written to it, reading from it would page it back in and then produce the actual content. If this was a new page and the OS contract was to provide zeroed pages, both the read before it was mapped and the read after it was mapped would produce zero.
What could happen is that the UB in that code could result in it being compiled in a way that makes the comparison non-deterministic.
(*): ... or alternatively, we're not talking about regular userspace program but a higher privilege layer that is doing direct unpaged access, but I assume that's not the case since you're talking about malloc.
The closest thing to "conditionally returned to the kernel" is if the page had been given to madvise(MADV_FREE), but that would still not have the behavior they're talking about. Reading and writing would still produce the same content, either the original page content because the kernel hasn't released the page yet, or zero because the kernel has already released the page. Even if the order of operations is read -> kernel frees -> write, then that still doesn't match their story, because the read will produce the original page content, not zero.
That said, the code they're talking about is different from yours in that their code is specifically doing an out-of-bounds read. (They said "If you happen to allocate a string that's 128 bytes, and malloc happens to return an address to you that's 128 bytes away from the end of the page, you'll write the 128 bytes and the null terminator will be the first byte on the next page. So they're very clearly talking about the \0 being outside the allocation.)
So it is absolutely possible to have this setup: the string's allocation happens to be followed by a different allocation that is currently 0 -> the `data[size()] != '\0'` check is performed and succeeds -> `data` is returned to the caller -> whoever owns that following allocation writes a non-zero value to the first byte -> whoever called `c_str()` will now run off the end of the 128B string. This doesn't have anything to do with pages; it can happen within the bounds of a single page. It is also such an obvious out-of-bounds bug that it boggles my mind that it passed any sort of code review and required some sort of graybeard to point out.
Unfortunately this hypothesis is also wrong. MAP_UNINITIALIZED can only be enabled in the kernel when there is no MMU, and in that case the page will already be in physical memory, so the very first pointer dereference will read the correct byte, not a fake zero because it's "uninitialized".
int* ptr = malloc(size); if(ptr[offset] == 0) { }
The code was assuming that the value in an allocated buffer did not change.
However, it was pointed out in review that it could change with these steps:
1) The malloc allocates from a new memory page. This page is often not mapped to a physical page until written to.
2) The reads just return the default (often 0 value) as the page is not mapped.
3) Another allocation is made that is written to the same page. This maps the page to physical memory which then changes the value of the original allocation.