It’s a good law. Probably deserving of the term law in almost any circumstance, although I’ve definitely been “lucky” enough to find exceptions (ie optimizing horribly inefficient code where the optimizations really couldn’t have been premature at any point in prior development). Edit: even so, I’ve found the law holds often enough in exactly the same code. Then it’s a very good opportunity to expand test coverage.
The point of Matt Godbalt's "law" is that eventually you'll discover that the undocumented, mysterious memcpy wasn't unnecessary. Ie. It was a hacky resolution for a weird race condition edge case that couldn't be solved some other way without introducing some other, major regression.
Off course, he doesn't mean to say you can never optimize to gain speed increases of > 2x. It's a hyperbole, build on his experience working with well written optimized time-critical software (think games, high volume/low latency trading), where on a good day you're happy to improve performance by 3%.
Mark is one of the world's top experts on practical MySQL performance at scale, having spent a huge amount of time optimizing MySQL at Google and Facebook. There's a question in this thread about whether this has real world impact... yes, if Mark noticed it, yes, yes it does. This will materially improve many common workloads for InnoDB.
4x perf on inserts... Those kinds of post make me both scared and depressed by the state of our industry. I feel like we're all hitting rocks to make fire in a cave.
We're picking up pennies on the sidewalk. Each individually is valuable, but there are just so many of them that every day we go home with the overwhelming majority of them go uncollected. And occasionally one is made of gold.
FWIW... if it takes you all of (say) two seconds to pick up a penny and put it in your pocket, that works out to 18$ an hour, tax-free. Not bad. Nor world-changing.
The fact that we're still relying on a human brain having a perfect mental model of the tens of thousands of lines of code he wrote, interacting with the million ones other people wrote, is baffling.
At this point i consider it a miracle things are still working about fine.
"4x perf on inserts" on a microbenchmark - there's nothing to be depressed (or, conversely, enthusiastic) about.
the microbenchmark has value (¹), but in a very specific context.
if one wanted to observe performance increase for non-trivial usage pattern, they should use at least TPC-C, where I guess changes like this make a negligible difference, or none at all.
> I assume this problem was limited to InnoDB because I did not see problems with MyRocks.
This would certainly explain why InnoDB is insanely slow at loading mysqldumps compared to MyISAM - we hit a wall in some systems and were unable to switch because of it. There's tons of questions online about how to speed this up, people were generally aware of the problem but assumed it was because InnoDB is more reliable with the data (like with foreign keys) or something about how it structures the data on disk that couldn't be changed.
Is someone more experienced able to shed light on how these benchmarks compare to real world use? Writes tend to be pretty resource intensive, is 4X faster going to show up as 2-4x faster writes on production environments?
Inserts ought to be able to run at the storage mediums write speed.
Ie. If I insert 1 million records of 1 kilobyte each. And my SSD can do 1GB/s of writes, then I should be able to do it in 1 second.
How close are we to that?
Transactions shouldn't slow this down (it's possible to write all the data to new areas of diak, and then the final 'commit' is a metadata update to say those new areas are active).
Indexes might slow it down depending on the index design. But it's possible to design an index where updates are coalesced, and therefore the changed parts of the index are only written once as a single bulk write. Assuming the inde. Size is only 10% of the data size, that's a 10% slowdown.
True - but this benchmark is for simple inserts, so it should be possible to reach the hardware write speed.
If those other things are used, then sure, it decreases the theoretical throughput.
A 'log' in modern databases is the database. Ie. the data gets written only once. A partial write can be safely ignored (rolling back the entire transaction and all writes in one go). Data is never overwritten in-place, so there is never a time when the database can be inconsistent. The downside with many storage engines is that an entire row must be read and rewritten in order to set one field (imagine a "UPDATE people SET name='FOO';" query - every single record needs to be read and rewritten, when theortically you could just write 'FOO' to disk and remember that all people in the database now have that name without even needing to read the people records)
This is essentially never true. It _could_ be true if you disabled many safety features in an RDBMS, but you then lose ACID guarantees.
MySQL (with InnoDB), for example, undergoes the following for an INSERT/UPDATE:
1. Write current state of pages to the undo log
2. Write changes to the buffer pool
3. Write changes to the redo log
4. Write changes to the binary log
5. Write changes to the double write buffer
6. Write changes to disk
There are a dizzying amount of variables affecting each of those, including delays, maximum allowable size, sync/async flushing, outright disabling of some steps (most commonly the double write buffer – which is only safe for CoW filesystems like ZFS – but you can also disable the redo log if you really want to), etc.
To your point of writes to disk, most (all?) of the disk writes are batched. Even if InnoDB wasn't doing so, the kernel can and will batch writes on its own.
> so there is never a time when the database can be inconsistent
If the hardware suffers a failure during a write, inconsistency can occur. This is precisely why the double write buffer (for InnoDB) and full page writes (for Postgres) exist. Also, the hardware usually reports back that it's flushed before it's actually done durably written it. If you have SSDs with Power Loss Protection capacitors, this isn't a problem, but no consumer drives that I'm aware of have those.
> downside with many storage engines is that an entire row must be read and rewritten
Actually, it can be even worse than this: if you're using UUIDv4 (or anything else random) as your PK, any UPDATE on more than one row is going to touch _a lot_ of pages. All of those pages have to be written out. This affects not only RDBMS with clustered indices (InnoDB, SQL Server) but also Postgres due to its WAL. Changing one byte can result in a 16 KiB (InnoDB) or 8 KiB (Postgres) write, which is quite a bit of amplification.
I'm a big fan of UUIDv7 if you need them. I maintain most places could just use an auto-incrementing int, and rewrite the API to not expose it, but barring that UUIDv7s are great.
What you say about the log isn't true. A log is also the restore. The database must make sure that if it's written to the log, it's written to the database, and vice versa. Both for restore in case of crash, and for sync across databases, and ...
This alone will ensure performances are not those of the disk, you need to write twice and use synced writes instead of async.
There are a lot of complexities, and solutions aren't as simple as one would imagine.
I'm going to assume MySQL, but other relational dbs work similarly.
> Transactions shouldn't slow this down (it's possible to write all the data to new areas of diak, and then the final 'commit' is a metadata update to say those new areas are active).
This is done already. DBs don't write directly to the buffer pages, they write to a sequential log (the WAL), because sequential writes are faster. A separate thread reads the WAL and applies the changes.
This is actually not even the whole story. Pages are written twice, to the so-called doublewrite buffer, because "filesystems are complicated", and writing a page is not in itself a guarantee of atomicity (this is a way more complex discussion than it seems).
> But it's possible to design an index where updates are coalesced, and therefore the changed parts of the index are only written once as a single bulk write
The most important concept is: sequential writes (as in: big batch of writes) are not possible in ACID dbs, because changes must be written immediately. Due to the durability guarantee, one can't buffer them in memory, and pretend that they have been written.
Battery-backed controllers can be used to improve write throughput, but that's another story.
You've forgotten transaction latency. You might be able to get that for doing all the inserts in one go, but in normal operation what you care about is INSERTs of one item at a time from different writers, and each of those has to do a whole round trip for durability.
Anyway, if you know how to do it I'm sure mysql will welcome the pull request.
> Godbolt’s Law — if any single optimisation makes a routine run two or more times faster, then you’ve broken the code.
<https://xania.org/200504/godbolt's-law>