I doubt the competition (e.g. IBM or Microsoft) has any better code quality. Even PostgreSQL is 1.3M lines of code, so let's get something deliberately written for simplicity. SQLite is just 130k SLoC, so another order of magnitude simpler.
And yet, even SQLite has an awful amount of test cases.
I'm sure some of the difference (25M vs. 1.3M) can be attributed to code for Oracle features missing in PostgreSQL. But a significant part of it is due to careful development process mercilessly eliminating duplicate and unnecessary code as part of the regular PostgreSQL development cycle.
It's a bit heartbreaking at first (you spend hours/days/weeks working on something, and then a fellow hacker comes and cuts of the unnecessary pieces), but in the long run I'm grateful we do that.
> It's a bit heartbreaking at first (you spend hours/days/weeks working on something, and then a fellow hacker comes and cuts of the unnecessary pieces), but in the long run I'm grateful we do that.
The single hardest thing about programming, I'd say.
PostgreSQL has a lot of code but most parts of the code base have pretty high quality code. The main exceptions are some contrib modules, but those are mostly isolated from the main code base.
It's because software LOC scales linerarly with the amount of man-months spent: a testament to the unique ability of our species to create beautiful, abstract designs that will stand the test of time.
This is an interesting comment, because I can't decide if you are sarcastic or making a deep insightful comment. Because I don't think the statement is true. LOC can go on forever, but it usually happens in things that aren't beautiful and abstract.
Worked on sql server for 10+ years. MS SQL Server is way better than that. The sybase sql server code we started with and then rewrote was as bad as oracle.
I guess that is just because SQL as a standard is not coherent nor something beatifully designed. SQL is mashup of vendor specific features all bashed togehter into one standard.
There's also a lot of essential complexity there. SQL provides, in essence, a generic interface for entering and analyzing data. Imagine the number of ways to structure and analyze data. Now square that number to get the number of two tests for how two basic features of the language interact with each other. And that's not even near full test coverage.
Your point about essential complexity is absolutely correct, but your faux mathematical analysis is totally not a legit way to analyze the complexity of something or determine test coverage. I feel like as programmers we should be comfortable making sensible statements without making up shady pseudo-math to sound convincing.
It's abundantly clear that I'm not making a precise computation here. My argument is that tests don't scale linearly with the number of features because interactions between features need to be tested as well.
Anton is a compiler writer himself (part of the Gforth team, which I lead). We do submit bug reports. They get rejected as "invalid", on UB. It is pretty clear that the way we use GCC to implement Gforth is C* and not "C", we use extensions to GCC as source level optimizations. Without them, Gforth would be a factor 3-5 slower than it is now. We experience slowdowns in that order when we have to switch back to "C" instead of finding a workaround for that particular "optimization".
So sane behavior of the compiler is absolutely necessary to create fast programs. The tone reflects more than 10 years of frustration about "C" maintainers, and their inability do deal with the needs of their customers, and our long-term goal is to completely throw out C, writing our own code generator instead. It's going to give us another factor 2-5 (depending on the problem; some microbenchmarks might gain a factor 10) over the C*-based solution we have now. And, at the same time, we can forget about releasing a bug workaround patch for Gforth for every new GCC release, as it was the case for GCC 3.x and 4.x (GCC 5.x is better, some of the critics has been heard). This is feasible, as the number of popular architectures has decreased so far that a small team can support the remaining ones (which is essentially x64, ARM, and ARM64, with some variations within the family).
As a GNU project, we understand ourselves as customers of the other, very important GNU project, GCC, of which we, and almost everybody else in the GNU project, depend. If it's broken, it is too bad. And if the bug report process is broken due to the "UB=we can blow up your program" attitude, it's worse. If the competing comiler maintainers, clang, have the same attitude, we can't even switch.
We are also part of a language standard effort (Forth, of course), so we know how such a committee think; this is not different just because the language is different. What we don't understand is why the C standard has a lot of UB, and only little IB (implementation defined behavior). For example integer overflow: This is a very obvious candidate for IB, as the behavior of the underlying hardware defines integer overflow; usually as wrapv in two's complement, sometimes as trapv, but the other options like one's complement and such are non-existent. The result of such an operation is never "undefined", and as wrapv two's complement is the dominating implementation, it is pretty portable, as well.
Or how pointers look like (pointer comparison): You know how your pointers will look like when you implement a C compiler, so you can fully specify the behavior of pointer comparisons. It's IB, because segmented architectures like 8086 with their overlapping segments simply don't give reliable and fast results. But if you don't program for MS-DOS, you don't need to care: usually pointers are just register-sized integers pointing to a flat, non-segmented memory, and pointer comparison is just compiled to integer comparison.
Bottom line: If you don't understand what your underlying machine does, and how your code generator translates statements to instructions, and therefore you think that x-1>x never can become true, you shouldn't be writing compilers.
And yet, even SQLite has an awful amount of test cases.
https://www.sqlite.org/testing.html