forthy's comments

forthy · on Nov 15, 2018

I doubt the competition (e.g. IBM or Microsoft) has any better code quality. Even PostgreSQL is 1.3M lines of code, so let's get something deliberately written for simplicity. SQLite is just 130k SLoC, so another order of magnitude simpler.

And yet, even SQLite has an awful amount of test cases.

https://www.sqlite.org/testing.html

pgaddict · on Nov 15, 2018

I'm sure some of the difference (25M vs. 1.3M) can be attributed to code for Oracle features missing in PostgreSQL. But a significant part of it is due to careful development process mercilessly eliminating duplicate and unnecessary code as part of the regular PostgreSQL development cycle.

It's a bit heartbreaking at first (you spend hours/days/weeks working on something, and then a fellow hacker comes and cuts of the unnecessary pieces), but in the long run I'm grateful we do that.

troels · on Nov 16, 2018

> It's a bit heartbreaking at first (you spend hours/days/weeks working on something, and then a fellow hacker comes and cuts of the unnecessary pieces), but in the long run I'm grateful we do that.

The single hardest thing about programming, I'd say.

ska · on Nov 28, 2018

In many (most?) ways the best edits of code are the ones where you can get rid of lines.

jeltz · on Nov 15, 2018

PostgreSQL has a lot of code but most parts of the code base have pretty high quality code. The main exceptions are some contrib modules, but those are mostly isolated from the main code base.

p0nce · on Nov 15, 2018

It's because software LOC scales linerarly with the amount of man-months spent: a testament to the unique ability of our species to create beautiful, abstract designs that will stand the test of time.

Latteland · on Nov 15, 2018

This is an interesting comment, because I can't decide if you are sarcastic or making a deep insightful comment. Because I don't think the statement is true. LOC can go on forever, but it usually happens in things that aren't beautiful and abstract.

p0nce · on Nov 16, 2018

I was being sarcastic.

Latteland · on Nov 18, 2018

thanks for reply. you said it so earnestly that i couldn't tell!

Latteland · on Nov 15, 2018

Worked on sql server for 10+ years. MS SQL Server is way better than that. The sybase sql server code we started with and then rewrote was as bad as oracle.

karulont · on Nov 15, 2018

I guess that is just because SQL as a standard is not coherent nor something beatifully designed. SQL is mashup of vendor specific features all bashed togehter into one standard.

majewsky · on Nov 15, 2018

There's also a lot of essential complexity there. SQL provides, in essence, a generic interface for entering and analyzing data. Imagine the number of ways to structure and analyze data. Now square that number to get the number of two tests for how two basic features of the language interact with each other. And that's not even near full test coverage.

9question1 · on Nov 15, 2018

Your point about essential complexity is absolutely correct, but your faux mathematical analysis is totally not a legit way to analyze the complexity of something or determine test coverage. I feel like as programmers we should be comfortable making sensible statements without making up shady pseudo-math to sound convincing.

majewsky · on Nov 15, 2018

It's abundantly clear that I'm not making a precise computation here. My argument is that tests don't scale linearly with the number of features because interactions between features need to be tested as well.

forthy · on March 4, 2016

Anton is a compiler writer himself (part of the Gforth team, which I lead). We do submit bug reports. They get rejected as "invalid", on UB. It is pretty clear that the way we use GCC to implement Gforth is C* and not "C", we use extensions to GCC as source level optimizations. Without them, Gforth would be a factor 3-5 slower than it is now. We experience slowdowns in that order when we have to switch back to "C" instead of finding a workaround for that particular "optimization".

So sane behavior of the compiler is absolutely necessary to create fast programs. The tone reflects more than 10 years of frustration about "C" maintainers, and their inability do deal with the needs of their customers, and our long-term goal is to completely throw out C, writing our own code generator instead. It's going to give us another factor 2-5 (depending on the problem; some microbenchmarks might gain a factor 10) over the C*-based solution we have now. And, at the same time, we can forget about releasing a bug workaround patch for Gforth for every new GCC release, as it was the case for GCC 3.x and 4.x (GCC 5.x is better, some of the critics has been heard). This is feasible, as the number of popular architectures has decreased so far that a small team can support the remaining ones (which is essentially x64, ARM, and ARM64, with some variations within the family).

As a GNU project, we understand ourselves as customers of the other, very important GNU project, GCC, of which we, and almost everybody else in the GNU project, depend. If it's broken, it is too bad. And if the bug report process is broken due to the "UB=we can blow up your program" attitude, it's worse. If the competing comiler maintainers, clang, have the same attitude, we can't even switch.

We are also part of a language standard effort (Forth, of course), so we know how such a committee think; this is not different just because the language is different. What we don't understand is why the C standard has a lot of UB, and only little IB (implementation defined behavior). For example integer overflow: This is a very obvious candidate for IB, as the behavior of the underlying hardware defines integer overflow; usually as wrapv in two's complement, sometimes as trapv, but the other options like one's complement and such are non-existent. The result of such an operation is never "undefined", and as wrapv two's complement is the dominating implementation, it is pretty portable, as well.

Or how pointers look like (pointer comparison): You know how your pointers will look like when you implement a C compiler, so you can fully specify the behavior of pointer comparisons. It's IB, because segmented architectures like 8086 with their overlapping segments simply don't give reliable and fast results. But if you don't program for MS-DOS, you don't need to care: usually pointers are just register-sized integers pointing to a flat, non-segmented memory, and pointer comparison is just compiled to integer comparison.

Bottom line: If you don't understand what your underlying machine does, and how your code generator translates statements to instructions, and therefore you think that x-1>x never can become true, you shouldn't be writing compilers.