Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But prototype-less functions in C don't actually work all that well. In particular, the inferred return type is going to be an int, so if you're trying to return a 64-bit pointer via a prototype-less function, you're going to have a very bad time.


Historically they worked well enough (on 16-bit and 32-bit machines) because sizeof(pointer) == sizeof(int) == sizeof(general register) on the architectures where C flourished in the pre-ANSI C era.

But with the migration to 64-bit machines, typically int stayed put at 32 bits.

I guess nobody wanted to introduce a new integral type between short and int; they had enough trouble dealing with code which assumed sizeof(long) == 4. I recall stumbling across a comment where the word "beint32_t" appeared where "belong" would have made sense in context..


I love naïve search and replace errors. In the November 1996 version of the Defense Incident-Based Report System codes definitions in DoD Manual 7730.47, the code 092-C2 refers to "shallful" dereliction of duty.


Also known as a clbuttic mistake.


Why not just make short 32-bits? Yeah, you lose the type for the 16-bit wide integers but x64 doesn't natively support it all too well anyway, unlike the 32- and 64-bit wide integers. And that is what the C integer types are about, right, about being efficiently represented by the underlying hardware, not their exact bitwidth? Right?


> And that is what the C integer types are about, right, about being efficiently represented by the underlying hardware, not their exact bitwidth?

If you have one `short` argument to your function maybe. But if you have a `short[]` array, you probably do care about the memory layout of that array. You might need it to be compatible with some particular data format that you're trying to read/write. Same with a field of a struct, if that struct is used for parsing. A lot of C code does parsing like this.


No. It’s for tightly packing data in data structures. Bitwidth is exactly what’s important here.


Well, that's a shame because bitwidth of standard integer types is quite uncertain. CHAR_BIT can be (and is, on some platforms) 16 or 32, long was never guaranteed to be 64 bits (it's quite often was 32 bits on platforms with 16 bit ints) etc, not to mention that if that is what the standard integer types were for then they'd probably have names like int8/uint8/int16/int32/etc.

It's almost as if they were not, in fact, intended for precise control of bitwidths in portable manner...


It doesn’t matter what someone in the past thought they were for. That’s what they are for in practice, names are irrelevant here, yes, they are quite bad. But the ones in stdint.h are just typedefs for those, so that’s what we are left with.


You can always use "unsigned char[N]" for that, you know, which is more realiable. You can even union it with an integer type for more convenient access, although please use static_assert to verify that the overlap is exact.

All in all, "it's very easy and straightforward to control the size and padding of a struct's fields in portable manner" is yet another C's imaginary advantage: it's not that straightforward or simple. The padding especially has always been a thorny issue.


The efficient hardware types are handled by int_fast*_t. The legacy types can't be redefined outside their established ranges because that would break things that depend on them fitting into a known amount of memory.


It would still break on struct and floating point type returns though.


Well, there is an argument to be had that, for C, a plain unadorned "int" should probably be the native word size for that architecture. On 64-bit systems, "int" would therefore also be 64 bits.


I think there are two reasons int has not gone from 32 to 64 bits on 64-bit systems.

Part of it is backward compatibility: code written to assume 32-bit int could break. (Arguably such code is badly written, but breaking it would still be inconvenient.)

Another part is that C has a limited number of predefined integer types; char, short, int, long, long long (plus unsigned variants and signed char). If char is 8 bits and int is 64 bits, then you can't have both a 16-bit and a 32-bit integer type. Extended integer types (introduced in C99) could address this, but I don't know of any compilers that provide them.


If you can have "long long", why not "short short"?

In that alternate universe, char could be 8 bits, short short 16, short 32, and int 64.


And “long short” and “short long” types. :)


For 24 bits?


Since the extended integer types are just aliases to the other types, they wouldn't solve the problem. Also in C++ these aliases create a problem with overload sets when you mix the two worlds and try to produce portable code. For example long on some platforms is 32 bit and 64 bit on others, also platforms use inconsistent fundamental types for 32 and 64 bit aliases. All in all if you want to produce portable code you neither use those extended integer types nor long. You assume char, short, int, long long are 8, 16, 32, 64 bit respectively, which holds on all relevant platforms.


Extended integer types are decidedly not just aliases to other types - the C standard has separate "standard integer types" which are the regular char/short/int/long/long long, and "extended integer types" which are any additional implementation-specific types. stdint.h-defined types can be either of those categories (and on regular clang/gcc they're all standard integer types and not extended ones). So you could have a system with char/short/int/long/long long being 8/16/64/64/64-bit respectively and still be able to provide an int32_t that's none of those; just, noone does.


What really sucks about this in C++ is that it prevents you from knowing whether you can overload based on those types.


> Extended integer types (introduced in C99) could address this, but I don't know of any compilers that provide them.

What environment are you working in? Because I don't know a single half-recent compiler that does not provide stdint (uint8_t, ..., int64_t), but I mostly work with GCC/LLVM toolchains.


Some embedded compilers will provide stdint. And if the compiler doesn't, I've found that one of the first headers written for a project ends up being an equivalent to stdint.

It's pretty common to develop part of an embedded C program under Linux or similar host environment. Better debuggers, better profiling tools, etc. And uint8_t and friends are particularly important when you're working cross-platform.


Extended integer types aren't necessarily related to stdint.h - in the vast majority if not every one of those "half-decent compilers" the stdint.h types are just typedefs over plain old char/short/int/long/long long, which are not extended integer types. Extended integer types is a mechanism to allow having types other than those.


> in the vast majority if not every one of those "half-decent compilers" the stdint.h types are just typedefs over plain old char/short/int/long/long long, which are not extended integer types.

Sure, but isn't that just an implementation detail? Because I really don't care if my int64_t is internally typedef'd to "long long int" or "__m64", as long as there is a standardized interface to ask for it.


Point being, _kst_'s comment of "I don't know of any compilers that provide them" is correct - there are few if any compilers that actually have actual extended integer types, and thus introducing such in compilers might be non-trivial, and plenty of code may exist under the assumption that they don't exist and thus could break (things like integer promotion rules, _Generic, varargs; and also intmax_t is of mention as it must be at least as wide as any supported standard or extended integer type (which is also why clang/gcc __int128 doesn't qualify as an integer type as per the standard))


Yeah, except that ints in your data structures will unnecessarily consume far, far too much memory.


If 32-bit ints didn't make your structures consume far, far too much memory in the early '90s, when consumer-grade computers came with 4-8MiB of RAM and 256MiB disks, then I don't see how 64-bit ints could have done so in the mid-'00s when they came with 1GiB of RAM and 256GiB disks.


They still come with 64k (or so) of L1 cache.


I am constantly amazed at how much memory programs uselessly consume.


OTH, the explicitly sized integers (int32_t, int16_t etc) had been added already in C99 (e.g. 25 years ago), and should be used when a specific memory layout of structs is desired.


If you're going to use those, then there's no reason to have ints be 64 bit.

Personally, I find using int32_t in general to be an uglification of the code. I never use `long` in C code anymore, as it's sometimes 32 bits and sometimes 64 bits. I use `int` and `long long`.

Do I care about 16 bit code anymore? No. Very few programs would port between 16/32 these days anyway, no matter what the Standard says or how hard you try to write code portably.


Always thought it was odd how "long long" is one of the only common types that's two words with an implicit int at the end.

Almost makes me want to add "typedef long long longer;" to some code that I don't intend anyone to maintain.


Is it the numbers you do not like? If you use “long long”, it must not be the length.

I don’t love long long. As an amateur compiler writer, it hurts me. “long long” makes “long” both a modifier ( like unsigned is ) and a type. Yuck.

I wish it was i8, i16, i32, and i64 ( with u versions of each ). f32 and f64 for floats. Those are easy to understand and fairly easy on the eyes.

If those numbers are too noisy, the CIL ( .NET ) types could work. For example, i4 and i8 instead of i32 and i64. I do no love the look of i1 either though. I guess you could special case sbyte and byte as aliases.


> Is it the numbers you do not like?

Correct.

> fairly easy on the eyes

Not for me. It's a personal thing, I just don't like it. When I removed them all from my code, it was like I'd scraped the barnacles off my boat.


"long" is always a modifier, just potentially applied twice, and potentially to nothing. A more written out version of "long long" is "long long int", and similarly "long" is really "long int".


Google’s C++ style guide similarly recommends using ‘int’ in general and never using ‘int32_t’, though it recommends using ‘int64_t’ for bigger numbers instead of ‘long’:

https://google.github.io/styleguide/cppguide.html#Integer_Ty...


I essentially do not use int short, long, long long at all. Frankly I think those were a terrible mistake and people should avoid using them.


Anybody who uses "int short" should be keel-hauled. That nonsense did not make its way into D!


The lack of memory safe casts drives me a bit batty.

You would think casting something as signed or unsigned wouldn't promote to a int / unsigned int. Ditto for const and unconst.


When the first C++ Standard (C89) was being created, about half the compilers implemented "sign preserving" semantics, which is what you're advocating, and the other half implemented "value preserving" semantics.

A great battle ensued, and many champions were slain.

The value preserving folks carried the field, and the sign preserving folks changed their compilers.


I was always a little surprised C never had an integer type the size of a native word. Probably the closest thing would be intptr_t since pointers presumably use a single word to represent addressable memory.


AFAIK until the switch to 64-bit architectures, int actually was the natural word size. Keeping int at 32-bits was probably done to simplify porting code to 64-bits (since all structs with ints in them would change their memory layout - but that's what the fixed-width integer types are for anyway, e.g. int32_t).

In hindsight it would probably have been better to bite the bullet and make int 64 bits wide.


> AFAIK until the switch to 64-bit architectures, int actually was the natural word size

32-bit int is still arguably the native word size on x64. 32-bit is the fastest integer type there. 64-bit at the very least often consumes an extra prefix byte in the instruction. And that prefix is literally called an "extension" prefix... very much the opposite of native!


Aren't 32-bit registers/operations also called "extensions" of their 16-bit counterparts on the x86 line, due to the ISA's 16-bit 8086/80286 lineage?

So could one make the argument that a 16-bit int ought to be the native word size on x64?


No. This isn't about dictionary pedantry. 16-bit is actually frequently more expensive than 32-bit on x86.


32-bit also often has the prefix byte (if one of the operands is r8-r15 or, for extending moves from 8-bit registers, r4-r15)


int being originally native word size is the reason for weird integer promotion rules.


IMHO it's only weird because the promotion is to 32-bit integers on 64-bit platforms. If all math would happen on the natural word size (e.g. 64-bits), and specific integer widths would only matter for memory loads/stores, it would be fine.


int is the smallest type you can do ALU ops on, so as long as x64 can still do 32-bit arithmetic, it's "natural" for it to remain 32 bits.


> pointers presumably use a single word to represent addressable memory.

Only in flat address spaces, which excludes platforms like old 16-bit x86 or modern CHERI. There, "pointer difference within a single object" need not be the same size as "pointer reference".


It did on the PDP-11


size_t and ptrdiff_t works for me.


I would argue that the native word size is still 32 bits on x86-64, though. With many instructions, using 64-bit registers needs a segment prefix override. Some RISC architectures do not even have 32-bit zero-extending integer instructions, so for them, 64-bit as the native word size makes sense. On the other hand, I'm not sure if <stdint.h> and uint16_t were invented at the time, and “short short int“ is not valid syntax (even today), so there wasn't an obvious way to denote a 16-bit integer type.


If the programmer didn’t specify the size of an int, it should mean “dealer’s choice.” Let the compiler pick a default, better yet make it a compiler option.

Fortran is, as always, ahead of the game.


It is a compiler setting.

`-Dint=__INT64_TYPE__`


“as always”?


I think you accidentally put a question mark where you meant to put an exclamation point.


Yes, they don't; it would be possible for a compiler to gather info about such call sites into a list and then, when it's finished compiling a compilation unit, to check this list against the now-complete symbol table and fail if some called functions have mismatched definitions or are still undefined... that's apparently was too much work back when C was designed so we have "hope for the best" design instead.


This doesn't work in C because the function is not necessarily defined in the same translation unit.

You'd need to do this at link time instead, which would require completely overhauling the format of object files, dynamic libraries, static libraries so they carry information about the types of functions instead of just the symbol names. It's not an easy fix.


> This doesn't work in C because the function is not necessarily defined in the same translation unit.

It does work in C ― that's what the include files are for (among other things), after all. So it's possible to be able to forward-declare functions outside the translation unit only (extern-declare?), those inside the translation units don't need to even if the compiler works in a single pass. And those external declaration could still be introduced at the very end of the translation unit and still would count. I dunno, seems like a pretty reasonable idea.


That's what DWARF is for, and similar formats on other platforms. Also Pascal-lineage languages commonly use this kind of embedded type information to provide module interfaces, it works quite well.


There is no information about the function's prototype in the object file, so the linker can't determine that. The compiler can't either since it's just compiling the compilation unit and has no knowledge of the outside world except what we tell it by prototypes (injected via header files or otherwise), but we aren't doing that here.


> There is no information about the function's prototype in the object file, so the linker can't determine that.

That that information isn’t there is an implementation choice.

For example, they could have hacked it in like C++ did by mangling names (https://en.wikipedia.org/wiki/Name_mangling). That probably would have required supporting longer identifiers (IIRC archives limited them to 14 characters), but that’s doable.


I think that would by definition be a two-pass compiler.


No, you emit all the code in one go, then after you've done that you have some residual pieces of data left one the side: the symbol table, and the list of all the calls of (hopefully)-forward-declared functions. At this point you could run the check on that list against the symbol table, no additional codegen or re-reading the source text needed.

Granted, you can call that a second pass although that's not that different from emitting a function's epilogue IMHO.


There is no need for the second phase. You can record the implicit declaration when you encounter it, and if there is a subsequent declaration, you can check that it's consistent or error out immediately.

This is what C compilers already do, in fact, to produce warnings when an implicit declaration doesn't match a later explicit declaration. But this is a best-effort warning only; it doesn't work if there is no declaration because the function is defined in a different translation unit, as I pointed out above.


Usually, the second pass in a compiler does not re-parse source files. Rather, it operates on another data structure, like an AST, intermediate representation, or the list mentioned in the original comment. At least, that’s my understanding of multi-pass compilation.


Well, this "list mentioned in the original comment" is not an AST or an immediate representation of the program in any reasonable sense just as a symbol table is not. Otherwise, setting the exact values of the text/data/bss size fields in the executable file header's at the end of the compilation would count as the second pass as well which IMO it should not.


The difference is that you need to make a complete second iteration (or pass) over the entire list to correctly check all of the callsites after all function type information is collected. The same is not true for symbol table usage in a single-pass C compiler.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: