Start from the perspective of the user seeing effectively:
> error: expected the character ';' at this exact location
The user wonders, "if the parser is smart enough to tell me this, why do I need to add it at all?"
The answer to that question "it's annoying to write the code to handle this correctly" is thoroughly lazy and boring. "My parser generator requires the grammar to be LR(1)" is even lazier. Human language doesn't fit into restrictive definitions of syntax, why should language for machines?
> Because code is still read more than it is written it just doesn't seem correct to introduce ambiguity like this.
That's why meaningful whitespace is better than semicolons. It forces you to write the ambiguous cases as readable code.
I used to hate semicolons. Then I started working in parser recovery for rustc. I now love semicolons.
Removing redundancy from syntax should be a non-goal, an anti-goal even. The more redundancy there is, the higher the likelihood of making a mistake while writing, but the higher the ability for humans and machines to understand the developer's intent unambiguously.
Having "flagposts" in the code lets people skim code ("I'm only looking at every pub fn") and the parser have a fighting chance of recovering ("found a parse error inside of a function def, consume everything until the first unmatched } which would correspond to the fn body start and mark the whole body as having failed parsing, let the rest of the compiler run"). Semicolons allow for that kind of recovery. And the same logic that you would use for automatic semicolon insertion can be used to tell the user where they forgot a semicolon. That way you get the ergonomics of writting code in a slightly less principled way while still being able to read principled code after you're done.
Why is ";" different from \n from the perspective of the parser when handling recovery within scopes? Similarly, what's different with "consume everything until the first unmatched }" except substituting a DEDENT token generated by the lexer?
Like no one characterizes it like that, but this is the same business where you can tell a story about hiring a bunch of college friends to pretend to be your employees so a client comes to your "office" and thinks you're a legitimate business. And instead of looking in horror at how casually you'll lie to get business it's seen as scrappy and whimsical.
Using LLMs for any kind of writing is unethical, with the narrow exception of translation. If you didn't take the time to compose your words thoughtfully then you aren't owed the time to read them.
The LLM presents a perverse incentive here - It is used for perceived efficiency gains, most of which would be consumed by the act of rewriting and redrafting. The alienness of the thoughts in the document is also non-condusive to this; Reading a long document about something you think you know but did not write is exhausting and mentally painful - This is why code review has such relatively poor results.
Quite frankly, while having an LLM draft and rewriting it would be okay, I do not believe it is reasonable to expect that to ever happen. It will be either like high school paper plagarism (Just change around some of the sentences and rephrase it bro), or it will simply not even get that much. It is unreasonable with what we know about human psychology to expect that "Human-Rewrites of LLM drafts", at the level that the human contributes something, are maintainable and scalable; Most people psychologically can't put in that effort.
>The LLM presents a perverse incentive here - It is used for perceived efficiency gains, most of which would be consumed by the act of rewriting and redrafting.
It might give efficiency gains for the writer, but the reader has to read the slop and try to guess at what it was intending to communicate and weed out "hallucinations". That's a big loss of efficiency for the reader.
I just can't seem to square up that the same people that complained left-and-right about "code smells" are the same ones that are shitting out slop code and are proud they shipped 50k lines of code in a week. It's going to be a maintenance nightmare for someone else. I'm not sure how anyone coming in is going to learn a codebase written by LLMs when it's 10x more code than is reasonably needed to solve the problem.
at this point I really think its better to read broken english than have to read some clanker slop. it immediately makes me want to just ignore whatever text i'm reading, its just a waste of time
I do wonder, we had pretty good (by some measure of good) machine translations before LLMs. Even better, the artifacts in the old models were easily recognized as machine translation errors, and what was better, the mistranslation artifacts broke spectacularly, sometimes you could even see the source in the translation and your brain could guess the intended meaning through the error.
With LLMs this is less clear, you don’t get the old school artifacts, instead you get hallucinations, and very subtle errors that completely alter the meaning while leaving the sentence intact enough that your reader might not know this is a machine translation error.
and not just artifacts/hallucinations, the worst thing about is the fact that its basically "perfect" English, perfect formatting, which makes it just look like grey slop, since it all sounds the same and its hard to distinguish between the slop articles/comments/PRs/whatever.
and it will also "clean up" the text to the point where important nuances and tangents get removed/transformed into some perfect literature where it loses its meaning and/or significance
I don't think that's fine, I think that's an example of why using LLMs to write is unethical and creates no value.
The purpose of written language is to express your thoughts or ideas to others. If you're synthesizing text and then refining it you're not engaging in that practice.
I disagree with the downvotes, but let me put it differently: if you don’t understand, have reviewed and be ready to own all of LLM output (the thoughtful part), then you aren’t owned the time to read them. If you didn’t try to reign in the verbose slop that’s the default for LLMs, I don’t want to read it.
Maybe the poster is running a local LLM.. you’d think that a SOTA model would have surmised that an overnight MacOS upgrade can only be a minor version.
As someone who has dealt with projects with AI-generated documentation... I can't really say I agree. Good documentation is terse, efficiently communicating the essential details. AI output is soooooooo damn verbose. What should've been a paragraph becomes a giant markdown file. I like reading human-written documentation, but AI-slop documentation is so tedious I just bounce right off.
Plus, when someone wrote the documentation, I can ask the author about details and they'll probably know since they had enough domain expertise and knowledge of the code to explain anything that might be missing. I can't trust you to know anything about the code you had an AI generate and then had an AI write documentation for.
Then there's the accuracy issue. Any documentation can always be inaccurate and it can obviously get outdated with time, but at least with human-authored documentation, I can be confident that the content at some point matched a person's best understanding of the topic. With AI, no understanding is involved; it's just probabilistically generated text, we've all hopefully seen LLMs generate plausible-sounding but completely wrong text enough to somewhat doubt their output.
Gah hopefully the meaning was clear from context, but I just realized I said "latter" when I meant "former". Inconsistent human documentation is better than miles upon miles of AI-slop documentation.
Given that people have access to LLMs themselves, publishing their output in lieu of good documentation (no matter how sparse) seems like it’s mostly downside.
This immediately invalidates a software or technical project for me. The value of documentation isn't the output alone, but the act of documenting it by a person or people that understand it.
I have done a lot of technical writing in my career, and documenting things is exactly where you run into the worst design problems before they go live.
Agreed, which is why I didn't bother reading this comment before downvoting it. If you think that you were owed some other behavior from me despite not paying me for it, feel free to elaborate; for example, you could acknowledge that there exists an implicit social contract when it comes to basic human communication.
Also elitist attitudes towards people for whom English isn’t a native language, elitist attitudes towards people with dyslexia and other conditions that make writing difficult, and elitist attitudes towards people with lower education levels.
One problem I see with the broader use of LLMs these days is the death of literacy.
For example, you chose to read my response and attack the vocabulary as if that was the point I was trying to make. This is a misunderstanding. I am purposefully reusing the word choice of the comment I'm replying to.
I was trying to very concisely point out that if an LLM is generating your writing it is not your words or your thoughts that you're trying to communicate.
> If you didn't take the time to compose your words thoughtfully then you aren't owed the time to read them.
Apply this argument to code, to art, to law, to medicine.
It fails spectacularly.
Blaming the tool for the failure of the person is how you get outrageous arguments that photography cant be art, that use of photoshop makes it not art...
Do you blame the hammer or the nail gun when the house falls down, or is it the fault of the person who built it?
If you dont know what you're doing, it isnt the tools fault.
Fundamentally it's still a memory limitation, just in terms of memory latency/cache misses instead of capacity. If you double the size of your numbers you're doubling the space it takes up and all the problems that come with it.
No it isn't. The 64-bit capabilities of modern CPUs have almost nothing to do with memory. The address space is rarely 64 bits of physical address space anyways. A "64-bit" computer doesn't actually have the ability to deal with 64 bits of memory.
If you double the size of numbers, sure it takes up twice the space. If the total size is still less that one page it isn't likely to make a big difference anyways. What really makes a difference is trying to do 64-bit mathematics with 32-bit hardware. This implies some degree of emulation with a series of instructions, whereas a 64-bit CPU could execute that in 1 instruction. That 1 instruction very likely executes in less cycles than a series of other instructions. Otherwise no one would have bothered with it
"Bitness" of a CPU almost always refers to memory addressing.
Now you could build a weird CPU that has "more memory" than it has addressable width (the 8086 is kind of like this with segmentation and 8/16 bit) but if your CPU is 64 bit you're likely not to use anything less than 64 bit math in general (though you can get some tricks with multiple adds of 32 bit numbers packed).
But a 32 bit CPU can do all sorts of things with larger numbers, it's just that moving them around may be more time-consuming. After all, that's basically what MMX and friends are.
The original 8087 implemented 80-bit operands in its stack.
It would also process binary-coded decimal integers, as well as floating point.
"The two came up with a revolutionary design with 64 bits of mantissa and 16 bits of exponent for the longest-format real number, with a stack architecture CPU and eight 80-bit stack registers, with a computationally rich instruction set."
Typically, it doesn't have the ability to deal with a full 64 bits of memory, but it does have the ability to deal with more than 32 bits of memory, and all pointers are 64 bits long for alignment reasons.
It's possible but rare for systems to have 64-bit GPRs but a 32-bit address space. Examples I can think of include the Nintendo 64 (MIPS; apparently commercial games rarely actually used the 64-bit instructions, so the console's name was pretty much a misnomer), some Apple Watch models (standard 64-bit ARM but with a compiler ABI that made pointers 32 bits to save memory), and the ill-fated x32 ABI on Linux (same thing but on x86-64).
That said, even "32-bit" CPUs usually have some kind of support for 64-bit floats (except for tiny embedded CPUs).
The 360 and PS3 also ran like the N64. On PowerPC, 32 bit mode on a 64 bit processor just enables a 32 bit mask on effective addresses. All of the rest is still there line the upper halves of GPRs and the instructions like ldd.
Wait, I hate employer-provided health insurance and think it's a terrible policy but what does that have to do with providers charging everyone --- including Medicare! --- way too much for services?
It’s a round about recognition of the agency problem in the medical industry.
If people chose and directly paid for there own medical bills and insurance then extra fees and extra diagnostics would be born directly by the person paying for it, who would have the freedom to make other choices, like picking insurance providers who were better at preventing it.
At least that’s an argument you can reasonably make. I’m not sure it would hold up in practice given how different medicine is from other markets.
The health insurance industry drives highly increased administrative costs - costs which the insurance companies are happy to foist off onto non insurance channels?
This is why I don't like TUIs at all, they're really bad at displaying complex information, handling complex interactions, and discovering how to compose those together.
> error: expected the character ';' at this exact location
The user wonders, "if the parser is smart enough to tell me this, why do I need to add it at all?"
The answer to that question "it's annoying to write the code to handle this correctly" is thoroughly lazy and boring. "My parser generator requires the grammar to be LR(1)" is even lazier. Human language doesn't fit into restrictive definitions of syntax, why should language for machines?
> Because code is still read more than it is written it just doesn't seem correct to introduce ambiguity like this.
That's why meaningful whitespace is better than semicolons. It forces you to write the ambiguous cases as readable code.
reply