> The argument that computational complexity has something to do with this could have merit but the article certainly doesn’t give indication as to why.
OP says it is because that predicting the next token can be correct or not, but it always looks plausible because that is what it calculates. Therefore it is dangerous and can not be fixed because it is how it works in essence.
Another anecdote. I've got a personal benchmark that I try out on these systems every time there's a new release. It is an academic math question which could be understood by an undergraduate, and which seems easy enough to solve if I were just to hammer it out over a few weeks. My prompt includes a big list of mistakes it is likely to fall into and which it should avoid. The models haven't ever made any useful progress on this question. They usually spin their wheels for a while and then output one of the errors I said to avoid.
My hit/miss rate with using these models for academic questions is low, but non-trivial. I've definitely learned new math because of using them, but it's really just an indulgence because they make stuff up so frequently.
I get generally good results from prompts asking for something I know definitely exists or is definitely possible, like an ffmpeg command I know I’ve used in the past but can’t remember. Recently I asked how to something in Imagemagick which I’d not done before but felt like the kind of thing Imagemagick should be able to do. It made up a feature that doesn’t exist.
Maybe I should have asked it to write a patch that implements that feature.
I find it incredibly useful for information retrieval from dense, archival-like text knowledge. I research cellular networks, and everything on Google/DDG is just fluffy SEO spam, but I find Gemini can reliably hone into the precise subsection out of tens of thousands of dense standards to tell me what 5G should do in a given scenario
There is no difference between "hallucination" and "soberness", it's just a database you can't trust.
The response to your query might not be what you needed, similar to interacting with an RDBMS and mistyping a table name and getting data from another table or misremembering which tables exist and getting an error. We would not call such faults "hallucinations", and shouldn't when the database is a pile of eldritch vectors either. If we persist in doing so we'll teach other people to develop dangerous and absurd expectations.
No it's absolutely not. One of these is a generative stochastic process that has no guarantee at all that it will produce correct data, and in fact you can make the OPPOSITE guarantee, you are guaranteed to sometimes get incorrect data. The other is a deterministic process of data access. I could perhaps only agree with you in the sense that such faults are not uniquely hallucinatory, all outputs from an LLM are.
I don't agree with these theoretical boundaries you provide. Any database can appear to lack in determinism, because data might get deleted, corrupted or mutated. Hardware and software involved might fail intermittently.
The illusion of determinism in RDBMS systems is just that, an illusion. The reason why I used the examples of failures in interacting with such systems that I did is that most experienced developers are familiar with those situations and can relate to them, while the probability for the reader to having experienced a truer apparent indeterminism is lower.
LLM:s can provide an illusion of determinism as well, some are quite capable of repeating themselves, e.g. overfitting, intentional or otherwise.
If the information it gives is wrong, but is grammatically correct, then the "AI" has fulfilled its purpose. So it isn't really "wrong output" because that is what the system was designed to do. The problem is when people use "AI" and expect it will produce truthful responses - it was never designed to do that.
But the point is that everyone uses the phrase "hallucinations" and language is just how people use it. In this forum at least, I expect everyone to understand that it is simply the result of next token generation and not an edge case failure mode.
I would have thought to assume that, but given how many on HN throw about how LLM's can think, reason, understand I think it does bear clearly defining some of the terms used.
Sorry, I'm failing to see the danger of this choice of language? People who aren't really technical don't care about these nuances. It's not going to sway their opinion one way or another.
Yep. All these do is “hallucinate”. It’s hard to work those out of the system because that’s the entire thing it does. Sometimes the hallucinations just happen to be useful.
> It sometimes feels like the only thing saving LLMs are when they’re forced to tap into a better system like running a search engine query.
This is actually very profound. All free models are only reasonable if they scrape 100 web pages (according to their own output) before answering. Even then they usually have multiple errors in their output.
I like asking it about my great great grandparents (without mentioning they were my great great grandparents just saying their names, jobs, places of birth).
It hallucinates whole lives out of nothing but stereotypes.
Responding with "skill issue" in a discussion is itself a skill issue. Maybe invest in some conversational skills and learn to be constructive rather than parroting a useless meme.
First of all, there is no such thing as "prompt engineering". Engineering, by definition, is a matter of applying scientific principles to solve practical problems. There are no clear scientific principles here. Writing better prompts is more a matter of heuristics, intuition, and empiricism. And there's nothing wrong with that — it can generate a lot of business value — but don't presume to call it engineering.
Writing better prompts can reduce the frequency of hallucinations but frequent hallucinations still occur even with the latest frontier LLMs regardless of prompt quality.
So you are saying the acceptable customer experience for these systems is that we need to explicitly tell them to accept defeat when they can’t find any training content/web search results that matches my query enough?
Why don't they have any concept of having a percentage of confidence in their answer?
It isn’t 2022 anymore, this is supposed to be a mature product.
Why am I even using this thing rather than using the game’s own mod database search tool? Or the wiki documentation?
What value is this system adding for me if I’m supposed to be a prompt engineer?
To take a different perspective on the same event.
The model expected a feature to exist because it fitted with the overall structure of the interface.
This in itself can be a valuable form of feedback. I currently don't know of any people doing it, but testing interfaces by getting LLMs to use them could be an excellent resource. Th the AI runs into trouble, it might be worth checking your designs to see if you have any inconsistencies, redundancies or other confusion causing issues.
One would assume that a consistent user interface would be easier for both AI and humas. Fixing the issues would improve it for both.
That failure could be leveraged into an automated process that identified areas to improve.
You seem to be living in the past. While EHRs are still primarily used from desktop PCs, all of the major ones have native mobile apps now. Clinicians appreciate being able to review patient charts and action alerts while away from a PC cart.
And this would be a white-label Epic MyChart for the particular system with embedding for the inpatient or customer facing connections that should be used
It seems like that could be done with a system shipping their own white-labeled GNU Health app through the App Store
You're really missing the point. The EHR vendors aren't charging customers for those apps through the Apple or Google app stores so "broken economics" are irrelevant. The app stores are only a distribution mechanism and work fine for that.
I have cut my warm water costs by 80% with balcony solar panels. I have a warm water heating pump with 600 W electrical power. My little server turns it automatically on when the solar access power is greater than 540 W (measured by the smart meter). This generates usually enough warm water for our household.
Also the solar panels cover to idle power of the house of 50-100 W very easily during daytime. This pays off in a few years and it reduces my carbon footprint and that of my neighbors.
The hybrid (heat-pump & heat-elements, both) water heater I installed 2 years ago has already paid for itself in savings. This design literally pulls the heat out of your conditioned space, providing both cooling and dehumidification (I live in a humid temperate rainforest so win-win). <3% of my annual electric usage goes to my water heater (typically 10%+).
During the brief winter months I just set it to heating elements only, and it behaves like a traditional watertank heater (i.e. doesn't cool house in the winter, using only resistive heating).
I assume balcony solar panels provide you with a power socket. How do you connect all the appliances in house to that socket(s)? Isn't it a lot of cabling?
Well, the person you are replying to is in a thread about Germany, mentions balcony solar and said "my little server turns it automatically on" (which is how you would construct that sentence in German instead of "turns it on automatically"), so my wild guess would be Germany. ;)
Germany has a pretty consistent climate. Doesn't really matter where you live. Of course, that's an oversimplification, but if you're new to Germany and wonder "oh, what's the weather going to be here?", the answer pretty much is "similar to the rest of the country".
You could then look at a map of France and think, ah, similarly sized country, probably also has a consistent climate, but that's not true. Southern France is very different from Northern France. But Germany's climate is pretty uniform.
Yes, there is a difference, you are right. I don't have hard numbers at the moment (typing from the phone) but from looking it up quickly, the sun's intensity varies from about 950 kWh/m² to about 1.200 kWh/m² between north and south Germany. So, what OP described will generally work in any part of Germany.
There sadly isn't a single viable option for a Linux mobile phone out there.
- Purism runs ancient hardware, charges way too much and has questionable business ethics.
- Pine64 has equally bad hardware but reasonable prices. I don't like the Hong-Kong connection though. Not sure how the security patching environment is in practice.
The only option on the table as I see it is buying from the devil and installing GrapheneOS.
FuriLabs has shipped a usable device for going on two hardware releases now.
Yes, it currently builds on top of Hallium. Anyone who thinks this should be a sticking point has their head in the sand; the device and effort is how you get a usable ecosystem rolling.
Maybe a sufficient number off hackers are offended enough now and contribute to really free platforms, like PostmarketOS or Mobian. There has been great work there in the last years. I think we are not very far away from a really usable free phone, we need device drivers and android emulation / f-droid as long as native apps did not catch up.
Based on my experiences with LLMs and the hype around it, we will need more experienced programmers.
Because they will have to clean up the huge mess that will come.
In my experience if you look at what effect democratizing code actually has, this is exactly the case. People are generating code, but that’s always been the easy part. The mess this is going to make, gonna need a lot of mops.
If programmers had any self respect we would refuse to be slop janitors and instead just build competing tools and eat the lunches of AI "coders"
But time and time again programmers continue to demonstrate we have no self respect and we're happy to dance to the tune of capital for money. That means we're definitely going to be downgraded to "slop janitors" in the near future
This distinction never existed in LISP. Greenspun's tenth rule in action.