Little tip for the younger folks reading this: if you are given two contradictory explanations for something, the correct explanation is probably the third one.
This reminds me of the contemporary use of e-bikes in Ukraine, which I understand is one of the many reasons that this war was not the curbstomp Russia seems to have expected.
I won't define reasoning, just call out one aspect.
We have the ability to follow a chain of reasoning, say "that didn't work out", backtrack, and consider another. ChatGPT seems to get tangled up when its first (very good) attempt goes south.
This is definitely a barrier that can be crossed by computers. AlphaZero is better than we are at it. But it is a thing we do which we clearly don't simply do with the probabilistic regurgitation method that ChatGPT uses.
That said, the human brain combines a bunch of different areas that seem to work in different ways. Our ability to engage in this kind of reason, for example, is known to mostly happen in the left frontal cortex. So it seems likely that AGI will also need to combine different modules that work in different ways.
On that note, when you add tools to ChatGPT, it suddenly can do a lot more than it did before. If those tools include the right feedback loops, the ability to store/restore context, and so on, what could it then do? This isn't just a question of putting the right capabilities in a box. They have to work together for a goal. But I'm sure that we haven't achieved the limit of what can be achieved.
these are things we can teach children to do when they don't do it at first. I don't see why we can't teach this behavior to AI. Maybe we should teach LLM's to play games or something. or do those proof thingys that they teach in US high school geometry or something like that. To learn some formal structure within which they can think about the world
It feels like humans do do a similar regurgitation as part of a reasoning process, but if you play around with LLMs and ask them mathematical questions beyond the absolute basics it doesn’t take long before they trip up and reveal a total lack of ‘understanding’ as we would usually understand it. I think we’re easily fooled by the fact that these models have mastered the art of talking like an expert. Within any domain you choose, they’ve mastered the form. But it only takes a small amount of real expertise (or even basic knowledge) to immediately spot that it’s all gobbledygook and I strongly suspect that when it isn’t it’s just down to luck (and the fact that almost any question you can ask has been asked before and is in the training data). Given the amount of data being swallowed, it’s hard to believe that the probabilistic regurgitation you describe is ever going to lead to anything like ‘reasoning’ purely through scaling. You’re right that asking what reasoning is may be a philosophical question, but you don’t need to go very far to empirically verify that these models absolutely do not have it.
On the other hand, it seems rather intuitive we have a logic based component? Its the underpinning of science. We have to be taught when we've stumbled upon something that needs tested. But we can be taught that. And then once we learn to recognize it, we intuitively do so in action. ChatGPT can do this in a rudimentary way as well. It says a program should work a certain way. Then it writes it. Then it runs it. Then when the answer doesn't come out as expected (at this point, probably just error cases), it goes back and changes it.
It seems similar to what we do, if on a more basic level. At any rate, it seems like a fairly straight forward 1-2 punch that, even if not truly intelligent, would let it break through its current barriers.
LLMs can be trained on all the math books in the world, starting from the easiest to the most advanced, they can regurgitate them almost perfectly, yet they won't apply the concepts in those books to their actions. I'd count the ability to learn new concepts and methods, then being able to use them as "reasoning".
Aren't there quite a few examples of LLMs giving out-of-distribution answers to stated problems? I think there are two issues with LLMs and reasoning:
1. They are single-pass and static - you "fake" short-term memory by re-feeding the questions with it answer
2. They have no real goal to achieve - one that it would split into sub-goals, plan to achieve them, estimate the returns of each, etc.
As for 2. I think this is the main point of e.g. LeCun in that LLMs in themselvs are simply single-modality world models and they lack other components to make them true agents capable of reasoning.
Just yesterday I saw an example of a person asking GPT what "fluftable" means. The word was invented by their little daughter and they didn't know what it meant. GPT reasoned it was a portmaneau of"fluffy" and "comfortable", and it made sense because it was used in reference to a pillow. If it's just regurgitation, I'd like to know how it's able to understand novel words not found in the training data...
For words that are not in the model's vocabulary, like 'fluftable', the model uses a subword tokenization strategy. It breaks down the word into smaller known subunits (subwords or characters) and represents each subunit with its own vector. By understanding the context in which 'fluftable' appears and comparing it to known words with similar subunits, the model can infer a plausible meaning for the word. This is done by analyzing the vector space in which these representations exist, observing how the vectors align or differ from those of known words.
'As always, the most important principle for understanding LLMs is that you should resist the temptation of anthropomorphizing them.'
I'm sorry, but that's absurd. Being able to explain the precise mechanism behind reasoning would make anything sound like it's not reasoning, because of our prior experiences. If we understood human reasoning well enough to explain exactly what happens in our brain, you would conclude that we're not really reasoning because you can provide an explanation of how we're reasoning about novel, out of distribution data. This is "God of the gaps" for thought.
What you've written does nothing to disabuse any reasonable person of the notion that LLMs cannot reason; if anything you've explained how LLM's reason, not that they cannot do it.
Because you’re not understanding what it’s regurgitating. It’s not a fact machine that regurgitates knowledge, in fact it’s not really so good at that. It regurgitates plausible patterns of language, and combining words and such is hardly a rare pattern
With only the information we had in 2020, the two theories “language models don’t reason, they regurgitate” and “as language models scale, they begin to think and reason” made predictions, and the people who invested time and money based on the predictions of the latter theory have done well for themselves.
AGI doesn't reason either. Noone defines AGI as "AI, but with reasoning". It's "AI, that outperforms humans at all disciplines, by any degree" usually. Maybe you confused it with ASI, but even then reasoning isn't a requirement afaik.
Reasoning is a learnt concept that involves retrieving memories and running them though an algorithm, also retrieved from memory, and then you loop the process until a classifier deems the result to be adequate to the given goal.
Reasoning blends learned skills and natural cognition.
It integrates new information, not just past memories.
Reasoning is adaptable, not rigidly algorithmic.
Emotions and context also shape reasoning.
I hope this will be found in history books and some students will point the irony that people are relying on gpt4's arguments about reasoning in a thread where it's proclaimed that said model can't reason
In fact it is not absurd or weird. The model does not need to be capable of x/reasoning to produce knowledge about x/reasoning. A book with a chapter on x/reasoning doesn't reason either.
Did you only read the title? Because the abstract gives you a pretty good idea of what they mean when they say reason. It's pretty easy to understand. No need to immediately call bullshit just because of a minor semantic disagreement.
>ThEY DON'T tHiNk. They'rE JuSt STochAStiC pARrotS. It'S not ReAL AGi.
It doesn't even matter if these claims are true or not. They're missing the point of the conversation and the paper. Reason is a perfectly valid word to use. So is think. If you ask it a question and then follow up with 'think carefully' or 'explain carefully'. You'll get the same response.
inb4 AcTUALLy LlMS Can'T do aNYtHIng CaRefUlly BECaUse pRogRAms ARen'T caRefUl
In this case I used 'usually' because don't remember all details and didn't want to generalize by saying 'always', but also training/benchmarking protocol can be flawed, for example LLM still can solve shallow reasoning problem by memorizing pattern.
I don't think anyone is holding Ireland responsible for "Irish" pubs. It somehow just became a genre of bar on its own, and it's easy to do in a lazy caricatured way, suitable for someone whose chief aim in owning a bar is rooking tourists: shamrock-based logo, change the name to "O'Whoever's", and have Guinness on tap, good to go.
Then you have other places like my old hangout from 20 years ago, which had a few of the same visual signifiers of Irishness, but the owner being in fact an Irishman, he didn't feel the need to go overboard with that. Hope that place is still open and doing well.
The dome pool is also the one part where you are required to wear a swimsuit. I'm pretty sure the reason is that if you didn't, let's put it like this, that water would become a health hazard pretty quick.
Swimsuit or not, people can behave inappropriately. Let me put it this way, the couple were having some form of sex and thought they were being discreet. The acoustics betrayed them.
I mean I don't doubt that it happens, but the swimsuit rule probably cuts down on it. And to be fair, that pool is possibly the sexiest place in Berlin.