Well... the actual problem is, imho, that it looks like the LLMs seem to have reached (or are close to reaching) a plateau.
You might be right about the "three months ago it could not produce a working implementation of a DBMS... but what if in 3 months (or 3 years) it stays stuck at the 20K slower threshold?
People have been saying, without any evidence at all, it's reached or about to reach a plateau for years now. We are clearly still seeing significant forward progress. While it's reasonable to think it will hit some plateau eventually, there's no reason to think that right now just happens to be as good as it's ever going to get.
Context is the plateau. It's why RAM prices are spiking. We're essentially throwing heap at the problem hoping it will improve. That's not engineering. It's not improving on a fundamental, technical level.
> Context is the plateau. It's why RAM prices are spiking.
Yes, context is the plateau. But I don't think it the bottleneck is RAM. The mechanism described in "Attention is all you need" is O(N^2) where N is the size of the context window. I can "feel" this in everyday usage. As the context window size grows, the model responses slow down, a lot. That's due to compute being serialised because there aren't enough resources to do it in parallel. The resources are more likely compute and memory bandwidth than RAM.
If there is a breakthrough, I suspect it will be models turning the O(N^2) into O(N * ln(N)), which is generally how we speed things up in computer science. That in turn implies abstracting the knowledge in the context window into a hierarchical tree, so the attention mechanism only has to look across a single level in the tree. That in turn requires it to learn and memorise all these abstract concepts.
When models are trained the learn abstract concepts which they near effortlessly retrieve, but don't do that same type of learning when in use. I presume that's because it requires a huge amount of compute, repetition, and time. If only they could do what I do - go to sleep for 8 hours a day, and dream about the same events using local compute, and learn them. :D Maybe, one day, that will happen, but not any time soon.
If a bridge girder isn't strong enough to support a load, you add material in the right places to make a larger, stronger girder. That is engineering. The idea that if you're not making fundamental improvements to your formulation of steel you aren't progressing is absurd. If adding RAM leads to improvements, and we have the engineering ability to add more RAM, then we are still making progress.
Regardless of how true your statement is (just adding metal to a structure is commonly not a way to solve the problem you stated, it just makes the structure heavier which means other systems have more to support) the point is that it isnt exponential/fundamental progress, which is the type that would be needed to avoid the plateaus folks are mentioning. Also adding RAM doesnt give you even linear improvements, its logarithmic.
> just adding metal to a structure is commonly not a way to solve the problem you stated, it just makes the structure heavier which means other systems have more to support
As a mechanical engineer, that is exactly how you solve that problem.
> point is that it isnt exponential/fundamental progress
You just stuck the goalpost on a rocket and shot it into space. You'd be hard pressed to show evidence that progress in this field was ever exponential - in most fields it never was. Logarithmic progress is typical; you make a lot of progress early on picking the low hanging fruit figuring out the basics, and as the problems get harder and the theory better understood it takes more effort to make improvements, but fundamentally improvements continue.
Incremental progress from increasing scale is, again, perfectly cromulent. It's how we've made advanced computers that can fit in your pocket, it's how clothing became so cheap it's practically disposable, it's how you can fly across the country for less than the price of a nice dinner. Imagine looking at photolithography, textile manufacturing, or aircraft 5 years after they reached their modern forms and saying "this has plateaued".
A little tangental, but I'm not entirely convinced the things you list at the end are improvements, per se. Clothing is so cheap because it's polyester, which is essentially plastic and is demonstrably bad for the environment. Same thing with 'computers in the pocket.' They're so cheap and refreshed at such a rate they become disposable when they really shouldn't be. E-waste is a real problem. Flying across the country...the train is better from a last-mile perspective.
In a sense, looking at photolithography, textile manufacturing, or aircraft as you suggest, does show they plateaued, at least to me.
Are we sure we want to be making things so cheap they become discardable in the ever-growing landfills of the world?
> You'd be hard pressed to show evidence that progress in this field was ever exponential - in most fields it never was.
Literally the introduction of transformers was absolutely exponential, in fact exponential progress is pretty much the defining characteristic of first chunk of a new technology's development. I mean in CS specifically, there are dozens and dozens of instances of exponential improvements. Like... obviously lol. Also the plateau that folks are mentioning is about a lack of fundamental improvements. Perhaps MEs dont experience exponential improvements but we do all the time in CS and SWE lol.
Everybody already has local regional tickets anyway. And most people can't be in more then one place at the time anyway. And most people stay in the same region most of the time anyway.
I live in Rostock. So if I want to go to Berlin or Hamburg (you know, where stuff like actual airports are) I am crossing "regional borders" even if it is a 200-250 km trip to each city
At least from Rostock to somewhat closer destinations you have both options. There's a bi-hourly IC to Hamburg or Berlin and another bi-hourly RE towards the same destinations. They're not terribly different in terms of travel time, but one is a regional train and one is an inter-city train.
Sure, long distances (I had to travel from Rostock to Tübingen last weekend) are typically not taken with regional trains (although you technically can; I did that as a poor student a few times, it just takes 16 hours instead of 10), but over medium distances (around 2–3 hours) you often have both options.
About non-replaceable batteries: from what I understand, if a battery can be replaced by any random device owner you must design it with a robust cell to avoid risk of it being punctured, breaking, being crushed.
And therefore you have more shell, less actual battery and therefore it lasts less.
This does not mean that I believe this was done exclusively for altruistic reasons. More like: this will result in a slightly better experience for the user... and more revenue for Apple. So let's do it.
I've worked in consumer electronics, batteries are built in because reviewers will endlessly trash a product that is just 1mm thicker than anything apple puts out, and they fawn over apple because the products are so thin.
If anyone releases a product that is just a tiny bit thicker than last year, except headlines like "new super-thick phone doesn't fit in pockets, causes back problems".
A small exaggeration? Not by far, reviewers nasty about device thickness.
Then 70% of people shove a case on and it really doesn't matter.
There are good water ingress reasons for non-replaceable batteries, making a device water proof and have a replaceable battery does add a good deal of thickness.
Anyway, you can get a battery replaced at a phone shop for a reasonable rate anyway, so IMHO it isn't as big of a deal now days.
No one wants to, but that is how many consumers decide on what to buy. It is especially how early adopters tuned into the review scene for their favorite products decide what to buy.
I think that what erased "programmer vs computer illiterate" dichotomy was BASIC in the 80s.
I've met lots of "digital natives" and they seem to use technology as a black box and click/touch stuff at random until it sorta works but they do not very good at creating at mental model of why something is behaving in a way which is not what was expected and verify their own hypothesis (i.e. "debugging").
Agreed. And I feel it fair to argue that this is the intended interface between proprietary software and its users, categorically.
And more so with AI software/tools, and IMO frighteningly so.
I don’t know where the open models people are up to, but as a response to this I’d wager they’ll end up playing the Linux desktop game all over again.
All of which strikes at one of the essential AI questions for me: do you want humans to understand the world we live in or not?
Doesn’t have to be individually, as groups of people can be good at understanding something beyond an individual. But a productivity gain isn’t on it’s a sufficient response to this question.
Interestingly, it really wasn’t long ago that “understanding the full computing stack” was a topic around here (IIRC).
It’d be interesting to see if some “based” “vinyl player programming” movement evolved in response to AI in which using and developing tech stacks designed to be comprehensively comprehensible is the core motivation. I’d be down.
I am from that era, so I might add something that perhaps is not obvious at all nowadays.
The microcomputer explosion gave birth to an large number of actual paper magazines and at least 50% of their content were... actual source listing you had to manually retype.
Basic was already fragmented in a billion different flavors and dialects (especially if your program had any kind of graphics) so the more ambitious user could also try their hand at translating a listing from - say - TSR-80 to Apple Basic.
In any case you were directly exposed to the actual source code, and tweaking or experimenting with it felt very natural.
This are just the last three Social Media I subscribed in the past and range from Stagnant to Pretty Much Dead.
I suppose that the problem is that if you already have 1000+ followers on, say, Twitter or IG you try posting the same stuff in parallel on both... after 1 month of doubled effort you notice that your followers on the new platform is an order of magnitude smaller... you want to stop double posting because it is too time consuming. Guess which one you will opt out of?
I once wrote a small class at work and by the time I left it was like over 8k lines long. People jokes it was my fault I called it HelperUtil instead of something more descriptive. It was a dumping ground for all the stuff people didn't want to think about. I wonder if something like that is possible in the microservice world?
It probably wasn't a joke. If you call something HelperUtil, it will become a dumping ground. That's a learnable lesson around naming, a mistake, but it's not learnable if it keeps getting described as a joke.
C# accidentally solved this problem with extension methods, these little helper utils at least get grouped by type and not in one humongous file. Or maybe that was part of the design team's intention behind them all along.
And because they're static you can easily see when services or state are getting passed into a method, clearly showing when it should in fact be some sort of service or a new type.
I am on the move so I cannot check the video (but I did skim the pdf). Is there any chance to see an example of this technique? Just a toy/trivial example would be great, TIA!
For the Monte Carlo Method stuff in particular[1], I get the sense that the most iconic "Hello, World" example is using MC to calculate the value of pi. I can't explain it in detail from memory, but it's approximately something like this:
Define a square of some known size (1x1 should be fine, I think)
Inscribe a circle inside the square
Generate random points inside the square
Look at how many fall inside the square but not the circle, versus the ones that do fall in the circle.
From that, using what you know about the area of the square and circle respectively, the ratio of "inside square but not in circle" and "inside circle" points can be used to set up an equation for the value of pi.
Somebody who's more familiar with this than me can probably fix the details I got wrong, but I think that's the general spirit of it.
For Markov Chains in general, the only thing that jumps to mind for me is generating text for old school IRC bots. :-)
[1]: which is probably not the point of this essay. For for muddying the waters, I have both concepts kinda 'top of mind' in my head right now after watching the Veritasium video.
> From that, using what you know about the area of the square and circle respectively, the ratio of "inside square but not in circle" and "inside circle" points can be used to set up an equation for the value of pi.
Back in like 9th grade, when Wikipedia did not yet exist (but MathWorld and IRC did) I did this with TI-Basic instead of paying attention in geometry class. It's cool, but converges hilariously slowly. The in versus out formula is basically distance from origin > 1, but you end up double sampling a lot using randomness.
I told a college roommate about it and he basically invented a calculus approach summing pixels in columns or something as an optimization. You could probably further optimize by finding upper and lower bounds of the "frontier" of the circle, or iteratively splitting rectangle slices in infinitum, but thats probably just reinventing state of the art. And all this skips the cool random sampling the monte carlo algorithm uses.
Sorry, I should have been more specific maybe: I do know about Montecarlo, and yeah, the circle stuff is a more or less canonical example - but I wanted to know more about the Markov Chains, because, again, I only know these in terms of sequence generators and I have some problems imagining how this could "solve problems" unless your problem is "generate words that sorta sound like a specific language but it is just mostly gibberish".
I’ve always loved this example. I implemented the Monte Carlo pi estimation on a LEGO Mindstorms NXT back in high school. Totally sparked my interest in programming, simulations, etc. Also the NXT’s drag-and-drop, flowchart programming interface was actually a great intro to programming logic. Made it really easy to learn real programming in later on.
Monte Carlo Value for Pi
Each successive sequence of six bytes is used as 24 bit X and Y co-ordinates within a square. If the distance of the randomly-generated point is less than the radius of a circle inscribed within the square, the six-byte sequence is considered a “hit”. The percentage of hits can be used to calculate the value of Pi. For very large streams (this approximation converges very slowly), the value will approach the correct value of Pi if the sequence is close to random. A 500000 byte file created by radioactive decay yielded:
Monte Carlo value for Pi is 3.143580574 (error 0.06 percent).
reply