Hacker Newsnew | past | comments | ask | show | jobs | submit | andreybaskov's commentslogin

Great summary of the current state of AI coding, without all the hype.

I wonder though if we are fundamentally limited by LLMs architecture to come up with some novel architecture. Or is it just a limitation of current prompting and tools?

E.g. could your brute force the problem by asking "come up with 100 innovative, not yet tried ideas in compiler architecture" and run 100s of this experiments.

I feel like the answer to that is basically that "spark" that we have as human intelligence, but also... sometimes you get it by trying and failing many many times.


Solar-powered data center in a desert.

Fully off-grid using solar, batteries and Starlink for uplink. Focusing on AI inference at the beginning. Currently building our first prototype and testing cooling solutions.

https://solarcube.com


How will you cool it? Will it be under ground? I’ve often thought about this as we have space and sunlight.


I'm testing ground heat rejection system that dumps heat into the ground. It needs a lot of space, but you need it for panels anyways.


Original Apple II manuals written by Chris Espinosa and Jef Raskin are a treat to read. Would highly recommend, just to get a sense of what it was like to get onboarded on Apple II back in the day.

And then obviously Programming the 6502 by Rodnay Zaks.


Out of all things I'm actually surprised they went straight to custom silicon, but gotta respect that decision. It's likely the only way to compete with Tesla right now.


Thank you for saying this. It's almost like others are saying we should stop trying things because they are hard and challenging.

I wish we could dream a bit bigger rather than coming up with reasons something will fail.


> What high quality data sources are not already tapped? Synthetic data? Video?

> Where does the next 1000x flops come from? Even with Moore's law dead, we can easily build 1,000x more computers. And for arguments about lack of power - we have sun.


A Dyson sphere brain ?


I don't think we need that for 1,000x. We can building more solar, nuclear and there is still room for at least 10x improvement in efficiency for the chips. We are far far away from maxing out our compute capability as civilization before we start shooting satellites into the sun.


I see LLMs in a similar way - a new UI paradigm that "clicks the right buttons" when you know what you need, but don't know exact names of the buttons to click.

And from my experience there are lots and lots of jobs that are just "clicking the right buttons".


Say we discover a new architecture breakthrough like Yann LeCun's proposed JEPA. Won't scaling laws apply to it anyway?

Suppose training is so efficient that you can train state of the art AGI on a few GPUs. If it's better than current LLMs, there will be more demand/inference, which will require more GPUs and we are back at the same "add more gpus".

I find it hard to believe that we, as a humanity, will hit the wall of "we don't need more compute", no matter what the algorithms are.


  > Won't scaling laws apply to it anyway?
Yes, of course. Scaling Laws will always apply, but that's not really the point[0]

The fight was never "Scale is all you need" (SIAYN) vs "scale is irrelevant" it was "SIAYN" vs "Scaling is not enough". I'm not aware of any halfway serious researcher that did not think scaling was going to result in massive improvements. Being a researcher from the SINE camp myself...

Here's the thing:

The SIAYN camp argued that the transformer architecture was essentially good enough. They didn't think scale was all you needed, but that the rest would me minor tweaks and increasing model size and data size would get us there. That those were the major hurdles. In this sense they argued that we should move our efforts away from research and into engineering. That AGI was now essentially a money problem rather than a research problem. They pointed to Sutton's Bitter Lesson narrowly, concentrating on his point about compute.

The SINE (or SINAYN) camp wasn't sold. We read the Bitter Lesson differently. That yes, compute is a key element to modern success, but just as important was the rise of our flexible algorithms. In the past we couldn't work with such algorithms because of lack of computational power, but that the real power was the algorithms. We're definitely a more diverse camp too, with vary arguments. Many of us look at animals and see that we can do so much more with so much less[2]. Clearly even if SIAYN were sufficient, it does not appear to be efficient. Regardless, we all agree that there's still subtle nuances in intelligence that need working out.

The characteristics of the scaling "laws" matter but it isn't enough. In the end what matters is generalization. For that we don't really have measures. Unfortunately, with the SIAYN camp also came benchmark maximization. It was a good strategy in the beginning as it helped give us direction. But we are now at the hard problem with the SINE camp predicted. How do you do things like make a model a good music generator when you have no definition of "good music"? Even in a very narrow sense we don't have a half way decent mathematical definition of any aesthetics. We argued "we should be trying to figure this out so we don't hit a wall" and they argued "it'll emerge with scale".

So now the cards have been dealt. Who has the winning hand? More importantly, which camp will we fund? And will we fund the SIAYN people that converted to SINE or will we fund those who have been SINE when times were tough?

[0] They've been power laws and I expect them to continue to be power laws[1]. But the parameters of those laws do still matter, right?

[1] https://www.youtube.com/watch?v=HBluLfX2F_k

[2] A mouse has on the order of 100M neurons (and 10^12 synapses). Not to mention how little power they operate on! These guys can still our perform LLMs on certain tasks despite the LLMs having like 4 orders of magnitude more parameters and many more in data!


I agree scaling alone is not enough, and transformers itself is a proof of that - it was an iteration on the attention mechanism and a few other changes.

But no matter what the next big thing is, I'm sure it would immediately fill all available compute to maximize its potential. It's not like intelligence has a ceiling beyond which you don't need more intelligence.


Was "scale is all you need" actually a real thing said by a real person? Even the most pro scale people like Altman seem to be saying research and algorithms are a thing too. I guess as you say a more important thing is where the money goes. I think Altman's been overdoing it a bit on scaling spend.


Yes, they even made t-shirts.

  > Even the most pro scale people like Altman seem to be saying research and algorithms are a thing too.
I think you missed the nuance in my explanation of both sides. Yes, they believed algorithmic development mattered but small. Tuning, not even considering exporting different architectures than the transformer.

Which Altman said that AGI is a scaling problem, which is why he was asking for $7T. But he was clearly a lier given this from last year. There's no way he really believed this in late 2024.

  > Altman claimed that AGI could be achieved in 2025 during an interview for Y Combinator, declaring that it is now simply an engineering problem. He said things were moving faster than expected and that the path to AGI was "basically clear."[0]
I'm with Chollet on this one, our obsession with LLMs have held us back. Not that we didn't learn a lot from them but that our hyper fixation closed our minds to other possibilities. The ML field (and CS in general) gets hyper fixated on certain things and I just don't get that. Look at diffusion models. There was basically a 5 year gap between the first unet based model and DDPM. All because we were obsessed with GANs at the time. We jump on a hypetrain and shun anyone who doesn't want to get on. This is not a healthy ecosystem and it hinders growth.

Just because we end up with success doesn't mean the path to get there was reasonable nor does it mean it was efficient.

[0] https://www.tomsguide.com/ai/chatgpt/sam-altman-claims-agi-i...


Fair enough although that Altman quote doesn't match what he actually said in the interview. He said:

>...first time ever where I felt like we actually know what to do like I think from here to building an AGI will still take a huge amount of work there are some known unknowns but I think we basically know what to go what to go do and it'll take a while it'll be hard but that's tremendously exciting... https://youtu.be/xXCBz_8hM9w?t=2330

and at the end there was "what are you excited for in 2025?" and Altman says "AGI" but that doesn't specify if that's it arriving or just working on it.

I don't think huge amount of work and known unknowns is the same as we just need to scale.


Does anyone know or have a guess on the size of this latest thinking models and what hardware they use to run inference? As in how much memory and what quantization it uses and if it's "theoretically" possible to run it on something like Mac Studio M3 Ultra with 512GB RAM. Just curious from theoretical perspective.


Rough ballpark estimate:

- Amazon Bedrock serves Claude Opus 4.5 at 57.37 tokens per second: https://openrouter.ai/anthropic/claude-opus-4.5

- Amazon Bedrock serves gpt-oss-120b at 1748 tokens per second: https://openrouter.ai/openai/gpt-oss-120b

- gpt-oss-120b has 5.1B active parameters at approximately 4 bits per parameter: https://huggingface.co/openai/gpt-oss-120b

To generate one token, all active parameters must pass from memory to the processor (disregarding tricks like speculative decoding)

Multiplying 1748 tokens per second with the 5.1B parameters and 4 bits per parameter gives us a memory bandwidth of 4457 GB/sec (probably more, since small models are more difficult to optimize).

If we divide the memory bandwidth by the 57.37 tokens per second for Claude Opus 4.5, we get about 80 GB of active parameters.

With speculative decoding, the numbers might change by maybe a factor of two or so. One could test this by measuring whether it is faster to generate predictable text.

Of course, this does not tell us anything about the number of total parameters. The ratio of total parameters to active parameters can vary wildly from around 10 to over 30:

    120 : 5.1 for gpt-oss-120b
    30 : 3 for Qwen3-30B-A3B
    1000 : 32 for Kimi K2
    671 : 37 for DeepSeek V3
Even with the lower bound of 10, you'd have about 800 GB of total parameters, which does not fit into the 512 GB RAM of the M3 Ultra (you could chain multiple, at the cost of buying multiple).

But you can fit a 3 bit quantization of Kimi K2 Thinking, which is also a great model. HuggingFace has a nice table of quantization vs required memory https://huggingface.co/unsloth/Kimi-K2-Thinking-GGUF


I love logical posts like this. There are other factors like mxfp4 in gpt-oss, mla in deepseek, etc.

>Amazon Bedrock serves Claude Opus 4.5 at 57.37

I checked the other Opus-4 models on bedrock:

Opus 4 - 18.56tps Opus 4.1 - 19.34tps

So they changed the active parameter count with Opus 4.5


Good observation!

56.37 tps / 19.34 tps ≈ 2.9

This explains why Opus 4.1 is 3 times the price of Opus 4.5.


Thanks! That's a great way to analyze it by comparing to open source models. Though I wonder if they use the same hardware for gpt-oss-120b and Claude Opus.


That all depends on what you consider to be reasonably running it. Huge RAM isn’t required to run them, that just makes them faster. I imagine technically all you'd need is a few hundred megabytes for the framework and housekeeping, but you’d have to wait for the some/most/all of the model to be read off the disk for each token it processes.

None of the closed providers talk about size, but for a reference point of the scale: Kimi K2 Thinking can spar in the big leagues with GPT-5 and such…if you compare benchmarks that use words and phrasing with very little in common with how people actually interact with them…and at FP16 you’ll need 2.9TB of memory @ 256,000 context. It seems it was recently retrained it at INT4 (not just quantized apparently) and now:

“ The smallest deployment unit for Kimi-K2-Thinking INT4 weights with 256k seqlen on mainstream H200 platform is a cluster with 8 GPUs with Tensor Parallel (TP). (https://huggingface.co/moonshotai/Kimi-K2-Thinking) “

-or-

“ 62× RTX 4090 (24GB) or 16× H100 (80GB) or 13× M3 Max (128GB) “

So ~1.1TB. Of course it can be quantized down to as dumb as you can stand, even within ~250GB (https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-l...).

But again, that’s for speed. You can run them more-or-less straight off the disk, but (~1TB / SSD_read_speed + computation_time_per_chunk_in_RAM) = a few minutes per ~word or punctuation.


    > (~1TB / SSD_read_speed + computation_time_per_chunk_in_RAM) = a few minutes per ~word or punctuation.
 
You have to divide SSD read speed by the size of the active parameters (~16GB at 4 bit quantization) instead of the entire model size. If you are lucky, you might get around one token per second with speculative decoding, but I agree with the general point that it will be very slow.


Yeah thanks for calling that out. I kind of panicked when I reached that part of the explanation and was stuck on whether or not I should go into dense models vs MoE. The question was about ‘big stuff like that’, which they most certainly use MoE, then I even chose an MoE as an example, but then there are giant dense models like Llama, but that’s not what was asked, although it wasn’t not asked because ‘also big league stuff’…anyway, I basically thought “you’re welcome” and “no problem”, then said “you’re problem”.


Originally posted this in another thread, but very curious what others think.

Can I ask my partner to buy a product on Amazon?

Can I ask my personal assistant to buy a product on Amazon?

Can I hire a contractor to buy products on Amazon?

Can I communicate with a contractor via API to direct them what products to buy?

What if there is no human on the other end and its an LLM?

Same issue with LinkedIn. I know execs who have assistants running their socials. Is this legal?

Like, where do we draw the line? In the future, would the only way to shop on Amazon be with approved VR goggles that scan your retina to verify you are a human?


> where do we draw the line?

Perplexify has shown itself to be a bad actor [1][2][3], and possibly incompetent, too [4].

We need to draw a line, eventually. But it’s far from urgent. And I don’t think Perplexity should be the one deciding.

[1] https://blog.cloudflare.com/perplexity-is-using-stealth-unde...

[2] https://www.reuters.com/legal/litigation/perplexity-ai-loses...

[3] https://arstechnica.com/tech-policy/2025/10/reddit-sues-to-b...

[4] https://brave.com/blog/comet-prompt-injection/


The law has nothing to do with it. Amazon is a private company and can make rules about who can or can't place orders on its website. When you create an account you agree to their ToS.


Interesting. Amazon ToS actually has a section about agents - https://www.amazon.com/gp/help/customer/display.html?nodeId=...

And they even provide a definition of what an Agent is:

"Agent” means any software or service that takes autonomous or semi-autonomous action on behalf of, or at the instruction of, any person or entity.

Though to me it raises even more questions. What is a software that takes "autonomous" action on my behalf. Is curl "autonomous"?


"autonomous or semi-autonomous" is the key phrase. If you manually invoke a curl command then no, it isn't an agent. If you write code that itself determines when and how to invoke that command then it is.


Am I not manually instructing the agent to buy a certain product?

What if I set up a cron job to buy a certain product every month - is that not autonomous? What if it is first querying my live toilet paper sticks to make the decision?


Exactly. It's software - `curl` or LLM. It's a function that accepts input and produces output. One is much more sophisticated that the other, but it's made out of the same machine instructions, there is no magic.

What's the criteria that makes one function "autonomous" and the other one "manual"? I feel it really boils down to this.


> Is curl "autonomous"?

Only when you supply -L


I don't think GP meant "legal" in the literal sense. Regardless, the post's meaning is still the same if you replace "Is this legal?" with "Does this conform to Amazon's ToS?", so please read it charitably and avoid being pedantic about this sort of thing.


Legal matters are all about pedantry.


Can I use Amazon Mechanical Turk to place orders for myself on Amazon?


No.

That is why you have personal credentials to log in to Amazon. If you want to have delegating capabilities you can open an Amazon business account.


Me and my wife share the same Amazon account. Should I open a business account to do grocery shopping?


Amazon support family sharing for the same home address - 2 adults and 4 children I think can share the same Prime

My wife and I used to share 1 account, but then I wanted to buy her a gift that had to be a surprise - so had to create a new account and add it as part of the family to the original one…. Then kids grew up and wanted to make small orders themselves, and I didn’t want them to see our order history…


I know its there. But we _prefer_ to have a single account to simplify tracking and picking up packages. I'm curious if from their point of view (or their ToS) I'm even allowed to share my credentials with anyone else.

1Password shared vaults are there for a reason - people share credentials all the time, business or personal.


She should not. You should create an Amazon family and enroll her as a member.

This way Amazon can keep track of your separate buyer profiles.


Would Amazon be ok with me opening a business account, creating credentials for a Perplexity assistant, and having it buy products?

Based on this article, I'd think not?


Even if it was not AI it would not be allowed. You are effectively creating dummy accounts with bots.

Even the SEC would be against it, as it would inflate the user base of Amazon.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: