Hacker Newsnew | past | comments | ask | show | jobs | submit | more lostmsu's commentslogin

> Because for devs especially

Are you not a dev? If not, what would you use a coding tool for? They still require handholding for anything largeish. Still much cheaper than outsource.


If dollar crumbles, the debt becomes lighter, no?

Tons of countries buy the dollar bond because it’s a safe bet. It’s how the world does finance.

If that stops, then the world stops buying dollars. And then America has to start justifying why a printed dollar justifies buying an import from China or Europe of real valuable goods


Nothing like Crossfire/SLI? Not possible to efficiently connect multiple cards for one large model?

Is PCI-E too much to ask?

I'm not sure what you're asking. The link I posted is for a PCIe card.

The page doesn't mention what interface the 160GiB card uses. Quick Googling doesn't either.

> small fp8 local model with almost 100k token context

Would not fit Qwen3.5 27B would it? That's the SOTA


This is a fp16 model. That's 54G in weights. I can load it only with fp8 quantization enabled (>= 128k context). I run into this error during generation though: https://github.com/vllm-project/vllm/issues/36350. Looks like an issue with the flash attention backend. But yeah, if you are OK with fp8 quantization on this model, it fits. I expect with 64G VRAM it will fit without quantization

> voluntary scanning

What is that? A setting in OS?


Service could voluntary opt-out, like Pavel Durov did.

That's one of the tricks. The other trick is to vote in universal right for encrypted communication once and for all.

Encryption is mathematics -- making this an issue of freedom not only of speech, but of thought.

That’s the best answer. But you’re up against paid up lobbyists.

On a related note I miss LLaMA spelling.

StrongARM was the best :)

Not for LLMs though.


It's worse, because there are actually integrated SoCs that include NPU, which I would say are real "AI accelerators".

llama.cpp afaik does not run portion of the model on the CPU. --cpu-moe just offloads weights to RAM, but they are still loaded to GPU for compute.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: