More

lostmsu · 2026-03-27T13:32:46 1774618366

> Because for devs especially

Are you not a dev? If not, what would you use a coding tool for? They still require handholding for anything largeish. Still much cheaper than outsource.

lostmsu · 2026-03-27T11:08:37 1774609717

If dollar crumbles, the debt becomes lighter, no?

simmerup · 2026-03-28T00:34:03 1774658043

Tons of countries buy the dollar bond because it’s a safe bet. It’s how the world does finance.

If that stops, then the world stops buying dollars. And then America has to start justifying why a printed dollar justifies buying an import from China or Europe of real valuable goods

lostmsu · 2026-03-26T21:54:21 1774562061

Nothing like Crossfire/SLI? Not possible to efficiently connect multiple cards for one large model?

lostmsu · 2026-03-26T21:50:23 1774561823

Is PCI-E too much to ask?

wmf · 2026-03-27T01:01:20 1774573280

I'm not sure what you're asking. The link I posted is for a PCIe card.

lostmsu · 2026-03-27T02:43:43 1774579423

The page doesn't mention what interface the 160GiB card uses. Quick Googling doesn't either.

lostmsu · 2026-03-26T21:49:16 1774561756

> small fp8 local model with almost 100k token context

Would not fit Qwen3.5 27B would it? That's the SOTA

oakpond · 2026-03-27T08:25:15 1774599915

This is a fp16 model. That's 54G in weights. I can load it only with fp8 quantization enabled (>= 128k context). I run into this error during generation though: https://github.com/vllm-project/vllm/issues/36350. Looks like an issue with the flash attention backend. But yeah, if you are OK with fp8 quantization on this model, it fits. I expect with 64G VRAM it will fit without quantization

lostmsu · 2026-03-25T21:34:14 1774474454

> voluntary scanning

What is that? A setting in OS?

u8080 · 2026-03-25T23:52:17 1774482737

Service could voluntary opt-out, like Pavel Durov did.

lostmsu · 2026-03-25T21:31:50 1774474310

That's one of the tricks. The other trick is to vote in universal right for encrypted communication once and for all.

tekne · 2026-03-25T22:40:04 1774478404

Encryption is mathematics -- making this an issue of freedom not only of speech, but of thought.

dgxyz · 2026-03-25T21:48:43 1774475323

That’s the best answer. But you’re up against paid up lobbyists.

lostmsu · 2026-03-25T15:00:03 1774450803

On a related note I miss LLaMA spelling.

moritzwarhier · 2026-03-31T22:16:11 1774995371

StrongARM was the best :)

Not for LLMs though.

lostmsu · 2026-03-25T14:54:22 1774450462

It's worse, because there are actually integrated SoCs that include NPU, which I would say are real "AI accelerators".

lostmsu · 2026-03-24T17:01:05 1774371665

llama.cpp afaik does not run portion of the model on the CPU. --cpu-moe just offloads weights to RAM, but they are still loaded to GPU for compute.