justrunitlocal's comments

justrunitlocal · 2026-04-28T18:48:55 1777402135

We've been running our 10 dev org on 8 H100s on open models (with some tweaks). Sure they aren't as good as the big providers but they 1. don't go down 2. have pretty damn high tok/s. It pays for itself.

Posting with a fresh account because I'm not supposed to share these details for obvious reason. If you want help on setting this up, just reply with a way to reach you.

ok_dad · 2026-04-28T18:52:08 1777402328

yea just buy 300k worth of hardware and bob's your uncle

justrunitlocal · 2026-04-28T18:54:11 1777402451

It was pretty hard to justify the purchase to the board but we got a decent deal from a nearby data-center (~15% discount). Thankfully, it's fixed cost, its an asset we can use for our taxes, and it will survive for years to come. The only thing we have to work on is maintenance as well as looking into some renewable energy options.

We're also looking into how to do some secure cost sharing with this so that all people need to pay for are what it costs for us to run everything! We're just planning on reserving at least 51% of the capacity for us and the rest for everyone else.

ok_dad · 2026-04-28T19:07:49 1777403269

Sorry, didn't mean to be dismissive, I was just being a dickhead needlessly.

I actually respect this a ton, good work.

justrunitlocal · 2026-04-28T19:15:16 1777403716

It's fine! There's no world where individuals can buy this kind of stuff. Our company is too small to do it, but I'd love for there to be a public utility of sorts for being able to use LLMs. It is absurd that only these >$1T companies are allowed to run this. I also find it dangerous for society to have so much power and wealth concentrated there too.

Anyway, this is the internet and skepticism is warranted :D.

ok_dad · 2026-04-28T19:19:38 1777403978

Yea, I actually looked into a similar thing myself recently. I was looking at how we could replace Cursor, and I found that for ~10 people we'd need a half dozen H100's or something on that scale, which would cost ~$1500 per developer or so to build and maintain on cloud infra, and to buy it would cost roughly 3 developers yearly salaries or so (this aligns with your numbers). We don't use that much inference, so we decided paying Cursor ~$200-300 per dev per month is better, for now, but in the future we might regret that when prices normalize more. However, we also don't use cloud agents or independent agents, we basically use AI as a pair programmer, so if we had to drop AI coding assistants completely our process wouldn't break too badly. I wish I could task my 3080 gaming card to do some inference, but I can only get ~10B models on there, so it's kinda worthless unless it's for something a small model can do.

zozbot234 · 2026-04-28T20:10:18 1777407018

The best deal is arguably to buy as much on prem inference as you can reasonably expect to use by running the hardware around the clock, even at slower throughput, and use 3rd-party inference for things that are genuinely latency-sensitive. I just don't see how this resolves to needing a half-dozen V100, surely you're not using that much compute? You don't need to place your entire model on GPU, engines for on prem inference generally support CPU/RAM-based offload.

mumbisChungo · 2026-04-28T20:15:45 1777407345

One dev's salary to give a 10 person team unlimited approximately free agentic coding for the foreseeable future, plus privacy.

OJFord · 2026-04-28T20:21:06 1777407666

And another salary to have someone set up and run it

kgeist · 2026-04-28T21:00:36 1777410036

We're planning to do the same thing - buy something like 8xH100 and run all coding there. The CTO almost agreed to find the budget for it but I need to make sure there are no risks before we buy (i.e. it's a viable/usable setup for professional AI-assisted coding)

Can you share what models you run and find best performing for this setup? That would help a lot. I already run a smaller AI server in the office but only 32b models fit there. I already have experience optimizing inference, I'm just interested what models you think are great for 8xH100 for coding, I'll figure out the details how to fit it :)

htrp · 2026-04-29T01:43:21 1777427001

8 x h100 80's don't give you enough to run the latest 1tn + parameter models (especially at the context window lengths to be competitive with the frontier models)

dools · 2026-04-29T02:05:09 1777428309

Verda has B300 clusters, 8 for USD $55/hour in 10 minute billing blocks

Havoc · 2026-04-28T23:53:00 1777420380

Deepseek, GLM, Minimax or Kimi are the most likely contenders.

dools · 2026-04-29T02:07:21 1777428441

I’ve been using kimi 2.5/2.6 for the past 2 weeks and it’s really not far off OpenAI and Claude models. I am a coder so it’s not all vibes but I am definitely more in the “spec to code” mode than “edit this file for me” and it copes just fine. Needs a bit more supervision than the frontier models but it’s also significantly cheaper. If I were anthropic I’d be shitting myself, their prices are going to 10x over the next 2 years

kakoni · 2026-04-29T07:51:44 1777449104

So are you running Kimi on Verda?

dools · 2026-04-29T02:02:45 1777428165

Check out Verda you can rent whatever super powerful GPU clusters you need in 10 minute increments. Deploy any open weight model using SGLang and away you go

threepts · 2026-04-29T09:06:40 1777453600

First of all 1) 8 H100S are NOT ENOUGH for today's premier models (500B+ tokens) and if you do run a obsolete model forget about memory.

2) After buying the 300k GPUs, your electricity cost will put you in competition with hosting on cloud costs, you will probably lose dollars this way.

3) NVIDIA will charge you a kidney to provide driver/hardware support if anything goes wrong.

This inherently a bad idea and this person is probably trying to promote his startup.

johndough · 2026-04-28T19:57:02 1777406222

> Sure they aren't as good as the big providers

If you haven't done so already, finetune the model on all your company's code that you can get your hands on. This is one of the great advantages that you get when running local models. I like the style of the generated code much better now, I have to rewrite much less, and my prompts can be shorter too. But maybe these already are the "tweaks" that you mentioned.

GenerWork · 2026-04-28T20:31:24 1777408284

How would they do that? Would it be as easy as telling a model "Hey, review all this code, identify patterns, and then write in this style going forward"?

Sorry if this is a stupid question, I've never finetuned or trained a LLM.

Havoc · 2026-04-28T23:54:11 1777420451

Unsloth has consumer accessible stuff on fine tuning models

2ndorderthought · 2026-04-28T18:59:18 1777402758

This is the actual answer. Man I hope to find a company like yours sometime soon. I am sick of all the issues with having 3rd party IP generation