Not just cheaper in terms of token usage but accuracy as well.
Even the smallest models are RL trained to use shell commands perfectly. Gemini 3 flash performs better with a cli with 20 commands vs 20+ tools in my testing.
cli also works well in terms of maintaining KV cache (changing tools mid say to improve model performance suffers from kv cache vs cli —help command only showing manual for specific command in append only fashion)
Writing your tools as unix like cli also has a nice benefit of model being able to pipe multiple commands together. In the case of browser, i wrote mini-browser which frontier models use much better than explicit tools to control browser because they can compose a giant command sequence to one shot task.
I would stay away from any startup for production workload.
Made the mistake. Never again.
Fly, railway, render. Avoid. All have weird show stopper bugs for any reasonable scale and you will fight against the platform compared to using big cloud.
And big cloud works better even in cases where PAAS is advertised as simpler (google cloud run and build is as easy to setup as railway but you have much more knobs to control traffic, routing, roll out etc)
Yeah. If my claude code usage was on API directly, it would be in thousands. I know this because I have addon credits on top of the max plan because I run into weekly limits often
I don't actually think the CLI tools and JavaScript apps I work on are particularly "simple". I think they're the level of complexity that most developers spend effectively all of their time building.
Kernel / database / systems engineers are a pretty rare breed.
Outside of engineers there is a whole raft of people on a team that should pick up and push back on this sort of copy problem at all phases of building a product.
are the engineers you do hire rewarded for paying attention to detail though? it's often the case that the company decision makers "want" attention to detail, in that they agree it's a nice thing, but their revealed preferences are more along the lines of "why are you wasting time on component x which is already in a shippable state when component y is behind schedule?!"
Apart from people who just weren’t good, what I found in a few decades is that most people will pay attention to details if given the incentive and time.
What companies seem to want is developers who do everything perfectly despite having someone yelling at them to move fast. Also: the person yelling also doesn’t care about the details until someone else points it out to them.
Runable | Bangalore, India | On Site | Software Engineers | Full Time & Internship | runable.com
Runable is a general purpose AI agent for any task you can think of. You can create presentations, websites, docs, reports, videos, images, use over 3000+ services, control remote browsers, etc.
I’m looking for savvy people with potential to grow quickly who wants to work at a startup. If you want to take ownership, learn and be a founder in future. This is the place for you.
Shoot me an email with your github or portfolio: saksham@runable.com
> AWS has a long standing issue with the ECS agent randomly disconnecting, resulting in orphaned EC2 instances which can cause traffic or deployment degradation.
> We have attempted to solve this a few ways in the past, but there were still critical edge cases falling through.
> So we bit the bullet, and developed a robust, full featured ECS cluster management solution to solve this problem once and for all.
> It's currently in private preview. To get early access before we roll it out to everyone, contact support.
I found elsewhere in the Flight control docs where they recommend ECs+EC2. While I'm not surprised to hear about issues with ECS+EC2, given the reported issues above I don't know if I'd recommend it in my docs. Fargate is a far better option for most use cases, at least in my experience. Unless you need specialized instance types, like GPU workloads.
Even the smallest models are RL trained to use shell commands perfectly. Gemini 3 flash performs better with a cli with 20 commands vs 20+ tools in my testing.
cli also works well in terms of maintaining KV cache (changing tools mid say to improve model performance suffers from kv cache vs cli —help command only showing manual for specific command in append only fashion)
Writing your tools as unix like cli also has a nice benefit of model being able to pipe multiple commands together. In the case of browser, i wrote mini-browser which frontier models use much better than explicit tools to control browser because they can compose a giant command sequence to one shot task.
https://github.com/runablehq/mini-browser