Hacker Newsnew | past | comments | ask | show | jobs | submit | buremba's commentslogin

I spent 3 months adopting Codex and Claude Code SDKs only to realize they're just vendor lock-in and brittle. They're intended to be used as CLI so it's not programmable enough as a library. After digging into OpenClaw codebase, I can safely say that the most of its success comes from the underlying harness, pi agent.

pi plugins support adding hooks at every stage, from tool calls to compaction and let you customize the TUI UI as well. I use it for my multi-tenant Openclaw alternative https://github.com/lobu-ai/lobu

If you're building an agent, please don't use proprietary SDKs from model providers. Just stick to ai-sdk or pi agent.


IIUC to reliably use 3P tools you need to use API billing, right? Based on my limited experimentation this is an order of magnitude more expensive than consumer subscriptions like Claude Pro, do I have that right?

("Limited experimentation" = a few months ago I threw $10 into the Anthropic console and did a bit of vibe coding and found my $10 disappeared within a couple of hours).

If so, that would support your concern, it does kinda sound like they're selling marginal Claude Code / Gemini CLI tokens at a loss. Which definitely smells like an aggressive lockin strategy.


Technically you're still using claude CLI with this pattern so it's not 3P app calling Anthropic APIs via your OAuth token. Even if you would use Claude Code SDK, your app is 3P so it's in a gray area.

Anthropic docs is intentionally not clear about how 3P tools are defined, is it calling Claude app or the Anthropic API with the OAuth tokens?


Unfortunately it's currently very utopian for (I would assume) most devs to use something like this when API cost is so prohibitively expensive compared to e.g. Claude Code. I would love to use a lighter and better harness, but I wouldn't love to quintuple my monthly costs. For now the pricing advantage is just too big for me compared to the inconvenience of using CC.

OpenAI officially supports using your subscription with pi. Same for OpenCode and other 3rd party harnesses.

You technically still use CC, it's not via SDK but via CLI programmatically triggered via pi.

Is this in line with Anthropic ToS? They cracked down hard on Clawdbot and the like from what I gathered. I guess if you are still invoking CC it might be fine, but isn't that gonna lead to weird behavior from basically doubling up on harnesses?

Nobody knows, including Anthropic itself I suppose

I left some notes about this. I agree with you directionally but practically/economically you want to let users leverage what they're already paying for.

https://yepanywhere.com/subscription-access-approaches/

Captures the ai-sdk and pi-mono.

In an ideal world we would have a pi-cli-mono or similar, like something that is not as powerful as pi but gives a least common denominator sort of interface to access at least claude/codex.

ACP is also something interesting in this space, though I don't honestly know how that fits into this story.


Page returns 404. ACP is great, indeed better to give pi-mono ACP than claude or codex directly. https://x.com/bu7emba/status/2026364497527513440

I also wondered for months why it feels so difficult to use Openai or Anthropic SDKs until i came to a similar conclusion.

how do you replicate the claude code system prompts in pi? i have tried using claude agebt sdk without the claude code preset, and it is quite bad

Pretty easy, the prompts can be seen here[0] and pi supports setting SYSTEM.md.

0: https://cchistory.mariozechner.at/


For all of the recent talk about how Anthropic relies on heavy cache optimization for claude-code, it certainly seems like session-specific information (the exact datestamp, the pid-specific temporary directory for memory storage) enters awfully early in the system prompt.

Neat! I wasn’t aware that Docker has an embedded microVM option.

I use Kata Containers on Kubernetes (Firecrackers) and restrict network access with a proxy that supports you to block/allow domain access. Also swap secrets at runtime so agents don’t see any secrets (similar to Deno sandboxes)

If anybody is interested in running agents ok K8S, here is my shameless plug: https://github.com/lobu-ai/lobu


Kata containers are the right way to go about doing sandboxing on K8s. It is very underappreciated and, timing-wise, very good. With ec2 supporting nested virtualization, my guess is there is going to be wide adoption.

I am pretty sure Apple containers on MacOS Tahoe are Kata containers

Woah, that looks great. I’ve been looking for something like this. Neither thr readme or the security doc go into detail on the credential handling in the gateway. Is it using tokens to represent the secrets, or is the client just trusting that the connection will be authenticated? I’m trying to figure out how similar this is to something like Fly’s tokenizer proxy.

I’m working on the documentation right now but I had to build 3 prototypes to get here. :)

After seeing Deno and Fly, I rewrote the proxy being inspired by them. I integrates nicely with existing MCP proxy so agent doesn’t see any MCP secrets either.


I'm still not that interested in setting up openclaw, but this implementation actually looks/sounds pretty good.

Thanks for sharing!


Exactly! I was digging into Openclaw codebase for the last 2 weeks and the core ideas are very inspiring.

The main work he has done to enable personal agent is his army of CLIs, like 40 of them.

The harness he used, pi-mono is also a great choice because of its extensibility. I was working on a similar project (1) for the last few months with Claude Code and it’s not really the best fit for personal agent and it’s pretty heavy.

Since I was planning to release my project as a Cloud offering, I worked mainly on sandboxing it, which turned out to be the right choice given OpenClaw is opensource and I can plug its runtime to replace Claude Code.

I decided to release it as opensource because at this point software is free.

1: https://github.com/lobu-ai/lobu


This is very neat! IMO inspecting the queries the agents run on the database is a better approach to understand how the code works, even more than reviewing the code.

I just tried and it works smoothly. For those who doesn't want to plug in the agents to their database directly, I built a similar tool https://dbfor.dev for the exact purpose, it just embeds PGLite and implements PG wire protocol to spin up quick PG databases with a traffic viewer included.


I tried to run it locally some time ago, but it's buggy as hell when self-hosted. It's not even worth trying out given that CF itself doesn't suggest it.


I'm curious what bugs you encountered. workerd does power the local runtime when you test CF workers in dev via wrangler, so we don't really expect/want it to be buggy..


There is a big "WARNING: This is a beta. Work in progress" message in https://github.com/cloudflare/workerd

Specifically, half of the services operate locally, and the other half require CF services. I mainly use Claude Code to develop, and it often struggles to replicate the local environment, so I had to create another worker in CF for my local development.

Initially, the idea was to use CF for my side projects as it's much easier than K8S, but after wrestling with it for a month, decided that it's not really worth investing that much, and I moved back to using K8S with FluxCD instead, even though it's overkill as well.


> There is a big "WARNING: This is a beta. Work in progress"

Ughhhh that is because nobody ever looks at the readme so it hasn't been updated basically since workerd was originally released. Sorry. I should really fix that.

> Specifically, half of the services operate locally, and the other half require CF services.

workerd itself is a runtime for Workers and Durable Objects, but is not intended to provide implementations of other services like KV, D1, etc. Wrangler / miniflare provides implementations of most of these for local testing purposes, but these aren't really meant for production.

But workers + DO alone is enough to do a whole lot of things...


Thanks a ton for the quick response! I totally get that workerd is not intended to be the emulator of all CF services, but the fact that I will still need an external dependency for local development, and the code I developed can't be used outside of CF environment, makes me feel like I'm locked in to the environment.

I'm mostly using terminal agents to write and deploy code. I made a silly mistake, not reviewing the code before merging it into main (side project, zero user), and my durable object alarms got into an infinite loop, and I got a $400 bill in an hour. There was no way to set rate limits for AI binding in workers, and I didn't get any notification, so I created a support ticket 2 months ago, which hasn't answered to this date.

That was enough for me to move out of CF as a long-time user (>10 years) and believer (CF is still one of my biggest stocks). In a world where AI writes most of the code, it's scary to have the requirement to deploy to a cloud that doesn't have any way to set rate limits.

I learned the hard way that I must use AI Gateway in this situation, but authentication is harder with it, and agents prefer embedded auth, which makes it pick AI binding over AI Gateway. With K8S, it's not easy to maintain, but at least I can fully control the costs without worrying about cost of experimentation.


I wonder why V8 is considered as superior compared to WASM for sandboxing.


Is WASM’s story for side effects solved yet? eg network calls seems too complicated (https://github.com/vasilev/HTTP-request-from-inside-WASM etc)


On V8, you can run both JavaScript and WASM.


Theoretically yes, but CF workers or this project doesn't support it. Indeed none of the cloud providers support WASM as first-party support yet.



Maybe it's better now but I wouldn't call this first-class support, as you rely on the JS runtime to initialize WASM.

The last time I tried it, the cold start was over 10 seconds, making it unusable for any practical use case. Maybe the tech is not there but given that WASM guarantees the sandboxing already and supports multiple languages, I was hoping we would have providers investing in it.


CF Workers does support WASM. We do too as V8 handles it natively. Tested it, works, just hasn't been polished yet.


The problem is that there’s not much of a market opportunity yet. Customers aren’t voting for WASM with their wallets like they are mainstream language runtimes.


I tried this and it broke the conversation. :(


Yes you usually need to compact first before doing this kind of thing because the context windows are different.


After Claude Code couldn't find the relevant operation neither in CLI nor the public API, it went through its Chrome integration to open up the app in Chrome.

It grabbed my access tokens from cookies and curl into the app's private API for their UI. What an amazing time to be alive, can't wait for the future!


Security risks aside, that's pretty remarkable problem solving on Claude's part. Rather than hallucinating an answer or just giving up, it found a solution by creatively exercising its tools. This kind of stuff was absolute sci-fi a few years ago.


Or this behavior is just programmed, the old fashioned way.


This is one of the things that’s so frustrating about the AI hype. Yes there are genuinely things these tools can do that couldn’t be done before, mostly around language processing, but so much of the automation work people are putting them up to just isn’t that impressive.


But it’s precisely the automation around LLMs that make the end result itself impressive.


A sufficiently sophisticated agent, operating with defined goals and strategic planning, possesses the capacity to discover and circumvent established perimeters.


Honestly, I think many hallucinations are the LLM way of "moving forward". For example, the LLM will try something, not ask me to test (and it can't test it, itself) and then carry on to say "Oh, this shouldn't work, blabla, I should try this instead.

Now that LLMs can run commands themselves, they are able to test and react on feedback. But lacking that, they'll hallucinate things (ie: hallucinate tokens/API keys)


Refusing to give up is a benchmark optimization technique with unfortunate consequences.


I think it's probably more complex than that. Humans have constant continuous feedback which we understand as "time". LLMs do not have an equivalent to that and thus do not have a frame of reference to how much time passed between each message.


That's fantastic


but at least you get a fraction of CPU and 1GB memory.


Exactly. I have also been playing with DuckDB for streaming use cases, but it feels hacky to issue micro-batching queries on streaming data in short intervals.

DuckDB has everything that streaming engines such as Flink have; it just needs to support managing intermediate aggregate states and scheduling the materialized views itself.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: