eaf7e281's comments

eaf7e281 · 2026-02-24T22:38:59 1771972739

again?

eaf7e281 · 2026-02-23T03:51:13 1771818673

freedom is not coming

eaf7e281 · 2026-02-05T21:17:58 1770326278

These two basically do what you want, let Claude be the manager and Codex/Gemini be the worker. Many say that Coder-Codex-Gemini is easier to understand than CCG-Workflow, which has too many commands to start with.

https://github.com/FredericMN/Coder-Codex-Gemini https://github.com/fengshao1227/ccg-workflow

This one also seems promising, but I haven't tried it yet.

https://github.com/bfly123/claude_code_bridge

All of them are made by Chinese dev. I know some people are hesitant when they see Chinese products, so I'll address that first. But I have tried all of them, and they have all been great.

eaf7e281 · 2026-02-05T18:16:31 1770315391

I kinda agree. Their model just doesn't feel "daily" enough. I would use it for any "agentic" tasks and for using tools, but definitely not for day to day questions.

lukebechtel · 2026-02-05T18:22:40 1770315760

Why? I use it for all and love it.

That doesn't mean you have to, but I'm curious why you think it's behind in the personal assistant game.

legitster · 2026-02-05T18:41:54 1770316914

I have three specific use cases where I try both but ChatGPT wins:

- Recipes and cooking: ChatGPT just has way more detailed and practical advice. It also thinks outside of the box much more, whereas Claude gets stuck in a rut and sticks very closely to your prompt. And ChatGPT's easier to understand/skim writing style really comes in useful.

- Travel and itinerary: Again, ChatGPT can anticipate details much more, and give more unique suggestions. I am much more likely to find hidden gems or get good time-savers than Claude, which often feels like it is just rereading Yelp for you.

- Historical research: ChatGPT wins on this by a mile. You can tell ChatGPT has been trained on actual historical texts and physical books. You can track long historical trends, pull examples and quotes, and even give you specific book or page(!) references of where to check the sources. Meanwhile, all Claude will give you is a web search on the topic.

aggie · 2026-02-05T19:59:43 1770321583

How does #3 square with Anthropic's literal warehouse full of books we've seen from the copyright case? Did OpenAI scan more books? Or did they take a shadier route of training on digital books despite copyright issues, but end up with a deeper library?

legitster · 2026-02-05T22:27:31 1770330451

I have no idea, but I suspect there's a difference between using books to train an LLM and be able to reproduce text/writing styles, and being able to actually recall knowledge in said books.

rolisz · 2026-02-05T20:10:09 1770322209

I think they bought the books after they were caught that they pirated the books and lost that case (because they pirated, not because of copyright).

FergusArgyll · 2026-02-06T00:36:41 1770338201

My 2 cents:

All the labs seem to do very different post training. OpenAI focuses on search. If it's set to thinking, it will search 30 websites before giving you an answer. Claude regularly doesn't search at all even for questions it obviously should. It's postraining seems more focused on "reasoning" or planning - things that would be useful in programming where the bottleneck is: just writing code without thinking how you'll integrate it later and search is mostly useless. But for non coding - day to day "what's the news with x" "How to improve my bread" "cheap tasty pizza" or even medical questions, you really just want a distillation of the internet plus some thought

eaf7e281 · 2026-02-05T21:25:55 1770326755

It's hard to say. Maybe it has to do with the way Claude responds or the lack of "thinking" compared to other models. I personally love Claude and it's my only subscription right now, but it just feels weird compared to the others as a personal assistant.

lukebechtel · 2026-02-06T00:07:34 1770336454

Oh, I always use opus 4.5 thinking mode. Maybe that's the diff.

quietsegfault · 2026-02-05T22:40:07 1770331207

Claude is far superior for daily chat. I have to work hard to get it to not learn how to work around various bad behaviors I have but don’t want to change.

solarkraft · 2026-02-05T18:43:14 1770316994

But that’s what makes it so powerful (yeah, mixing model and frontend discussion here yet again). I have yet to see a non-DIY product that can so effortlessly call tens of tools by different providers to satisfy your request.

eaf7e281 · 2026-02-05T18:14:10 1770315250

> From the press release at least it sounds more expensive than Opus 4.5 (more tokens per request and fees for going over 200k context).

That's a feature. You could also not use the extra context, and the price would be the same.

charcircuit · 2026-02-05T18:55:49 1770317749

The model influences how many tokens it uses for a problem. As an extreme example if it wanted it could fill up the entire context each time just to make you pay more. The efficiency that model can answer without generating a ton of tokens influences the price you will be spending on inference.

eaf7e281 · 2026-02-05T18:11:14 1770315074

There's no way they actually work on training this.

fragmede · 2026-02-05T19:49:18 1770320958

The people that work at Anthropic are aware of simonw and his test, and people aren't unthinking data-driven machines. How valid his test is or isn't, a better score on it is convincing. If it gets, say, 1,000 people to use Claude Code over Codex, how much would that be worth to Anthropic?

$200 * 1,000 = $200k/month.

I'm not saying they are, but to say that they aren't with such certainty, when money is on the line; unless you have some insider knowledge you'd like to share with the rest of the class, it seems like an questionable conclusion.

margalabargala · 2026-02-05T18:29:32 1770316172

I suspect they're training on this.

I asked Opus 4.6 for a pelican riding a recumbent bicycle and got this.

https://i.imgur.com/UvlEBs8.png

WarmWash · 2026-02-05T18:54:37 1770317677

It would be way way better if they were benchmaxxing this. The pelican in the image (both images) has arms. Pelicans don't have arms, and a pelican riding a bike would use it's wings.

ryandrake · 2026-02-05T19:43:03 1770320583

Having briefly worked in the 3D Graphics industry, I don't even remotely trust benchmarks anymore. The minute someone's benchmark performance becomes a part of the public's purchasing decision, companies will pull out every trick in the book--clean or dirty--to benchmaxx their product. Sometimes at the expense of actual real-world performance.

seanhunter · 2026-02-05T19:42:38 1770320558

Pelicans don’t ride bikes. You can’t have scruples about whether or not the image of a pelican riding a bike has arms.

jevinskie · 2026-02-05T19:48:38 1770320918

Wouldn’t any decent bike-riding pelican have a bike tailored to pelicans and their wings?

actsasbuffoon · 2026-02-05T21:23:30 1770326610

Sure, that’s one solution. You could also Isle of Dr Moreau your way to a pelican that can use a regular bike. The sky is the limit when you have no scruples.

cinntaile · 2026-02-05T20:03:54 1770321834

Now that would be a smart chat agent.

mrandish · 2026-02-05T18:39:18 1770316758

Interesting that it seems better. Maybe something about adding a highly specific yet unusual qualifier focusing attention?

TheDong · 2026-02-06T06:35:50 1770359750

I don't think that really proves anything, it's unsurprising that recumbent bicycles are represented less in the training data and so it's less able to produce them.

Try something that's roughly equally popular, like a Turkey riding a Scooter, or a Yak driving a Tractor.

riffraff · 2026-02-05T19:53:22 1770321202

perhaps try a penny farthing?

KeplerBoy · 2026-02-05T18:15:38 1770315338

There is no way they are not training on this.

collinmanderson · 2026-02-05T18:16:47 1770315407

I suspect they have generic SVG drawing that they focus on.

eaf7e281 · 2025-10-16T16:47:48 1760633268

What is the optimal disk size in terms of price per TB? The last time I checked, it was 16 TB disks, I believe.