You say you've used it for months, I wonder if the example you gave was recent and if you've been noticing an overall degradation in quality or it's been constantly bad for you?
(Being true to the HN guidelines, I’ve used the title exactly as seen on the GitHub issue)
I was wondering if anyone else is also experiencing this? I have personally found that I have to add more and more CLAUDE.md guide rails, and my CLAUDE.md files have been exploding since around mid-March, to the point where I actually started looking for information online and for other people collaborating my personal observations.
This GH issue report sounds very plausible, but as with anything AI-generated (the issue itself appears to be largely AI assisted) it’s kind of hard to know for sure if it is accurate or completely made up. _Correlation does not imply causation_ and all that. Speaking personally, findings match my own circumstances where I’ve seen noticeable degradation in Opus outputs and thinking.
EDIT: The Claude Code Opus 4.6 Performance Tracker[1] is reporting Nominal.
What I've noticed is that whenever Claude says something like "the simplest fix is..." it's usually suggesting some horrible hack. And whenever I see that I go straight to the code it wants to write and challenge it.
That is the kind of thing that I've been fighting by being super explicit in CLAUDE.md. For whatever reason, instead of being much more thorough and making sure that files are being changed only after fully understanding the scope of the change (behaviour prior to Feb/Mar), Claude would just jump to the easiest fix now, with no backwards compatibility thinking and to hell with all existing tests. What is even worse is I've seen it try and edit files before even reading them on a couple of occasions, which is a big red flag. (/effort max)
Another thing that worked like magic prior to Feb/Mar was how likely Claude was to load a skill whenever it deduced that a skill might be useful. I personally use [superpowers][1] a lot, and I've noticed that I have to be very explicit when I want a specific skill to be used - to the point that I have to reference the skill by name.
I did not use the previous version of Opus to notice the difference, but Sonnet 4.6 seems optimized to output the shortest possible answer. Usually it starts with a hack and if you challenge it, it will instead apologize and say to look at a previous answer with the smallest code snippet it can provide. Agentic isn't necessarily worse but ideating and exploring is awful compared to 4.5
I did my usual thing today where I asked a Sonnet 4.6 agent to code review a proposed design plan that was drafted by Opus 4.6 - I do this lately before I delved into the implementation. What it came back with was a verbose output suggesting that a particular function `newMoneyField` be renamed throughout the doc to a name it fabricated `newNumeyField`. And the thing was that the design document referenced the correct function name more than a few dozen times.
This was a first for me with Sonnet. It completely veered off the prompt it was given (review a design document) and instead come out with a verbose suggestion to do a mechanical search and replace to use this newly fabricated function name - that it event spelled incorrectly. I had to Google numey to make sure Sonnet wasn't outsmarting me.
Superpowers, Serena, Context7 feel like requried plugins to me. Serena in particular feels like a secret weapon sometimes. But superpowers (with "brainstorm" keyword) might be the thing that helps people complaining about quality issues.
lol this one time Claude showed me two options for an implementation of a new feature on existing project, one JavaScript client side and the other Python server side.
I told it to implement the server side one, it said ok, I tabbed away for a while, came to find the js implementation, checking the log Claude said “on second thought I think I’ll do the client side version instead”.
Rarely do I throw an expletive bomb at Claude - this was one such time.
Dunno man, Claude had a spec (pretty sure I asked it to consider and outline both options first) or at least clear guidance and decided to YOLO whatever it wanted instead.
It’s always “you’re using the tool wrong, need to tweak this knob or that yadda yadda”.
this prompt is actually in claude cli. it says something like implement simplest solution. dont over abstract. On my phone but I saw an article mention this in the leak analysis.
I've seen a lot of the issues mentioned in the issue. The attempts to end the session early are particularly annoying. We spend a while iterating on a plan and after every phase of implementation I get some variation of "That's a lot of work for today, should we wrap up?" like it's actively trying to drive sessions to a close. I wouldn't say it's useless for these tasks. But it's requiring more effort and guidance than it used to. It's also more likely to jump right into changes from a question I ask rather than addressing the question which is very annoying.
If that tracker is using paid tokens, as opposed to the regular subscription, then there's no financial incentive for Antrophic to degrade their thinking, so their benchmark likely would not be affected by the cost-cutting measures that regular users face.
Also, it's probably very easy to spot such benchmarks and lock-in full thinking just for them. Some ISPs do the same where your internet speed magically resets to normal as soon as you open speedtest.net ...
I haven't noticed any changes but my stuff isn't that complex. People are saying they quantized Opus because they're training the next model. No idea if that's true... It's certainly impacting my decision to upgrade to Max though. I don't want to pay for Opus and get an inferior version.
I haven't noticed any changes either, but I noticed that opus 4.6 is now offered as part of perplexity enterprise pro instead of max, so I'm guessing another model is on the horizon
I just finished reading the full analysis on GitHub.
> When thinking is deep, the model resolves contradictions internally before producing output.
> When thinking is shallow, contradictions surface in the output as visible self-corrections: "oh wait", "actually,", "let me reconsider", "hmm, actually", "no wait."
Yeah, THIS is something that I've seen happen a lot. Sometimes even on Opus with max effort.
I missed that from the long issue, thanks for pointing it out! My experience with Opus today was riddled with these to the point where it was driving me completely mental. I've rarely seen those self-contradictions before, and nothing on my setup has changed - other than me forcing Opus at --effort max at startup.
I wonder if this is even more exaggerated now through Easter, as everyone’s got a bit extra time to sit down and <play> with Claude. That might be pushing capacity over the limit - I just don’t know enough about how Antropic provision and manage capacity to know if that could be a factor. However quality has gotten really bad over the holiday.
Cannot say I've noticed, but I run virtually everything through plan mode and a few back and forth rounds of that for anything moderately complex, so that could be helping.
I used to one-shot design plans early in the year, but lately it is taking several iterations just to get the design plan right. Claude would frequently forget to update back references, it would not keep the plan up to date with the evolving conversation. I have had to run several review loops on the design spec before I can move on to implementation because it has gotten so bad. At one point, I thought it was the actual superpowers plugin that got auto-updated and self-nerfed, but there weren't any updates on my end anyway. Shrug.
I've been working on an AI workspace inside Neovim (and using the editor as the TUI). When I started, I asked myself, "Wait, WHAT?! Another one? Who would use this?" However the goal was never about eyes (well, GH stars) on this new thing, it was about learning.
I wanted to dig deeper into how modern-day tools work so I can understand the sort of 'magic' I was experiencing using tools like Claude Code. The more I've been working on this side project, the more I understand about AI systems, agent loops, prompt engineering and all the cleverness that goes into making a good, usable, magical AI agent.
Is it X or another Cloudflare outage, though... Let's wait till this plays out - I'm personally struggling to open quite a few websites I regularly visit right now.
Until recently, I wouldn't have thought Cloudflare could be the culprit, however since their most recent outage their reputation has been less than stellar.
Not quite, that’s more like taking pleasure in the misfortune of someone else. It’s close, but the specific relief bit that it is not _your_ misfortune is not captured
No logging in to Cloudflare Dash, no passing Turnstile (their CAPTCHA Replacement Solution) on third-party websites not proxied by Cloudflare, the rest that are proxied throwing 500 Internal server error saying it's Cloudflare's fault…
Couldn't find the video on that blog post - for those who prefer to learn via video the announcement page with the recording is preferred.
I find Cloudflare a bit disorganised with this launch, there's multiple things going on - recording, blog posts, documentation and each has something the others don't.
reply