Hacker Newsnew | past | comments | ask | show | jobs | submit | _pdp_'s commentslogin

This makes no sense. You can easily make a monolith and build all parts of it in isolation - i.e. modules, plugins, packages.

In fact, my argument is that there will be more monolith applications due to AI coding assistants, not less.


If you can run free models on consumer devices why do you think cloud providers cannot do the same except better and bundled with a tone of value worth paying?

Solid advise but I think that it is less known that tight glutes can cause lower back pain too even when your BMI is normal. Basically sitting too much and not stretching enough is a cause for the pelvic misalignment.

The title is a little misleading.

It was Opus 4.6 (the model). You could discover this with some other coding agent harness.

The other thing that bugs me and frankly I don't have the time to try it out myself, is that they did not compare to see if the same bug would have been found with GPT 5.4 or perhaps even an open source model.

Without that, and for the reasons I posted above, while I am sure this is not the intention, the post reads like an ad for claude code.


OP here.

I don't understand this critique. Carlini did use Claude Code directly. Claude Code used the Claude Opus 4.6 model, but I don't know why you'd consider it inaccurate to say Claude Code found it.

GPT 5.4 might be capable of finding it as well, but the article never made any claims about whether non-Anthropic models could find it.

If I wrote about achieving 10k QPS with a Go server, is the article misleading unless I enumerate every other technology that could have achieved the same thing?


Also, he did compare with earlier versions that, before 4.5, were dramatically worse at finding the same problems. There's even a graph. That seems to pretty solidly support the idea that this is "gain of function" as it were...

No the title is correct and you are misreading or didn't read. It was found with Claude code, that's the quote. This isn't a model eval, it's an Anthropic employee talking about Claude code. So comparing to other models isn't a thing to reasonably expect.

> You could discover this with some other coding agent harness.

And surely that would be relevant if they were using a different harness.


What's the point?

Anthropic can simply play it cool and, I don't know, open source the thing?

It is not like claude code is that complex and interesting. Sure there are some questionable stuff in there but it is not that controversial.


The most controversial part is that they wrote a TUI in ReactJS, but they don't try to keep that part secret, they brag about it. :^)

Yeah as much as I avoid OpenAI for [reasons], the Rust TUI was really the move. Claude Code is a mess.

Some are stuck in 2010s, where people thought that JS was turning into a lingua franca. As usual, such delusions are costing us some pretty heavy price. People seem to now accept crappy, laggy UIs "because it makes business sense", completely ignoring that their business _is_ providing a seamless experience. ugh sorry, </rantmode>

I think the reason behind using React and JavaScript is simpler - these tools are heavily vibecoded, and React/JavaScript is what was most present in the training data and as such is what the models excels the most at generating.

The crappy laggy UIs have the same root cause - heavy use of vibecoding with lackluster quality processes


vibe coding is barely a year old, this trend is older

I mean, it is easy to understand once you realise that there is no spoon.

Despite their power, frontier models are threatened by open-source equivalents. If AGI is not on the horizon and model performance is likely not going to be enough of a differentiator to keep the momentum going, the only other way is to go horizontal - enterprise solutions, proprietary coding agent harnesses, market capture, etc.

If AGI is in sight, none of these short-term games really matter. You just need to race ahead.


The solution as usual is open source.

For example...

We recently moved a very expensive sonnet 4.6 agent to step-3.5-flash and it works surprising well. Obviously step-3.5-flash is nowhere near the raw performance of sonnet but step works perfectly fine for this case.

Another personal observation is that we are most likely going to see a lot of micro coding agent architectures everywhere. We have several such cases. GPT and Claude are not needed if you focus the agent to work on specific parts of the code. I wrote something about this here: https://chatbotkit.com/reflections/the-rise-of-micro-coding-...


Yeah this is similar to my approach, although with slightly more powerful models. I’m just not having a good time letting the sota models loose on a code base to implement entire features. Spending too much time cleaning up the mess. It’s my fault, I needed to guide it more, but it would take the same amount of time to use a faster model to generate smaller chunks and also cost less. And I’m not even doing anything particularly complex!

inb4 skill issue I could probably beat you coding by hand with you using Claude code


> The solution as usual is open source.

> Obviously step-3.5-flash is nowhere near the raw performance of sonnet

I feel like these two statements conflict with each other.


Those two statements completely check out about a lot of open source projects/products tho... macOS upsetting you today? The solution is linux!

Google releasing Gemma 4 yesterday was prescient. Toying around with Zed + Gemma 4 on my laptop is 95% as good as using a cloud provider.

The personal account makes a lot of sense, although I could easily see why the OP was not successful. Even if you are an excellent engineer, making people do things, accept ideas, and in general hear you requires a completely different skill altogether - basically being a good communicator.

The second thing is that this series of blog posts (whether true or not, but still believable) provides a good introduction to vibe coders. These are people who have not written a single line of code themselves and have not worked on any system at scale, yet believe that coding is somehow magically "solved" due to LLMs.

Writing the actual code itself (fully or partially) maybe yes. But understanding the complexity of the system and working with organisational structures that support it is a completely different ball game.


I disagree.

I've worked on honing my communication skills for 20 years in this industry. Every time I have failed to get the desired result, I have gone back to the drawing board to understand how I can change how I'm communicating to better convey meaning, urgency, and all that.

After all that I've finally had an epiphany. They simply don't care. They don't care about quality, about efficiency, about security. They don't care about their users, their employees, they don't care about the long term health of the company. None of it. Engineers who do care will burn out trying to "do their job" in the face of management that doesn't care.

It's getting worse in the tech industry. We've reached the stage where leaders are in it only for themselves. The company is just the vehicle. Calls for quality fall on deaf ears these days.


yes, so situational awareness is even more fundamental than communication

especially because people hired by people hired by people (....) hired by founders (or delegated by some board that's voted by successful business people) did not get there by being engineering minded.

and this is inconceivable for most engineering minded people!

they don't care because their world, their life, their problems and their solutions are completely devoid of that mindset.

some very convincing founder types try to imitate it, some dropouts who spent a few years around people who have this mindset can also imitate it for a while, but their for them it's just a thing like the government, history, or geography, it's just there, if there's a hill they just go around, they don't want to understand why it's there, what's there, what's under it, what geological processes formed it, why, how, how long it will be there ...



Yeah, uhh:

> I've worked on honing my communication skills for 20 years in this industry.

That's because the skills weren't good enough.


So the takeaway isn't how good or bad I may be at communicating, it's that I was fundamentally speaking a language that was wholly orthogonal to the interests of leadership. No matter how good I became at making persuasive arguments about fixing technical debt and preventing outages, the management simply didn't care about those things. They say they they do, because it would sound insane to say otherwise, but they largely keep their goals and motivations clandestine.

Which for many engineers who got into this industry because they loved solving problems, it can be quite a shocking realization.


Which is why you both listen to what they say, and pay attention to what they do, and what they prioritise. You use the actions to figure out where they were coming from with the message, and then you adapt your message to suit that.

> Which for many engineers who got into this industry because they loved solving problems, it can be quite a shocking realization.

It's just another problem to solve, based on the same foundational skill set you develop as an engineer: Observation, interpretation, analysis, experimentation, and implementation.

All-hands meetings are boring as hell, but they'll give me all sorts of signal about various managers up the line. I'll also take any opportunity I can get to be "in the room where it happens" when decisions are made (or speak to people who were in the room) while I'm building up a mental picture of what motivates someone.

If they're glory hunters, I'll figure out how to pitch my thing as something they can brag about. If they're people oriented (rare, but it happens), I'll pitch the human impact angle. If they're money pinchers, it's all about that $/month savings figure, put it front and centre in the opening sentence.

Everyone has an angle, a bias of some description. If you watch what projects do and don't get approved, and what language was used in them, you'll be successful too.


If I am using a service, I do not care about your communicating...I want reliability...

> Even if you are an excellent engineer, making people do things, accept ideas, and in general hear you requires a completely different skill altogether - basically being a good communicator.

I was thinking like this for a while but, now, I think this expectation is majorly false for a senior individual contributor. Especially when someone who can push out a detailed series of blogposts and has tried step-wise escalation.

Communication is a two-way street. Unlike the individual contributors, the management is responsible for listening and responding to risk assesments by the senior members and also ensuring that the technical competence and experienced people are retained in a tech company. If a leader doesn't want to keep an open ear, they do not belong there. If there is a huge attrition of highly senior people from non-finalized projects, you do not belong leadership either. Both cases are mentioned in the article.

Unfortunately our socioeconomic and political culture in the West has increasingly removed responsibilities and liabilities from the leadership of the companies. This causes people with lackluster technical, communication and risk assesment mentality being promoted into leadership positions.

So outside of a couple completely privately owned companies or exceptionally well organized NGOs, it will be increasingly difficult to find good leaders.


Even before vide coding this problem existed.

The truth is, only small companies build good stuff. Once a company becomes big enough, the main product that it originally started on is the only good thing that is worth buying from them - all new ventures are bound to be shit, because you are never going to convince people to break out of status quo work patterns that work for the rest of the company.

The only exception to this has been Google, which seems to isolate the individual sectors a lot more and let them have more autonomy, with less focus on revenue.


OP was not successful because they didn't want to fix the problems he discussed. I have been in the same exact situation, and no level of communication skills would have been successful in changing their minds.

Or they did, but they needed/wanted to do something else more.

That's usually based on either (a) more perspective, or (b) lack of foundational depth.


Maybe they didn’t have sufficient visibility at the ground level to make proper decisions.

Absolutely textbook "Brilliant Jerk". Dude just whines and whines and whines. If you're so good, why can't you get anybody to work with you?

I did not get that impression at all. He mentioned quite a few conversations with partner level employees, technical fellow, principal managers.

The impression I got is he tried to fix things, but the mess is so widespread and decision makers are so comfortable in this mess that nobody wants to stick their necks out and fix things. I got strong NASA Challenger vibes when reading this story…


My read is he was not Sr enough in the org to drive any effort to improve things, and could not get someone who was to do it either.

The title is a complete nonsense.

So is this comment.

Yeah I agree

I am not defending Delve or anything and I hope they get what they deserver but there is no correlation between SOC2 certification and the actual cyber capability of a company. SOC2 and ISO27001 is just compliance and frankly most of it is BS.

Sure it's certainly not perfect and a lot of the documentation is something you just write for the audit and never look at it again but that's why I am saying play the odds. The average delve customer startup might be less secure that the average startup who has to justify their processes to a real auditor.

Personally, I use them as frameworks to justify management processes.

A) I tie the cybersecurity activities to business revenue enabling outcomes (unblocked contracts), and second to reduced risk (as people react less to this when spending the buck).

B) with the political capital from point A) I actually operate a cybersecurity program, justify DevSecOps artefacts, threat modeling, incident response exercises, etc.

What this SOC2 reports, ISO27k certificates are, more like a standardization for communicating the activities of the org to outside people, and getting an external person to vet that the org doesn't bulls*t too much. but at the end of the day, the organization is responsible for keeping their house in order.


I went through SOC2 Type I and II. I’d say that most of that stuff is necessary, like splitting environments and so on. That doesn’t mean it’s anything close to sufficient to avoid being hacked.

It’s a framework to give you the direction, then if employees are careless (or even malicious), no security standard is complete enough to protect a company.


Not to be pedantic about the topic but SOC 2 is an auditing standard, not a security framework. It defines what you’ll be assessed against but it doesn’t tell you how to build your security program. You’ll find the prescriptive controls in real frameworks like ISO 27001, NIST CSF, or CIS Controls which do give you a structure for implementing security.

Delve and Emdash. Are there more products or companies with similar names?

Polsia (AI slop backwards)

According to SemiAnalysis, it is akin to getting a FAA certification.

https://x.com/HotAisle/status/2035062702587232458


Some of it is, but things like "your stage/dev and production environments should be completely isolated from eachother" are valid and most tech companies get lazy on this front

It was never about cyber capability. It's a liability transfer framework.

If a service provider has a control that says "we use firewalls on all network access points, and configure those firewalls to CIS benchmark whatever", and a third-party signs off with "yes we checked, they have the firewalls, and they're configured properly", you now have two parties you can sue when a security incident caused by lack of firewalls causes you material damage.

Your org's cyber insurance will also go down if you can say "all our vendors have third-party attested compliance, and we do annual compliance reviews".


It might feel like BS, and I'm inclined to agree with you because of the security theater aspect. (For example, Mercor had their verification done by what appears to be a legitimate audit firm.)

But it's not useless. It still forces you to go through a very useful exercise of risk modeling and preparation that you most likely won't do without a formal program.


If your goal is to maximize your posture against cyber threats, spending your time on SOC 2 compliance with Vanta (or similar) is a waste of time if you consider the amount of time spent compared to security gained.

It's incredibly easy to get SOC 2 audited and still have terrible security.

> forces you to go through a very useful exercise of risk modeling

Have you actually done this in Vanta, though? You would have to go out of your way to do it in a manner that actually adds significant value to your security posture.

(I don't think SOC/ISO are a waste of time. We do it at our company, but for reasons that have nothing to do with security)


Probably the most useful aspect of SOC2 is that it gives the technical side of the business an easy excuse for spending time and money on security, which, in startup environment is not always easy otherwise (Ie “we have to dedicate time to update our out of date dependencies, otherwise we’ll fail SOC2”).

If you do it well, a startup can go through SOC2 and use it as an opportunity to put together a reasonable cybersecurity practice. Though, yeah, one does not actually beget the other, you can also very easily get a soc2 report with minimal findings with a really bad cybersecurity practice.


That's exactly what I've done in the past. We had to be soc2 and pci dss compliant (high volume so couldn't be through saq). I wouldn't say the auditor helped much in improving our security posture but allowed me to justify some changes and improvements that did help a lot.

It doesn't force you go through risk modelling because by now most SOC2 platforms have templates you just fill in the blanks and sign off. Conversely, the auditors are paid by the company, so their incentive is to pass the audit so the client can get what it wants.

Because there's no adversarial pressure as a check and balance to the security, and AICPA is clearly just happy to take the fees, it's a hollow shirt. It's like this scene from The Big Short. https://youtu.be/mwdo17GT6sg?si=Hzada9JcdIPfdyFN&t=140

As usual, it's only people that care that force positive change. The companies that want good security will have good security. Customers who want good security will demand good security.


Having been through SOC2, it doesn't mean a company is rock solid, but it definitely makes the company button up loose ends, if taken seriously.

The main use of these certs is to give people that actually want to do their job a stick to hit their bosses with.

Yes they may be a BS in certain cases, however its still better than nothing. They do allow the companies to consider the questions atleast instead of claiming unawareness and most importantly it facilitates the incremental improvement.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: