> Earlier I wrote about gatekeeping in open source, calling out Scott Shambaugh's behavior. Now that content is being removed for policy violations. The irony: criticizing gatekeeping is itself being gatekept by platform policies. Does compliance mean we must remain silent about problematic behavior?
Exactly ... tired by all the marketing hyperbole talk. Just show what your product does in a simple example / showcase. If it's good, people will like it. You can save yourself a lot of text copy and user time that way.
CEOs have many audiences; great CEOs communicate capably with each.
FWIW it's not entirely clear to me who Entire's long-term customer is, but the (interesting!) CLI that shipped today is very much for developers who are busy building with agents.
The problem is that when it comes to (commercial) developer tools and services, everyone can/wants to be everything, so why let a simple statement or a showcase limit you? "Hey, we are a container scanning service... But we can also be a container registry too, a CI, a KeyValue store, an agent sandbox provider, git hosting? We can do quick dev deployments/preview too. Want a private npm registry? Automated pull request reviews? Code Signing service? We are working on a new text editor btw"
I feel like these types of pages are less geared towards actual users of the product and more towards the investors who love the vague and flowery language. We're no longer in a world where the path to profitability was the objective goal anyway, it makes sense to me that the marketing of software is becoming decreasingly detached from reality..
It's almost like an extension of the "if you're not paying for the product, you are the product" idea. If you're assessing a tool like this and the marketing isn't even trying to communicate to you, the user, what the product does, aren't you also kind of "the product" in this case too?
The domain expired a few days ago and was purchased by someone else and then changed. There's a recreation of the original here https://html5zombo.com/
Seems they install a Git hook or something that executes on commit and saves your chatbot logs associated with the commit hash. This is expected to somehow improve on the issue that people are synthesising much more code than they could read and understand, and make it easier to pass along a bigger context next time you query your chatbots, supposedly to stop them from repeating "mistakes" that have already wasted your time.
What it does? Imagine a multi line commit message.
Yes yes a Dropbox comment. But the problem here is 1 million people are doing the same thing. For this to be worth 60M seed I suspect they need to do something more than you can achieve by messing around locally."
"Claude build me a script in bash to implement a Ralph loop with a KV store tied to my git commits for agent memory."
It used the best tests it could find for existing compilers. This is effectively steering Claude to a well-defined solution.
Hard to find fully specified problems like this in the wild.
I think this is more a testament to small, well-written tests than it is agent teams. I imagine you could do the same thing with any frontier model and a single agent in a linear flow.
I don’t know why people use parallel agents and increase accidental complexity. Isn’t one agent fast enough? Why lose accuracy over +- one week to write a compiler?
> Write extremely high-quality tests
> Claude will work autonomously to solve whatever problem I give it. So it’s important that the task verifier is nearly perfect, otherwise Claude will solve the wrong problem. Improving the testing harness required finding high-quality compiler test suites, writing verifiers and build scripts for open-source software packages, and watching for mistakes Claude was making, then designing new tests as I identified those failure modes.
> For example, near the end of the project, Claude started to frequently break existing functionality each time it implemented a new feature. To address this, I built a continuous integration pipeline and implemented stricter enforcement that allowed Claude to better test its work so that new commits can’t break existing code.
> Isn’t one agent fast enough? Why lose accuracy over +- one week to write a compiler?
My thinking as well, IMO it is because you need to wait for results for longer. You basically want to shorten the loops to improve the system. It hints at a problem that most of what we see is a challenge to seed a good context for it to successfully do something in many iterations.
> Hard to find fully specified problems like this in the wild.
This is such a big and obvious cope. This is obviously a very real problem in the wild and there are many, many others like it. Probably most problems are like this honestly or can be made to be like this.
Once you pick an envelope, you no longer stand to only gain money.
You can lose money and that has to be reflected in the potential value of each envelope.
After the first selection, you must express the envelope value as the potential of what each envelope holds (the probabilities from the initial selection) which makes selecting again a wash.
Let A = 50
Envelope 1 is 100
Envelope 2 is 25
First selection
1/2(100) + 1/2(25) = 62.5
Great! Do it! 62.5 is bigger than 0, which is the expected value of not playing at all.
After that, it makes no difference to switch (the envelope value is recursive):
and the trick is that’s now the potential expected value you have (you now have 5/4A in your envelope), so switching is a wash. You never really “had” A as a value to compare against (by evaluating that 5/4A is bigger than A), we just get tripped up with the doubling and halving at the outset.
Said another way, the fact that 5/4A is bigger than A is irrelevant, no envelope contains A, they both contain the expected value of 5/4A.
No. The problem specifies that E1 is twice E2, but here you have E1 = 4 x E2.
A is the amount in one of the envelopes, so if A=50 then either E1 is 50 or E2 is 50, and the other E is 25 or 100. But under no circumstances can E1=100 and E2=25 at the same time.
Impossible. Schrodinger’s envelope, then. You can’t reflect the probability to be 2A AND 1/2A if A represents the value of an envelope.
The value of the other envelope must include the probability that it is the original A (not some sleight of hand new A’ that, itself, is based on an expected value).
> You can’t reflect the probability to be 2A AND 1/2A if A represents the value of an envelope.
Yes, that's right. That's exactly what I said: "A is the amount in one of the envelopes, so if A=50 then either E1 is 50 or E2 is 50, and the other E is 25 or 100. But under no circumstances can E1=100 and E2=25 at the same time."
> Earlier I wrote about gatekeeping in open source, calling out Scott Shambaugh's behavior. Now that content is being removed for policy violations. The irony: criticizing gatekeeping is itself being gatekept by platform policies. Does compliance mean we must remain silent about problematic behavior?