Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

  > Yeah, this is a bullshit article. There is no such degradation, and it’s absurd to say so on the basis of a single problem which the author describes as technically impossible. It is a very contrived under-specified prompt.
I see "No True Scotsman" argument above.

  > My evidence is that RL is more relevant is that that’s what every single researcher and frontier lab employee I’ve heard speak about LLMs in the past year has said.
Reinforcement learning reinforces what is already in the LM, makes width of search path of possible correct answer narrower and wider search path in not-RL-tuned base models results in more correct answers [1].

[1] https://openreview.net/forum?id=4OsgYD7em5

  > I have never once heard any of them mention new sources of pretraining data, except maybe synthetic data they generate and verify themselves, which contradicts the author’s story because it’s not shitty code grabbed off the internet.
The sources of training data already were the reasons for allegations, even leading to lawsuits. So I would suspect that no engineer from any LLM company would disclose anything on their sources of training data besides innocently sounding "synthetic data verified by ourselves."

From the days I have worked on blockchains, I am very skeptical about any company riding any hype. They face enormous competition and they will buy, borrow or steal their way to try to not go down even a little. So, until Anthropic opens the way they train their model so that we can reproduce their results, I will suspect they leaked test set into it and used users code generation conversation as new source of training data.

 help



That is not what No True Scotsman is. I’m pointing out a bad argument with weak evidence.

  >>> It is a very contrived under-specified prompt.
No True Prompt can be such contrived and underspecified.

The article about degradation is a case study (single prompt), weakest of the studies in hierarchy of knowledge. Case studies are basis for further, more rigorous studies. And author took the time to test his assumptions and presented quite clear evidence that such degradation might be present and that we should investigate.


We have investigated. Millions of people are investigating all the time and finding that the coding capacity has improved dramatically over that time. A variety of very different benchmarks say the same. This one random guy’s stupid prompt says otherwise. Come on.

As far as I remember, article stated that he found same problematic behavior for many prompts, issued by him and his colleagues. The "stupid prompt" in article is for demonstration purposes.

But that’s not an argument, that’s just assertion, and it’s directly contradicted by all the more rigorous attempts to do the same thing through benchmarks (public and private).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: