Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's surely a difficult decision.

You either:

1.) train the model using public-domain or "free" content, get a mix of very old-style writing and a very wide range of quality blended into an overall bad result,

or, you:

2.) train the model using copyrighted content, get "so-so-but-better" results (because the model is still not able to produce quality by remixing quality), but you won't be able to release it because you can't ensure that it won't reveal parts of its training content later-on.

So... at max you can use it for PR only and never open it to anyone...

Yeah, really hard to guess what happened here.... /s



We know every modern LLM was trained on masses of in-copyright material without a license, and no one has had trouble releasing them, so that's probably not the issue with release of this model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: