> Training on copyleft licensed code is not a license violation. Any more than a person reading it is.
Some might hold that we've granted persons certain exemptions, on account of them being persons. We do not have to grant machines the same.
> In copyright terms, it's such an extreme transformative use that copyright no longer applies.
Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim? Sure, it can also produce extremely transformed versions, but is that really relevant if it holds within it enough information for a (near-)verbatim reproduction?
>Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim? Sure, it can also produce extremely transformed versions, but is that really relevant if it holds within it enough information for a (near-)verbatim reproduction?
I feel as though, from an information-theoretic standpoint, it can't be possible that an LLM (which is almost certainly <1 TB big) can contain any substantial verbatim portion of its training corpus, which includes audio, images, and videos.
No we don't have to, but so far we do, because that's the most legally consistent. If you want to change that, you're going to need to pass new laws that may wind up radically redefining intellectual property.
> Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim?
Of course it has, if the transformation is extreme, as it appears to be here. If I memorize the lyrics to a bunch of love songs, and then write my own love song where every line is new, nobody's going to successfully sue me just because I can sing a bunch of other songs from memory.
Also, it's not even remotely clear that the LLM can produce the training data near-verbatim. Generally it can't, unless it's something that it's been trained on with high levels of repetition.
> you're going to need to pass new laws that may wind up radically redefining intellectual property
You're correct that this is one route to resolving the situation, but I think it's reasonable to lean more strongly into the original intent of intellectual property laws to defend creative works as a manner to sustain yourself that would draw a pretty clear distinction between human creativity and reuse and LLMs.
> into the original intent of intellectual property laws to defend creative works as a manner to sustain yourself
But you're missing the other half of copyright law, which is the original intent to promote the public good.
That's why fair use exists, for the public good. And that's why the main legal argument behind LLM training is fair use -- that the resulting product doesn't compete directly with the originals, and is in the public good.
In other words, if you write an autobiography, you're not losing significant sales because people are asking an LLM about your life.
For real, I'm not certain we will ever be able to merge AI code without human review. But:
1. Every time I've confidently though "AI will never be able to do X" in the last year, I've later been proven wrong, so I'm a bit wary to assume that again without strong reasons.
2. I see blog posts by some of the most AI-forward people that seems to imply some people are already managing large codebases without human review of raw code. Maybe they're full of crap - there are certainly plenty of over-credulous bs artists in the AI space - but maybe they're not.
3. The returns on figuring this out are so incredibly high that, if it's possible, people will figure it out.
All that to say: it's far from certain, but my bias is that it is possible.
1. Every time I've confidently stated "this AI architecture will never be able to do X" in the past 6 years, I've not been proven wrong (with one possible exception earlier today: https://news.ycombinator.com/item?id=47291893 – the jury's still out on that one). … No, my version doesn't really work, does it? It just sounds like bragging, or maybe hubris.
> some people are already managing large codebases without human review of raw code.
2. I have never believed this to be impossible. I do, however, maintain that these codebases are necessarily some combination of useless, plagiarism, and bloated. I have yet to see a case where there isn't a smaller, cheaper way to accomplish the same task faster and better.
> The returns on figuring this out are so incredibly high
3. And yet, they still haven't figured it out. My bias is that it isn't possible, because nothing has fundamentally changed about the model architectures since I first skimmed a PDF about GPT, and imagined an informal limiting proof that I still haven't found any holes in.
Because you say we need to figure out techniques to do it. If it's not possible, then there are no techniques to do it. Since you want the techniques, I assume you assume that they exist.
> 1. Every time I've confidently though "AI will never be able to do X" in the last year, I've later been proven wrong, so I'm a bit wary to assume that again without strong reasons.
That's evidence that you shouldn't assume something is impossible. I'm not suggesting that, either.
> 2. I see blog posts by some of the most AI-forward people that seems to imply some people are already managing large codebases without human review of raw code. Maybe they're full of crap - there are certainly plenty of over-credulous bs artists in the AI space - but maybe they're not.
Do you have any idea whether this works well though?
> 3. The returns on figuring this out are so incredibly high that, if it's possible, people will figure it out.
Ok. But again, that's a big if there.
The returns on breaking a popular cryptographic algorithm are also huge, but that's not an indication that it's possible, or that it's impossible for that matter.
I'm baffled why people think that "it would be great if..." has any bearing on the chances that the thing that follows is true.
It's a bad faith question or one so deeply uninformed that parent is correct. It only takes a couple clicks to see the ideas of the people who are "just asking questions".
I swear “check their post history” has got to be weakest form of ad hominem going. “I can deflect from answering difficult questions if I attack the messenger” is just so weak.
You are mistaken, probably not for the first time today.
Even big copyright firms. Disney especially is known for rehashing existing material and then not allowing anyone else to do the same with their stuff. Disney does not have a lot of original stories.
> If “AI-rewriting” is accepted as a valid way to change licenses, it represents the end of Copyleft. Any developer could take a GPL-licensed project, feed it into an LLM with the prompt “Rewrite this in a different style,” and release it under MIT. The legal and ethical lines are still being drawn, and the chardet v7.0.0 case is one of the first real-world tests.
This isn't even limited to "the end of copyleft"; it's the end of all copyright! At least copyright protecting the little guy. If you have deep enough pockets to create LLMs, you can in this potential future use them to wash away anyone's copyright for any work. Why would the GPL be the only target? If it works for the GPL, it surely also works for your photographs, poetry – or hell even proprietary software?
… when it works. And if you never have to change camera or microphone settings.
> and calendar integration.
The little notification that pops up telling you your meeting is about to start based on your calendar? The one you better not click in the first 5 or so seconds it's there, because then you'll end up with an error message that tells you absolutely nothing, have to go back to the chat, and try again?
No, installing apps is just genuinely bad UX, at least for me, because my password manager doesn't work in apps, and I have to manually generate and copy and paste password. If creators of the app are extra stupid, they make it so that you cannot paste into the password input, so I have to enter the generated password character by character.
And why is it broken? Is there a way for a password manager app to somehow inspect other apps and identify forms within them and interact with the forms?
...yes? I can't tell if you're trolling at this point or genuinely unaware.
Both iOS and Android have APIs for this, you (as the app developer) just mark the relevant fields in the app as login/password/etc, and the OS will interact with your chosen password manager to autofill and/or save them.
If you've never seen this work on your device, then you might have something configured incorrectly — many app developers are incompetent and bad at this; but not _all_ of them.
It works for filling an existing password, but not for creating a new one, iOS still prompts me to fill existing even though I'm on the sign up page. On the web 1Password can also automatically generate a Fastmail masked email address, but I doubt there's any hope for that to work in a native app.
I've flown with many European budget carriers and have never once seen this requirement. Sure, they might charge for or not provide printed boarding passes, but they've always sent me a PDF or PNG boarding pass by e-mail or provided one through their website. That, in my book, is a non-issue. Forcing an app is a huge issue, and shouldn't be legal if the only reasonable way to get the app is agreeing to the draconian conditions of one of two gatekeeping companies subject to foreign jurisdictions.
This is an insane take. Apple and Google both reserve the right to deny accounts to people without any legal appeal at worst. At best, a legal dispute would have to be resolved in US courts. How other countries (including my own) accept that as a condition to use public services is beyond belief, and pointing this out is not an overreaction.
Why are we willingly placing private companies – private companies subject to foreign jurisdictions, even! – in the role of gatekeepers of public services? We have surely completely lost our minds!
You can literally just use the online form. Have you actually tried using your search engine to query applying, and seeing just how easy it is to optionally apply online?
Over half of the world’s population is using an Android or iOS device. Most people visiting a country in the UK or have the means to afford a trip, most likely have a functioning mobile phone.
I find it somewhat amusing you think I’m “insane” for suggesting most of the modern world has a relatively accessible Android or iOS device to apply for a visa.
> You can literally just use the online form. Have you actually tried using your search engine to query applying, and seeing just how easy it is to optionally apply online?
This whole story is about how they're trying to pressure you into using the app.
> Over half of the world’s population is using an Android or iOS device. Most people visiting a country like the UK or have the means to afford a trip, most likely have a functioning mobile phone.
That does not in any way affect any of what I wrote. I'll try to write it differently: Do you think it's OK that Google and Apple decide (at worst on their very own without oversight, at best with the oversight of a foreign country that isn't the one you're travelling to or from) who gets to do these things and under what conditions?
> I find it someone amusing you think I’m “insane” for suggesting most of the modern world has a relatively accessible android or iOS device to apply for a visa.
I find it insane that you think that because Google and Apple happen to grace most people with access to Android and iOS, then it's fine that we all live by their mercy.
I’m sorry but I can’t respond further, conversation is going nowhere. You clearly have it out for Apple and Google, and you are not being “pressured” into doing anything.
Critical reading and thinking would lead you through the flow to click on the “continue application online” form.
I thought so too, but if you follow the 'start now' link you get a page full of trying to push you to use the app, then if you say you cannot all the way at the bottom, then you get another page trying to help you with installing the app, then you actually get the form. I'm quite disappointed, usually the UK government digital services are not quite so user-hostile.
There is also a new UK government requirement to verify your identity if you are a director or significant shareholder in a UK company.
The online route for that goes through a couple of pages then says "now switch to the app on your smartphone". In theory you can also go to a Post Office to get your documents checked but it didn't work for me.
It's insane that multi trillion government would rely on a foreign private entity for something so simple yet critical. The only sane answer here is corruption.
Some might hold that we've granted persons certain exemptions, on account of them being persons. We do not have to grant machines the same.
> In copyright terms, it's such an extreme transformative use that copyright no longer applies.
Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim? Sure, it can also produce extremely transformed versions, but is that really relevant if it holds within it enough information for a (near-)verbatim reproduction?
reply