Honestly, can people stop speaking in absolutes regarding these systems? We (researchers and non-researchers alike) are gradually trying to comprehend exactly how much they generalise and memorise, but this is darn hard work and it is not our fault that several major tech giants decided to deploy and profit from these models long before the scientific and legal landscape was clear. Somepalli et al. (2022) [1] for example is a fairly strong argument against your statement above.
The fact is that these systems are complex, new, and interesting. However, it is not the fault of small-time programmers and artists that modern copyright law is a major, overreaching mess that is now finally greatly affecting what the big corporations want to do. They are getting sued? Cry me a river… Perhaps they will finally stop backing the American-led copyright lobby then?
> is a fairly strong argument against your statement above.
From a quick skim of this paper, they apparently used toy models with a few hundred to a few thousand images in the training set. For the ones with as few as a few thousand training images, they rarely or never saw exact duplicates.
For instance, in their figure 4, they show exact duplicates for the training set with only 300 images (well, duh), and didn't find any exact duplicates for the training set with only 3,000.
I'm not sure I'd call this a "strong argument" when applied to models with millions or billions of images. Quite the contrary. LAION-5B (used in Stable Diffusion) was trained on 5 billion image/caption pairs.
Firstly, thank you for engaging in a discussion. Secondly, I am not an expert in image processing, rather my focus in on language. Thus my intuitions will not work as much in my favour in this domain, although the models do have similarities.
They explore a range of sizes and I do not think it is fair to to only highlight the smallest ones. They do explore a 12M subset of LAION in Section 7 for a model that was trained on 2B images. Yes, it is not an ideal experimental setup to use a subset (they admit this) and far from LAION-5B, but it is a fair stab at this kind of analysis and is likely to lead to further explorations.
Let us return though to your claim, which is what I objected to: “Pretty much none of these systems ‘reconstruct an image in detail’.” I think it is fair to say that this work certainly makes me doubt whether none of these systems (even the larger ones) exhibit behaviour that may limit their generalisability or cross the boundary of what is legally considered derivative work.
You may very well be right that once we scale to billions of images this behaviour is improved (or maybe even disappears), but to the best of my knowledge we do not know if this is the case and we do not know when, how, and why it occurs if it does occur. I remain a firm believer that these kinds of models are the future as there is little evidence that we have reached their limits, but I will continue to caution anyone that talks in absolutes until there is solid evidence to support those claims.
[1]: https://arxiv.org/abs/2212.03860
The fact is that these systems are complex, new, and interesting. However, it is not the fault of small-time programmers and artists that modern copyright law is a major, overreaching mess that is now finally greatly affecting what the big corporations want to do. They are getting sued? Cry me a river… Perhaps they will finally stop backing the American-led copyright lobby then?