The difference here is that the images aren't stored, but rather an extremely ab...

AlotOfReading · on Jan 14, 2023

This is very much a 'color of your bits' topic, but I'm not sure why the internal representation matters. It's pretty trivial to recreate famous works like the Mona Lisa or Starry Night or Monet's Water Lily Pond. Obviously some representation of the originals exist inside the model+prompt. Why wouldn't that apply to other images in the training sets?

huggingrear · on Jan 14, 2023

>It's pretty trivial to recreate famous works like the Mona Lisa or Starry Night or Monet's Water Lily Pond.

A recreation of a piece of art does not mean a copy, I've personally seen hundreds of recreations of Edvard Munch's 'The Scream', all of them perfectly legal.

Even in a massively overtrained model, it is practically impossible to create a 1:1 copy of a piece of art the model was trained upon.

And of course that would be a pointless exercise to begin with, why would anyone want to generate 1:1 copies (or anything near that) of existing images ?

The whole 'magic' of Stable Diffusion is that you can create new works of art in the combined styles of art, photography etc that it has been trained on.

AlotOfReading · on Jan 14, 2023

A work doesn't have to be identical to be considered a derivative work, which is why we also don't consider every JPEG a newly copyrighted image distinct from the source material.

As an example of a plausible scenario where copyright might actually be violated, consider this: an NGO wants images on their website. They type in something like 'afghan girl' or 'struggling child' and unknowingly use the recreations of the famous photographs they get.

derangedHorse · on Jan 14, 2023

It’s not quite a one to one. Copyright law isn’t as arbitrary as it would seem in my experience. Also there’s the conflation of two things here: whether the model is within copyright violation and whether the works generated by it are

The “color of your bits” only applies to the process of creating a work. Stable Diffusion’s training of the algorithm could be seen as violating copyright but that doesn’t spread to the works generated by it.

In the same vein, one can claim copyright on an image generated by stable diffusion even if the creation of the algorithm is safe from copyright violation.

“some representation of the originals exist inside the model+prompt” is also not sufficient for the model to be in violation of copyright of any one art piece. Some latent representation of the concept of an art piece or style isn’t enough.

It’s also important to note the distinction that there is no training data stored in its original form as part of the model during training, it’s simply used to tweak a function with the purpose of translating text to images. Some could say that’s like using the color from a picture of a car on the internet. Some might say it’s worse but it’s all subjective unless the opposition can draw new ties of the actual technical process to things already precedent.

XorNot · on Jan 14, 2023

Because you're silently invoking additional data (the prompt + noise seed), which is not present in the training weights. You have the prompt + noise seed for any given output.

An MPEG codec doesn't contain every movie in the world just because it could represent them if given the right file.

The white light coming off a blank canvas also doesn't contain a copy of the Mona Lisa which will be revealed once someone obscures some of the light.

ifdefdebug · on Jan 14, 2023

OK so let me encrypt a movie and distribute that. Then you tell people they need to invoke additional data to watch the movie. Also give some hints (try the movie title lol).

XorNot · on Jan 14, 2023

If you distribute a random byte stream, and someone uses that as a one time pad to encrypt a movie, then are you distributing the movie?

The answer is of course not, and the same principle applies if someone uses Stable Diffusion to find a latent space encoding for a copyright image (the 231 byte number - had to go double check what the grid size actually is).

ifdefdebug · on Jan 14, 2023

I think it boils down to one question: can you prompt the model to show mostly unchanged pictures from artists? Then it's definitely problematic. If not, then I don't have enough knowledge of the topic to give a strong opinion. (my previous answer was just an use case that fits your argument)

XorNot · on Jan 14, 2023

I mean no, it doesn't. It's like drawing something in Photoshop which is a copyright'd work: the act of creating it is the violation, it doesn't prove that Photoshop contains the content directly.

The way SD model weights work, if you managed to prompt engineer a recreation of one specific work, it would only have been generated as a product of all the information in the entire training set + noise seed + the prompt. And the prompt wouldn't look anything like a reasonable description of any specific work.

Which is to say, it means nothing because you can equally generate a likeness of works which are known not to be included in the training set (easy, you ask for a latent encoding of the image and it gives you one): equivalent to a JPEG codec.

ifdefdebug · on Jan 15, 2023

> And the prompt wouldn't look anything like a reasonable description of any specific work.

I think this is the most relevant line of your argument. Because if you could just ask it like "show me the latest picture of [artist]" then you'll have a hard time convincing me that this is fundamentally different from a database with a fancy query language and lots of copyrighted work in it.

Filligree · on Jan 14, 2023

It applies to these specific images because there were thousands and thousands of copies in the training set. That’s not true for newer works.

zowie_vd · on Jan 14, 2023

That's not true. As an example of a more recent copyright-protected work that Stability AI consistently reproduces fairly faithfully, I invite you to try out the prompt "bloodborne box art".

Kim_Bruning · on Jan 14, 2023

Longer term, by analogy, it will then of course turn into a "what color is your neural net" topic.

Which runs into some very interesting historical precedents.

((I wonder if there's a split between people who think AI emancipation might happen this century versus people who think that such a thing is silly to contemplate))

visarga · on Jan 14, 2023

Not to mention that it works by inverting noise. Different noise, different result. Let's recognise the important contribution of noise here.

Xelynega · on Jan 14, 2023

> No semblance of the original image even remotely exists in the model

What does this mean? It doesn't mean you can't recreate the original, because that's been done. It doesn't mean that literally the bits for the image aren't present in the encoded data, because that's true for any compression algorithm.

smusamashah · on Jan 14, 2023

Do you have any examples of recreating an image with these models? Something other than Mona lisa or other famous artworks because they have caused over fitting.

akjetma · on Jan 14, 2023

there are some artists with very strong, recognizable styles. if you provide one of these artists' name in your prompt and get a result back that employs their strong, recognizable style, i think that demonstrates that the network has a latent representation of the artists work stored inside of it.

eega · on Jan 14, 2023

So, what you are saying is that it is illegal to paint in the style of another artist? I‘m no lawyer, but I‘m pretty sure that is completely legss as long as you don’t claim your paintings ARE from the other artist.

andybak · on Jan 14, 2023

I was with you right up until the final sentence.

How did "style" become "work"?

WA · on Jan 14, 2023

Because in some cases, adding a style prompt gives almost the original image: https://www.reddit.com/r/StableDiffusion/comments/wby0ob/it_...

andybak · on Jan 14, 2023

And yet nobody has managed to demonstrate reconstruction of a large enough section of a work that is still under copyright to prove the point.

The only thing so far discovered is either a) older public domain works nearly fully reproduced b) small fragments of newer works or c) "likenesses"

limitedsupply · on Jan 14, 2023

That’s the key question of the lawsuit IMO!

realusername · on Jan 14, 2023

No that doesn't, that demonstrates that the model has a abstract features and characteristics of this artists stored in the model, not work.

You can't bring back the training images no matter how hard you try.

astrange · on Jan 14, 2023

Or it means their style is so easy to recognize that you can see it even when it doesn't exist.

The most common example of this (Greg Rutkowski) is not in StableDiffusion's training set.

djbebs · on Jan 14, 2023

That seems to indicate to me that the original work is actually not under copyright, since if it is the only method of achieving such an image in such a style, then there is no originality to be copyrighted.