Not a copy, a hash or fingerprint. Just enough data to measure if it's substanti...

Kim_Bruning · on Jan 14, 2023

If I understand correctly, wouldn't a hash database of <just the training set> be larger than the actual model? (in fact by 1 or 2 orders of magnitude?)

8n4vidtmkvmk · on Jan 15, 2023

Yeah, I guess so. The models are only 4 or 8 GB. A giant list of hashes would be bigger, sure. But they're 2 very different things. Model is for generating new images, this hash database is copyright enforcement. If you really want to check for violations I don't know how else you're going to do it.

Filligree · on Jan 14, 2023

Approximately, yes.

galleywest200 · on Jan 14, 2023

Couldn't I just add a few non-sense bytes into my images to change the hash/fingerprint?

8n4vidtmkvmk · on Jan 15, 2023

Hash yes, fingerprint maybe no. Maybe I'm using the term incorrectly here, but I think of fingerprint like a lossy hash. Like one way of doing this would be to resize the image to, say, 8 by 8, and quantize it to say, 16 colors. So the fingerprint size is 884 bits=32 bytes. Tiny changes aren't likely to change the fingerprint. You'd probably have to do something a little more clever so as not to get too many false positives though. Or once you get a hit, do a deeper comparison.