Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not a copy, a hash or fingerprint. Just enough data to measure if it's substantially similar.

But yes, it may be infeasible to index and compare against every image ever uploaded.



If I understand correctly, wouldn't a hash database of <just the training set> be larger than the actual model? (in fact by 1 or 2 orders of magnitude?)


Yeah, I guess so. The models are only 4 or 8 GB. A giant list of hashes would be bigger, sure. But they're 2 very different things. Model is for generating new images, this hash database is copyright enforcement. If you really want to check for violations I don't know how else you're going to do it.


Approximately, yes.


Couldn't I just add a few non-sense bytes into my images to change the hash/fingerprint?


Hash yes, fingerprint maybe no. Maybe I'm using the term incorrectly here, but I think of fingerprint like a lossy hash. Like one way of doing this would be to resize the image to, say, 8 by 8, and quantize it to say, 16 colors. So the fingerprint size is 884 bits=32 bytes. Tiny changes aren't likely to change the fingerprint. You'd probably have to do something a little more clever so as not to get too many false positives though. Or once you get a hit, do a deeper comparison.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: