The process is described above, but it’s very hard to “innocently” end up with o...

heavyset_go · on Aug 5, 2021

> The process is described above, but it’s very hard to “innocently” end up with one of those images that they are looking for from the database.

Actually, it's very easy to end up with an image that has a similar perceptual hash to an illegal image.

They are not doing MD5 hashing, they're taking perceptual hashes and then using something like the hamming distance or Levenshtein distance to make a fuzzy match with hashes from illegal images.

I've built products using these methods, and it is incredibly easy to make a fuzzy match based on perceptual hashes from two images that have nothing to do with each other.

gojomo · on Aug 5, 2021

> but it’s very hard to “innocently” end up with one of those images that they are looking for from the database.

Are you sure, given how many iPhone takeover/jailbreak bugs regularly exist, including the recent 0-click iMessage bug? Would you like to dare a hacking group, domestic or foreign, to land a single verboten image on your iPhone?

> And the way it’s being done (hashes), a collision is highly unlikely.

The limited details so far are suggestive that its using perceptual hashes, which are more susceptible to collision-engineering/false-positives than cryptographically-secure hashes.

ihattendorf · on Aug 5, 2021

What happens if someone spams CP via iMessage to their enemies?

Additionally, this isn't just a hash of the file but a perceptual hash on the image content. So e.g. changing a single bit in the image would create a different cryptographic hash, but generally not a different perceptual hash.

aviraldg · on Aug 5, 2021

Presumably the verification that these false matches are not problematic is manual?

That's not good enough given how our media currently works. Imagine articles published when information about such a check leaks that say "celebrity X's phone checked by police for suspected CSAM." While that is the truth ("suspected") no one cares about that nuance and such a person would get cancelled very quickly, even if there was no evidence of wrongdoing.

JohnFen · on Aug 5, 2021

> it’s very hard to “innocently” end up with one of those images that they are looking for from the database.

Of course, there's no telling what images will be in the database when this inevitably expands beyond CP.

curryst · on Aug 5, 2021

Presuming that is what they're doing, it won't stay that way for long.

It's absurdly easy to change the hash of a file. In the case of something like JPEG, you don't even have to change the file itself, you can just change the metadata. Apple could presumably only hash the image itself, but again, all you have to do is make tiny, imperceptible to humans changes to the image and the hash is totally different.

Long story short, this is either nearly pointless and privacy invasive, or it's about to get drastically more invasive to be effective.

elzbardico · on Aug 5, 2021

It would be trivial for a bad intentioned actor to send those images to a target that he wanted to incriminate if the target is not tech savvy.

alasano · on Aug 5, 2021

Furthermore, "highly unlikely" is an understatement. Using SHA256 there are 2^256 hashes.

That's 115792089237316195423570985008687907853269984665640564039457584007913129639936. Probability of collision (two different inputs generating the same hash) is infinitesimal.