Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's super easy to watermark weights for ML models.

Just add a random 0.01 to a random weight anywhere in the network. It will have very little impact on the results, but will mean you can identify who leaked the weights.



It should be easy enough to make that sort of signature very difficult to trace by simply adding a bunch of small noise to the network overall, or even simply training for a few iterations.


The person leaking it might not do so intentionally. Their computer might be compromised. Are we going to punish people for not being cybersec experts?


Compare two copies.


Slightly modify a million random weights by changing the least significant bit up or down.


Compare three copies.


Or slightly randomly modify all the parameters on the copy you distribute, then it will be a match for nobody.


You compare all three and average the variance of each value. So the more copies the better.


...or just steal it so that even if it can be traced, it's not your problem.


In fairness to ipaddr, this can result in worse performance at this point.


OK, this is a fun game. I think your counterattack assumes I'm picking these million weights uniformly randomly among the 175 billion. I modify my original answer: s/a million/half the weights in a deterministic subset of 2 million weights/

Select the deterministic subset by just hashing some identifier for each weight.

For any reasonable number of copies, there's a pretty unique subset between all your copies sharing a large amount of bits flipped in the same direction among this subset.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: