Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are you talking about?


It's pretty simple. GPT models are essentially information weapons. People are going to get their hands on them, so might as well give them a model where you can identify content generated with them, so you can know who is using them for nefarious purposes. Like how many printers encode hidden patterns on paper that identify the model of the printer and other information[0]

0. https://www.bbc.com/future/article/20170607-why-printers-add...


This is nonsense.


Would an AI @ FB employee admit it if it was true?


> I will never discuss FB technical details, internals, or anything else on this site, so please do not ask.

My claim of nonsense has nothing to do with FB. You cannot fingerprint models like this, that's just not how it works.

Also, if we are reading profiles, you call yourself a 10x engineer on your blog, that's hilarious. Maybe 10x the nonsense?


Please don't start a profile analysis flamewar. It just escalates and makes everyone unhappy.

I think it's OK if people notice you work at Facebook. There are people on HN that like to attack anyone nice enough to engage with them just because they work at a big company. I worked at Google for many years, and people were off to blame me personally for every decision that Google made that they didn't like. My approach was to just say, look, the CEO didn't ask me, and if they did I would have said no. If you have concerns with something I actually work on, I'd love to adjust it based on your feedback. (That was network monitoring for Google Fiber, and wasn't very controversial. But, HN loves to lay in to you if you open yourself up for it. I learned a lot about people.)

In this case, I think the best you can do is to say "I don't think it's possible to add fingerprinting, and if it were, I would fight to not add it. I also don't know of any decision to add fingerprinting, and like I said, I would try to make sure we didn't do it." (Or if you're in favor and it's not technically possible, you could say that too!)

Anyway, it is really nice to hear from people "in the trenches". Please don't let people being toxic scare you away or bait you into a flamewar. Comments like yours remind us that even in these big companies whose political decision we may not like, there are still people doing really good engineering, and that's always fun to hear about.


To be clear, I wasn't intending to come across as attacking voz, only pointing out that I don't think anyone "in the know" at Meta/Facebook would admit to it even if they were doing it, so hearing "This is nonsense." doesn't really tell anybody much. They would likely say the same thing whether they thought it was nonsense or not.


No, they would likely not say anything. Explicitly denying it is saying something. But also - just to backup your claim how do you fingerprint a model? It seems logically impossible to me, if you are trying to mimic a certain intelligence, and you specifically "unmimic" it... then you may as well not try.


That's a good point, and a valid correction. Thank you!


>You cannot fingerprint models like this

A GAN can absolutely be trained to discriminate between text generated from this model or another model.

>that's hilarious

What's hilarious about it?


That would be interesting if it was true, but I think it can’t be true because LLMs main advantage is they memorize text in their weights and so your discriminator model would need to be the same size as the LLM.

That said the smaller GPT3 models break down quite often so they’re probably detectable.


In the same way we can train models that can identify people from their choice of words, phrasing, grammar, etc, we can train models that identify other models.


That's anthropomorphizing them - a large language model doesn't have a bottleneck the same way a human does (in terms of being able to express things), it can get on a path where it just outputs memorized text directly and it won't be consistent with what it usually seems to know at all.

Also, you could break a discriminator model by running a filter over the output that changes a few words around or misspells things, etc. Basically an adversarial attack.


I agree it is not exactly the same as a human, but the content it produces is based on its specific training data, how it was fed the training data, how long it was trained, the size and shape of the network, etc. These are unique characteristics of a model that directly impact what it produces. A model could have a unique proclivity for using specific groups of words, for example.

But yes, you could break the discriminator model, in the same way people disguise their own writing patterns by using synonyms, making different grammar/syntax choices, etc. Building a better evader and building a better detector is an eternal cat and mouse game, but it doesn't reduce the need to participate in this game.


A well trained GAN has 50% chance of finding if the generate image is fake or not. But you can't do imperceptible changes on text like you for images.


> A GAN can absolutely be trained to discriminate between text generated from this model or another model.

Nope. I dare you to do it. Or at least intelligently articulate the model architectures for doing so.

> What's hilarious about it?

It's a bullshit term, firstoff, and calling yourself that is the height of ego. Might as well throw in rockstar, ninja, etc too.


So in the entire field of machine learning, we can't train a model that can identify another model from its output? Just can't be done? And there's absolutely no value in having tools that can identify deep fakes, or content produced by specific open models?

>It's a bullshit term, firstoff, and calling yourself that is the height of ego

I am a 10x engineer though, so I'm sorry if that rubs you the wrong way. Also, you're reading my personal website, so of course I'm going to speak highly of myself :)


> in the entire field of machine learning

... we can't train a model to be 100% correct. There will always be false matches. Another super hard task is confidence estimation - models tend to be super sure of many bad predictions.

In this particular case you're talking about detecting human written texts against stochastic text generation. If you wanted to test if the model regurgitates training data, that would have been easy. But the other way around, to check if it outputs something different from future text, it's a hard, open-ended problem. Especially if you take into consideration the prompts and the additional information they could contain.

It's like testing if I have my keys in the house vs testing if my keys are not outside the house (can't prove an open ended negative). On top of this, the prompts would be like allowing unsupervised random strangers into the house.


That is an interesting idea. The fact that they are characterizing the toxicity of the language relative or other LLMs gives it some credibility. That being said, I just don’t see where the ROI would be in something like that. Seems like a lot of expense for no payoff.

My (unasked for) advice would be to take the 10x engineer stuff off your page. It may be true, but it signals the opposite. Much better to just let your resume / accomplishments speak for themselves.


>That being said, I just don’t see where the ROI would be in something like that. Seems like a lot of expense for no payoff.

I consider these types of models as information weapons, so I wouldn't be surprised if they have some contract/agreement with the US government that they can only release these things to the internet if they have sufficient confidence in their ability to detect them, when they inevitably get used to attack the interests of the US and our allies. I don't know how (or even if) that translates to a financial ROI for Meta.


> Nope. I dare you to do it. Or at least intelligently articulate the model architectures for doing so.

It is obvious that we can in principle try to detect this. People are already attempting to do so [1][2]. I would be very surprised if Facebook and other tech giants are not trying to do that, because they already have a huge problem in their hands from this type of technology.

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049133/ [2] https://github.com/openai/gpt-2-output-dataset/tree/master/d...


How can you identify content generated with them?


I'm not saying that Meta did it, but recent research shows that it is possible and hard to detect - https://arxiv.org/abs/2204.06974 - so if they really wanted to, they could.


That paper is not about fingerprinting the arbitrary output of a specific model, which would allow Meta to track its usage in the results, e.g. tell a genuine text from a fake generated by their model. The paper implies giving the model some specific secret input only known to you.

I think the thread we're in is also based on the similar misunderstanding.


By training a GAN. A trained GAN will be able to accurately guess whether a block of text was produced by this GPT model, some other GPT model, or is authentic.


Just so I understand you properly:

Original Inputs (A) -> NN (Q) -> Output (X)

You are saying you could train something that would take X and identify that it is the product of NN (Q). Even though you don't know A?

So, to simplify and highlight the absurdity: If I made a NN that would complete sentences by putting a full stop on the end of open sentences. You could train something that could detect that separately to a human placed full stop?

(This seems actually impossible, there is an information loss that occurs that can't be recovered)


Can you identify GPT text versus authentic text? If so, then there are features in that text that give it away. It stands to reason that there exist other features in the text, based on the training data the model was fed, and other characteristics of the model, that a discriminator model could use to detect, with some confidence, which model produced the text. A discriminator model which can detect a specific generative model essentially captures its "fingerprint".

An example of some of these features might be the use of specific word pairs around other word pairs. Or a peculiar verb conjugation in the presence of a specific preposition.


If differentiating between real samples and generated ones were as straightforward as "training a GAN", detecting deep fakes would not be as big of a research topic as it is.


The point is that it's possible and we're improving on it every day.


Know any papers where someone has done this with large language models successfully?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: