Let me guess, it's yet another AI article written by someone who's never worked in the field, repeating the same drivel seen in pretty much every other mainstream news outlet
If there's one thing I'll take out of this year, it's that I will NEVER read another AI article written by some journalist that's never even studied the subject matter and worked in the industry.
Every time I see one of these articles, I always read the author's bio before going any further. If I don't see any related degree or expertise, the article goes rightfully in the trash
Tried to make a business out of it. Turns out high school and college students aren't willing to pay for a lot of things, including a subscription service to write essays. Who'd've thunk it (I say in jest as this should have been obvious to me)? Ended up closing up shop due to a variety of factors. Probably could have pivoted, but I'm on to bigger and better things nowadays.
Look at any HN discussion which touches on education. Observe the dozens who come out of the woodwork to comment on how useless e.g. literature classes are, and how students would be so much better served by Lerning 2 Code. These people would use AI to write English papers in an instant, because they consider the subject worthless anyway.
Plagiarism is a problem in CS too. Some fraction of students will get CS degrees without learning CS concepts. I assume that's part of why tech companies do Leetcode interview questions.
Copilot is easy to catch because it pastes training data verbatim. But we need to prepare for the future where students can say "hey GPT-5, make a CRUD app in Go in the style of Alan Kay. Here are the unit tests".
Anyone have tips on detecting whether a sequence of bytes was generated by a tool like GPT-3? Not necessarily just for plagiarism detection, I'm more interested in things like detecting malware and trojans.
I’ve been thinking about this lately, and I am just getting into AI and NNs so take this with some salt…
A service could easily add something after the fact - like a water mark, but I think if it was mixed in with the content it would get learned and washed out when training or validating the output.
For something more fun you might be able to build a classifier per student that can determine if a student made the text. You’d have to build it based on the students past work - you could then get the probability that that student wrote that paper.
That classifier could also, maybe, be able to find the “voice” of gpt3 vs something else.
You could game that system by using AI from the start, or slowly adding in more and more AI paragraphs, in say, high school - where most of the training data for a student would come from.
On the flip side a clever student could build their own “AI based content generator” (not sure what to call that) and use their own content - which is my latest little side project.
I'm interested in identifying the tool used to generate the text.
For example, recently I used a word2vec-like approach on obfuscated assembly code. With a relatively small model this can get reasonable answers to questions like "which of these obfuscations was done to the code" and "which of these un-obfuscated functions is the best match for this obfuscated code".
So I am wondering what approaches people are using to detect when media was generated by a machine.
My research interest is in the binary/malware niche, but I'm using work from NLP and computer vision. And I figure other people are interested in answering "was this artwork made by a human" or things like "was my copyrighted artwork used to generate this AI-generated image".
If there's one thing I'll take out of this year, it's that I will NEVER read another AI article written by some journalist that's never even studied the subject matter and worked in the industry.
Every time I see one of these articles, I always read the author's bio before going any further. If I don't see any related degree or expertise, the article goes rightfully in the trash