Students using AI to write papers

cr4nberry · on Oct 14, 2022

Let me guess, it's yet another AI article written by someone who's never worked in the field, repeating the same drivel seen in pretty much every other mainstream news outlet

If there's one thing I'll take out of this year, it's that I will NEVER read another AI article written by some journalist that's never even studied the subject matter and worked in the industry.

Every time I see one of these articles, I always read the author's bio before going any further. If I don't see any related degree or expertise, the article goes rightfully in the trash

more_corn · on Oct 14, 2022

I’ve used gpt3 to generate articles. They’re well structured, articulate and make sound points. They’re not good at citations though.

It works even better if you create an outline first and prompt the LLM through that.

The work I’ve generated made me very concerned because it generates undetectable plagiarism/fake work.

im3w1l · on Oct 14, 2022

If I want to know what students are doing in the classroom right now, I ask a teacher not an AI researcher.

Mockapapella · on Oct 14, 2022

Built a website based on this concept a few years ago after GPT-2 came out: https://emberlightwriter.com/

Tried to make a business out of it. Turns out high school and college students aren't willing to pay for a lot of things, including a subscription service to write essays. Who'd've thunk it (I say in jest as this should have been obvious to me)? Ended up closing up shop due to a variety of factors. Probably could have pivoted, but I'm on to bigger and better things nowadays.

tonightstoast · on Oct 14, 2022

Eh they pay for Chegg so there is probably some market there. GPT-2 may not have been quite good enough though.

beardyw · on Oct 14, 2022

What's the point? Cheating at education is like having a powered exercise bike.

floren · on Oct 14, 2022

Look at any HN discussion which touches on education. Observe the dozens who come out of the woodwork to comment on how useless e.g. literature classes are, and how students would be so much better served by Lerning 2 Code. These people would use AI to write English papers in an instant, because they consider the subject worthless anyway.

hoosieree · on Oct 14, 2022

Plagiarism is a problem in CS too. Some fraction of students will get CS degrees without learning CS concepts. I assume that's part of why tech companies do Leetcode interview questions.

Copilot is easy to catch because it pastes training data verbatim. But we need to prepare for the future where students can say "hey GPT-5, make a CRUD app in Go in the style of Alan Kay. Here are the unit tests".

NLPlatypus · on Oct 14, 2022

Copilot does more than that. Have you tried it yet?

hoosieree · on Oct 14, 2022

Not yet. I know it can replace variable names, but MOSS can catch that already. What features are you thinking of?

hoosieree · on Oct 14, 2022

Anyone have tips on detecting whether a sequence of bytes was generated by a tool like GPT-3? Not necessarily just for plagiarism detection, I'm more interested in things like detecting malware and trojans.

0x20cowboy · on Oct 14, 2022

I’ve been thinking about this lately, and I am just getting into AI and NNs so take this with some salt…

A service could easily add something after the fact - like a water mark, but I think if it was mixed in with the content it would get learned and washed out when training or validating the output.

For something more fun you might be able to build a classifier per student that can determine if a student made the text. You’d have to build it based on the students past work - you could then get the probability that that student wrote that paper.

That classifier could also, maybe, be able to find the “voice” of gpt3 vs something else.

You could game that system by using AI from the start, or slowly adding in more and more AI paragraphs, in say, high school - where most of the training data for a student would come from.

On the flip side a clever student could build their own “AI based content generator” (not sure what to call that) and use their own content - which is my latest little side project.

Valgrim · on Oct 14, 2022

How does malware and trojans relate to text generation?

hoosieree · on Oct 14, 2022

I'm interested in identifying the tool used to generate the text.

For example, recently I used a word2vec-like approach on obfuscated assembly code. With a relatively small model this can get reasonable answers to questions like "which of these obfuscations was done to the code" and "which of these un-obfuscated functions is the best match for this obfuscated code".

So I am wondering what approaches people are using to detect when media was generated by a machine.

My research interest is in the binary/malware niche, but I'm using work from NLP and computer vision. And I figure other people are interested in answering "was this artwork made by a human" or things like "was my copyrighted artwork used to generate this AI-generated image".