I didn't try original BERT at all because I didn't get good results from any LLM...

stavros · 2025-04-30T00:21:28 1745972488

Would you happen to know of any resources for how to distill a ModernBERT model out of a larger one? I'm interested in doing exactly what you did, but I don't know how to start.

philipkglass · 2025-04-30T01:07:42 1745975262

I was trying to identify "evergreen" and "time-sensitive" kinds of writing -- basically, I wanted to figure out if web pages captured in 2016 would still have content that's interesting to read today or if the passage of time would have rendered them irrelevant.

Here's the training code that I used to fine-tune ModernBERT from the ~5000 pages I had labeled with Llama 3.3. It should be a good starting point if you have your own fine-tuning task like this. If you can get away with a smaller context than I used here, it will be much faster and the batches can be larger (requires experimentation).

https://pastebin.com/Saq1EyAB

stavros · 2025-04-30T08:49:58 1746002998

Thank you!