Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I did something similar to this years ago. I forked a GPL'd project that had been BSD prior to that and wanted to flip it back to BSD. I contacted the people who had contributed since the license change and got most of them to agree to a relicense but was left with ~900 lines of code that wasn't going to change. I worked with an IP attorney and came up with a strategy that worked in my case (but don't blindly apply it to yours):

- I wrote unit tests for every single one of those functions to confirm/validate the behavior.

- Then I deleted all 900 lines of code and committed it.

- Then I wrote code to make the unit tests pass again.

It was painful but that kept it tightly scoped and I could "prove" the behavior hadn't changed AND that I didn't use the original code.

There's an argument that the unit tests could be a "derivative work" but they were not part of the original system, did not change or add functionality to the system, and did not impact its performance so we discounted that concern.

The more pressing was that - as an open source project - and the guy doing the audit, I had reviewed the GPL implementation and had access to it at any time. What helped me there is that I made a point of using more modern language constructs and patterns which improved the performance of those functions by 30-90% and I resolved a number of buggy edge cases and other problems so it was clearly "substantially different" in implementation.

This was never tested in a lawsuit and do NOT take the above as a definitive solution.



Interesting, this does sound a bit like clean-room reverse engineering which is a tried-and-true method for reproduction without breaking copyright, but you having access to and having reviewed the GPL implementation would break that mold.


From the wikipedia clean room page:

> Typically, a clean-room design is done by having someone examine the system to be reimplemented and having this person write a specification. This specification is then reviewed by a lawyer to ensure that no copyrighted material is included. The specification is then implemented by a team with no connection to the original examiners.

Without the disconnection between examiners and implementers, it's only slightly similar to a clean room. And we have new case law to consider: an API may be subject to copyright, and those unit tests are highly suspect under that lens.


> Interesting, this does sound a bit like clean-room reverse engineering which is a tried-and-true method for reproduction without breaking copyright

To me, it seems obvious that if the developer read the code that's getting replaced and reproduced its behavior to the T, by reading it and running it many times... that's the opposite of a clean room implementation. What do you think a dirty room implementation is then?

I'm not saying you are right or wrong, I'm not an IP attorney and I think IP is really boring. I can see how if a developer at a giant company rewrites open source X in Swift, C# or Golang in order to exploit it commercially, there could be a cathedral of opinions that would support, "Okay, this is what is meant by clean room." In the same way that BigCo developers work with their attorneys to file patents for ideas they saw elsewhere and didn't invent all the time. It's one of many possible beliefs about IP, and it can even thrive in reality, but it doesn't mean it is a correct one.


My understanding is that the clean room approach is tried and true and sufficient to avoid copyright infringement. So BigCo and other organizations that worry greatly about liability insist on it.

However I'm unaware of case law indicating that it's strictly necessary. If the final implementation differs enough from the original, no copyright infringement occurs and no one is going to sue anyone, so the steps taken to arrive at the new version are less relevant.


It is very similar in spirit to how mp3 patents were worked around back on the late 90s / early 00s.

Of course copyright and patents are different beasts, so this similarity is probably legally insignificant.


Yeah, adding another engineer to the process would probably make it "cleaner" if you thought the other party was motivated to want to sue you.


It is very similar to what Google did for Java SE libraries too.


That sounds like something I could get an LLM to do. And then of course I can do it iteratively until all the code has been laundered. Maybe that's how Microsoft can justify training on all the GitHub data.


IANAL, but my understanding of copyright law jurisprudence is that using an LLM to automate the process is going to substantially increase the likelihood that you will be found to be infringing.


I think, but I'm not sure, that they mean to write the tests, and then they'd be able to fix the implementation blindly?


Except the detail about what licenses were used for the learning model.

It remains to be tested in court.


It’s entirely possible that the model and all of its outputs will be determined to be derivative works of the training inputs. If that happens then, oh boy, not good things for anyone using it, I’m sure.


Just curious - what was your motivation?


Could you maybe have deleted the unit tests and written new ones based on the new code to be safe? After all you know that the new behavior is good.


What is the project you worked on?


You are not breaching the GPL in the letter but you surely broke it in the spirit.


How?


The point of GPL is to allow end users to benefit from open source, instead of getting closed software or hardware.

In this case caseysoftware clearly benefits from the GPL code that gets replaced. End users get no guarantees.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: