I did something similar to this years ago. I forked a GPL'd project that had bee...

dtech · on June 10, 2023

Interesting, this does sound a bit like clean-room reverse engineering which is a tried-and-true method for reproduction without breaking copyright, but you having access to and having reviewed the GPL implementation would break that mold.

klyrs · on June 10, 2023

From the wikipedia clean room page:

> Typically, a clean-room design is done by having someone examine the system to be reimplemented and having this person write a specification. This specification is then reviewed by a lawyer to ensure that no copyrighted material is included. The specification is then implemented by a team with no connection to the original examiners.

Without the disconnection between examiners and implementers, it's only slightly similar to a clean room. And we have new case law to consider: an API may be subject to copyright, and those unit tests are highly suspect under that lens.

doctorpangloss · on June 10, 2023

> Interesting, this does sound a bit like clean-room reverse engineering which is a tried-and-true method for reproduction without breaking copyright

To me, it seems obvious that if the developer read the code that's getting replaced and reproduced its behavior to the T, by reading it and running it many times... that's the opposite of a clean room implementation. What do you think a dirty room implementation is then?

I'm not saying you are right or wrong, I'm not an IP attorney and I think IP is really boring. I can see how if a developer at a giant company rewrites open source X in Swift, C# or Golang in order to exploit it commercially, there could be a cathedral of opinions that would support, "Okay, this is what is meant by clean room." In the same way that BigCo developers work with their attorneys to file patents for ideas they saw elsewhere and didn't invent all the time. It's one of many possible beliefs about IP, and it can even thrive in reality, but it doesn't mean it is a correct one.

zugi · on June 11, 2023

My understanding is that the clean room approach is tried and true and sufficient to avoid copyright infringement. So BigCo and other organizations that worry greatly about liability insist on it.

However I'm unaware of case law indicating that it's strictly necessary. If the final implementation differs enough from the original, no copyright infringement occurs and no one is going to sue anyone, so the steps taken to arrive at the new version are less relevant.

dspillett · on June 10, 2023

It is very similar in spirit to how mp3 patents were worked around back on the late 90s / early 00s.

Of course copyright and patents are different beasts, so this similarity is probably legally insignificant.

emodendroket · on June 10, 2023

Yeah, adding another engineer to the process would probably make it "cleaner" if you thought the other party was motivated to want to sue you.

maxloh · on June 10, 2023

It is very similar to what Google did for Java SE libraries too.

Y_Y · on June 10, 2023

That sounds like something I could get an LLM to do. And then of course I can do it iteratively until all the code has been laundered. Maybe that's how Microsoft can justify training on all the GitHub data.

jcranmer · on June 10, 2023

IANAL, but my understanding of copyright law jurisprudence is that using an LLM to automate the process is going to substantially increase the likelihood that you will be found to be infringing.

foota · on June 10, 2023

I think, but I'm not sure, that they mean to write the tests, and then they'd be able to fix the implementation blindly?

pjmlp · on June 10, 2023

Except the detail about what licenses were used for the learning model.

It remains to be tested in court.

DowsingSpoon · on June 11, 2023

It’s entirely possible that the model and all of its outputs will be determined to be derivative works of the training inputs. If that happens then, oh boy, not good things for anyone using it, I’m sure.

earhart · on June 10, 2023

Just curious - what was your motivation?

plonk · on June 10, 2023

Could you maybe have deleted the unit tests and written new ones based on the new code to be safe? After all you know that the new behavior is good.

maxloh · on June 10, 2023

What is the project you worked on?

goodpoint · on June 10, 2023

You are not breaching the GPL in the letter but you surely broke it in the spirit.

Thiez · on June 11, 2023

goodpoint · on June 11, 2023

The point of GPL is to allow end users to benefit from open source, instead of getting closed software or hardware.

In this case caseysoftware clearly benefits from the GPL code that gets replaced. End users get no guarantees.