This is basically a crosspost for I noticed that some folks in California are considering a lawsuit against Microsoft/OpenAI. 


  • Copilot is trained on open source software.
  • Copilot doesn't respect the licensing agreements of that software.
  • Copilot doesn't have a clear fair use argument for doing so.
  • By accepting copilot suggestions, you are potentially violating licensing agreements yourself.

Sections of particular interest:

[W]e inquired pri­vately with Fried­man and other Microsoft and GitHub rep­re­sen­ta­tives in June 2021, ask­ing for solid legal ref­er­ences for GitHub’s pub­lic legal posi­tions … They pro­vided none.

- Software Freedom Conservancy

“You are respon­si­ble for ensur­ing the secu­rity and qual­ity of your code. We rec­om­mend you take the same pre­cau­tions when using code gen­er­ated by GitHub Copi­lot that you would when using any code you didn’t write your­self. These pre­cau­tions include rig­or­ous test­ing, IP [(= intel­lec­tual prop­erty)] scan­ning [my emphasis], and track­ing for secu­rity vul­ner­a­bil­i­ties.”


Whether or not AI training is fair use under US copyright law is an unsettled question that likely will be fought out in some court battles. seems to suggest that the US copyright office believes:

The Board accepts as a threshold matter Thaler’s representation that the Work was
autonomously created by artificial intelligence without any creative contribution from a human

Given that all those AI-generated imagines are based in part on human-generated training data, this seems to be an expressed view that the training data is no "creative contribution from a human actor"

From an AI risk perspective, this seems to be an interesting question. You could limit AI capability by pushing for a law that makes the training data use copyright.

