This is basically a crosspost for https://githubcopilotinvestigation.com/. I noticed that some folks in California are considering a lawsuit against Microsoft/OpenAI. 

tl;dr:

  • Copilot is trained on open source software.
  • Copilot doesn't respect the licensing agreements of that software.
  • Copilot doesn't have a clear fair use argument for doing so.
  • By accepting copilot suggestions, you are potentially violating licensing agreements yourself.

Sections of particular interest:

[W]e inquired pri­vately with Fried­man and other Microsoft and GitHub rep­re­sen­ta­tives in June 2021, ask­ing for solid legal ref­er­ences for GitHub’s pub­lic legal posi­tions … They pro­vided none.

- Software Freedom Conservancy

“You are respon­si­ble for ensur­ing the secu­rity and qual­ity of your code. We rec­om­mend you take the same pre­cau­tions when using code gen­er­ated by GitHub Copi­lot that you would when using any code you didn’t write your­self. These pre­cau­tions include rig­or­ous test­ing, IP [(= intel­lec­tual prop­erty)] scan­ning [my emphasis], and track­ing for secu­rity vul­ner­a­bil­i­ties.”

- https://docs.github.com/en/copilot/overview-of-github-copilot/about-github-copilot#using-github-copilot

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 5:13 AM

Whether or not AI training is fair use under US copyright law is an unsettled question that likely will be fought out in some court battles. 

https://www.copyright.gov/rulings-filings/review-board/docs/a-recent-entrance-to-paradise.pdf seems to suggest that the US copyright office believes:

The Board accepts as a threshold matter Thaler’s representation that the Work was
autonomously created by artificial intelligence without any creative contribution from a human
actor

Given that all those AI-generated imagines are based in part on human-generated training data, this seems to be an expressed view that the training data is no "creative contribution from a human actor"

From an AI risk perspective, this seems to be an interesting question. You could limit AI capability by pushing for a law that makes the training data use copyright.