This is basically a crosspost for https://githubcopilotinvestigation.com/. I noticed that some folks in California are considering a lawsuit against Microsoft/OpenAI.
Sections of particular interest:
[W]e inquired privately with Friedman and other Microsoft and GitHub representatives in June 2021, asking for solid legal references for GitHub’s public legal positions … They provided none.
- Software Freedom Conservancy
“You are responsible for ensuring the security and quality of your code. We recommend you take the same precautions when using code generated by GitHub Copilot that you would when using any code you didn’t write yourself. These precautions include rigorous testing, IP [(= intellectual property)] scanning [my emphasis], and tracking for security vulnerabilities.”
Whether or not AI training is fair use under US copyright law is an unsettled question that likely will be fought out in some court battles.
https://www.copyright.gov/rulings-filings/review-board/docs/a-recent-entrance-to-paradise.pdf seems to suggest that the US copyright office believes:
The Board accepts as a threshold matter Thaler’s representation that the Work wasautonomously created by artificial intelligence without any creative contribution from a humanactor
Given that all those AI-generated imagines are based in part on human-generated training data, this seems to be an expressed view that the training data is no "creative contribution from a human actor"
From an AI risk perspective, this seems to be an interesting question. You could limit AI capability by pushing for a law that makes the training data use copyright.
Update: lawsuit filed https://githubcopilotlitigation.com/.