Some Thoughts on AI Art

[-]Kaj_Sotala3y271

Deep learning systems require huge amounts of data to approach human-level generalizations. This indicates, to an extent, that what's learned from a single example is "shallow". Perhaps this could be seen as closer to plagiarism.

The lawsuit against Stable Diffusion argues that SD works by amassing a huge library of images that the system then interpolates between in order to generate the desired kinds of images, but struggles to create the kinds of image combinations that don't appear in the training data and thus can't be interpolated between. Some of my friends have also remarked on this, e.g. that there are many contexts where it's a struggle to get the system to draw women in a non-sexualized way. (See also Scott Alexander on the way that DALL-E conflates style and content.)

This is then different from the kind of learning that a human artist does - humans don't only store a huge library of reference photos in their mind and interpolate between them, but they actually get a conceptual understanding of the world as well. Because of that, they could easily draw pictures even of things they've never seen before ("a dog wearing a baseball cap while eating ice cream" is the example used in the complaint). In contrast, systems like Stable Diffusion are limited to only being able to draw things that are a sufficiently close match to images they've already seen. In that sense, a human artist who draws the kind of a picture that would otherwise not have existed in SD's training set is much more directly enabling the system to draw those kinds of pictures, than they would be enabling another human artist to do the same. (Or so the argument goes.)

From the complaint:

Ho showed how a latent image could be interpolated—meaning, blended mathematically—to produce new derivative images. Rather than combine two images pixel by pixel—which gives unappealing results—Ho showed how Training Images can be stored in the diffusion model as latent images and then interpolated as a new latent image. This interpolated latent image can then be converted back into a standard pixel-based image.
The diagram below, taken from Ho’s paper, shows how this process works, and demonstrates the difference in results between interpolating pixels and interpolating latent images.
In the diagram, two photos are being blended: the photo on the left labeled “Source x0,” and the photo on the right labeled “Source x'0.”
The image in the red frame has been interpolated pixel by pixel, and is thus labeled “pixel-space interpolation.” This pixel-space interpolation simply looks like two translucent face images stacked on top of each other, not a single convincing face.
The image in the green frame, labeled “denoised interpolation”, has been generated differently. In that case, the two source images have been converted into latent images (illustrated by the crooked black arrows pointing upward toward the label “Diffused source”). Once these latent images have been interpolated (represented by the green dotted line), the newly interpolated latent image (represented by the smaller green dot) has been reconstructed into pixels (a process represented by the crooked green arrow pointing downward to a larger green dot). This process yields the image in the green frame. Compared to the pixel-space interpolation, the difference is apparent: the denoised blended interpolation looks like a single convincing human face, not an overlay or combination of images of two faces. [...]
Despite the difference in results, these two modes of interpolation are equivalent: they both generate derivative works from the source images. In the pixel-space interpolation (the red-framed image), the source images themselves are being directly interpolated to make a derivative image. In the denoised interpolation (the green-framed image), (1) the source images are being converted to latent images, which are lossy-compressed copies; (2) those latent images are being interpolated to make a derivative latent image; and then (3) this derivative latent image is decompressed back into a pixel-based image.
In April 2022, the diffusion technique was further improved by a team of researchers led by Robin Rombach at Ludwig Maximilian University of Munich. These ideas were introduced in his paper “High-Resolution Image Synthesis with Latent Diffusion Models.”
Rombach is also employed by Stability as one of the primary developers of Stable Diffusion, which is a software implementation of the ideas in his paper.
Rombach’s diffusion technique offered one key improvement over previous efforts. Rombach devised a way to supplement the denoising process by using extra information, so that latent images could be interpolated in more complex ways. This process is called conditioning. The most common tool for conditioning is short text descriptions, previously introduced as Text Prompts, that might describe elements of the image, e.g.—“a dog wearing a baseball cap while eating ice cream”. This metric uses Text Prompts as conditioning data to select latent images that are already associated with text captions indicating they contain “dog,” “baseball cap,” and “ice cream.” The text captions are part of the Training Images, and were scraped from the websites where the images themselves were found.
The resulting image is necessarily a derivative work, because it is generated exclusively from a combination of the conditioning data and the latent images, all of which are copies of copyrighted images. It is, in short, a 21st-century collage tool.
The result of this conditioning process may or may not be a satisfying or accurate depiction of the Text Prompt. Below is an example of output images from Stable Diffusion (via the DreamStudio app) using this Text Prompt—“a dog wearing a baseball cap while eating ice cream”. All these dogs in the resulting images seem to be wearing baseball hats. Only the one in the lower left seems to be eating ice cream. The two on the right seem to be eating meat, not ice cream.
In general, none of the Stable Diffusion output images provided in response to a particular Text Prompt is likely to be a close match for any specific image in the training data. This stands to reason: the use of conditioning data to interpolate multiple latent images means that the resulting hybrid image will not look exactly like any of the Training Images that have been copied into those latent images.
But it is also true that the only thing a latent-diffusion system can do is interpolate latent images into hybrid images. There is no other source of visual information entering the system.
Every output image from the system is derived exclusively from the latent images, which are copies of copyrighted images. For these reasons, every hybrid image is necessarily a derivative work.
A latent-diffusion system can never achieve a broader human-like understanding of terms like “dog,” “baseball hat,” or “ice cream.” Hence, the use of the term “artificial intelligence” in this context is inaccurate.
A latent-diffusion system can only copy from latent images that are tagged with those terms. The system struggles with a Text Prompt like “a dog wearing a baseball cap while eating ice cream” because, though there are many photos of dogs, baseball caps, and ice cream among the Training Images (and the latent images derived from them) there are unlikely to be any Training Images that combine all three.
A human artist could illustrate this combination of items with ease. But a latentdiffusion system cannot because it can never exceed the limitations of its Training Images.
In practice, the quality of the latent-diffusion images depends entirely on the breadth and quality of the Training Images used to generate the latent images. If that weren’t true, then it wouldn’t matter where Stable Diffusion (or any other AI-Image Product) got its Training Images.
In actuality, the provenance of an AI-Image-Product’s Training Images matters very much. According to Emad Mostaque, CEO of Stability, Stable Diffusion has “compress[ed] the knowledge of over 100 terabytes of images.” Though the rapid success of Stable Diffusion has been partly reliant on a great leap forward in computer science, it has been even more reliant on a great leap forward in appropriating copyrighted images.

[-][anonymous]3y1611

What's amusing is before this case ever even sees a trial, the above limitations may be overcome. Feedback from a system that checks the output image actually satisfies the prompt and that humans have the correct number of fingers for instance.

[-]Iknownothing3y21

That's horrifying

[-]25Hour3y90

Interestingly i believe this is a limitation that one of the newest (as yet unreleased) diffusion models has overcome, called DeepFloyd; a number of examples have been teased already, such as the following Corgi sitting in a sushi doghouse:

https://twitter.com/EMostaque/status/1615884867304054785?t=jmvO8rvQOD1YJ56JxiWQKQ&s=19

As such the quoted paragraphs surprised me as an instance of a straightforwardly falsifiable claim in the documents.

[-]DavidAlmonte3y2011

"After all: the purpose of copyright law is, to a very large extent, to preserve the livelihood of intellectual property creators, who would otherwise have limited ability to profit from their own works due to the ease of reproducing it once made. Modern AI systems are threatening this, whether or not they technically violate copyright."

While it's probably true that copyright/patent/IP law generally in effect helps "preserve the livelihood of intellectual property creators," it's a mistake IMO to see this as more than merely instrumental in preserving incentives for more art/inventions/technology which, but for a temporary monopoly (IP protections), would be financially unprofitable to create. Additionally, this view ignores art consumers, who out-number artists by several orders of magnitude. It seems unfair to orient so much of the discussion of AI art's effects on the smaller group of people who currently create art.

IMO the key questions (both morally & legally) should fall into two camps:

Value Creation

I.e, whether, in a regime where to training algos on copyrighted works is permissible, there are

higher volumes of art to consume/appreciate
"better"/more aesthetically pleasing art

than in a regime where people can only train AI art/inventions on public domain & proprietary art/inventions.

No. 2 seems pretty clearly true, but I'm struggling to articulate why. No 1. Seems somewhat conditional on No 2, since I suspect there would be less art created if the AI art tools create "worse" art.

Enforcement Costs

I.e whether - conditional on copyright "infringing algos yielding net societal equal or lower terminal art//innovation volume and/or equal/diminished quality - the detection and enforcement costs of techniques to stop the creation of art from algorithms trained on copyrighted works are sufficiently low.

I doubt there's a lot of societal value in creating an expensive cottage industry of copyright inspectors whose end output degrades the aggregate quality of humanity's art-stock. I don't have priors for the costs of such an enforcement mechanism, but IP lawyers seem expensive & regulatory orgs can get bloated pretty easily.

[-]Lycaos King3y3-4

While it's probably true that copyright/patent/IP law generally in effect helps "preserve the livelihood of intellectual property creators," it's a mistake IMO to see this as more than merely instrumental in preserving incentives for more art/inventions/technology which, but for a temporary monopoly (IP protections), would be financially unprofitable to create. Additionally, this view ignores art consumers, who out-number artists by several orders of magnitude. It seems unfair to orient so much of the discussion of AI art's effects on the smaller group of people who currently create art.

I think you've got this precisely backwards. The concept of laws as such only makes sense in a deontological framework where the fruits of intellectual labor belong to the individual who produced them. Otherwise instead of complicated rules about temporary monopolies and intellectual property, the government would just allow any use which could be proven in court to be net positive in utility, regardless of the wishes of the original creator.

Whether or not you think this is a bad idea, I think it clear that society at large doesn't agree with the framework you've proposed for evaluating IP and copyright.

[-]Samuel Hapák3y126

Actually you got it backwards. The so called intellectual property doesn’t have typical attributes of property:

– exclusivity: if I take it from you, you don’t have it anymore

– enforceability: it’s not trivial to even find out my “art was stolen”

– independence: I can violate your IP by accident even if I never seen any of your works (typical for patents), this can’t happen with proper property

– clear definition: you usually don’t need courts to decide whether I actually took your car or not.

Besides that, IP is in direct conflict with proper property rights (right to use your own property freely).

However, having IP is a practical way of overcoming the black passenger problem. But that’s the reason it was created in the first place. That’s the reason it actually expires after some time and works become a part of “public domain”. (Can you imagine a car becoming a part of public domain? See the difference?)

Now, even the US constitution is aware of this and explicitly states “progress of science and arts” as the only lawful reason to enact copyright.

[The Congress shall have power] “To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.

[-]Andrew Currall3y20

"After all: the purpose of copyright law is, to a very large extent, to preserve the livelihood of intellectual property creators, who would otherwise have limited ability to profit from their own works due to the ease of reproducing it once made. Modern AI systems are threatening this, whether or not they technically violate copyright."

Yes, this is 100% backwards. The purpose of copyright law is to incetivise the production of art so that consumers of art can benefit from it. It incidentally protects artists livelihoods, but that is absolutely not it's main purpose.

We only want to protect the livelihood of artists because humans enjoy consuming art- the consumption is the ultimate point. We don't have laws protecting the livelihood of people who throw porridge at brick walls because we don't value that activity. We also don't have laws protecting the livelihood of people who read novels, because while lots of people enjoy doing that, other people don't value the activity.

If we can get art produced without humans invovled, that is 100% a win for society. In the short term it puts a few people out of work, which is unfortunate, but short-lived. The fact that AI art is vastly more efficiently-produced than human art is a good thing, that we should be embracing.

[-]abramdemski3y30

I think you're mainly correct, but it's a bit of both. We have laws and subsidies protecting the livelihoods of farmers. The way democracy works, the winning coalition will cater to its constituents by passing laws which benefit those folks. So plausibly, the winning coalitions which passed/protected intellectual property laws included some support by artists, too. (This is especially plausible if you think about how Disney influences copyright law.)

Given the current reaction to AI art, I think it's plausible (but very uncertain) that enough people would side with artists here to democratically protect artists now/soon. People enjoy consuming art, but doing so also creates some degree of emotional connection to the artists themselves (a parasocial relationship).

[-]abramdemski3y20

While it's probably true that copyright/patent/IP law generally in effect helps "preserve the livelihood of intellectual property creators," it's a mistake IMO to see this as more than merely instrumental in preserving incentives for more art/inventions/technology which, but for a temporary monopoly (IP protections), would be financially unprofitable to create.

I'm not sure what you're saying here! My implication was that we should view the law as instrumental rather than terminally valuing the law as it currently stands. I don't know much about the law, but I also have the impression that judges will think about it this way when considering how to respond to this new situation.

Additionally, this view ignores art consumers, who out-number artists by several orders of magnitude. It seems unfair to orient so much of the discussion of AI art's effects on the smaller group of people who currently create art.

True! In the past, protecting the profitability of artists this way was also (for the most part) to the benefit of consumers, since profitability of art determined how much was created and mass-distributed. Especially before the internet.

No. 2 seems pretty clearly true, but I'm struggling to articulate why. No 1. Seems somewhat conditional on No 2, since I suspect there would be less art created if the AI art tools create "worse" art.

AI art generally seems like a lot of #1 and only a little #2, right now. Obviously the quality will keep getting better.

If training on copyrighted work was outlawed tomorrow, then I think we would see less AI art in the very short term (so negative impact to #1 temporarily), and in the medium term, less human artists out of a job (so, somewhat temporary positive impact to #2).

In the longer term, I think it's not going to matter very much, since the technology will find ways to improve one way or another.

Enforcement Costs

I personally imagine enforcement costs will be low, because training these systems requires large amounts of money and is accomplished by a relatively small number of orgs which will mostly be self-policing once the legal situation is clear (because the risk of investing that much money, and then having a court tell you to throw the result away, is going to be mostly unacceptable).

But I could easily be incorrect.

[-]Q Home2y10

Additionally, this view ignores art consumers, who out-number artists by several orders of magnitude. It seems unfair to orient so much of the discussion of AI art's effects on the smaller group of people who currently create art.

What is the greater framework behind this argument? "Creating art" is one of the most general potentials a human being can realize. With your argument we could justify chopping off every human potential because "there's a greater amount of people who don't care about realizing it".

I think deleting a key human potential (and a shared cultural context) affects the entire society.

[-]Dagon3y170

It's funny that short-timeline-believers tend not to care much about the topic, as it'll be very minor very soon. And long-timeline-believers think we've got at least a little breathing room to sort it out using slow human processes for social and legal norm adjustment.

I put myself somewhere in between. We probably don't have the 2+ (human) generations it takes to societally absorb a giant change, but it's not really a crisis yet. We haven't seen any significant court case outcomes NOR legislation that needs court testing (I really am looking forward to the Getty case, though).

Artists (and other "creatives") are worried, far more concerned that their future artistic positioning and revenue will be reduced by "unfair" competition, than that their copyright exclusivity for past work will be violated. This seems to me to be the most important aspect: the future of human work-value (especially non-elite work). I think it's surprising a lot of us that the "creative" work seems to be under more attack than the "rote" work (driving, warehouse, etc.). I don't know what the new equilibrium will be, and I can't see any simple solutions.

It's deeply unfortunate that the US no longer has any ability to actually discuss, compromise, and experiment on policy. Culture wars take over too soon, and this prevents any sensible small-steps or even measurement of such changes.

[-]the gears to ascension3y6-4

I'm not sure what short timeline bettors you're thinking of here, but I personally think that ai art is pretty much the only form the ai safety problem will ever take. Art is a generative model's paperclip.

[-]Iknownothing3y20

In the US, the common person has little to no power. I hope the artists manage to get a victory. But I'm not counting on it.

[-]25Hour3y62

Reasonable points, all! I agree that the conflation of legality and morality has warped the discourse around this; in particular the idea of Stable Diffusion and such regurgitating copyrighted imagery strikes me as a red herring, since the ability to do this is as old as the photocopier and legally quite well-understood.

It actually does seem to me, then, that style copying is a bigger problem than straightforward regurgitation, since new images in a style are the thing that you would ordinarily need to go to an artist for; but the biggest problem of all is that fundamentally all art styles are imperfect but pretty good substitutes in the market for all other art styles.

(Most popular of all the art styles-- to judge by a sampling of images online-- is hyperrealism, which is obviously a style that nobody can lay either legal OR moral claim to.)

So i think that if Stability tomorrow came out with a totally unimpeachable version of SD with no copyrighted data of any kind (but with a similarly high quality of output) we would have, essentially, the same set of problems for artists.

[-]Brendan Long3y20

I'm confused about how style copying is a new problem. You can trivially find people willing a capable of drawing convincing Disney or specific-anime-studio art, and there's an entire town in China dedicated to making paintings in famous styles. This has existed for a long time and the moral panic is just because now scary computers are doing it.

[-]abramdemski3y20

So i think that if Stability tomorrow came out with a totally unimpeachable version of SD with no copyrighted data of any kind (but with a similarly high quality of output) we would have, essentially, the same set of problems for artists.

I don't think this is true in the short term. Artists are currently dealing with issues like scam social media accounts which copy their style and claim to be the artist. (Not sure how big this is, I only heard about this as a rumor -- but it's something that is now possible, where before you'd only be able to do something like this by re-posting existing works.)

[-]RomanHauksson3y40

Very well written, thank you! All of the writing about AI-generated art that I've stumbled across has been either one-sentence talking points (e.g. "it's stealing art without artists' permission" or "training an AI model is just like a human looking at past art") or hedgy arguments from news articles ("some artists are concerned that...").

It's refreshing to see a serious, grounded look at the ethics of AI art. I was thinking about writing my own post along the same vein, but this covers most of what I would have touched on (and more).

[-]the gears to ascension3y00

Video embeds for relevant videos - first the stilted conversation between random frantic ai nerd who is trying to clarify that there's nothing that can be done to stop ai and we better hurry (I agree with him, he didn't make it clear enough how hopeless it is to stop it, though, too many people are like "why not just not?" and don't understand why that's ... nigh on not permitted by physics)

And a couple of related videos I'd recommend, both from the past couple of days. Both are best watched 2x speed with captions imo. Or toss them in whisper and just read the video. It's good research regardless, these are just blog posts, it's just that most blog posts are videos because videos get more normie engagement. Sorry.

(And as usual I try to be a hub of "stuff people should have been aware of already", spider links manually from my userpage or dm me for more links. Basically, holy shit check out IPAM.)

[-]Iknownothing3y-10

We should give artists better tools rather than make tools to replace artists.

^{^}

https://en.m.wikipedia.org/wiki/Stable_Diffusion#Usage

I'm not sure exactly which systems were and were not trained on copyrighted material; and in some cases, I think the information is not publicly available. The fact that most/all modern deep-learning image-generation tools I am aware of can copy the styles of a broad variety of specific artists when asked seems like significant evidence that most/all of these systems have been trained on copyrighted material.

But at least we know that Stable Diffusion has been, since its data-set is public.

^{^}

https://techcrunch.com/2022/12/13/image-generating-ai-can-copy-and-paste-from-training-data-raising-ip-concerns/

I initially thought that modern ML (meaning, very very large transformer networks) was safe from this kind of risk because it showed an ability to generalize very well, and be very creative when output was generated by random sampling.

However, it turns out that modern ML memorizes its data quite well, meaning that it achieves extremely low loss when the same work is shown to it again during training. This means it's possible for it to generate stuff directly from its training data, just by sampling.

On the pro-AI-art side, I've seen the argument made that modern ML can't be memorizing its training data, since the size of the neural network (in bytes) is far far smaller than the size of the data-set. But this seems to be wrong.

Obviously, it's possible to compress the training data a lot. Obviously, it's possible for the network to memorize some things but not all.

But the most persuasive argument is when we re-generate images almost precisely, with only a text prompt.

^{^}

I'm not sure exactly which systems have safeguards, or lack them. There was discussion of DALL-E

^{^}

Being able to reproduce the style of a specific artist hurts the livelihood of that artist in ways that AI art in general does not. It allows scammers to pretend to be that artist, for example on social media websites. It also allows companies to produce products which use the style, where previously they would be forced to pay the original artist (or a good human imitator, which can be harder to find and might not save any money).

^{^}

Of course, the reality is we don't yet know what's legal or illegal, because this hasn't yet been tested in court.

^{^}

Daniel Kokotajlo made an argument similar to this.

Human artist learning from (copyrighted) works:	AI learning from (copyrighted) works:
Not very output-scalable. One human can only do so much work.	Very very output-scalable. Once you've trained a network, producing work is relatively inexpensive. One AI can disrupt the whole market. This is much less of a "level playing field".
Not very input-scalable. One human can only see so much media.	Much more input-scalable. Modern systems are trained on a significant fraction of human-produced media. Again, less of a "level playing field".
Humans form rich generalizations from a small number of examples.	Deep learning systems require huge amounts of data to approach human-level generalizations. This indicates, to an extent, that what's learned from a single example is "shallow". Perhaps this could be seen as closer to plagiarism.
Humans can understand and avoid the idea of copyright violation, and are often cautious to "not steal ideas" even beyond the legal requirements. With some notable exceptions, humans are really trying to create unique works.	Most current AI systems have no safeguards with respect to copyright violations, and certainly don't have the human idea of "not stealing ideas". Indeed, to a large extent, these systems are being trained to mimic their input data as closely as possible.
It's a human, gosh darn it!	It's not a human, gosh darn it! As anthropocentric as the idea may be, it's pretty standard for the law to treat humans differently.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

75

Some Thoughts on AI Art

75

75

Is this the most important conversation to be having about AI?

The basic issues.

Some initial arguments.

It's not illegal.^[5]

It's what humans do.

There's no precedent for calling this immoral.

Counter #1: But I vaguely felt like there was a consensus on this?!

Counter #2: There's a clear moral consensus about user data.

The Case For Dialogue?

75

Some Thoughts on AI Art

75

75

Is this the most important conversation to be having about AI?

The basic issues.

Some initial arguments.

It's not illegal.[5]

It's what humans do.

There's no precedent for calling this immoral.

Counter #1: But I vaguely felt like there was a consensus on this?!

Counter #2: There's a clear moral consensus about user data.

The Case For Dialogue?

It's not illegal.^[5]