Five Theses on AI Art

jenn

1. We've Been On This Ride Before

Virginia Woolf, writing at the dawn of cinema (1926), expresses doubt about whether or not this new medium has any legs:

"Anna [Karenina] falls in love with Vronsky” – that is to say, the lady in black velvet falls into the arms of a gentleman in uniform and they kiss with enormous succulence, great deliberation, and infinite gesticulation, on a sofa in an extremely well-appointed library, while a gardener incidentally mows the lawn. So we lurch and lumber through the most famous novels of the world. So we spell them out in words of one syllable, written, too, in the scrawl of an illiterate schoolboy. A kiss is love. A broken cup is jealousy. A grin is happiness. Death is a hearse. None of these things has the least connexion with the novel that Tolstoy wrote, and it is only when we give up trying to connect the pictures with the book that we guess from some accidental scene – like the gardener mowing the lawn – what the cinema might do if left to its own devices. But what, then, are its devices? If it ceased to be a parasite, how would it walk erect?"

Reviewing clips from Anna Karenina (1911)^[1], you can see her point. Every single scene is shot from an awkward, middle-ish distance. The composition is terrible, the movement is jittery, there really aren't that many pixels to look at and it's monochrome to boot.

The camera is still, even when action takes place in different locations around the set.

It reads (watches?) more like a bootleg recording of a stage play than a movie as we would know one today. Not only were early filmmakers overly focused on adapting classical works of literature, it was doing so through emulating adjacent, well established mediums, instead of exploring the boundaries of its own. The incentives are understandable there: you want to exploit the established market, you don't want to do something too weird and scare the ~~hoes~~ investors, "emulate x perfectly" is such a wonderfully clear win condition.

I don't blame Virginia Woolf for doubting if cinema has any devices at all. But the film industry slowly got their act together, and by the time they made my comfort movie Sissi in 1955, they had managed to invent things like close ups, and shooting scenes from more than one angle, and also colour and sound.

And then by the time they shot Barbie (2024), they had invented more things, like zooming and panning the camera, shenanigans with body doubles so they don't need both of their very expensive movie stars to be on set at the same time, and spending hundreds of millions of dollars on making a single movie.

Maybe the industry will continue to invent more things! It just took the industry a few decades to start cooking, is all.

I don't think this story is unique to cinema. Deconstructing Arguments Against AI Art notes similar dynamics for photography, recorded music, and digital drawing and editing tools.

2. Rapid Mass Adoption Makes AI Art Seem More Banal Than It Is

One thing that's different this time around is the rapid accessibility of the shiny new technology.

To wildly oversimplify, historically, new inventions tend to percolate out very slowly: super expensive prototypes only accessible to a handful of dedicated specialists and wealthy patrons who existed in a tight feedback loop with the manufacturers (or were the manufacturers), then a gradual, often decades-long, democratization. Think books/manuscripts, cameras, film recorders, personal computers. This gave culture time to adapt, and for genuine craft to emerge alongside the new toy.

Now imagine if we’d somehow only invented the camera after smartphones were already in everyone’s pocket. One random Tuesday morning, every single person on Earth suddenly has a camera app. What’s the immediate, overwhelming result? An instant, planet-wide tsunami of the most banal photos imaginable: beach sunsets and cute girls and juicy burgers and so so many pictures of cats.

Of course everyone would be tripping over themselves to denounce it as a worthless, trivial gimmick, utterly incapable of producing anything of True Artistic Merit™ or any kind of value.

Perhaps they might change their minds when they see the first photo that came from an active war zone, or deep space, or the other end of a microscope. Or maybe it doesn't happen until someone gets the idea to take a lot of pictures in very quick succession, dozens of times per second, and then play back the pictures on a screen at very high speed accompanied by sound. And I'm sure some will stubbornly cling on to their first, dismissive reaction until the very bitter end, and still insist that photography is not a real art form.

I think we're sort of stuck at this step of the discourse currently, but why wouldn't we be? Woolf published her hate mail more than 30 years after the first public screenings of the Lumière brothers' first short films. This means we can continue collectively having bad takes about AI art until 2050, and still come out ahead.

3. AI Art Will Democratize More Mediums

Feature-length movies, animated cartoons of any length, and video games are examples of mediums where it's very difficult to make a finished piece with one person, or only a small number of them. Teams are good for various things, but they also encumber you — they require capital, overhead coordination work, and the smoothing over of disagreements in artistic vision.

Something wonderful happened to music in the 00s, called "FL Studio is good now". When I was in grade school, two kids just a few years older than me met on an internet hobbyist forum for producing electronic music. With very little formal music training, and mostly the computers they already had around, they were able to create little tunes to share with others, and talk shop.

Porter Robinson and Madeon went on to make the glittering, soaring EDM that defined my adolescence. They picked up music theory as they went, started collaborating with other artists, and continue to make music that is really good. But at the start, they were just teens messing around in their bedrooms.

I want that to happen to more mediums! I want edgy cartoons clearly made by a single emo teen, with production values rivalling that of The Lion King or The Little Mermaid. I want them to explode over the internet, so much so that we end up treating them with mild disdain, the way we treat, like SoundCloud rappers today^[2]. I want video game production to be as accessible to any fifteen year old as recording bedroom pop or shooting a video for TikTok. Yes, the variance is going to be high, and the median is going to be crap, but why care? It's already like that for everything else, and we have the curatorial technology for dealing; the good will float to the top, and we'll all be better off for having more variety to choose from.

It’s good if more modes of artistic expression aren’t gated behind technical expertise with film cameras or game engines, proficiency with actually playing a musical instrument, or colour theory. Powerful AI can make it easier for people to get started, and they'll pick up what they require as they go along.

4. AI Art Will Make Make Other Artistic Mediums Do Interesting Things in Response

Once cameras could capture realistic likenesses cheaply, it freed up painters to explore other directions with more deliberation (or perhaps desperation). That's kind of how we got impressionism,^[3] and everything afterwards:

Rather than compete with photography to emulate reality, artists focused "on the one thing they could inevitably do better than the photograph—by further developing into an art form its very subjectivity in the conception of the image, the very subjectivity that photography eliminated".

A similar story plays out with theatre and film, though to a smaller and messier extent:

Throughout the century, the artistic reputation of theatre improved after being derided throughout the 19th century. However, the growth of other media, especially film, has resulted in a diminished role within the culture at large. In light of this change, theatrical artists have been forced to seek new ways to engage with society. The various answers offered in response to this have prompted the transformations that make up its modern history.

In the first case, portraiture was an attractor that many painters were historically pulled into. When demand for that suddenly dissipated, a vivid artistic movement bloomed, and that’s how we got our Monets and Van Goghs. In the second, film took over the role that theatre had as cheap entertainment for the masses, and then theatre, too, went off in weirder directions (though, uh, cards on the table I'm less confident? that that's a good thing since I'm not actually an experimental theatre enjoyer). And perhaps movies today are weirder than they otherwise would have been, if YouTube hadn't then taken the place of cheap entertainment for the masses in turn!

Artistic mediums, as we understand them, are a mix of technical constraints innate to the medium and incentives that are not. In the best case, getting rid of some of the incentives can enable its practitioners to better explore the breadth of what it is technically capable of.

AI artists are going to find certain niches that older mediums are currently servicing sub-optimally, and it'll be the kick in the pants the older mediums need to stretch out fully and do more exploration.^[4] I look forward to the results.

Regarding film and theatre, there's also a pretty interesting bi-directionality that ended up happening. Personnel and technologies continue to flow back and forth between the two mediums, presumably for the better (though to some theatre snobs' dismay). More recently, did you know the guy who made Potion Seller for the new location for cheap entertainment for the masses ended up making one of the best movies of 2024?

You could imagine some cinematography technique emerging from AI-generated media that seems obvious in hindsight but is lateral to where the film industry is currently heading. That technique could then make its way back into traditional filmmaking. More shrimp Jesus in all of the movies, that's what I always say.

5. The Devices of AI Art Will Take Time To Emerge

By and large, creative use of AI today is falling into the same trap as early cinema did: we use it to generate the kinds of things we are already familiar with: static text, images, videos, software. We don't let it be strange, in the way AI is strange. So here's the question of the hour: what are the devices of AI art?

It's still the very early days, but I think Gary Hustwit's 2024 documentary, Eno, might be instructive. Eno rarely agrees to documentaries, but he's a neophile, and Hustwit baited him into this one with the AI angle. All in all, he ended up recording 30 hours of interviews with Eno, and separately assembled 500 hours of archival footage. Then he hand-assigned weights to every slice of film, and created an algorithm to generate a unique 90 minute documentary for every screening. You can see the trailer here, but to watch the actual documentary, you'll need to catch a bespoke showing in a theatre.

Ben Davis is probably my favourite contemporary art critic. He writes:

"I’ve seen Eno three times now. I love Brian Eno, so this was not a chore, and I can say that each version contained incidents that probably would be central to this or that telling of Eno’s career that the other two versions didn’t include. ... The tone is consistent, and consistently affecting... I imagine that it is very, very difficult to assemble all the parts and to weight all the probabilities to generate this consistent personality—it is likely more labor, not less." (emphasis mine)

(It's a great documentary! Go watch it if you can catch a showing. But, uh, also, if you do you should cross your fingers and hope for a cut that features more David Bowie than U2. Davis again: "Admittedly, to say that you are looking at an artwork for its “personality” is also to say that you might catch it on better or worse days.")

Beyond experimental documentaries, artists dabbling with AI are doing lots of other things too! Like, ummmmm, making an impression of your sixteen-year-old self from an iPhone backup and then letting you talk to an LLM roleplaying as them. Oh uhhhh have humans vote on thousands of generated pieces weekly and then the best ones get minted into NFTs?

Okay, look, I fully admit that at the moment none of it is very good. But there's really no reason to expect it to be; we're working with the equivalent of pinhole cameras here, and a body of knowledge has not yet been established.

But this is going to be a very temporary state of affairs, especially if the capital keeps flowing. The tech is going to improve, the creators are going to update on what works well, we're going to figure out what the best practices are and what to stay away from.

It takes like thirty years for us to figure this shit out! Let it cook! Just let the AI cook. Certainly nothing bad will happen from just letting the AI cook for thirty years.

^{^}
Woolf possibly watched the 1920 adaptation, but sadly I couldn't find versions of that online.
^{^}
I also happen to think that SoundCloud rappers can be very good, but that's beside the point.
^{^}
To be clear, that is one factor among broader cultural, philosophical, and artistic questions that artists were exploring at the time, but my understanding is that it's an important one.
^{^}
...in the best case. In the worst case they shrivel up and die. But that's a sacrifice I'm willing to make o7

But the film industry slowly got their act together,

I don't think it was that slow. Even 'A trip to the moon' in 1902 already used stop motion, 'The great train robbery' in 1903 had already used more complex cutting & perspectives & even put a camera on a moving train.

By 1930 the film industry had done:

complex tracking shots (1927)

experimented with previously impossible perspectives (1930)
nice visual effects (1927)
stunts (1923)
combined a lot of techniques (1924)
and did various experimental stuff (1929)

Thank you for creating and/or digging up all those gifs! I didn't mean to imply that it took the industry until the 50s to become functional and agree that many sorts of innovations happened from fairly early on.

To noodle on this a little more, doing the math, there were around seven years between the first Lumière shorts and A Trip to the Moon. But there's also other landmarks we can use for the basis of comparison. For example, if we use The Horse in Motion (1878) as our starting point, we might not expect anything super exciting to happen for a few decades more.

My understanding is that a lot of the slow progress from 1878 to the early 1900s was the “cinema tech stack” needing to become technically and economically viable.

To get good motion you need ~16 frames per second, which means each frame has to be exposed ~1/16 of a second, which in turn means you need stuff like sensitive film stock, lots of light, decent lenses. Then you need a camera that can move film in a way that is at a constant speed but also holds each frame perfectly still briefly, for a precise duration, and without any jitter/warping/etc. Then for economic viability you also need projection that’s bright and safe for a room, plus a practical way to duplicate film at scale.

The starting point for all of this was early photography (e.g. daguerreotypes in the 1830s–40s), which used rigid metal plates and multi-minute exposures in bright daylight.

For some forms of AI art (single images, short clips) the tech stack feels maybe mostly already there, while for others it doesn't (how to turn short clips into a full-length movie). But maybe that’s just a lack of imagination, and we’ll look back and say something like: “they didn’t realize they needed BCI to really unlock AI art's potential”.

I generally agree but would like to add some minor corrections: not 1/16 of a second but 1/32 because the other half of the time shutter opens or closes, although even exposures much shorter than that were already achieved in the 1870s for the scientific purposes. However, that application used hard plates and thus didn't have to deal with the problem of film tearing in the camera.

Decent lenses and dry gelatine process were also ready by the 1880s, and the idea of making photographic film from oiled paper (Eastman used it initially, but it was very fragile) was present as well. Thus, I think, the actual barrier to inventing cinematography was producing clear transparent nitrocellulose (~1883), a technology transfer to the photographic film soon followed (~1887), then a few more years for figuring out the camera (and projector) mechanics.

Also, I don't think that safety of the projector was solved until well into the 20th c., e. g., see https://en.wikipedia.org/wiki/Bazar_de_la_Charit%C3%A9#Fire_of_1897

is ai a medium in the way film is a medium?

Yeah I would say AI is more like the popularization of PhotoShop in photography, or CGI in movies/animations. Most any work you can do with AI is a result of taking existing pieces from the medium, training the AI on them, and then using it to generate more. So it's effectively a stochastic editor or cognitive harness, that helps you make the art. It shines in a few areas, like those subliminal pictures that are two things at once, but ultimately is like a really large sized brush that makes everything you paint average.

I'm not sure, but I think it can be useful to think of it as one at least sometimes.

I often like to cite [https://www.youtube.com/watch?v=Njk2YAgNMnE](this music video) as an example of something that was made possible by AI, and used it as just a building block in a complex artistic process (for my part, I couldn't imagine how I would auto-generate a video like this, or even encode the movement of the camera as a constraint (without some substantial effort), and it was made in 2022!)

it is likely more labor, not less.

I think this is the crux of it. The turning point on AI art comes when artists demonstrate clearly to the public that it can be used as a building block in a complex, difficult process rather than a shortcut around it.

I've been toying with the idea of taking a collection of photographs, assembling some kind of feedback loop to convert them into a uniform perspective, and merge them into a tileset, using something like CLIP embeddings to identify similar tiles. I can then apply a custom implementation of wave function collapse to my photo set, oriented around relations between embeddings rather than discrete tiles, in order to create an infinite gallery consisting of all the pictures I've taken.

This would be substantially more labor-intensive than the photography itself, and - having taken a shot at some aspects of it - would involve pushing the technical envelope on a few fronts, but I think it would be of interest as art.

This reminds me of a case for slop and of the two rebuttals which the post received. I expect that high-level tastes (e.g. related to long-term value of the piece of art, the ideas which it propagates or to meanings unlocked under scrutiny) will not be satisfied by AI-assisted art unless either the AI or the human creator has high-level tastes as well. Alas, training high-level tastes into the AI could end up being difficult due to problems with incentives and with training data (think of GPT-4o's sycophancy, expected(?) rollout of erotica by OpenAI's models, AI girlfriends who don't need to be smarter than Llama, brainrot), and the art which you describe (e.g. making an impression of your sixteen-year-old self from an iPhone backup and then letting you talk to an LLM roleplaying as them) would be either as hard to value to outsiders as family photos or optimized for virality instead of causing the users to develop high-level tastes...

You could apply the same thing to any new genera of art. The hoi polloi all have horrible taste, and therefore we should expect the new genera to mostly cater to that horrible taste and produce a ton of slop.

On the one hand, that's true. New artistic media and generas often result in a greater amount of slop produced. But it misses the fact that new artistic media and generas also create artistic innovation, and while only a small minority of people have good taste, often (especially in the long term), the market accommodates that good taste just fine, and we get masterworks of that new media or genera.

I apologise, but there is another aspect which I described in this comment. Before the rise of the Internet pictures or films would have to be reproduced by talented people or expensive equipment before being seen by armies of viewers. Then the reproducers or those who possess the equipment would have to carefully select what they spread^[1] across the nation over the years. This, in turn, would imply that a far-reaching meme would be spread for a long time by ~the same reproducers, letting the society react (e.g. by arresting the reproducer for possessing porn) or forget about the old films which weren't better than the average.

^{^}
An additional level of friction was the requirement that the commoners come and see the film or see or get the photographs.

It's not clear to me this would increase the quality level of what gets spread. First, the few selectors likely have as bad taste as the hoi polloi, being selected for political acumin for what is, essentially, a political job, if they are selected for anything. Second, it is widely agreed that art flourishes on subversion and going "against the pack", with many (especially of the old guard) hating new art forms when they arrive. Third, such selection will necessarily cater to the lowest common denominator. Compare TV shows of the '90s to TV shows now.

I liked Crawford's defense of slop and think both rebuttals missed the point of his argument.

I expect that high-level tastes... will not be satisfied by AI-assisted art unless either the AI or the human creator has high-level tastes as well

I agree with this; this is the case in all the other mediums (you can't create a good song, or ballet, or watercolour painting unless you have good taste) so I don't see why it wouldn't also be the case for AI assisted art as well.

One direction I think artists can take AI is to just increase the complexity of their pieces. No one is going to spend 5000 weeks creating a single work of art (the average human lifespan is 4000 weeks), but if a good artist can, with AI, create something in 50 weeks that would take them 5000 weeks without it, I would be interested in seeing the result.