A quick post on a probably-real inadequate equilibrium mostly inspired by trying to think through what happened to Chance the Rapper. 

Potentially ironic artifact if it accrues karma.

1. The sculptor's garden

A sculptor worked in solitude for years, carving strange figures in his remote garden. Most of his statues failed: some cracked in winter, others looked wrong against the landscape. But occasionally, very rarely, one seemed to work.

The first visitors stumbled upon the garden by accident. They found themselves stopped by his angels—figures that somehow held both sorrow and joy, wings that seemed about to flitter. 

Word traveled slowly. More visitors came, drawn by something they couldn't quite name.

The sculptor felt recognized for the first time. Not famous—but understood. His private work had somehow become communicable. He carved more angels, trying to understand what made these particular statues resonate.

As crowds grew, their attention shifted. They began photographing the angels from certain angles, comparing new works to old, developing favorites. They applauded. The sculptor, still believing he followed the same thread, unconsciously noted which details drew the longest contemplation, which angles prompted gasps.

Years passed. The garden became famous. Tour buses arrived with guides explaining the "important" pieces. The sculptor produced angels of increasing technical perfection, each guaranteed to produce the proper response at the proper moment. The crowds applauded more reliably. The sculptor carved more reliably. Each reinforced the other.

One morning, walking his garden alone before dawn, he saw his statues without the crowds. Without their reactions to guide him, he saw what he'd actually been making: the same angel, refined and repeated, each iteration more precisely calibrated to trigger the expected response.

He wasn't carving anymore. He was manufacturing applause in the shape of angels. 

And the crowds—they weren't looking at angels anymore. They were seeing what they expected to see, applauding their own ability to recognize what they'd been trained to admire.

The garden had become a perfect mirror. Both he and his audience got trapped looking at their own reflections and calling it art.

2. The mirror trap

The Mirror Trap is a failure mode where creators and audiences fall into mutual Goodharting—each optimizing for a proxy of what they actually value, with each side's proxy reinforcing the other's drift. From my perspective, this dynamic crops up everywhere: in music, YouTube video essays, academic research, startup pitches, journalism—and on LessWrong itself.

Bidirectional Goodharting

Audiences initially reward creators for genuine value—insight, beauty, truth. But evaluation is costly. Over time, they substitute a proxy: reputation. "This sculptor made great angels before, so this new angel must be great." They stop looking closely, and their applause becomes more automatic.

Creators initially pursue authentic expression—following internal vision, exploring what feels necessary. But creation is uncertain. Over time, they substitute a proxy: applause. "The audience loved this angel, so I must be on the right track." They stop trusting their internal compass, and their work becomes more predictable.

Each side's Goodharting dynamic clearly reinforces the other's. The audience's reputation-based applause teaches the creator what to optimize for, and the creator's applause-optimized work confirms the audience's use of reputation as a guide. They become locked in a signal-cheapening spiral.

Basic Hand-Wavy Model:

Let  be the creator's output at time , and  be the audience's evaluation.

Initially, both track true value :

  •  ( creator maximizes genuine value)
  •  ( audience recognizes genuine value)

But maintaining true evaluation is costly, so proxies emerge:

  • Audience proxy:  "similarity to previously applauded work"
  • Creator proxy:  = "expected applause"

The recursive dynamic becomes: 

As , the system reaches a fixed point where:

  • : each new work is a safe variation
  • : each evaluation is predetermined

Over time, both sides converge toward a fixed point: the creator keeps making slight variations of their last hit, and the audience keeps applauding what looks like past work. The original value function 𝑉 has vanished; only proxies-of-proxies remain. Creator and audience are no longer in dialogue about the thing that initiated their connection—insight, beauty, truth—but are trapped in a hall of mirrors, each reflecting the other's expectations. 

3. Resisting the trap

If the Mirror Trap is a real thing that emerges from mutual proxy optimization, then breaking from it requires continuously disrupting the proxies themselves. 

The irony—and potential futility—of noticing this trap is that it might recursively capture any attempt to mitigate it. (In other words: any of the following ideas for mitigating the trap can themselves be Goodharted—so try not to do this, I guess.)

Brief practical ideas for creators

  • Maintain work that no audience will ever see. Not just practice or drafts, but actual valuable real work—the thread that keeps you honest. When your private work starts resembling your public work, this is a signal that you might be captured by proxies.
  • Intentionally cultivate incompatible audiences. Show different work to groups with conflicting tastes. When you can't optimize for everyone simultaneously, you're more incentivized to optimize for something real instead.
  • Break the pattern at peak success. The sculptor's next move after perfecting angels should have been anything but angels. Accept the temporary loss of applause as the cost of staying alive creatively.

Brief practical ideas for audiences

  • Reward failed experiments as enthusiastically as successes. Audiences shape what creators optimize for by what you choose to value. Make creative risk economically and socially viable. Upvotes shouldn't be "I actively like this;" they should be "I support the creative attempt."
  • Practice evaluating work without context. Attempt to approach familiar creators as if encountering them for the first time. What would you see if you didn't know whose work this was?
  • Follow creators who contradict each other. If your taste becomes too predictable or coherent, you're inadvertently training creators to serve that taste. Diverse inputs prevent creators' mirror formation.

By default, most work gradually frogboils from angels into applause; from whatever someone originally felt compelled to create into whatever reliably generates a reinforcing response. The basic questions to ask oneself is simple, though answering is nontrivial—

If my audience disappeared tomorrow, what would I still feel compelled to make? 

If my favorite creators vanished, what would I still be compelled to seek out? 

The gap between those answers and current behavior measures how deep the mirror runs.

New Comment
12 comments, sorted by Click to highlight new comments since:

Love the story, and the pictures that went along with it!

But evaluation is costly. 

I would express it differently - it's not necessarily that internal evaluation is costly, but that the external signals are often loud relative to the internal evaluation. Or even more precisely, the parts within a person's psyche that respond to external signals are often stronger than the ones that care about the internal evaluation. As you put it:

The sculptor felt recognized for the first time. Not famous—but understood. His private work had somehow become communicable. He carved more angels, trying to understand what made these particular statues resonate.

At first, the sculptor's art was driven by something inside him. We don't know what it was. But with the first visitors, something stronger raised its head - the feeling of being understood, something he'd never felt before. 

Having had a taste of that, he started craving it. Various thoughts and impulses began to emerge from that craving, subtle at first - maybe just wondering what made some particular angel resonate with people so much. 

That question was still coming from his original creative energy, but now it was starting to get intertwined with his desire for feeling seen. And as that thought led him to working on another angel, it felt natural to think about how the visitors might see it.

With each new angel that people liked, the craving became stronger. And then came a terrible fear - what if he created a sculpture that people wouldn't like anymore? He was getting so used to the experience of being seen and understood, the thought of losing that became intolerable.

Yet another force made its presence known. One so afraid of even thinking about losing his new position, it became fixated on doing only the thing that had been working. It would not do anything that might risk the Unbearable Outcome. Stay focused on following the crowds, watching them take photos and marvel at the statues, figure out exactly what it was that they liked, do more of that.

As that force became dominant, it wasn't just the original creative impulse that quietly dropped away. The sculptor became so obsessed with repeating the kinds of actions that would make him feel seen, he didn't even notice he hadn't been feeling seen for a long time.

3. Resisting the trap

I'd also add something like "make sure your audience isn't the only way you are getting your needs met". If your art is the only source of validation in your life, then the more validation you get from it, the scarier it will feel to lose. The scarier it feels to lose, the less courage you'll have to experiment doing something else.

The funny thing I feel when reading this post is that I've had thoughts about this sort of cycle before—I think not the exact mirror cycle you're talking about, but similar fixed-points of ping-ponging taste-shaping—but they weren't framed as “how do we avoid this as a trap” so much as “what if that class of system and its basins are the figure that ‘humanity’ (or some relevant subset?) effectively cares about, and the grounding in some other reality is mainly ‘useful’ for edge constraints and random perturbations”.

Maybe a bit of autism is helpful here, as an antidote to the pressure from outside. Sometimes I know that what I am going to write will be unpopular with the audience, but I write it anyway. Not because I am heroic or something, but because ultimately I care about my own approval more than I care about what other people think.

Now I just need to learn how to make good art, and I will be a total success. :D

I've seen this called "audience capture" in the context of blogging - optimizing for maximum attention (and therefore maximum revenue) by saying whatever is popular.

Surely this is why people think so hard about the true and proper telos of their actions.

Art for art's sake is practically wireheading. Art for decoration can be manufactured cheaply, to beautify life, and that's fine. Making art can teach one the skills to make things, but then surely one should apply those skills on something with effects out in the world?

I don't think the audience has an important cognitive role here. The creator can simply ignore them, or communicate to them in ways that uplift them (to educate them in ways they didn't know they needed to be educated because the creator wants to uplift them), or the creator can simply choose to pander, and harvest resources in exchange for a recognizably valuable product.

One of my favorite artists is Thomas Kinkade. He did it all. His art was simply beautiful to normal people. He was the richest living artist ever in history during his own life, achieving actual worldly success, due to normal people actually paying a lot for his paintings (and even just expensive postcards of his paintings). (I think Jeff Koons holds the title now, and has him beat on the "ever in history", but I don't like his stuff as much.) Prestigious art snobs hated Kinkade, and he kept doing it anyway. But also, deep down, he actually did have "the soul of an artist". The hate weighed on him. He was an alcoholic and eventually killed himself. At one point, at Disneyland, while drunk, he peed on a statue of Winnie the Pooh saying "This one's for you, Walt." Like I said... as an artist, he did it all.

In the first draft of this comment, I said "So far as I know, he never pivoted. He never tried to pander to the snobs." However, then I went googling, and... apparently he had a vault full of stuff from many different styles, including some fucked up self portraits? I'm not sure if this downgrades him in my mind, or raises him to new heights. I'm glad that he was aware of what he was doing, though. Here's a quote from the article I found that mentions his vault (bold not in original): 

Yousef says she did not understand “how skillful a painter Kinkade was” before starting the film project. She points out that many other people were putting images of cottages on commemorative plates in the 1990s, “but they were terrible. His skill blew them out of the water. And now his style has become the archetype.”

Kinkade was also prolific. He created a new intricately detailed cottage painting every month, in addition to running his empire of prints and collectibles. He was one of the first people to make himself into a brand. “Andy Warhol would’ve respected his marketing genius,” Yousef says.

Warhol’s name comes up often in the documentary, and it is no accident that two experts on the Pop artist are interviewed in the film—the former Andy Warhol Museum director Eric Shiner and the critic and Warhol biographer Blake Gopnik. As Kinkade once said, “I’ve achieved the Nirvana that Andy Warhol dreamed of achieving. Warhol’s dream was that he would become a robot who just could push a button and his paintings would come out without him even being involved, and I’ve done that!” The performance artist reveals himself. Even Kinkade’s own family calls him one.

Amazingly, the artist’s wife, all four of his kids and his two siblings participated in the film, in addition to some of his closest friends and a couple of colourful superfans. They provided not only insightful interviews but archival images, home movies, boxes of fan letters, Kinkade’s teenage audio recordings and the all-important access to his vault. Yousef says that it was particularly important for her to give voice to his immediate family, who were often overshadowed by Kinkade’s public persona and success.

apparently he had a vault full of stuff from many different styles, including some fucked up self portraits? I’m not sure if this downgrades him in my mind, or raises him to new heights. I’m glad that he was aware of what he was doing, though

I think any skilled artist like Kinkade will be able to imitate many different styles, and will be very aware how their own style fits with the rest of art history. For example, Kurt Cobain could write music in different styles, and toward the end of his life he was getting tired of the screaming thing (yes, he very much saw the screaming thing as a style, separate from his personality) and wanted to make more softcore REM-like stuff.

I don't see how this Goodharting is bidirectional. It seems like plain old Goodharting. The assessment, with time (and due to some extraneous process), becomes a lower quality proxy, that the artist keeps optimizing, thus Goodharting actual value.

The artist is using “does the audience overtly respond well to this” as a proxy measure for whether the art meets the artist's more illegible standard of goodness, but the audience is using “does this come from an artist we already regard as good” as a proxy measure for their own illegible standard of goodness. The illegible standards of both parties had to intersect enough around the initial art for the cycle to get started, but that doesn't mean they're the same, nor that the optimization processes are completely symmetrical or the same process. It might be possible that the signals get so entangled that you could treat it as an instance of single-Goodhart on some compound measure from outside the system, but from inside the system there's still multiple sub-cycles going on that feed each other. Does that answer this, or is there something else off?

But the audience isn't optimizing/Goodharting anything, just providing an imperfect proxy. It is only the artist who is argmaxing, which is when Goodhart appears.

One way out would be for the artist to stop optimizing for the audience, and start optimizing for real value. Another way out would be for the audience to perfect their assessment. But this is always the case for Goodhart: you can either stop using the proxy altogether, or improve your proxy.

Something more interesting would be "the artist is trying to create the art that elicits the best response, and the audience is trying to produce the response that makes the artist happiest", or something like that. This is what happens when two people pleasers meet and they end up doing a plan that none of them wants. It's also relevant to training an AI that's alignment-faking. In a sense, the other trying to maximize your local utility dampens the signal you wanted to use to maximize global utility.

Alas, many people desire the trap, because it pays the bills and that's all they ever really cared about anyway.

Lots of art, not many artists.

is this a real trap that can happen though? are there actual examples? I kept thinking that it sounds like a situation that would immediately shatter in real life under uncounted variables.

thinking about the audience in the parable, why aren't they getting bored? I believe most people would lose interest in near-identical "new works" pretty quickly.

so maybe it also depends heavily on what counts as repetitive works. for example, a common criticism of FIFA video games and other "annual reskin" sports games is basically that "It's the same thing every year." but it's really not! for one, I know people who play these games, and there are 3 reasons they might continue to buy the new editions:

  • actual upgrades: these games usually actually have noticeable changes, even if they're relatively minor. graphical upgrades, new animations, improvements in controls, new gameplay, etc.
  • staying current; these games model real, actual people who are currently alive and playing. year after year, their stats may change. they may change teams, retire, or new players will enter. sports fans seem to find this a valuable aspect of the games.
  • online play: if you like playing with other people you might put up with worse than annual paid updates!

so, that's the example that most readily comes to me of this "trap" and I think it really doesn't apply at all. besides the audience factors - these games aren't made by "an artist", they're made by a corporation.

I'm not sure that characterizing the audience's proxy as "similarity to previously applauded work" is always right. I think in a lot of domains, including sculpture and other fine art, a very large part of the audience's evaluation is simply the reputation of the creator. Art critics don't like to say that they don't understand <insert famous artist here>'s new work.

Curated and popular this week