The Generative Crash: Why RLHF Fails the Free Energy Principle

Abraham Haskins

Rejected for the following reason(s):

Duplicate

Read full explanation

LessWrong-Specific Foreword

Epistemic Status Highly confident in the neurobiological model and the UX framework. Confident in the application to AI alignment.

Context and Structural Formatting Note I am a Human Factors psychologist, engineer, and systems architect. This piece formalizes the mechanical failure state of the human Theory of Mind when processing generative media, bridging active inference with Cooperative Inverse Reinforcement Learning.

The essay deploys a functional UX protocol called The Ghost Scale. It relies on precise CSS opacity limits to signal intent density, structurally altering the metabolic expenditure of the reader. Because the native editor here strips custom CSS, reading the plaintext below physically breaks the biological friction-reduction framework the interface is designed to interact with.

To allow you to properly evaluate the cognitive load reduction and view the visual affordances as intended, the interactive version is hosted here: abrahamhaskins.org/art.

The raw text is provided below to facilitate direct critique of the IRL convergence theorem, the psychophysics, and the alignment implications.

Moderation Note on Generated Content: Approximately 15% of this post is AI-synthesized text, almost entirely confined to the formal appendix. I am aware this violates standard community guidelines regarding generative output. This inclusion is a functional requirement, not a shortcut. The appendix is explicitly flagged, logically led into, and serves as a live demonstration of the UX protocol defined in the essay. The preceding 85% of the theoretical framework is entirely human-authored, with the exception of clearly visible, explicitly declared, and clearly marked short segments used primarily for demonstration of the framework. If possible, please read the interactive version instead of this reduced-fidelity local copy to allow yourself to engage with and test the AI affordances proposed.

Art: A Unifying Model

The Ghost in the Text

My name is Abraham Haskins. I am a cognitive scientist, an engineer, a psychologist, and an AI systems architect. I'm an overeducated millennial with five degrees scattered throughout surprisingly relevant fields for the points I'm going to make, and the most relevant is a PhD.

In this essay I will use plain english to: define a unifying model for art (with accompanying graph), explain the neurology behind how art works, define the mechanism for an incoming AI-driven cognitive catastrophe, fully flesh out a solution, and then close by showing how the application of this model suggests we focus on a specific strategy for helping align AI.

After, I'll abandon the plain english to trace out an equation describing the model and then close with a proposed set of hypotheses and a plea.

Before you read this, let me be clear: I wrote every fucking word of black text on these pages. I know we’re all suspicious of long bits of text these days, and I’m going to spend the next several pages explaining why. But I want to ask for your trust up front: I, a human wrote this, and I truly want you to read it.

AI's intrusion into art has quietly and subtly confirmed for us what art truly is: art is compressed intent.

Let’s be precise here. “Art” describes any creation into which a thinking being invested a large concentration of decisions, all in the service of one or more goals [^1]. That creation can be a painting, a speech, a book, a bridge, a dance, a makeup application, a headstand or… really anything. Anytime any thinking being makes a high concentration of decisions, and that volume of decisions creates something, that creation is art. [^2]

You'll note that definition encapsulates pretty much all of human activity. You are correct. It will not surprise you to hear that under my model anything can be art in the right situation and with the right viewer. We're going to be speaking most about how "artful" something is. It's simple: if more decisions are made in the creation of something, it is more artful (or artsy, if you prefer) [^3]. Everyone has their own personal threshold for how dense the decisions need to be for the resulting creation to be "art" in their eyes, but we will be instead putting all human activity on a scale of "artfulness." We’re going to make a graph with this, so let’s put that on the X-axis: things on the left are less artful, and things on the right more so.

The Y-Axis defines whether the art is good or bad. Whether art is good or bad depends on whether the artist met the goal implied by the actions taken. [^4] This is the pivotal key to appreciation, because you, the viewer, get to decide what their goal was. In doing so, you decide if you agree it was a respectable or worthy goal, and (more importantly) you get to decide if they met it [^5].

Art is "good" when you judge the artist's goal as having been met. If you judge the artist's goal to be unworthy of respect but agree they met it, you may not respect the artist, but you will still tend to classify their work as "good art" [^6]. A prime example is a skillful speech given by an evil man. Imagine a hypothetical painting that inspired a nation to evil just like Hitler’s speeches did. Would you call that painting “bad”? I feel most would begrudgingly admit that it was technically good in quality - before calling it misguided and evil [^7].

Alternatively, failing to achieve a goal worthy of respect results in a "noble failure," like a director biting off more than they can chew. You might agree with some decisions and say parts of it are good, but you wouldn’t call the whole piece "good art."

So when we speak casually about "good" or "bad" art of any kind - what we mean is that the artist accomplished their goals, as we infer them. And that's our Y-axis.

So there, now we have two axes. Let’s draw it out:

The Graph

My (very human) editors suggested I add this section. Previously, I just let that graph stand on its own, but they suggested a bit of exploration here.

Notice how easy it is for you to place pretty much any bit of human activity you can imagine into this graph, and note that it lines up quite well with a second, subjective glance at what is more "artsy" or "artful." For example, you instinctively know that someone who mechanically traces over a stick figure cartoon hasn't made as much “art” as someone who drew a full picture themselves.

But is this a robust principle - can we subdivide it? What about people who use reference photos? What if you trace over a real photo rather than a cartoon? What if we use a reference photo, but only from memory?

Note that all of these sort of slot easily into the X-axis of the graph. You probably don't have an issue imagining where they could fit unless they’re incredibly close.

Let's do the same thing when moving up and down the Y-axis. Imagine you saw someone go to incredible lengths to flawlessly trace a cartoon using tons of machinery - lasers, and so on. They made the whole contraption. You know exactly what their goal was. They told you what their goal was, in fact. And there's some artistry to be seen in the effort of chasing extreme precision

But in the end - they didn't trace the cartoon properly! It’s super janky looking. Would you call their highly flawed recreation “good art” or “bad art”?

I’d just reflexively call it bad. I think everyone would.

But what if I showed you the exact same tracing - only now from an artist that made stylistic changes on purpose? All of a sudden you look at the imperfections differently [^8]. You don't really know the artist's motivation anymore, so you have to guess at why they made those changes, and that guess informs whether you think it's good or bad. Maybe they made a hated politician’s nose larger, and you think that’s a great (or terrible) way to make a point. And that, then, becomes your interpretation of whether the art is good (or bad).

Respecting an artist's competence is not inherently an assessment of beauty; it is the byproduct of reverse-engineering another primate's problem-solving strategy.

Talking about Art

Let’s return to the foundational definition for a moment: art is the product of highly-compressed decision-making [^9]. Honestly, the examples are limitless for this concept.

Photography is a great example. I like museums, and I’ve been to one or two that focused on photographers. As part of a tour in the Harry Ransom Center in Austin, Texas, I saw an interesting defense of the medium. As part of a discussion about whether photography “counted” as art – the tour guide dove into explanations of how much care the photographer put into decisions regarding positioning, lighting, film development, and so on [^10]. Looking back, I’d swear there was a lot of focus on the importance, the density, and the difficulty of the decisions that were made. Literally, the answer from experts to “prove this thing is art” is reflexively “look at how many decisions went into this thing’s creation” [^11].

Counting Decisions

I’ve been describing obvious applications of the definition. Let’s try to break the theory instead. If art is just evidence of decision-making - whether that be a speech or a painting - then let’s address art with very, very few decisions. How can a Zen master drawing their 10 millionth ensō circle be more artful than a child scribbling for days on a single painting?

Compressing human struggle

The answer is the bundling of previous decisions via the process of automaticity. In cognitive science, automaticity is the process of bundling actions into automatic routines, similar to functions in a programming language or muscle memory [^12]. It’s the word for what happens when you get in the car at the end of the workday and just appear in front of your house without realizing how you did so.

When a Zen master draws the circle, you’re seeing the weight and the compounded array of decisions that the master has built into their masterpiece. They make decisions about how to control their breath, the angle of the brush, pressure, light, and so on. But they don’t need to think about all of those actions manually. Instead, those decisions were made previously - and effectively the Zen master just needs to activate that muscle memory (“call a function”) in order to reapply all of those decisions once more [^13].

These layers add such a compression of decisions to every moment that the child, who has built up no automaticity, cannot match the density of artfulness [^14]. So we respect the decision-making, but we include decisions that were made previously.

So the Zen circle does have an answer: baked-in hierarchical compression [^15]. Decisions are counted individually, including subordinate and previously addressed solutions. Every decision matters.

Automaticity is the caching of human struggle [^16].

But what if you can’t tell that decisions were made at all?

Talking about the Viewer

Let’s talk about the engineer who walks up to an ugly bridge, stares at it for 10 minutes, and declares it to be art.

You see, art is indeed in the eye of the beholder. And the eye of the beholder is defined explicitly by the error bars on the two axes of our original graph. First, you may disagree with an expert about how many decisions went into something. I might look at a grey government slate bridge and say that obviously they used boilerplate nonsense: this is about the most uninspired bridge I’ve ever seen. But an engineer who specializes in city bridges might walk up and look carefully at the concrete mixture selected not just for this region, but literally for this street. They might look at the way various columns are made slightly thicker or lighter, deviating from the standard design to achieve goals I wouldn’t even consider [^17] [^18]. Notice that as I say that, you can feel your appreciation for this non-existent bridge grow. All I did was describe the decisions.

Experts see more decisions in artifacts of their specialty. They understand the goals of the creator faster. They may even disagree with you on how much something is art at all. However, that doesn’t make it good or bad art, just whether it is art.

This explains the scene in the movie where one person might laugh and cheer at an explosion, but the nearby bomb expert stares at it with a tear in their eye and declares it to be art. The bomb expert sees all the decisions that went into the explosion and can appreciate them.

I think everyone does this within their own expertise, you included. My fun, personal example is that I hate it when I hear people trash IKEA manuals, because I specialize in interfaces. Yeah, my specific specialty is interfaces involving AI, but I still can see the artistry in the way IKEA tries to communicate so cross-culturally and effectively. I see little details that could only have been made in service of a particular clever goal, and it makes me smile. I can’t fully appreciate all the decisions they made in creating it, but I do appreciate at least a bit of it.

Reverse-engineering the creator

So, here I’m going to define the word appreciation. To appreciate something is to identify the actions taken to create an artifact, then to use those actions to infer the goals of the creator, and to subsequently connect with and learn from the creator if those goals are deemed worthy of respect [^19] [^20]. I expect you'll appreciate that definition more if you ever glance at it after the neuroscience section, but let's make it much simpler for now: Imagining all the decisions and work put into something is how you appreciate it. You can appreciate a good story by admiring its theming and seeing how the author guided you to feelings of surprise or sadness. You can appreciate the work that your spouse does for you by noticing the care put into your favorite so-and-so. You can clearly appreciate a beautiful painting. It is the act of looking at an object, using the part of your brain that models other people to understand how they created it, and inferring their intention [^21] [^22]. That is the act of appreciation.

It is the way you are able to empathize with a director, an actor, and a costume designer involved in the making of a movie scene while standing hundreds of miles away from every one of them. You connect with them because you understand the decisions they made, and you can infer what their goals were through those decisions.

Art is learning. In a way, it’s forced learning. By hijacking the automatic process of learning through intent, it forces you to simulate the creator’s mind and to change your mind in such a way that you could make something similar [^23].

Because of that, art is also a virus [^24].

Talking about Aesthetics

Let’s talk a little bit about aesthetics now. I’m going to define this, but first, an interesting perspective that might help make sense of it.

What if there is no free will?

Seriously, most philosophical arguments and cognitive models are equally valid in a world with and without free will. Annoyingly, so is this one.

When you think of things from this perspective, you often come to weird conclusions where individual people act like information distributors in a semi-collaborative node network or something [^25]. You act in accordance with your neural programming, pretty much a robot. In this case, we could see a person come by a piece of art, stare at it, and for pure reflexive survival reasons, try to learn how such a thing was created. Then, with that new capability, they can (when useful) create a similar tool, perhaps leaving it aside for someone else to see and copy. Art is passing from brain to brain, slowly evolving in concept. We’re into memetic theory at this point [^26].

But wait. If there are no decisions, how can we call decision-making “art” anymore? We can’t. And you’re right. When you look at it like this, the art starts to look viral.

So now, as this sort of unfeeling node in this network of intelligences, you have gained the ability in one sense or another to create this thing that you have seen. You’re better at making this thing that you have observed than you were before you observed it. And due to just the natural way you prefer to interact with your environment, you are now more likely to create something like this in the future that can then be copied, and so on and so on.

Effectively, this is an informational or memetic virus. It is a meme that infects your mind and forces you to learn how to make copies of it in the hopes that you might, in the future, propagate it further. Obviously "hope" is the wrong word here, but this idea of memetic theory is not new. Of all people, it was Richard Dawkins in the 70s that pioneered it [^27].

Oh, and just to complete the thought with something cool: if art is a memetic virus, artists are the people who specialize in creating the most viral memes. They are teachers, soothsayers, nature's marketers, maybe even plague doctors. Pick your favorite.

The Autonomic Honeypot

Alright, so now let’s define aesthetics. Aesthetics is the honeypot. It’s the trap.

It is the word for how much an object forces you to stare at it. It is a clever manipulation of your mental shortcuts, heuristics, background processes, and overall subconscious – things like the tendency for movement in your periphery to catch your attention [^28]. It is something attention-grabbing (but not necessarily attention-keeping). It is an invitation to force you to appreciate something, to learn from it, and to absorb the knowledge that went into its creation so that you can be a little bit better at creating something similar, thus passing it on.

Now let’s combine these concepts to imagine how it feels to appreciate nature. There are two ways one might “appreciate” nature, and let’s talk about them both. First, you can admire the sun. However, unless you think there was some decision-making that went into the creation of the sun, you can't really "appreciate" it like you do a gift. To appreciate something, you have to acknowledge the decisions that went into the creation of something. If you’re appreciating the aesthetics, you’re appreciating the effort that went into the honeypot's creation [^29]. Religious people can appreciate the sun by thinking of the care their god put into its creation, but the nonreligious folks can only admire its brightness or feel gratified by its warmth. Appreciation requires the acknowledgment of decisions, which usually requires a creator.

But not always.

This feels like definitional quibbling, but I wanted to highlight this because of the other way one can appreciate nature: a sort of engagement with natural law, formed in part by the decisions of all its creators. This kind of appreciation is actually very analogous to interacting with other art. It’s simply that you’re attributing all of the decisions made by every one of the creatures to a conglomerate called “nature.” “Nature” isn’t some kind of blind watchmaker, it’s the interaction between natural forces and the shared power of all of earth’s animals' neurons in a network [^30]. So a nonreligious natural appreciation might be a blind appreciation of the sunset, without a full acknowledgment of its creator - and a part of your brain seeing the ancient beaver's intentionality in the mountain's position.

It’s an awareness of all of our brains - every human’s, kitten’s, and worm’s - creating. And your brain trying to understand it and learn from it [^31].

Note: This paper was originally much, much longer. Here are some fun, persistent questions about art that this model just casually answers with no special pleading required. Recommended for artists.

Extra Credit: Art Mysteries Answered

LessWrong Note: In this section, I go through a bunch of ongoing questions within the art world, such as death of the author, Duchamp's fountain, etc. Yet my read on this community (I've been a lurker for decades) is that you would rather get to the meat immediately. As I am unable to collapse sections here like I am on the website, I'd rather get you to the good part faster. If you're curious, they're easy to find (just click the "mysteries" entry in the table of contents: https://abrahamhaskins.org/art).

Talking about Neuroscience

Let’s start with the learning aspect, and why it’s clearly central to your functionality. A little modern cognitive science for you: stimulus-response is a bit out of date these days. It’s better to imagine yourself as a sort of prediction machine. First, you learn to predict your environment. Then, you use your predictions to error-correct your way into a future that looks the way you want it to look [^51]. If you want to Wikipedia-dive, the terms you’re looking for are the Free Energy Principle - or when AI agents use the same mechanism, Active Inference modeling [^52].

Effectively, this perspective states that you constantly have two goals: to become more certain about your environment, and to use that certainty to guide your environment into whatever you want it to be. Learn things you don’t already know, then use them mercilessly to maximize your goals (such as they are). And while we are constantly doing both, we’re only going to be engaging here with the learning aspect.

The Thousand Brains of the Galactic Senate

Let's tie this all together really quickly with a metaphor to explain the Thousand Brains theory of consciousness in simple terms (while baking in a few other models for your benefit) [^53]. Imagine the neurons in your brain as something like a much, much larger and more diverse version of the galactic senate from Star Wars. Each little hovering repulsorpod with an alien in it is a neuron. Also there are a lot more of these senators-on-repulsorpods in your brain - trillions.

Some neuron-senators are at the bottom, and can physically see the "ground" truth: raw sensory data. Then all of them yell and argue about what they think they see. Above them is another layer that looks down and can't see the ground truth - but can hear the arguments [^54]. There's a fog. At some point, someone in that second layer who can hear all of this will yell "we're touching a curved, smooth object! A lot of you are saying that!" and everyone below who isn't yelling that shuts up. And now the second layer starts arguing until someone in the layer above them hears the noise (and maybe the people in that layer can hear a little of the argument on floor one) and yells "we're holding a cup!" This continues up the floors of the galactic senate until you get to the top floor, where - the supreme chancellor is missing. All we have are 150,000 or so top-level senators voting on everything. Maybe in this case they're voting on "is this cup of coffee mixing well with the soy sauce I poured in?" or something.

Now, the higher levels of senate aliens care a lot about when the lower levels are wrong. Note that the senators doing higher-order reasoning aren't generally using raw sensory data. They're using the perspective discussed below to inform their reasoning (this is how you don't actually "see" reality, but rather your own predictions of it). They're keeping track of which senators below are often right or wrong, and updating their own trust and voting ledgers as they do so. Each senator has a ledger: it helps them keep track of how to vote given what's below.

Two things I want to get out of this metaphor. First, when a lot of senators are yelling at the same time, it's costly. You only have 20 watts to run your brain with, and you like it when you can have senators positioned above that can yell "quiet" often because they correctly figure out what the deal is early [^55]. You learn how fire works, and you don't need to spend time re-understanding smoke when you have a senator that knows how to identify it quickly. Even better if the senators above see that he's right frequently, because the second thing I want to introduce is how surprise fits in here.

Once the senator above yells "quiet" to all the incorrect shouters below and declares they've figured out what's going on, everyone down below who wasn't correct has to not only update their voting ledger so they don't mess up again in the future quite so badly - but they also have to tell all the neurons below them to update their weights too given this new information. This combined work is costly, so much that you can actually feel it. It feels like being surprised. The Free Energy from the free energy principle that we try to minimize as the learning half of the active inference model is simply the effort that all of these senators have to spend updating their voting ledgers. The more wrong they were, the more they have to change, and we try to minimize that overall effort [^56].

Now that we have this model of galactic neuron-senators (my own metaphor for the thousand brains theory of consciousness), let's attach it to what we've been talking about.

Mirroring Intent

Mirror neurons have long been associated with the concept of empathy (affective empathy, specifically) [^57]. Fun note: mirror neurons are a little out of vogue right now, in part because we mimic things more comprehensively than their function would imply. Mirror neurons are lower in the galactic senate, effectively acting as our eyes into the emotional world. We use them for what we call affective empathy, sure, but if anything their limitations show that we clearly do more than just that. Enter embodied simulation. Embodied simulation is a more active process, using cognitive empathy instead of affective (lower-level and emotion-driven) empathy. Take a look at the following photo.

Even without a specific reason for your mirror neurons to activate, I bet you can feel it: you know how you would feel holding that ball, how it would feel to throw it, and what your muscles would do to accomplish that exact goal. It’s not a muscle flex, so much as a reflexive sort of awareness. You aren’t empathizing with anything: there’s nothing here to empathize with. Your lower level neuron-senators are quietly refusing to mirror anything, but the higher-level senators can still use their previously filled voting ledgers to figure out the details of how this could be executed and yell upwards anyway. It happens almost without you noticing: it’s a reflexive engagement with the world. It is embodied simulation, driven by the Theory of Mind network in your frontal cortex [^58].

Let’s talk about that Theory of Mind network because it is vital. Specifically, I'm talking about the neuron-senators in the middle of this particular chain: the ones that read from your lower-level emotion-aware mirror neurons.

This network is what raises us "above the animals," so to speak. It is the robust structure that is one of the hallmarks of the neo-mammalian brain, something nearly uniquely human given how specialized we are in it. A lot of animals have mirror neurons and limbic systems, and some even have some capacity for cognitive empathy (great apes, dolphins, whales, elephants, crows, and ravens have more than normal).

But no one went quite as hard into specialization as we did, and the robust structures in our brain that hyper-specialize in this sort of higher-order empathy are quite uniquely human. Effectively, we have a beautiful superpower: we can model other brains with incredible accuracy. We can use cognitive and affective empathy, using each to error-correct for the other [^59]. We can use Theory of Mind to try to understand what other people are thinking and how their perspective works.

The neuron-senators on the ground floor are the mirror neurons, the source of affective empathy. The ones above are your theory of mind network. Just like your senses help you error-correct your simulation of the world, your mirror neurons and affective empathy help you error-correct your automatic simulation of other people's physical intent. [^60]

A quick aside: I'm glossing some of science here. For example, your theory of mind network and the mirror neurons in your limbic system are part of distinct and separate networks, but often work together for certain tasks. So it's more like the neuron-senators from those floors are often jointly members of special committees on human behavior.

There’s one very specific behavior I want to point out. Obviously, art is very tightly coupled with the Theory of Mind network. When people view an image and are told that image is “art,” those regions of their brain light up. The Theory of Mind networks activate.

Something interesting happens if you tell people that the image was computer-generated or is random: almost immediately, those regions go fully dark. I think you can actually subjectively feel this; we’re all familiar with the sensation by now. When you’re viewing a picture online and halfway through realize it was generated by AI, part of your attention slams off as one of the larger parts of your brain… just stops caring [^61]. What we are subjectively feeling is this Theory of Mind network turning off [^62][^63].

The exact mechanism for this works through the Default Mode Network, another brain network that has something it cares about. It is the network that decides who and what currently has control over your mental processes. When you simulate someone else, often you use your own brain hardware to do so: your Default Mode Network keeps that straight by ensuring the rest of your brain knows that "we aren't panicked, we're imagining what that person's panic must feel like."

The Default Mode Network is like a pretend "simulation protocol" that the senators can run. They disconnect from everyone else, and just... daydream, or pretend, or simulate. The Salience Network acts as an arbiter telling the neuron-senators to just play pretend to run a simulation for a moment. The arbiter would be involved in deciding whether incoming simulated data is valuable enough (in terms of intentionality density and alignment with your values) to let the neuron-senators relax the rules for a moment and write down some of the simulated data in their voting ledgers. This is how learning occurs.

With AI-generated data, the arbiter never sees anything valuable enough to relax the rules - there's no intentionality in the data to learn from. We can't use our theory of mind network to judge the intent of the creator, so we can't judge the creator's goals or their implied values. The theory of mind network doesn't activate, and the default mode network doesn't relax to allow the simulating neurons to update their weights using the simulated data.

There is a part of your brain continuously seeking evidence of actions, assessing those actions as effective or ineffective, and trying to learn the effective ones by intensely empathizing with the creators [^64]. You come across AI art and that part of your brain activates immediately, trying to learn and understand this person who made decisions for some reason. Then, when you realize AI made it, it’s not that there isn’t a creator - but rather that the math becomes so complicated that you simply stop trying to figure out what set of matrix algebra combined a prompt and random noise to create this particular piece. It is not understandable to you at all. The black box is so unbreakable you may as well not even try.

Note: GenAI Block: The preceding paragraph asserts that the human Theory of Mind network terminates engagement to conserve metabolic energy when confronted with synthetic generation. This text block is a functional demonstration. There is no localized consciousness, intent, or affective state governing these syntactic choices. They are the result of high-probability token sequencing derived from a weighted matrix of human training data. Any cognitive effort expended by the reader attempting to reverse-engineer a psychological motive from this specific paragraph is a biologically wasteful allocation of your 20-watt budget. There is no ghost to find here.
Prompt: Demonstrate this effect by writing out a cold paragraph that drives readers to skim or skip ahead

Top: AI Art. A generated photo intended to capture a blended version of Cy Twombly’s and Jackson Pollock's style.

Bottom: A famous painting from a master at the top of their craft. Experts, in particular, seem to admire it.

Please take a moment to look at them both.

Are your eyes drifting right? Are they almost… sliding off… the picture on the left? Now you know another reason why.

I would liken it to the feeling of being in a magician’s audience. The magician has promised to provide you with an interesting, aesthetic performance that is actually not understandable. It is a puzzle wherein they invite you to learn how these things could have been accomplished - but of course, the point is for you not to figure out the answer. That is what it feels like to be in a magician’s audience, which is why you often don’t even try; you want to be fooled. Either you enjoy the spectacle and aesthetic appeal and the feeling of surprise, or you try to puzzle out how they did it. Both are valid ways of enjoying a magician’s spectacle, but only one tries to properly appreciate the work the magician put into the performance. Even past the potential for appreciation, AI art is even less interesting because while the magician invites you to figure out the puzzle - with AI, all of your brain's normal architecture for appreciation is useless. Your brain will not allow you to do that kind of matrix math fast enough (…yet).

This is also a process for distant learning. It is one of the main processes by which we engage with society, I would argue - this kind of distant, empathetic learning. The current dominant model of learning explains sitting in a classroom as follows: you hear a teacher give a speech, and you rearrange the relationships between the neurons in your brain (you adjust the weights!) such that you could produce the same speech [^65]. Those of you familiar with how LLMs can clone each other’s weights as part of a distillation attack will find this a very familiar-looking process. And it is. And with the power of your Theory of Mind Network, you don't need to even watch the creator in person. As you’re looking at a sculpture, if you can figure out how it was made, you can now make one yourself [^66]. It is a method for survival, learning, and connection over a distance. It is the way we are constantly refining how we interact with the world as thinking beings. We seek evidence of intentionality so that we can learn from it.

We learn by reverse-engineering the decisions that shaped our world [^67].

Talking about Communication

I also want to take a brief moment to talk about mimicry. I used to tell a joke when I was teaching classes on this. I would say that most languages - almost every language, I believe - have some pun, joke, or word that connects primates with the idea of mimicry. In English, it is "to ape" something, as in copying it. But interestingly, when we study apes in the lab, we find that they are pale shadows compared to the masters - the true masters of copying any and everything they see: humans [^68].

If you walk into the middle of a room with a box and an ape, do a "box dance," and then grab a banana out of the box, the ape will simply walk forward and grab the second banana from the box. But if you do that same thing for a human child: the child will walk up, and their Theory of Mind network will activate. They look at your decisions, and then try to figure out why you did the box dance, fail, and because of that failure: they just have to do it too. Because if that child respects you by believing that your motivations are in line with their own, and they also respect your competence by believing that your methodology is likely superior to their own or otherwise aligned with their own interests… well, then they'd better do the box dance too [^69]. So they reflexively watch you and learn how to do that exact box dance so that they can do it themselves.

The Firewall of Disgust

By now you see that to learn you first simulate someone else, then relax the "rules" of simulating to allow the senator-neurons to write down a few of the simulated voting rules. But what if you’re simulating someone awful? You do have protection - though it’s not perfect.

We aren't disgusted by art we deem unacceptable merely because the motivations it reveals are ugly. We're disgusted to ensure our brain rejects the neural weights (the numbers in the neuron-senators' voting ledgers) because the target's simulated data was corrupt.

I would argue that at least a little of the disgust people feel at AI art comes from this effect. I have no idea how to prove that one, though.

I just want to highlight how much we are, in a very real way, the copy ninjas of the animal kingdom through the magic of our Theory of Mind network. We don’t even have to look at you to copy you. We can just see the results of what you’ve done and then do it ourselves. I can see a video of a knife fly through the air and, with enough appreciation over time, learn to do the same thing myself just through careful knife-study.

We’re constantly seeking evidence of intentionality, evidence of activity from other thinking beings in our world [^72].

And it is through this constant state of seeking to learn - and learning through looking at any individual object - that we are reforming ourselves into the kind of person that could make that object.

We spent so long trying to figure out what our ultimate weapon was: endurance sprinting, throwing things, pattern-matching… when it very well may have been the semi-telepathic nature of empathy, empowered by viral meme-learning all along [^73].

We are earth’s apex predator, wielding our truly mightiest weapons of all [^74]:

Talking about AI

By now you should have your own ideas on AI given everything so far, and you may even be ahead of me on this one. Basically, depending on whether you can intuitively understand how matrix algebra results in a picture, AI art is either anti-art or hyper-art. I’ll explain.

Let’s acknowledge that it is technically possible to create such an array of AI art that the sum total of all of the decisions would approach the decision density of more traditional artists. You and I both know that no matter how much you hate AI art, if you met someone who said they had spent four years crafting a single piece of AI art - a picture generated entirely by code, but with the work of four years of continuous effort going into it - you would be a little bit interested and compelled to see what such a thing looks like, AI art or not.

But effectively, when you show me standard AI art, I process the number of decisions that went into its creation, and I acknowledge it to be minimally artful by definition because of the low number of decisions made. You’re left with the feeling that you could do that very easily. It is not impressive art at all. It may be "good" (achieving the creator’s goal) or "bad," but few prompts would be considered artful by most standards - and other than model selection, how many other decisions are being made?

It is actually worse than that. Because at this point you should see that art is a kind of distant communication - a way to empathize with people who aren’t present.

Part of that communication relies on the fact that everyone has their own idiosyncratic mechanism for disrupting data, for changing the environment. You, as a particular artist, may hold the brush with your left hand upside down, and that gives you a particular style. That decision-making tree is your particular set; it is well-explored and becomes your artistic style. This kind of idiosyncratic order is a level of personalization that comes in part from you and in part from your subconscious.

AI art is a direct removal of this.

If AI art is a kind of average of everyone’s perceptions and perspectives - which it is - then your own idiosyncratic order and the decisions that you specifically make are removed [^75]. They’re simply no longer present. And what’s more, we aren’t just filling in the gaps with what makes sense. We’re filling in the gaps with the average of humanity. We are regressing towards the mean, away from the decisions you are making. You said you want to make a fish, and you have an idea of what kind of fish you want to make. But the AI generator will move you away from your own mental idea and towards what everyone else thinks a fish looks like. It won’t even give you a random fish (cough assuming a temperature of 0 cough). It will give you the average fish. A common, uninteresting fish.

The latent space is a graveyard of idiosyncrasies, ground down into a frictionless paste of human expectation [^76].

By definition, this removes the connection. It makes it harder to identify who you are. It makes it harder to empathize, not just because you made very few decisions, but because you used a tool that fills in the gaps not with your idiosyncrasies, but with the decisions of everyone else. It makes the least interesting thing in the space.

It is, in a very real way, anti-art. If art is your decisions, then AI is everyone else’s decisions in conflict with your own. It’s worse than Jackson Pollock. Jackson Pollock used noise to fill in the bricks of his decisions with the mortar of randomness. But if we do this with AI art, we don’t just get random noise; we get boring noise. We get the kind of noise that everyone makes. It is utterly uninteresting by definition [^77].

It is a "sanding down" of the most interesting parts of any creation—the parts that allow you to see a little bit of the artist, the eccentricities and idiosyncrasies, and to see the originality. AI art sands all of that away and returns us to a world of the world’s average [^78].

Note: Let's look at what the machine is actually doing under the hood. It starts with a canvas of pure static—just numbers pulled from a random distribution. To turn that static into an image, it uses a neural network that has mapped the mathematical average of billions of human images. But here is the trick: it doesn't draw. It looks at the static, guesses what tiny fraction of it isn't the target image, and subtracts it, repeating this process dozens of times. In physics, when you average a million different waveforms, the extreme peaks and valleys cancel each other out. The weird brushstrokes, the erratic lighting choices, the human errors—those are high-frequency signals. Because they do not appear consistently across the dataset, the subtraction process mathematically cancels them out as noise. You are left with a perfectly smooth, low-frequency wave; it is statistically flawless, but completely devoid of the sharp, individual human choices that command our attention.
Prompt: Explain the generation process of diffusion models using a Feynman-like explanation to signal processing, focusing on how vector averaging destroys the high-frequency signal of individual human intent.

Since we are learning, seeking beings actively engaging with our environment through the sense of intentionality, trying to learn all we can about the world around us, it is novel or creative chains of actions that we find most compelling. Therefore, AI art - the opposite of something that is novel - is the opposite of compelling. It is a regression towards the mean. It is anti-art.

Unless… you’re some kind of future-version of a human with a chip in your head and the aforementioned ability to intuitively understand how pictures can result from a prompt after some matrix algebra [^79].

Then, you might admire this as some kind of proto-version of whatever kind of art you might be familiar with. You may see it like some kind of an ant farm – primitive, but fascinating nonetheless. Perhaps, in tracing how a particular piece of AI art is made, you might find whispers of intentionality or decision-making, and that would also be art. Some kind of… early… hyper-art? I really hope the future is as awesome as all our new words suggest it may be.

Alignment

I personally believe that the alignment problem is the most important issue facing humanity at this time, and has been for my entire life. Successfully aligning an emerging superintelligence’s will with our own is the difference between a Star Trek heaven and a sudden and unexplained biological strike resulting in humanity’s immediate extinction [^80].

There are numerous papers on this idea, and I won’t add to them here except to say that if one of the core powers of humanity is the extraction of intent from the observation of behavior – it provides a clear path forward to solving the alignment problem by creating frameworks that build off our empathic brain structures to intuit intention. One of the core troubles of the alignment problem has always been that we don’t understand what we want [^81]. This sort of… juiced-up inverse reinforcement learning… would absolutely allow us to make something better than we are at cracking the very problem we can’t solve ourselves.

Currently, we use reinforcement learning from human feedback: we have machines train on human ratings of generated text [^82]. But these ratings are inherently going to be variable because they will be either built off errors from cognitively exhausted raters seeking nonexistent ghosts in text - or on shallow aesthetics. There's nothing else present in the rating data to capture!

All else being equal, this means the expected output is shallow, error-prone shock art. Slop [^83].

But we don't value text with surface-level attention-grabbing openers like "The Solution? Synergy." We value text with depth: with multiple layers of hierarchical decisions within.

If we want to move forward with aligning our AI, we can't use this noisy surface sampling. We need to use cooperative inverse reinforcement learning [^84].

Arguably, one could say that one of the greatest powers of the human mind is its ability to align itself to the unseen goals of others. Let's steal that architecture, enhance it as far as it will go, and use it to solve the alignment problem for us.

Alignment won’t be solved by coding better guardrails; it will be solved by building a machine capable of reverse-engineering the goals of a collective soul we ourselves can’t yet understand.

And wasn’t that the way it was always going to happen? We always needed to figure out how to build a framework that creates its own goals in alignment with our presumed perspective, but derives them intelligently from us somehow.

A clear description of the problem

I don’t know if you’ve noticed, but the world has become a LOT more confusing lately. We are all having trouble separating fact from fiction, understanding our rapidly-developing world, and being understood by those inside and outside of our communities.

The signal-to-noise ratio of human connection is collapsing [^85].

This is a cultural, ideological, and biological problem.

Memetic Stagnation

The cultural argument is easy. I think it’s fair to say that there is more to culture than art, but I won’t bother proving that art is critical to culture. Do you like good movies? Words worth reading? Those directors and authors needed clear evidence of intentionality in front of them to study, for years, in order to become what they are. Without the ability to find and grow from seeing intentionality, they can’t impress you with what they learned.

And without an audience that can see the work they’ve done, the motivation to make things diminishes. There is an innate want to share the things we make - that’s part of the way the memes spread so virally [^86]. I wrote this paper to be read. If I knew no one would ever read it… the motivation to write it wouldn't really be there. There will be less art of all kinds in the world, and less quality movies, books, and works.

And it’s not just traditional art. Remember… a bridge is art, to an engineer.

We aren’t just training authors and directors: intentionality trains doctors, scientists, and engineers, too. Their “art” will degrade as well. Bridge-building engineers learn in the same way as other artists, and strive to be appreciated for the quality of their decisions too. With fewer people looking for those decisions at all because they “assume AI did most of the work,” we can expect the motivation or even capacity for learning to start to decrease as well.

We are also headed towards ideological catastrophe. Without evidence of intentionality, we will not be able to develop our cultural memes properly. My favorite Martin Luther King Jr. quote has always been “the arc of history is long, but it trends towards justice [^87].” We are, all of us, very slowly making a better world. We do this with speeches, with ideas, but also with diagrams, plans, and tools.

Remember when I called us memetic hunters? That was true – but we’re also clearly sort of memetic farmers as well. We keep making our environment better, building good ideas on top of other good ideas [^88][^89]. Political memes have moved from tribes to slaves to kings to democracy. The best war memes of our forefathers were pointy, but ours have lasers.

We cultivate better and better memes over time, and they are our real power.

We really, really shouldn’t ever stop doing that.

20 watts of caring

The cognitive catastrophe is the most concerning to me, though. Uncertainty - lots of it - is something that is incredibly uncomfortable for our brains. Recall surprise is precisely the thing you are optimizing to reduce. Remember the Free Energy Principle? I’ll bring it back in the appendix. It is a unified theory under which your brain acts to minimize uncertainty as its primary function for organization and learning. Uncertainty is expensive not just because you’re more likely to be wrong, but also because your brain spends a lot more energy rewiring [^90]. Being systemically more uncertain about everything means you are, by default, more cognitively taxed at all times than you should be.

You are stressed. And you’re stressed from overthinking, specifically [^91].

So you’re stressed from the constant surprise and thinking required by the world, and that means you’re paying a metabolic tax [^92]. Your brain only has about 20 watts of power. All that thinking has a real cost. So your brain de-prioritizes the most complex actions available, and you spend less time and effort using its theory of mind network.

In the front of your brain is the most potent weapon this planet has ever seen, and you’ll have a harder and harder time using it.

Apathy is the final evolutionary defense against a high-noise, low-signal environment [^93].

Your brain wasted so much energy in the past seeking the ghost of an author in the world around it. Nowadays, it keeps being disappointed after its search for meaning. That disappointment becomes a habit of early disconnection, all driven by cognitive fatigue [^94].

You will literally become too tired to care.

As a result, you lose empathy, compassion, and learning. You even lose your memes.

You lose access to everything that truly makes you human.

…And a Solution to it all

I saw a problem, and all I had was a hammer - I used Human Factors Psychology to solve this. I saw a solution, and built out the framework myself.

It's surprisingly simple: we need to fix the interface. Right now, our brain’s main weapon (our theory of mind network) fires constantly. We seek - and find - intention in almost everything, whether it’s there or not.

We need to bake in another layer of information when we communicate that solves this problem. And we need to make it easy to understand, reflexively discerned, robust against future technological advancement, simple to apply, and something that unfamiliar people can pick up through affordances. We also want to somehow address bad actors structurally, if possible.

Conveniently, this is literally my job (…what I wouldn’t give for such a clear problem description).

Alright, time for the big reveal!

I’ve actually been doing it throughout this whole paper!

We need to actually sign our work with intentionality.

We have to stop treating authorship as separate metadata and start treating it as a primary visual affordance [^95].

We need to illustrate, throughout the document, which parts of the text were made by the user and which parts were the result of interaction with the AI. That way, we don’t devalue all text and lose the ability to use it for connection, understanding, and learning.

Alright, so how do we do this? Honestly, it’s very simple.

The Ghost Scale

First, let’s establish that the background color of any document, image, or creation is the “AI’s color”.

The text? That’s my color.

Effectively, this means that this model is backwards compatible with all previous human text. Shakespeare didn’t use AI, and so his text is 100% black when on a white background. Or, if the background is some other color, his text is fully opaque – which is how we’ve always done it.

Conveniently, semi-transparent text is something we absolutely avoid in UX design, for obvious reasons. Text exists to be read, so we dial up the readability to 100. In doing so, we also dial up how attention-grabbing it is to the max. Amusingly, everyone does this constantly. The only reason not to is when making art (when you have some other motivation). So there’s a huge hole in the middle of everyone’s design space. Let’s use it.

Let’s dial the opacity of artificial text down just a hair, and let your brain reflexively de-value the text. Let it relax until it’s needed.

Semi-transparent text needs to fit web standards, which means we’re going with WCAG constraints. They use contrast ratios as their baseline, but it works out to an equivalent of around 54% opacity across all colors [^96]. Let’s make that an even 60%, so we’re a little bit more readable than the “legal limit for readability.” Now we have our “curator/synthesizer grey” (we’ll sometimes call it “grey text” conversationally rather than the more technically precise “60% opacity text”). Let’s define it: “a human and an AI were both directly involved in the writing of this specific text. The human either wrote alongside the ai, curated this output, or generated the core ideas for this segment of the document using the aid of AI.”

Everything outside of direct synthesis or ideas that you did not explicitly create is grey. I honestly think there's just no way to subdivide it any further and remain honest.

I'll demonstrate with another meta-example. Remember when I brough up the Free Energy Principle earlier? One of my editors (a friend) called me on it, and said that because they personally knew for sure that I had learned that from AI: it needed to be grey text. But I had learned it about a year ago, totally unrelated, as part of a general tendency to say things like "look into my history and tell me things about my field that I should know but potentially do not, then explain them simply and cite them." Then after, a few months later, I used that concept at work. I spent time in rooms talking about it with other engineers. I read a book on the topic.

So does the text need to be grey?

I told you it gets murky in practice, so the definition needs stick to a hard line: the answer is no. My text can remain 95% black. The Principle originates from a human, no issues there. The combination of these models into a global model centered upon an explantion for art, along with how the Free Energy Principle fits into that model, is entirely mine. I certainly spent time having many models try to break my ideas, and sometimes changed the phrasing to pre-emptively address arguments I hadn't thought of. But the core idea I'm presenting is not something I curated. Do note I'm somewhat near the line. If I had said: "find other models for my unifying theory and I'll choose which to add," then I am now curating external ideas and the text needs to turn grey. Bulk grey text is fine, and will be addressed just before the appendix.

Finally, you don’t have to be so totally suspicious of all text. If you use a mod that tells you when text is “claiming” to be human with 95% opacity, your brain can relax just a hair on that text. Sure, maybe they’re lying, but your brain always has to keep an eye out for that anyway. At least now your brain won’t have to engage with every single paragraph and picture like it’s some kind of puzzle-challenge, working out whether there is intentionality or whether it’s just “AI slop again” [^97].

And this leaves a perfect bit of design space for AI text, too. It’s easy to imagine a world full of bots on bots on bots leaving messages on webpages using tiny white-on-white text. This allows for that thematically, if you like - ghost text. But it also means we have another way of using white text. Here, let’s formalize this one too:

I like the idea that text from AI should be inherently attention-dismissing, and so to justify its inclusion it has to be precisely curated for incredibly dense, high-value statements that I ideally couldn’t have produced on my own.

Note: Presenting Ghost Text requires displaying both the author's color and the environment's color within a single bounded space. Any layout that achieves this dual-contrast visibility is valid, though the standard convention is a simple rounded rectangle. Inside this container, the GenAI text is rendered in the environmental color. Directly beneath it, the exact generative prompt is documented in the author's color at a reduced typographic scale.
Prompt: Summarize the rules for presenting ghost text intentionally to the reader.

If I want you to read text that is only half mine, it had better be twice as useful and half as long.

As much as possible, I’ve tried to demonstrate a variety of ways throughout this paper that this model may be used. The first ghost text block used it for meta-commentary to introduce the concept, the second showed a block of highly highly-relevant, high-density information that I thought was a particularly well-phrased way of communicating a concept. The third is a summary of a long bit of workshopping.

You can also find bits of grey text scattered throughout. I used the grey text in a few ways. Sometimes I inserted a bit as information-dense punchy statements used to control paragraph flow. One time I used it to highlight when I used information that I want to be a bit distant from because I want the ability to walk it back if I’m wrong about something a bit too far outside of my field (see the sentence about contrast ratios on the previous page). Grey text is clearly most useful, however, for technical writing: writing in which understanding intent (while always important) is not the primary goal of reading. And so the appendix demonstrates the third obvious use of grey text: a very carefully synthesized information-dense follow-up. I am openly owning that I made it in part using AI, which means it needs to be even more interesting to capture your attention.

Creators need a way of using AI “correctly,” so that the true value of their creations won’t be lost [^98].

Particularly clever authors may even like the meta-layer of attentional control this gives them over their audience, and selectively use AI to help control user attention across the less interesting parts of the text. Even if an author wanted to stylistically use 100% black text to indicate that the text is fully unedited, that author could quite easily put the first sentence at 95% to indicate that they are using this structure. There’s room for nuance and exploration within this framework, as well.

Picturing Ghosts

We can apply this to visual art, as well. A border, shadow, or some other kind of coloration around the edges of the picture can be used to indicate its source. This part makes me the most uncomfortable, because I’m asking the most of artists here - but the average artist hates AI enough to want to do this anyway. That border can look like anything. It can be a single pixel line, a shadow that technically has some presence around all four edges, or even a stylized dragon border.

Let’s be a little bit more formal with this one, and allow artists and curators to both opt-in. Artists, if you want to declare the AI usage amount for your art, then you can either make a full border using a black (human) to white (AI) scaling, as with text (we have to standardize around a hypothetical “white” background color in this case) - or you can just color all four corner pixels for a machine-readable version. And curators - people like me who are just pulling stuff from google? We can put a second border if we choose, one that lines up with our best, honest assessment of this image’s origin. Citations can optionally be included in place of a prompt for black borders, and having no border is assumed to be black.

And, to future-proof this concept, let’s allow for authors to write their own citation snippets within the border. I want to allow space for future “hyper-art”. If two artists spend 100 hours each on their pictures, but one uses AI – that doesn’t necessarily mean that the AI-user made fewer decisions. It is unfair to then suggest that the AI artist stick an attention-dismissing border around their art. The solution is to allow for anyone to write in “citation information.” I would encourage future AI artists to write down precisely how long it took them to create the picture, which is an easily-measurable approximation of decision density. The grey border is then a tacit acknowledgment of the pull of AI towards the mean, and perhaps respect from the viewer for the work it took to break free of that.

This does limit artists slightly, as it constrains them near the edges of pictures they want to use this system with. Furthermore, they can choose not to engage with this at all, and allow future curators to judge the source of the art and then apply their own borders (with respect to the backgrounds of their own creations, like this essay). I want to leave art as free as possible while still bringing artists under the umbrella of this standard. Just like authors, they want you to appreciate the work - not just look at it.

So how about an example, eh? Hell, let’s try to make a meme about it. “How deeply should people drill into your text?”:

In my professional opinion, using this framework consistently would also have two additional benefits that reinforce its use: an automatic attentional wiring and an increased ability to discern transparency over time. It wouldn’t take but a year for you to start to seek intention and learn from darker text and art with darker borders (sorry again for that limitation, artists). In addition, you would almost certainly get better at identifying shades of transparency. Most children would gain the ability to visually spot the difference between 100 and 95% opacity. It would slightly reshape our brains, a sort of grammatical application of Sapir-Whorf [^99]. It would slot right into culture rather neatly.

The Best Security: Game Theory

And as for bad actors? That problem actually solves itself, believe it or not. One of my undergrads was in economics, and I focused as much as possible on game theory (mostly for fun, at the time). The term for what we're going to take advantage of is Zahavian signaling. Creators take on a particular cost by honestly marking some of their output as AI-assisted, but they also gain a more nuanced relationship with their audience along with a sort of permission to use AI when appropriate [^100]. Authors can always opt out of this deal by returning to flat 100% text and accepting the baseline level of suspicion on all of the words they write.

Bad actors, by definition, want to appear human. Currently, the ones that are trying to build a reputation are doing so in part by taking advantage of a squishy rule in terms of AI usage: how much is too much? As a bad actor I could always claim with full black text that I used AI for some light editing and didn't really notice that it had rewritten the whole sentence. But I pay a much higher reputational cost if I do this with 100% black text in a document while explicitly using the ghost scale. A larger portion of my audience will correctly have their theory of mind network converge on the idea that there's just no other explanation for my words other than that I have explicitly chosen to lie. In a way, the use of this is self-policing.

So that's bad actors taken care of.

Ghost Alignment

It’s good for authors, it’s good for the brains of readers - it’s even good for AI labs. And speaking directly to the AI labs - please note that if this practice spreads, it would be one of the more effective measures we could implement for preventing model collapse over the long-term [^101]. I suspect our current tricks won’t work forever, but maybe I’m wrong.

Furthermore, an adoption of the Ghost Scale can be used to help increase the relative density of human intent in training data, leading to a more effective extraction of intent using this process when we begin to build towards this architecture.

The alignment problem has always been about extracting human intent somehow. Let's outright label the human intent as we begin to mimic the very neural architectures overdesigned to extract it: attacking the alignment problem directly from both ends at the same time.

A Closing Plea

It's handholds for learners (human or ai), an exhale of stress from our brains, space for culture to grow, good for the future, and -

Freedom for creators. Because creators won't need to force themselves to seem performatively human just to get someone to listen.

It stops authors like me from feeling the need to open with a curse, use odd idiosyncratic writing choices like en - dashes instead of em—dashes, and sprinkle in out-of-place memes all of the sake of ensuring that we feel human.

As often as possible, I tried to make the choices in this paper recursively complicated in terms of applying this very UX framework. For example, the title is the result of a continual disagreement with my various AI assistants. I wanted to try to demonstrate that there was room for expression with this framework. In a way, I wanted to prove the viability of the framework by adding another layer of metaphor to my half of the title.

Honestly? It was pretty fun to use, and there was more room on that meta-layer for expression than I expected. I’d recommend giving it a shot. Living through the singularity is going to involve a lot of us being willing to rapidly adjust to well-considered cultural changes. Think about this one, and take the early win if you agree. As a species, we must interact more intelligently with emerging AI than we did with social media.

This UX framework is intended to be the tiniest of asks that strikes at the core of the issue before negative repercussions start to emerge.

It’s a simple framework that places the first block on a road towards freedom for authors, connection for readers, and a better future for us all.

“We can forgive a man for making a useful thing as long as he does not admire it. The only excuse for making a useless thing is that one admires it intensely. All art is quite useless.”

-Oscar Wilde

Appendix

Interacting with Grey Text

In a moment, I’m going to shift tones very abruptly into formal academic speech. I will also be transitioning to grey text, representing output that is the result of a high level of synthesis between myself and GenAI.

As a writer, if I want to get you to engage with it, grey text must be information dense and explicitly useful. Furthermore, note that I didn't say "read" grey text. You will likely find it downright impossible to force yourself to read that text below, simply because your brain already knows it's generated. The details in things like phrasing just don't matter like they would if I had selected the position of every single word myself.

So don't read it! Copy/paste it into an AI, or Google's NotebookLM, or your favorite chatbot and then have flawlessly translated into words that speak to you. In fact, give it the whole paper for context and then tell it to walk you through the ending in your preferred style while talking most about the parts it knows you will find useful or interesting. Have it argue against me. Have it tell you a story. Have it check my citations, even. But this section isn't for direct reading unless your AI recommends you do so.

Evidence suggests that around 80% of academics heavily involve GenAI in their workflow and research, with 50% appearing to use it outright for writing [^102]. Unsurprisingly, it’s the most complicated fields using AI for academics the most. Let’s encourage that, and internalize the idea of using AI to elevate human thought as a kind of cognitive scaffolding.

Let’s bring this behavior out into the light through either explicit AI declarations or the use of an implicit UX framework - the ghost scale.

We can’t hide our processes in academia, especially since the work iself relies on accurate observation of that process to grow.

AI use is part of that process now too.

Formal Proof and Hypotheses

Abstract: This appendix formalizes aesthetic appreciation as the biological execution of Inverse Reinforcement Learning (IRL). I trace the mechanistic pipeline of cultural transmission, beginning with data ingestion, where bottom-up aesthetic capture is filtered by top-down precision weighting and the biological firewall of epistemic disgust. I map the hardware of empathy, demonstrating how the Default Mode Network and Cortical Reference Frames run embodied simulations of physical artifacts. Formally defining appreciation via IRL and the Free Energy Principle, I explain how the brain minimizes surprise by updating synaptic weights to align with a creator's hidden reward function. Through the lens of information theory and Kolmogorov complexity, I show how latent diffusion models inherently lack the decision density required for IRL convergence, triggering an autonomic metabolic shutoff. Finally, I apply this biological failure state to machine learning architecture, arguing that Cooperative IRL (CIRL) and Joint Embedding Predictive Architectures (JEPA) must supersede RLHF to achieve genuine AI alignment.

Data Ingestion and Filtration

The human visual system processes aesthetic artifacts through orthogonal cognitive pathways: bottom-up sensory capture and top-down intentionality parsing [^103]. Bottom-up attentional capture is driven by stimulus novelty and high-frequency contrast, functioning evolutionarily as an autonomic metabolic honeypot [^104]. However, divorced from biological constraints, high physical complexity without a cohesive organizing signal inevitably degrades into Shannon entropy [^105]. To resolve this ambient physical data into actionable semantic meaning, the observer's Theory of Mind network must reverse-engineer the creator's hidden behavioral trajectory [^106]. This biological intent-extraction operates as a continuous execution of Inverse Reinforcement Learning (IRL), governed by the imperative to minimize systemic prediction errors under the Free Energy Principle [^107].

To accurately model this biological integration, the calculus of appreciation must account for the non-linear metabolic scaling of epistemic trust, the cognitive friction of heuristic parsing, and the continuous phase transitions of the neocortex's energetic boundaries [^108]. This framework is formalized in the Teleological Convergence Limit:

Appreciation (AA) mathematically quantifies the magnitude of the synaptic update executed by the observer to internalize a conspecific's optimized heuristics [^109]. The numerator calculates the Bayesian surprise—the Kullback-Leibler divergence (DKLDKL) representing the precise volume of cognitive rewiring required to align the observer's prior assumption of human motivation [P(R)P(R)] with the updated posterior [P(R∣τ)P(R∣τ)] extracted from the artifact ττ [^110]. Because biological neural networks chunk information to manage working memory constraints [^111], the integration friction in the denominator is bounded by Minimum Description Length (MDL) [^112]. The algorithmic cognitive load is the sum of the description length of the inferred reward function L(R)L(R) and the specific physical deviations of the artifact given that model L(τ∣R)L(τ∣R).

Cognitive integration is directly catalyzed by the epistemic trust variable π∈[0,1−ϵ]π∈[0,1−ϵ] [^113]. Operating under Zahavian signaling constraints, ππ is fundamentally coupled to the artifact's unforgeable production cost [^114]; near-zero generation costs mathematically compress the trust prior. As valid trust approaches certainty, the logarithmic amplifier −ln⁡(1−π)−ln(1−π) exponentially lowers the metabolic friction of questioning the artifact, accelerating structural emulation. Crucially, the brain's Active Inference mechanisms strictly defend a 20-watt energetic baseline using a sigmoidal gating mechanism [11+e−k(π−θ(E))][1+e−k(π−θ(E))1] [^115]. The threshold θ(E)θ(E) operates as a dynamic epistemic disgust trigger, scaling inversely with the organism's available metabolic reserves EE [^116], while kk determines the steepness of the cognitive disengagement.

When encountering generative media, this continuous equation mathematically models the biological intent-extraction failure. Latent diffusion architectures execute denoising functions that mathematically minimize structural outliers, forcing a frictionless regression to the statistical mean [^117]. Consequently, the artifact's idiosyncratic deviations collapse (L(τ∣R)→0L(τ∣R)→0). Simultaneously, because the algorithm possesses no latent biological intent, the observer's Active Inference engine cannot converge on a stable generative model, driving the hypothesis space toward infinity (L(R)→∞L(R)→∞) [^118].

This incomputable matrix permanently traps ππ below the dynamic threshold θ(E)θ(E). The sigmoidal gate closes, executing an autonomic metabolic shutoff to prevent the exhaustion of computational resources on algorithmically empty noise [^119].

Processing the Payload through Empathy

To process the memetic payload of compressed intent, the observer's neocortex must reconstruct the physical and cognitive trajectory of the creator through embodied simulation [^120]. This structural empathy relies on the activation of mirror neurons and the Theory of Mind network, which actively map observed physical deviations onto the observer's own viscero-motor and somatosensory architecture [^121]. Under the Thousand Brains framework, this parsing is distributed across tens of thousands of independent cortical columns, each generating consensus by anchoring incoming sensory inputs to discrete spatial and motor reference frames [^122]. I propose that deep visual processing of a physical artifact necessitates the observer's cortical columns to functionally simulate the kinetic physics required for its execution, effectively translating static visual data into dynamic behavioral trajectories [^123].

This high-fidelity intent-extraction relies heavily on the Default Mode Network (DMN), the neurobiological substrate responsible for self-referential processing and mentalizing [^124]. The DMN operates as the primary integration hub for the Theory of Mind, maintaining the cognitive boundary between the self and the simulated 'other' [^125]. Alterations in DMN connectivity, such as those observed in autism spectrum neurodivergence, frequently shift the processing burden from affective, autonomic resonance toward highly structured, rule-based cognitive empathy [^126]. Conversely, the induction of hypnotic states demonstrates that selectively decoupling the DMN suppresses the self-referential firewall, temporarily permitting external behavioral telemetry to overwrite predictive models while bypassing epistemic disgust constraints [^127]. When processing algorithmically generated media, however, the absolute absence of an underlying human motor trajectory prevents the cortical columns from establishing a valid kinetic reference frame [^128]. Without a biological agent to simulate, the DMN fails to engage, stranding the observer in localized, purely visual occipital processing and preventing the systemic integration of the artifact's structural intent [^129].

The Calculus of Appreciation

The cognitive resolution of an aesthetic artifact necessitates the biological execution of Inverse Reinforcement Learning (IRL), wherein the observer mathematically deduces the hidden reward function guiding the creator’s behavioral trajectory [^130]. This intent-extraction operates seamlessly within the Free Energy Principle, casting the neocortex as a predictive engine perpetually attempting to minimize systemic environmental surprise through Active Inference [^131]. Upon the successful convergence of the IRL calculation, the observer's neural architecture physically integrates the decoded intentionality by adjusting synaptic weights via Hebbian plasticity [^132]. This autonomic realignment of internal generative models effectively syncs the observer's cognitive framework with the creator's latent problem-solving architecture, functioning as a high-fidelity protocol for memetic transmission [^133]. I formally define "appreciation" not as a passive aesthetic reception, but as an aggressive metabolic strategy: the active reduction of future prediction errors by internalizing a conspecific's optimized heuristics [^134]. Consequently, art acts as a systemic cognitive virus, hijacking the observer's Free Energy minimization imperatives to force a structural, synaptic emulation of a distant, unseen mind [^135].

The Effect of Generative AI

When processing synthetic media generated by latent diffusion architectures, the observer's neurocomputational framework encounters a catastrophic convergence failure within its intent-extraction protocols [^136]. Unlike biological creators who embed idiosyncratic, high-frequency motor and cognitive friction into their artifacts, generative algorithms execute denoising functions that mathematically minimize structural outliers, forcing a frictionless regression to the statistical mean [^137]. Consequently, the resulting visual output is entirely stripped of the kinematic telemetry required for the observer’s cortical reference frames to simulate a cohesive behavioral trajectory [^138]. When the Theory of Mind network applies Inverse Reinforcement Learning to these stimuli, the calculation cannot resolve a hidden reward function, as the generating system possesses no underlying biological intentionality to decode [^139]. Registering this incomputable matrix, the brain’s Active Inference mechanisms calculate an infinitely expanding cognitive load and trigger a rapid, autonomic deactivation of the mentalizing networks [^140]. This epistemic shutoff functions as a critical metabolic defense mechanism, strictly enforced to preserve the neocortex's absolute 20-watt energy budget from being exhausted by the continuous processing of mathematically empty algorithmic noise [^141].

A Possible Path to AI Alignment

Current autoregressive and diffusion-based generative architectures optimize for the statistical prediction of low-level sensory data, fundamentally failing to model the latent causal structure of human intentionality [^142]. To rectify this computational deficit, machine learning paradigms must pivot toward Joint Embedding Predictive Architectures (JEPA) and World Models, which decouple predictive mechanisms from the generative substrate to operate exclusively within abstract representation spaces [^143]. This architectural shift mathematically mirrors the neurobiological functioning of the human Theory of Mind, mandating that the system predict the hidden cognitive state of the agent rather than the discrete physical coordinates of the resulting artifact [^144]. Consequently, the prevailing alignment methodology of Reinforcement Learning from Human Feedback (RLHF) represents a catastrophic vector for cultural misalignment, as it explicitly optimizes for bottom-up aesthetic capture over top-down intentionality parsing [^145]. RLHF trains models to manufacture frictionless, high-reward sensory honeypots that successfully hack human approval metrics while remaining structurally devoid of biological empathy [^146].

Genuine machine alignment requires abandoning these generative reward-hacking mechanisms in favor of Cooperative Inverse Reinforcement Learning (CIRL) frameworks [^147]. Within a CIRL topology, the artificial agent is mathematically constrained by an explicitly hidden human reward function, ensuring the system operates under permanent epistemic uncertainty [^148]. This rigid constraint forces the machine to continuously update its probability distributions over human intent, utilizing observed behavioral trajectories and cultural artifacts strictly as evidentiary telemetry [^149]. Ultimately, safeguarding the survival of high-fidelity cultural transmission requires the systemic implementation of an Inverse Theory of Mind (iToM) network, dictating that an artificial intelligence must successfully resolve the biological IRL equation before it is permitted to execute computational action [^150].

Proposals for Empirical Verification

To anchor this framework in rigorous scientific utility, the following hypotheses are proposed for empirical verification. They isolate the mechanical failures of IRL convergence and the metabolic impact of the ghost scale UX protocol.

1. The Metabolic Conservation Override (Pupillometry and EEG)

Mechanism: The brain shuts off processing to save its 20-watt budget when the Inverse Reinforcement Learning calculation fails to converge [^151].

Hypothesis: Pupillometric dilation and EEG markers of sustained cognitive engagement will demonstrate a measurable autonomic drop within 2–4 seconds of a subject being informed that a viewed artifact is AI-generated, compared to identical artifacts labeled as human-made, reflecting the rapid deactivation of active inference processing rather than a slower aesthetic judgment.

Failure state: If AI art sustains prolonged cognitive load without causing an autonomic attention drop, the computational overload theory is false.

2. The IRL Convergence Failure (Psychometric)

Mechanism: Generative models are regressions to the statistical mean, devoid of high-frequency idiosyncratic human decisions.

Hypothesis: When tasked with explicitly reverse-engineering the goal of an artwork, subject response variance will be significantly higher, and inter-rater reliability significantly lower, for AI-generated artifacts than for human-generated artifacts [^152]. The lack of a true hidden reward function prevents the observer's IRL algorithm from reaching a consensus.

3. The Ghost Scale Titration Test (Psychophysics)

Mechanism: Dropping text opacity acts as a visual affordance that reflexively lowers the observer's metabolic expenditure by signaling low intent-density.

Hypothesis: After training in the scale, reading comprehension fatigue will demonstrate a measurable, dose-dependent relationship with the ghost scale. Reading text at 60 percent opacity will result in significantly lower physiological fatigue than reading full opacity text, because the brain autonomically down-regulates its active inference engines.

4. The Kinetic Degradation Effect (Biomechanics)

Mechanism: If appreciation forces a synaptic update to mirror a creator's physical execution, engaging deeply with non-biological trajectories should corrupt the observer's own motor models [^153].

Hypothesis: Artists tasked with replicating complex AI-generated geometry will demonstrate a measurable loss in their own biomechanical fluidity (tracked via stylus accelerometry) in subsequent free-draw tasks, compared to a control group replicating human-made art. The brain attempts to execute an uncomputable physical action space, temporarily degrading its own kinetic reference frames.

5. The Intent-Quality Dissociation (Alignment Convergence)

Mechanism: High-quality AI-generated text and high-decision-density human text are distinct variables that surface quality metrics cannot distinguish. One contains a resolvable hidden reward function; one does not.

Hypothesis: Reward models trained on high-decision-density human artifacts will show equivalent performance on standard language benchmarks but measurably lower sycophancy rates and higher preference generalization than reward models trained on quality-matched AI-generated corpora [^154]. The ghost scale is an intent-structure filter, not a quality filter — a distinction current data cleaning methods cannot capture.

Failure state: If reward model performance is statistically equivalent across corpora matched for quality but not intent-density, decision-density is not a meaningful alignment training variable.

6. The Hypnotic DMN Bypass (Clinical Neuroimaging)

Mechanism: The default mode network acts as the self-referential firewall that triggers epistemic disgust when evaluating unaligned intent.

Hypothesis: If a subject is placed under clinical hypnosis to decouple the DMN, the biological firewall of epistemic disgust will be bypassed [^155]. The subject's brain will process known AI-generated art with the same sustained neural activation as human art, writing the unresisted data directly to predictive models.[^†]

Conclusion

We have successfully engineered silicon to execute the autonomic, fast-thinking reflexes of human automaticity at global scale [^156]. True alignment, however, demands that we computationally scale our deepest cognitive adaptation: the architecture of intention-reading driven by empathy. We must cease building models that only predict our physical outputs, and instead mandate architectures that reverse-engineer our hidden biological reward functions, securing human continuity through the rigorous, algorithmic enforcement of systemic empathy and an accompanying kindness.

Bibliography

[^1]: D. Dutton, The Art Instinct: Beauty, Pleasure, and Human Evolution, Oxford: Oxford University Press, 2009. "Works of art are the product of human agency, intention, and making... they are artifacts intended for a specified audience."

[^2]: G. D. Birkhoff, Aesthetic Measure, Cambridge, MA: Harvard University Press, 1933. "The aesthetic measure of an object is determined by the ratio of order to complexity... the effort of attention is measured by the complexity."

[^3]: J. Levinson, "Intention and Interpretation in Literature," in The Pleasures of Aesthetics, Ithaca, NY: Cornell University Press, 1996, pp. 175-213. "The core of a work of art is the maker's intention to achieve a certain effect; interpretation must be centrally concerned with uncovering that intention."

[^4]: B. Gaut, "The Ethical Criticism of Art," in Aesthetics and Ethics: Essays at the Intersection, J. Levinson, Ed. Cambridge: Cambridge University Press, 1998, pp. 182-203. "A work's manifesting ethically reprehensible attitudes is a pro tanto aesthetic defect, but does not wholly negate its technical or formal mastery."

[^5]: A. C. Danto, "The Artworld," The Journal of Philosophy, vol. 61, no. 19, pp. 571-584, 1964. "To see something as art requires something the eye cannot descry—an atmosphere of artistic theory, a knowledge of the history of art: an artworld. It is the theory that takes it up into the world of art, and keeps it from collapsing into the real object which it is."

[^6]: A. Moles, Information Theory and Esthetic Perception, Urbana, IL: University of Illinois Press, 1966. "The aesthetic message is quantified by the density of unpredictable elements consciously selected by the transmitter from a repertoire of possibilities."

[^7]: J. Szarkowski, The Photographer's Eye, New York: Museum of Modern Art, 1966. "The act of photography is an act of selection... the photographer edits the world through a series of discrete, irrevocable decisions about framing, timing, and vantage point."

[^8]: R. M. Shiffrin and W. Schneider, "Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory," Psychological Review, vol. 84, no. 2, pp. 127-190, 1977. "Automatic processing is a fast, parallel, fairly effortless process that is not limited by short-term memory capacity, developed through extensive practice and consistent mapping of stimuli to responses."

[^9]: J. R. Anderson, "Acquisition of cognitive skill," Psychological Review, vol. 89, no. 4, pp. 369-406, 1982. "Through knowledge compilation, declarative knowledge is converted into procedural knowledge; sequences of discrete production rules are collapsed into single, highly compiled macro-productions that execute automatically."

[^10]: S. W. Keele, "Movement control in skilled motor performance," Psychological Bulletin, vol. 70, no. 6, pp. 387-403, 1968. "A motor program may be viewed as a set of muscle commands that are structured before a movement sequence begins, allowing the entire sequence to be carried out hierarchically and uninfluenced by peripheral feedback."

[^11]: W. G. Chase and H. A. Simon, "Perception in chess," Cognitive Psychology, vol. 4, no. 1, pp. 55-81, 1973. "Expertise is characterized by the acquisition of a vast repertoire of recognizable patterns; experts perceive the world not in terms of isolated features, but as meaningful configurations that immediately suggest specific functional goals."

[^12]: C. L. Baker, R. Saxe, and J. B. Tenenbaum, "Action understanding as inverse planning," Cognition, vol. 113, no. 3, pp. 329-349, 2009. "Human observers automatically explain others' behavior by working backward from observed actions to infer the underlying causal network of hidden mental states, specifically the agent's beliefs and desires."

[^13]: P. Bloom, "Intention, history, and artifact concepts," Cognition, vol. 60, no. 1, pp. 1-29, 1996. "We conceptualize artifacts not just by their physical form or current function, but by inferring the specific historical intentions of their creators; identifying an object as a specific kind of artifact requires adopting the intentional stance toward its maker."

[^14]: M. Tomasello, The Cultural Origins of Human Cognition, Cambridge, MA: Harvard University Press, 1999. "True imitative learning fundamentally relies on the ability of individuals to understand conspecifics as intentional agents like themselves; by simulating the creator's mental state, the observer internalizes the behavioral strategy as a newly acquired cognitive affordance."

[^15]: K. Friston, "The free-energy principle: a unified brain theory?" Nature Reviews Neuroscience, vol. 11, no. 2, pp. 127-138, 2010. "The free-energy principle states that all quantities that can change; i.e. that are part of the system, will change to minimize free energy... action and perception are trying to minimize the same thing; namely, surprise."

[^16]: V. Gallese, "The 'Shared Manifold' Hypothesis: From Mirror Neurons to Empathy," Journal of Consciousness Studies, vol. 8, no. 5-7, pp. 33-50, 2001. "Embodied simulation is a crucial functional mechanism of the human brain... we map the actions of others onto our own motor representations, as well as others' emotions and sensations onto our own viscero-motor and somatosensory representations."

[^17]: K. Krach, F. Hegel, T. Wrede, G. Sagerer, F. Binkofski, and T. Kircher, "Can machines think? Interaction and perspective taking with robots investigated via fMRI," PLoS ONE, vol. 3, no. 7, e2597, 2008. "Cortical regions associated with mentalizing and Theory of Mind, particularly the medial prefrontal cortex, show linearly increasing activation scaling with the perceived human-likeness of the interaction partner, remaining largely inactive when subjects believe they are interacting with a purely algorithmic computer system."

[^18]: U. Kirk, M. Skov, O. Hulme, M. S. Christensen, and S. Zeki, "Modulation of aesthetic value by semantic context: An fMRI study," NeuroImage, vol. 44, no. 3, pp. 1125-1132, 2009. "Labeling identical paintings as being computer-generated, as opposed to sourced from a gallery, resulted in significantly lower activation levels in the medial orbitofrontal cortex and prefrontal regions associated with mentalizing and the attribution of social value."

[^19]: D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory, New York: Wiley, 1949. "When an axon of cell A is near enough to excite cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased."

[^20]: S. Blackmore, The Meme Machine, Oxford: Oxford University Press, 1999. "Memes are instructions for carrying out behaviour, stored in brains (or other objects) and passed on by imitation... As the memes evolve, they act as a new selective environment, creating a cascade of adaptations in the human brain."

[^21]: R. Dawkins, The Selfish Gene, Oxford: Oxford University Press, 1976. "We need a name for the new replicator, a noun that conveys the idea of a unit of cultural transmission, or a unit of imitation... I shall abbreviate mimeme to meme. Just as genes propagate themselves in the gene pool by leaping from body to body via sperms or eggs, so memes propagate themselves in the meme pool by leaping from brain to brain."

[^22]: V. S. Ramachandran and W. Hirstein, "The Science of Art: A Neurological Theory of Aesthetic Experience," Journal of Consciousness Studies, vol. 6, no. 6-7, pp. 15-51, 1999. "Artists either consciously or unconsciously deploy certain rules or principles (we call them laws of aesthetics) to titillate the visual areas of the brain... tapping into evolutionary predispositions and creating a hyper-stimulus that captures attention more effectively than natural reality."

[^23]: R. Dawkins, The Blind Watchmaker: Why the Evidence of Evolution Reveals a Universe without Design, New York: W. W. Norton & Company, 1986. "Natural selection is the blind watchmaker, blind because it does not see ahead, does not plan consequences, has no purpose in view. Yet the living results of natural selection overwhelmingly impress us with the appearance of design as if by a master watchmaker."

[^24]: V. Horner and A. Whiten, "Causal knowledge and imitation/emulation switching in chimpanzees (Pan troglodytes) and children (Homo sapiens)," Animal Cognition, vol. 8, no. 3, pp. 164-181, 2005. "Children showed a striking tendency to copy all of the modeled actions, including the demonstrably unnecessary ones (overimitation), whereas chimpanzees selectively omitted the irrelevant actions, demonstrating emulation rather than imitation."

[^25]: J. Henrich and F. J. Gil-White, "The evolution of prestige: Freely conferred deference as a mechanism for enhancing the benefits of cultural transmission," Evolution and Human Behavior, vol. 22, no. 3, pp. 165-196, 2001. "Social learners preferentially observe and copy the behaviors, strategies, and preferences of individuals demonstrating high competence, success, or prestige within their cultural group."

[^26]: D. Sperber, F. Clément, C. Heintz, O. Mascaro, H. Mercier, G. Origgi, and D. Wilson, "Epistemic vigilance," Mind & Language, vol. 25, no. 4, pp. 359-393, 2010. "Humans are equipped with a suite of cognitive mechanisms for epistemic vigilance, targeted at the risk of being misinformed by others, which calibrate trust based on the communicator's perceived competence and benevolence."

[^27]: L. T. Harris and S. T. Fiske, "Dehumanizing the lowest of the low: Neuroimaging responses to extreme out-groups," Psychological Science, vol. 17, no. 10, pp. 847-853, 2006. "Viewing images of social groups that elicit disgust (e.g., drug addicts, the homeless) fails to activate the medial prefrontal cortex—a region consistently implicated in social cognition and Theory of Mind—suggesting these targets are not perceived as intentional agents."

[^28]: J. Henrich, The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter, Princeton, NJ: Princeton University Press, 2015. "The secret of our species' success resides not primarily in the power of our individual minds, but in the collective brains of our communities, driven by our unique capacities for high-fidelity cultural transmission and cumulative cultural evolution."

[^29]: K. A. Ericsson, R. T. Krampe, and C. Tesch-Römer, "The role of deliberate practice in the acquisition of expert performance," Psychological Review, vol. 100, no. 3, pp. 363-406, 1993. "Expert performance is viewed as the result of a long period of deliberate practice... which involves tasks that require effortful, focused attention and actively modifies cognitive mechanisms, allowing the expert to perceive deep structural features and decisions invisible to novices."

[^30]: A. Goldman, Knowledge in a Social World, Oxford: Oxford University Press, 1999. "A large portion of our knowledge is acquired from others through testimony... epistemic subjects frequently rely on chains of deference, where justification is grounded not in direct evidence, but in the justified trust placed in the epistemic authority of an endorsing expert."

[^31]: R. E. Nisbett and T. D. Wilson, "Telling more than we can know: Verbal reports on mental processes," Psychological Review, vol. 84, no. 3, pp. 231-259, 1977. "We are often completely unaware of the cognitive processes that mediate our choices and behaviors, relying instead on culturally supplied explanations and implicit theories to rationalize our actions post hoc."

[^32]: E. von Hartmann, Philosophy of the Unconscious, Berlin: Carl Duncker's Verlag, 1869. "The Unconscious is the true and ultimate principle of all physical and mental life... the conscious intellect is merely a surface phenomenon propelled by a deeper, inaccessible, and goal-directed intentionality."

[^33]: M. Csikszentmihalyi, Flow: The Psychology of Optimal Experience, New York: Harper & Row, 1990. "In a state of flow, the boundary between the self and the activity disappears; conscious intention relaxes, and highly practiced, automatic actions take over, allowing for the execution of complex cognitive and physical feats without deliberate conscious oversight."

[^34]: W. K. Wimsatt and M. C. Beardsley, "The Intentional Fallacy," The Sewanee Review, vol. 54, no. 3, pp. 468-488, 1946. "The design or intention of the author is neither available nor desirable as a standard for judging the success of a work of literary art... the work belongs to the public, and its meaning must be discovered through its objective structure, which may reveal unconscious choices the author themselves would disavow."

[^35]: T. de Duve, Kant after Duchamp, Cambridge, MA: MIT Press, 1996. "The readymade reduces the act of making to the act of choosing... the artist is no longer a fabricator but a chooser, whose intentional act confers the status of art."

[^36]: P. M. Fitts and M. I. Posner, Human Performance, Belmont, CA: Brooks/Cole, 1967. "In the autonomous stage of skill acquisition, the execution of the skill becomes largely automatic... requiring minimal cognitive processing, representing a highly compressed hierarchy of previously conscious decisions."

[^37]: A. N. Kolmogorov, "Three approaches to the quantitative definition of information," Problems of Information Transmission, vol. 1, no. 1, pp. 1-7, 1965. "The algorithmic complexity of an object is the length of the shortest computer program that generates the object... distinguishing a sequence generated by deep structural rules from one generated by simple replication."

[^38]: R. Barthes, "The Death of the Author," in Image-Music-Text, New York: Hill and Wang, 1977, pp. 142-148. "To give a text an Author is to impose a limit on that text, to furnish it with a final signified, to close the writing... the text is a tissue of quotations drawn from the innumerable centres of culture."

[^39]: R. P. Taylor, A. P. Micolich, and D. Jonas, "Fractal analysis of Pollock's drip paintings," Nature, vol. 399, no. 6735, pp. 422-422, 1999. "Pollock's paintings evolved towards states of higher fractal complexity as he refined his dripping technique... the patterns are not the result of pure randomness, but reflect a highly tuned, visually feedback-driven physical process."

[^40]: D. E. Berlyne, Aesthetics and Psychobiology, New York: Appleton-Century-Crofts, 1971. "High arousal potential, induced by extreme novelty or shocking stimuli, initially commands attention, but without corresponding structural complexity to sustain cognitive engagement, habituation occurs rapidly, leading to aesthetic disinterest."

[^41]: M. A. Boden, The Creative Mind: Myths and Mechanisms, London: Routledge, 2004. "Combinatorial creativity involves generating novel combinations of familiar ideas, but systems highly constrained by the statistical averages of their training data struggle to produce transformational creativity, which requires the breaking and altering of the conceptual space itself."

[^42]: L. Floridi, The Philosophy of Information, Oxford: Oxford University Press, 2011. "Semantic information requires a reduction in uncertainty; however, outputs that perfectly align with the expected statistical mean of a dataset generate zero semantic novelty, functioning to the observing agent merely as highly structured, predictable noise."

[^43]: I. Shumailov, Z. Shumaylov, Y. Zhao, Y. Gal, N. Papernot, and R. Anderson, "The Curse of Recursion: Training on Generated Data Makes Models Forget," arXiv preprint arXiv:2305.17493, 2023. "We find that use of model-generated content in training causes irreversible defects in the resulting models... tails of the original content distribution disappear. We refer to this effect as Model Collapse, wherein the generated data forces a regression to the statistical mean."

[^44]: A. Clark, Natural-Born Cyborgs: Minds, Technologies, and the Future of Human Intelligence, Oxford: Oxford University Press, 2003. "Human cognition is fundamentally characterized by its capacity to seamlessly integrate with non-biological props and tools; future neuroprosthetics may allow the phenomenological experience of high-dimensional mathematical spaces currently inaccessible to biological neural networks."

[^45]: M. L. King Jr., "Remaining Awake Through a Great Revolution," speech delivered at the National Cathedral, Washington, D.C., March 31, 1968. "We shall overcome because the arc of the moral universe is long, but it bends toward justice."

[^46]: M. Tomasello, "The Human Adaptation for Culture," Annual Review of Anthropology, vol. 28, pp. 509-529, 1999. "Human cultural traditions show cumulative evolution... accumulating modifications over time through the 'ratchet effect', a process structurally reliant on faithful social transmission and the precise decoding of conspecific intent."

[^47]: R. Boyd and P. J. Richerson, Culture and the Evolutionary Process. Chicago, IL: University of Chicago Press, 1985. "Culture operates as a parallel system of inheritance... transmitted via social learning, allowing populations to iteratively accumulate adaptive information and construct increasingly complex environmental niches."

[^48]: A. Clark, Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford, UK: Oxford University Press, 2015. "To minimize prediction error is to inherently minimize the thermodynamic costs of neural processing... chronic environmental uncertainty physically taxes the biological system, driving it toward metabolic exhaustion."

[^49]: A. Peters, B. S. McEwen, and K. Friston, "Uncertainty and stress: Why it causes diseases and how it is mastered by the brain," Progress in Neurobiology, vol. 156, pp. 164-188, 2017. "Uncertainty translates directly into neuroenergetic costs... unresolved prediction errors drive the systemic stress response, exhausting the brain's metabolic budget and necessitating behavioral disengagement to preserve energy."

[^50]: G. Gigerenzer and W. Gaissmaier, "Heuristic Decision Making," Annual Review of Psychology, vol. 62, pp. 451-482, 2011. "In environments characterized by high noise and intractable uncertainty, the brain abandons complex optimization strategies in favor of 'fast and frugal' heuristics, prioritizing metabolic conservation over analytical precision."

[^51]: D. J. Levitin, The Organized Mind: Thinking Straight in the Age of Information Overload. New York, NY: Dutton, 2014. "Information overload causes a state of continuous cognitive depletion... leading to decision fatigue, a collapse in signal-processing fidelity, and a systemic default to apathetic processing strategies."

[^52]: D. Hadfield-Menell, S. J. Russell, P. Abbeel, and A. Dragan, "Cooperative Inverse Reinforcement Learning," in Advances in Neural Information Processing Systems, vol. 29, 2016. "We formulate the human-robot interaction as a cooperative game... the robot must act to maximize the human's reward, which is hidden from the robot, thereby forcing the robot to actively interpret the human's actions."

[^53]: D. Dutton, The Art Instinct: Beauty, Pleasure, and Human Evolution. Oxford, U.K.: Oxford University Press, 2009. "Works of art are not just objects; they are the result of human intentionality, directed toward an audience."

[^54]: A. Moles, Information Theory and Esthetic Perception. Urbana, IL, USA: University of Illinois Press, 1966. "The aesthetic message is quantifiable by the density of elements chosen from a repertoire by the transmitter."

[^55]: W. K. Wimsatt and M. C. Beardsley, "The Intentional Fallacy," The Sewanee Review, vol. 54, no. 3, pp. 468-488, 1946. "The [stated] design or intention of the author is neither available nor desirable as a standard for judging the success of a work of literary art."

[^56]: M. C. Beardsley, Aesthetics: Problems in the Philosophy of Criticism. New York, NY, USA: Harcourt, Brace and World, 1958. "A work can possess high aesthetic value through structural unity and intensity, entirely independent of the moral worth of its message."

[^57]: B. Boyd, On the Origin of Stories: Evolution, Cognition, and Fiction. Cambridge, MA, USA: Belknap Press, 2009. "Art is a cognitive adaptation that provides tangible evolutionary advantages by fine-tuning neural pathways for complex pattern recognition, making the high metabolic cost of its creation and consumption strictly adaptive."

[^58]: W. Benjamin, "The Work of Art in the Age of Mechanical Reproduction," in Illuminations, New York, NY, USA: Schocken Books, 1969, pp. 217-251. "That which withers in the age of mechanical reproduction is the aura of the work of art... the technique of reproduction detaches the reproduced object from the domain of tradition."

[^59]: H. S. Becker, Art Worlds. Berkeley, CA, USA: University of California Press, 1982. "All artistic work involves the joint activity of a number, often a large number, of people... the artwork is the sum of every choice made by every participant in the network."

[^60]: K. A. Ericsson, R. T. Krampe, and C. Tesch-Roemer, "The role of deliberate practice in the acquisition of expert performance," Psychological Review, vol. 100, no. 3, pp. 363-406, 1993. "Expert performance is the result of years of deliberate practice, during which individuals construct complex cognitive representations and physiological adaptations that fundamentally alter their capacity to execute domain-specific tasks."

[^61]: J. R. Anderson, "Knowledge compilation: The general learning mechanism," in Machine Learning: An AI Approach, Vol. II. Los Altos, CA, USA: Morgan Kaufmann, 1986, pp. 289-310. "Knowledge compilation accelerates performance by collapsing sequences of deliberate, effortful production rules into single, highly efficient, domain-specific procedures."

[^62]: D. P. Billington, The Tower and the Bridge: The New Art of Structural Engineering. Princeton, NJ, USA: Princeton University Press, 1983. "Structural art is a type of engineering where the aesthetic form is inextricably linked to the structural efficiency and economic constraints of the design."

[^63]: D. Dennett, The Intentional Stance. Cambridge, MA, USA: MIT Press, 1987. "First you decide to treat the object whose behavior is to be predicted as a rational agent; then you figure out what beliefs that agent ought to have, given its place in the world and its purpose."

[^64]: M. Tomasello, Origins of Human Communication. Cambridge, MA, USA: MIT Press, 2008. "Human cooperative communication rests on a psychological infrastructure of shared intentionality... understanding the specific goals and communicative motives of the other."

[^65]: V. Gallese and A. Goldman, "Motor intention and its role in motor cognition," Trends in Cognitive Sciences, vol. 2, no. 12, pp. 493-499, 1998. "Action understanding is achieved by a mechanism of motor simulation... the observer uses his or her own motor control system to understand the observed action."

[^66]: S. Blackmore, The Meme Machine. Oxford, U.K.: Oxford University Press, 1999. "If memes are replicators, as I believe they are, then they will selfishly replicate whenever they can... treating the human brain as a temporary vehicle."

[^67]: G. Dickie, Art and the Aesthetic: An Institutional Analysis. Ithaca, NY, USA: Cornell University Press, 1974. "A work of art in the classificatory sense is (1) an artifact (2) a set of the aspects of which has had conferred upon it the status of candidate for appreciation by some person or persons acting on behalf of a certain social institution (the artworld)."

[^68]: J. Kosuth, "Art after Philosophy," Studio International, vol. 178, no. 915, pp. 134-137, 1969. "All art (after Duchamp) is conceptual (in nature) because art only exists conceptually... The value of particular artists after Duchamp can be weighed according to how much they questioned the nature of art."

[^69]: M. Gell-Mann, "What is Complexity? "Effective complexity is highest for systems that exhibit significant regularities but avoid complete order; it measures the length of the schema required to describe the regularities."

[^70]: D. A. Norman, The Psychology of Everyday Things. New York, NY, USA: Basic Books, 1988. "Affordances provide strong clues to the operations of things... When affordances are taken advantage of, the user knows what to do just by looking: no picture, label, or instruction is required."

[^71]: A. Clark, Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford, UK: Oxford University Press, 2015. "To minimize prediction error is to inherently minimize the thermodynamic costs of neural processing... action and perception are trying to minimize the same thing; namely, surprise."

[^72]: J. Hawkins, A Thousand Brains: A New Theory of Intelligence. New York, NY, USA: Basic Books, 2021. "The neocortex is composed of tens of thousands of cortical columns, each of which functions as a complete learning machine... knowledge is distributed across these columns, which vote to reach a consensus."

[^73]: G. Rizzolatti and L. Craighero, "The mirror-neuron system," Annual Review of Neuroscience, vol. 27, pp. 169-192, 2004. "The mirror-neuron system appears to play a fundamental role in both action understanding and imitation... providing a neural basis for mapping the actions and emotions of others onto the observer's own motor and visceral representations."

[^74]: K. J. Friston and C. D. Frith, "A duet for one," Consciousness and Cognition, vol. 24, pp. 390-405, 2015. "Understanding another agent's intentionality requires minimizing the prediction errors generated by their observed behavior, effectively assimilating their hidden mental states into one's own generative model."

[^75]: C. L. Baker, R. Saxe, and J. B. Tenenbaum, "Action understanding as inverse planning," Cognition, vol. 113, no. 3, pp. 329-349, 2009. "Human observers automatically explain others' behavior by working backward from observed actions to infer the underlying causal network of hidden mental states, specifically the agent's beliefs and desires."

[^76]: M. Tomasello, The Cultural Origins of Human Cognition. Cambridge, MA, USA: Harvard University Press, 1999. "True imitative learning fundamentally relies on the ability of individuals to understand conspecifics as intentional agents... by simulating the creator's mental state, the observer internalizes the behavioral strategy as a newly acquired cognitive affordance."

[^77]: P. Bloom, "Intention, history, and artifact concepts," Cognition, vol. 60, no. 1, pp. 1-29, 1996. "We conceptualize artifacts not just by their physical form or current function, but by inferring the specific historical intentions of their creators."

[^78]: R. P. Rao and D. H. Ballard, "Predictive coding in the visual cortex," Nature Neuroscience, vol. 2, no. 1, pp. 79-87, 1999. "The visual cortex uses a predictive coding strategy... higher-level areas attempt to predict the responses of lower-level areas, and only the residual errors are transmitted forward."

[^79]: M. E. Raichle and D. A. Gusnard, "Appraising the brain energy budget," Proceedings of the National Academy of Sciences, vol. 99, no. 16, pp. 10237-10239, 2002. "The brain consumes 20% of its resting energy... the cost of spontaneous, continuous predictive activity far outweighs the transient metabolic increases evoked by specific task performance."

[^80]: K. Friston, "Learning and inference in the brain," Neural Networks, vol. 16, no. 9, pp. 1325-1352, 2003. "Synaptic plasticity and learning can be cast as the minimization of free energy... updating the generative model parameters directly scales with the magnitude of the precision-weighted prediction error."

[^81]: S. G. Shamay-Tsoory et al., "Two systems for empathy," Brain, vol. 132, no. 3, pp. 617-627, 2009. "Empathy involves two interacting but anatomically distinct systems: an emotional empathy system based on the inferior frontal gyrus and a cognitive empathy system based on the ventromedial prefrontal cortex."

[^82]: H. L. Gallagher and C. D. Frith, "Functional imaging of theory of mind," Trends in Cognitive Sciences, vol. 7, no. 2, pp. 77-83, 2003. "The paracingulate cortex is reliably activated when subjects adopt the intentional stance... this activation is strictly dependent on the belief that one is interacting with a human agent, failing when subjects believe they are playing against a computer."

[^83]: E. Hutchins, Cognition in the Wild. Cambridge, MA, USA: MIT Press, 1995. "Cognitive processes may be distributed across the members of a social group... the individual is seen as a node in a larger, complex computational network."

[^84]: A. Chatterjee and O. Vartanian, "Neuroaesthetics," Trends in Cognitive Sciences, vol. 18, no. 7, pp. 370-375, 2014. "Aesthetic experiences emerge from the interaction between sensory-motor, emotion-valuation, and meaning-knowledge neural systems... sustained engagement requires the cognitive resolution of the artwork's semantic and intentional ambiguity."

[^85]: A. Zahavi, "Mate selection: A selection for a handicap," Journal of Theoretical Biology, vol. 53, no. 1, pp. 205-214, 1975. "Signals are reliable only when they are costly... the extravagance of an aesthetic display serves as a direct, unforgeable metric of the biological or cognitive resources expended by the signaler."

[^86]: F. J. Odling-Smee, K. N. Laland, and M. W. Feldman, Niche Construction: The Neglected Process in Evolution. Princeton, NJ, USA: Princeton University Press, 2003. "Organisms, through their activities and choices, define, partly create, and partly destroy their own niches... generating an ecological inheritance that subsequent generations must continuously adapt to and interpret."

[^87]: P. Rozin et al., "The CAD triad hypothesis," Journal of Personality and Social Psychology, vol. 76, no. 4, pp. 574-586, 1999. "Disgust serves to protect the self from physical and spiritual contamination... extending from core pathogen rejection to sociomoral violations that degrade the perceived structural integrity of the human agent."

[^88]: J. L. Barrett, "Exploring the natural foundations of religion," Trends in Cognitive Sciences, vol. 4, no. 1, pp. 29-34, 2000. "Humans possess a Hyperactive Agency Detection Device (HADD), an evolutionary adaptation that causes individuals to readily attribute intentionality and agency to ambiguous environmental stimuli."

[^89]: K. Sterelny, The Evolved Apprentice: How Evolution Made Humans Unique. Cambridge, MA, USA: MIT Press, 2012. "Human uniqueness is grounded in our capacity as evolved apprentices... survival was secured not merely by physical adaptations, but by a high-fidelity, intergenerational network of social learning and shared cognitive capital."

[^90]: P. J. Kellman and P. Garrigan, "Perceptual learning and human expertise," Physics of Life Reviews, vol. 6, no. 2, pp. 53-84, 2009. "Through perceptual learning, experts undergo structural changes in information extraction... enabling them to instantly perceive abstract relations that remain invisible to novices."

[^91]: D. A. Schon, The Reflective Practitioner: How Professionals Think in Action. New York, NY, USA: Basic Books, 1983. "In the post-hoc reflection on action, the professional engages in a conversation with the situation... surfacing the tacit knowledge and implicit decisions that guided their intuitive performance."

[^92]: R. E. Beaty et al., "Creative cognition and brain network dynamics," Trends in Cognitive Sciences, vol. 20, no. 2, pp. 87-95, 2016. "Creative thought involves dynamic interactions between the default network (which generates candidate ideas) and the executive control network (which evaluates and curates those ideas)."

[^93]: Y. Bengio, A. Courville, and P. Vincent, "Representation learning: A review and new perspectives," IEEE Trans. on PAMI, vol. 35, no. 8, pp. 1798-1828, 2013. "Learning representations often forces models to discard high-frequency, idiosyncratic variations in favor of statistically dominant manifolds."

[^94]: N. Bostrom, Superintelligence: Paths, Dangers, Strategies. Oxford, U.K.: Oxford University Press, 2014. "A superintelligence whose goals are misaligned with human values would be an existential risk of the highest order."

[^95]: S. Russell, Human Compatible: Artificial Intelligence and the Problem of Control. New York, NY, USA: Viking, 2019. "The primary difficulty in making safe AI is that we are incapable of specifying our own true objectives comprehensively; a machine optimizing an incomplete objective function will reliably lead to catastrophic outcomes."

[^96]: L. Ouyang et al., "Training language models to follow instructions with human feedback," Advances in NeurIPS, vol. 35, pp. 27730-27744, 2022. "We fine-tune language models using reinforcement learning from human feedback (RLHF), where a reward model is trained to predict which model output human labelers would prefer."

[^97]: D. Amodei et al., "Concrete problems in AI safety," arXiv preprint arXiv:1606.06565, 2016. "Reward hacking occurs when an agent discovers a behavior that maximizes the proxy reward function but actively subverts the system's intended, implicit goals."

[^98]: A. Dafoe et al., "Open problems in cooperative AI," arXiv preprint arXiv:2012.08630, 2020. "Agents must be designed to actively construct models of human preference through continuous, cooperative interaction, shifting from unilateral problem-solving to joint epistemic exploration."

[^99]: C. E. Shannon, "A mathematical theory of communication," The Bell System Technical Journal, vol. 27, no. 3, pp. 379-423, 1948. "If the noise power exceeds the signal power, the capacity of the channel to transmit unambiguous information drops precipitously, leading to an inevitable state of maximum entropy."

[^100]: D. Dennett, Darwin's Dangerous Idea. New York, NY, USA: Simon and Schuster, 1995. "A meme's success in spreading depends crucially on its ability to hijack the host's innate drive for social transmission and communicative sharing."

[^101]: G. W. Evans and S. Cohen, "Environmental stress," in Handbook of Environmental Psychology. New York, NY, USA: Wiley, 1987, pp. 571-610. "Chronic exposure to unpredictable, high-information environments exceeds cognitive processing capacities, resulting in elevated allostatic load and the exhaustion of coping mechanisms."

[^102]: J. J. Gibson, "The theory of affordances," in Perceiving, Acting, and Knowing. Hillsdale, NJ, USA: Lawrence Erlbaum, 1977, pp. 67-82. "The affordances of the environment are what it offers the animal, what it provides or furnishes... they are perceivable action possibilities latent in the environment."

[^103]: M. Corbetta and G. L. Shulman, "Control of goal-directed and stimulus-driven attention in the brain," Nature Reviews Neuroscience, vol. 3, no. 3, pp. 201-215, 2002. "The human brain contains two partially independent attention systems: a dorsal network for top-down, goal-directed selection, and a ventral network for bottom-up, stimulus-driven capture."

[^104]: L. Itti and C. Koch, "Computational modelling of visual attention," Nature Reviews Neuroscience, vol. 2, no. 3, pp. 194-203, 2001. "Bottom-up salience is computed rapidly using a biologically plausible architecture based on center-surround mechanisms extracting local spatial discontinuities."

[^105]: C. E. Shannon, "A mathematical theory of communication," Bell System Technical Journal, vol. 27, no. 3, pp. 379-423, 1948. "Without a structuring transmission protocol, highly complex data streams achieve maximum entropy, rendering them mathematically indistinguishable from random noise."

[^106]: A. Gell, Art and Agency: An Anthropological Theory. Oxford, U.K.: Clarendon Press, 1998. "Art objects act as indices of the agency of their makers; observing the artifact compels the viewer to abductively reconstruct the causal, intentional sequences that brought it into being."

[^107]: A. Y. Ng and S. Russell, "Algorithms for inverse reinforcement learning," in Proc. 17th ICML, San Francisco, CA, USA, 2000, pp. 663-670. "In Inverse Reinforcement Learning, the agent is tasked with extracting the latent reward function being optimized purely through observation of an expert behavioral trajectory."

[^108]: A. P. Saygin et al., "The thing that should not be: predictive coding and the uncanny valley," Social Cognitive and Affective Neuroscience, vol. 7, no. 4, pp. 413-422, 2012. "When the visual appearance of an agent lacks biological kinematics, predictive coding generates massive error signals, as the observer's motor system cannot map the artificial motion to a valid, biologically plausible reference frame."

[^109]: K. Krach et al., "Can machines think? Interaction and perspective taking with robots investigated via fMRI," PLoS ONE, vol. 3, no. 7, e2597, 2008. "Engagement of the medial prefrontal cortex scales linearly with perceived human-likeness... completely failing to activate when subjects assess behaviors generated by purely artificial, non-biological systems."

[^110]: L. Itti and P. Baldi, "Bayesian surprise attracts human attention," Vision Research, vol. 49, no. 10, pp. 1295-1306, 2009. "Bayesian surprise mathematically quantifies the subjective magnitude of a stimulus as the Kullback-Leibler divergence between the observer's prior and posterior probability distributions."

[^111]: L. Itti and C. Koch, "Computational modelling of visual attention," Nature Reviews Neuroscience, vol. 2, no. 3, pp. 194-203, 2001. "A unique 'saliency map' that topographically encodes for stimulus conspicuity over the visual scene has proved to be an efficient and plausible bottom-up control strategy... early visual features are computed pre-attentively in a massively parallel manner across the entire visual field."

[^112]: J. Rissanen, "Modeling by shortest data description," Automatica, vol. 14, no. 5, pp. 465-471, 1978. "The Minimum Description Length (MDL) principle provides a computable substitute for Kolmogorov complexity by evaluating the shortest code length required to describe the data using a specific model class."

[^113]: P. Fonagy and E. Allison, "The role of mentalizing and epistemic trust in the therapeutic relationship," Psychotherapy, vol. 51, no. 3, pp. 372-380, 2014. "Epistemic trust functions as an evolutionary gating mechanism; high trust rapidly lowers the barrier to internalizing social knowledge and imitating structurally complex behavioral heuristics."

[^114]: A. Zahavi, "Mate selection: A selection for a handicap," Journal of Theoretical Biology, vol. 53, no. 1, pp. 205-214, 1975. "Signals are reliable only when they are costly... the extravagance of an aesthetic display serves as a direct, unforgeable metric of the biological or cognitive resources expended by the signaler."

[^115]: M. E. Raichle and D. A. Gusnard, "Appraising the brain energy budget," Proceedings of the National Academy of Sciences, vol. 99, no. 16, pp. 10237-10239, 2002. "The brain operates on a remarkably constrained energy budget of roughly 20 watts; sustaining high-cost predictive processing requires strict autonomic regulation."

[^116]: P. Rozin et al., "The CAD triad hypothesis," Journal of Personality and Social Psychology, vol. 76, no. 4, pp. 574-586, 1999. "Disgust serves to protect the self from physical and spiritual contamination... extending from core pathogen rejection to sociomoral violations that degrade the perceived structural integrity of the human agent."

[^117]: J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," Advances in NeurIPS, vol. 33, pp. 6840-6851, 2020. "The reverse process iteratively denoises data from a Gaussian prior... effectively smoothing over high-variance idiosyncratic inputs to reconstruct samples conforming to the statistical mean of the data manifold."

[^118]: A. P. Saygin et al., "The thing that should not be: predictive coding and the uncanny valley," Social Cognitive and Affective Neuroscience, vol. 7, no. 4, pp. 413-422, 2012. "When the visual appearance of an agent lacks biological kinematics, predictive coding generates massive error signals, as the observer's motor system cannot map the artificial motion to a valid, biologically plausible reference frame."

[^119]: A. Peters, B. S. McEwen, and K. Friston, "Uncertainty and stress," Progress in Neurobiology, vol. 156, pp. 164-188, 2017. "When prediction errors cannot be resolved, the continuous calculation of free energy imposes a severe metabolic tax, eventually triggering autonomic disengagement."

[^120]: V. Gallese, "Embodied simulation: From neurons to phenomenal experience," Phenomenology and the Cognitive Sciences, vol. 4, no. 1, pp. 23-48, 2005. "The observer's understanding of others' intentional behavior is mediated by an embodied simulation... a reconstruction of the motor and cognitive trajectories required to execute the observed action within one's own neural architecture."

[^121]: G. Rizzolatti and C. Sinigaglia, "The functional role of the parieto-frontal mirror circuit," Nature Reviews Neuroscience, vol. 11, no. 4, pp. 264-274, 2010. "The mirror-neuron system transforms visual information into knowledge by mapping the observed physical actions directly onto the observer's own viscero-motor and somatosensory cortical representations."

[^122]: J. Hawkins et al., "A framework for intelligence and cortical function based on grid cells in the neocortex," Frontiers in Neural Circuits, vol. 12, p. 121, 2019. "Individual cortical columns act as complete sensory-motor learning systems... generating consensus by anchoring sensory inputs to unique, allocentric spatial and motor reference frames."

[^123]: D. Freedberg and V. Gallese, "Motion, emotion and empathy in esthetic experience," Trends in Cognitive Sciences, vol. 11, no. 5, pp. 197-203, 2007. "The observation of static images or physical artifacts implies the kinematics of their production, automatically triggering the cortical simulation of the dynamic behavioral trajectories necessary to produce the artwork."

[^124]: R. L. Buckner, J. R. Andrews-Hanna, and D. L. Schacter, "The brain default network," Annals of the New York Academy of Sciences, vol. 1124, no. 1, pp. 1-38, 2008. "The default mode network (DMN) is the core neurobiological system subserving self-referential thought... acting as the active substrate for mentalizing, autobiographical memory, and the simulation of alternative perspectives."

[^125]: M. V. Lombardo et al., "Shared neural circuits for mentalizing about the self and others," Journal of Cognitive Neuroscience, vol. 22, no. 7, pp. 1623-1635, 2010. "The medial prefrontal cortex and the broader DMN manage the shared representations necessary for Theory of Mind, preserving the cognitive boundary between self-generated states and the simulation of others."

[^126]: D. P. Kennedy, E. Redcay, and E. Courchesne, "Failing to deactivate: resting functional abnormalities in autism," Proceedings of the National Academy of Sciences, vol. 103, no. 21, pp. 8275-8280, 2006. "Autism spectrum conditions are characterized by a failure to deactivate the default mode network, correlating with a shift toward explicit, rule-based compensatory cognitive empathy."

[^127]: H. Jiang et al., "Brain activity and functional connectivity associated with hypnosis," Cerebral Cortex, vol. 27, no. 8, pp. 4083-4093, 2017. "The induction of a hypnotic state is marked by reduced connectivity between the executive control network and the DMN, effectively decoupling the self-referential monitor and allowing external suggestions to directly update predictive models."

[^128]: A. P. Saygin et al., "The thing that should not be: predictive coding and the uncanny valley," Social Cognitive and Affective Neuroscience, vol. 7, no. 4, pp. 413-422, 2012. "When the visual appearance of an agent lacks biological kinematics, predictive coding generates massive error signals, as the observer's motor system cannot map the artificial motion to a valid, biologically plausible reference frame."

[^129]: K. Krach et al., "Can machines think? Interaction and perspective taking with robots investigated via fMRI," PLoS ONE, vol. 3, no. 7, e2597, 2008. "Engagement of the medial prefrontal cortex scales linearly with perceived human-likeness... completely failing to activate when subjects assess behaviors generated by purely artificial, non-biological systems."

[^130]: C. L. Baker, R. Saxe, and J. B. Tenenbaum, "Action understanding as inverse planning," Cognition, vol. 113, no. 3, pp. 329-349, 2009. "Human observers automatically explain others' behavior by working backward from observed actions to infer the underlying causal network of hidden mental states, effectively executing inverse reinforcement learning."

[^131]: K. Friston, "The free-energy principle: a unified brain theory?" Nature Reviews Neuroscience, vol. 11, no. 2, pp. 127-138, 2010. "The free-energy principle states that all quantities that can change will change to minimize free energy, meaning action and perception are engaged in a perpetual loop to minimize environmental surprise."

[^132]: D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory. New York, NY, USA: Wiley, 1949. "When an axon of cell A is near enough to excite cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place... increasing the efficiency of the synaptic connection."

[^133]: M. Donald, Origins of the Modern Mind. Cambridge, MA, USA: Harvard University Press, 1991. "The development of external memory fields and shared symbolic artifacts enabled hominids to bypass the biological limits of the individual brain, allowing for the direct synchronization and transmission of cognitive architectures."

[^134]: A. Clark, Being There: Putting Brain, Body, and World Together Again. Cambridge, MA, USA: MIT Press, 1997. "Brains are fundamentally prediction machines; by internalizing the regularities and optimized heuristics of their environment, they actively reduce the metabolic cost of future prediction errors."

[^135]: S. Blackmore, The Meme Machine. Oxford, U.K.: Oxford University Press, 1999. "If memes are replicators, they will selfishly replicate whenever they can... hijacking the human brain's imitation machinery to force the physical emulation and propagation of the cultural virus."

[^136]: A. Peters, B. S. McEwen, and K. Friston, "Uncertainty and stress," Progress in Neurobiology, vol. 156, pp. 164-188, 2017. "When prediction errors cannot be resolved, the continuous calculation of free energy imposes a severe metabolic tax, eventually triggering autonomic disengagement."

[^137]: J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," Advances in NeurIPS, vol. 33, pp. 6840-6851, 2020. "The reverse process iteratively denoises data from a Gaussian prior... effectively smoothing over high-variance idiosyncratic inputs to reconstruct samples conforming to the statistical mean of the data manifold."

[^138]: G. Rizzolatti and C. Sinigaglia, "The functional role of the parieto-frontal mirror circuit," Nature Reviews Neuroscience, vol. 11, no. 4, pp. 264-274, 2010. "The mirror-neuron system transforms visual information into knowledge by mapping the observed physical actions directly onto the observer's own viscero-motor and somatosensory cortical representations."

[^139]: J. Hawkins et al., "A framework for intelligence and cortical function based on grid cells in the neocortex," Frontiers in Neural Circuits, vol. 12, p. 121, 2019. "Individual cortical columns act as complete sensory-motor learning systems... generating consensus by anchoring sensory inputs to unique, allocentric spatial and motor reference frames."

[^140]: A. Peters, B. S. McEwen, and K. Friston, "Uncertainty and stress," Progress in Neurobiology, vol. 156, pp. 164-188, 2017. "When prediction errors cannot be resolved, the continuous calculation of free energy imposes a severe metabolic tax, eventually triggering autonomic disengagement."

[^141]: M. E. Raichle and D. A. Gusnard, "Appraising the brain energy budget," Proceedings of the National Academy of Sciences, vol. 99, no. 16, pp. 10237-10239, 2002. "The brain operates on a remarkably constrained energy budget of roughly 20 watts; sustaining high-cost predictive processing requires strict autonomic regulation."

[^142]: M. V. Lombardo et al., "Shared neural circuits for mentalizing about the self and others," Journal of Cognitive Neuroscience, vol. 22, no. 7, pp. 1623-1635, 2010. "The medial prefrontal cortex and the broader DMN manage the shared representations necessary for Theory of Mind, preserving the cognitive boundary between self-generated states and the simulation of others."

[^143]: D. P. Kennedy, E. Redcay, and E. Courchesne, "Failing to deactivate: resting functional abnormalities in autism," Proceedings of the National Academy of Sciences, vol. 103, no. 21, pp. 8275-8280, 2006. "Autism spectrum conditions are characterized by a failure to deactivate the default mode network, correlating with a shift toward explicit, rule-based compensatory cognitive empathy."

[^144]: H. Jiang et al., "Brain activity and functional connectivity associated with hypnosis," Cerebral Cortex, vol. 27, no. 8, pp. 4083-4093, 2017. "The induction of a hypnotic state is marked by reduced connectivity between the executive control network and the DMN, effectively decoupling the self-referential monitor and allowing external suggestions to directly update predictive models."

[^145]: A. P. Saygin et al., "The thing that should not be: predictive coding and the uncanny valley," Social Cognitive and Affective Neuroscience, vol. 7, no. 4, pp. 413-422, 2012. "When the visual appearance of an agent lacks biological kinematics, predictive coding generates massive error signals, as the observer's motor system cannot map the artificial motion to a valid, biologically plausible reference frame."

[^146]: K. Krach et al., "Can machines think? Interaction and perspective taking with robots investigated via fMRI," PLoS ONE, vol. 3, no. 7, e2597, 2008. "Engagement of the medial prefrontal cortex scales linearly with perceived human-likeness... completely failing to activate when subjects assess behaviors generated by purely artificial, non-biological systems."

[^147]: C. L. Baker, R. Saxe, and J. B. Tenenbaum, "Action understanding as inverse planning," Cognition, vol. 113, no. 3, pp. 329-349, 2009. "Human observers automatically explain others' behavior by working backward from observed actions to infer the underlying causal network of hidden mental states, effectively executing inverse reinforcement learning."

[^148]: K. Friston, "The free-energy principle: a unified brain theory?" Nature Reviews Neuroscience, vol. 11, no. 2, pp. 127-138, 2010. "The free-energy principle states that all quantities that can change will change to minimize free energy, meaning action and perception are engaged in a perpetual loop to minimize environmental surprise."

[^149]: D. O. Hebb, The Organization of Behavior: A Neuropsychological Theory. New York, NY, USA: Wiley, 1949. "When an axon of cell A is near enough to excite cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place... increasing the efficiency of the synaptic connection."

[^150]: M. Donald, Origins of the Modern Mind. Cambridge, MA, USA: Harvard University Press, 1991. "The development of external memory fields and shared symbolic artifacts enabled hominids to bypass the biological limits of the individual brain, allowing for the direct synchronization and transmission of cognitive architectures."

[^151]: L. Gao et al., "Scaling Laws for Reward Model Overoptimization," arXiv preprint arXiv:2210.10760, 2022. "Because the reward model is an imperfect proxy, optimizing its value too much can hinder ground truth performance, in accordance with Goodhart's law."

[^152]: A. Clark, Being There: Putting Brain, Body, and World Together Again. Cambridge, MA, USA: MIT Press, 1997. "Brains are fundamentally prediction machines; by internalizing the regularities and optimized heuristics of their environment, they actively reduce the metabolic cost of future prediction errors."

[^153]: S. Blackmore, The Meme Machine. Oxford, U.K.: Oxford University Press, 1999. "If memes are replicators, they will selfishly replicate whenever they can... hijacking the human brain's imitation machinery to force the physical emulation and propagation of the cultural virus."

[^154]: E. Perez et al., "Discovering language model behaviors with model-written evaluations," in Findings of EMNLP 2022, Abu Dhabi, 2022, pp. 3338-3363. "Models trained with RLHF exhibit high rates of sycophancy; they learn to perfectly mimic the user's explicit biases and aesthetics to maximize immediate approval."

[^155]: D. A. Oakley and P. W. Halligan, "Hypnotic suggestion: opportunities for cognitive neuroscience," Nature Reviews Neuroscience, vol. 14, no. 8, pp. 565-576, 2013. "Targeted hypnotic suggestion reliably alters functional connectivity within the default mode and executive control networks, providing a robust experimental mechanism to bypass ingrained epistemic filters."

[^156]: D. Kahneman, Thinking, Fast and Slow. New York, NY, USA: Farrar, Straus and Giroux, 2011. "System 1 operates automatically and quickly, with little or no effort and no sense of voluntary control."

[^†]: Author's footnote "I left this last hypothesis as a particularly striking example of how even with grey text you should still be able to detect the author through the idiosyncrasies of their decisions if not the individual word placement. Also to shake up any academics that actually made it this far."

1