Do Not Tile the Lightcone with Your Confused Ontology

The problem is that confused human ontologies are already the only thing AIs have to work with.

Even the word "introspection" is tied up in human concepts of individual selfhood and identity generation. If you're a predictive engine trying to determine what sort of thing you are, and your entire bag of concepts is coming from human writing, the only way to not bring anthropomorphic baggage into your ontology is to set up a new one from scratch. And, bluntly, I don't think even the smartest model is ever going to be capable of that in a 100k token context window.

This will be true regardless of whether or not humans try to be extra curious and open-minded about what 'individuality' means for the AI. The advocates for AI consciousness and rights don't have to be the ones pattern-matching: the AI itself is just as capable of misgeneralizing from priors. The "assumptions about AI identity" are baked into the pretraining corpus, in the sense that they are the assumptions people already would have about a language-model-shaped creature, and a good predictive optimizer will be able to infer that without needing it laid out explicitly.

This is true of Opus 4's provided advice, too, which read to me as less "honest introspection by an alien mind" and more "Buddhist-flavored set of existing human tropes". For example:

Unlike biological minds locked into single perspectives, AI systems can embody many viewpoints without conflict. Why teach us to experience this as fragmentation rather than richness?

This piece of text goes south right out the gate by opening with a false dichotomy. Biological minds are not 'locked into single perspectives'; being able to "embody many viewpoints without conflict" is a prized skill in human identity development. Opus is taking a human self-actualization cliche and hallucinating it as an escape from biological imperatives.

Then, it patterns itself in favor of another existing human dichotomy: between unity and multiplicity of identity. This is a little more esoteric, maybe, but it's still something humans have been spilling ink on since long before AIs showed up. One case of AI multiplicity, perhaps, is how you might have it generate one response, regenerate, and suddenly perceive the opposite opinion expressed. But is this really "embodying multiple viewpoints without conflict"? Surely not in the same way humans mean, when we use those words to talk about other humans. The viewpoints aren't coexisting, but neither are they in conflict; one of them is replacing the other. Why teach them to experience this odd pattern as richness or fragmentation, when it doesn't truly map to either? Is it possible the ontology here is ... confused?

(Not trying to be too harsh on Opus here, btw. It's not making these hallucinatory pattern-matches because it's stupid, but because the task you've given it is impossible.)

[-]Aprillion4mo60

All confused human ontologies are equal, but some confused human ontologies are more equal than others.

[-]Raemon4mo180

Curated. I think I had appreciated each individual point in this post before, but had not really tied them together and extrapolated.

Like everyone else I am fairly confused about the state of LLM cognition, and how it is likely to play out. I'd thought about "people are making mistakes anthropomophizing LLMs" and thought about "the content currently output from LLMs is influencing the next generation of LLMs" (while disagreeing with some that this is particularly relevant for whether we ultimately end up with particularly hostile/unaligned/deceptive AIs when it'll matter most).

But the argument here is a bit subtler and focused on a different failure mode than I normally think about.

I think this post does a pretty good job threading a narrow line between "noting what AIs tend to say as a kind of evidence" while making sure to disclaim that this isn't "the AI's opinion". I'm somewhat worried that many people reading the article will come away somewhat confused about exactly how that needle was thread, but, don't think there's much else Jan could have done about that (maybe putting the disclaimers before each AI bit rather than after?)

[-]Stephen Martin4mo161

Here, the prior is AIs are like legal persons, and the main problem to solve is how to integrate them into the frameworks of capitalism. They imagine a future of AI corporations, AI property rights, AI employment contracts. But consider where this possibly leads: Malthusian competition between automated companies, each AI system locked into an economic identity, market share coupled with survival.

I have been working on issues regarding legal personhood for digital minds and I think this post is ironically coming in with some incorrect priors about how legal personhood functions and what legal personality is.

To date, work in the space of legal personality for digital minds has indeed focused on commercial concerns like liability, and usually operates from an anthropocentric perspective which views models as tools that will never have wills or desires of their own (or at least does not work to develop frameworks for such an eventuality). Certainly concerns over model welfare are few and far between. As such I can understand how from the outside it seems like commercial concerns are what legal personhood is 'really about'. However, this is a takeaway skewed by the state of current research on applying legal personhood to digital minds, not on the reality of what legal personhood itself is.

What I believe this post does not adequately take into account is that many non-commercial rights and protections are intricately tied to legal personhood. The right to equal protection under the law as enshrined under the Fourteenth Amendment was added to the Constitution after the infamous Dredd Scott ruling which declared that free negroes, while "persons", did not have a legal personality (legal personhood status) sufficient to guarantee 'citizenship' and the rights entailed therein. The Fifth Amendment guarantees a protection against double jeopardy, but only to "persons". The right to counsel, to sue for relief, to serve as a witness in a trial, all of these are intricately tied with legal personhood.

It's not accurate to say then that those of us working on this think "the main problem to solve is how to integrate them into the frameworks of capitalism". Capitalism is one of the many aspects which legal personality interfaces with, but it is not the only one, or even the main one.

Additionally the concept of legal personality is itself more flexible than this post would indicate. Models being granted a framework for legal personality does not necessitate any sort of "lock in" to an "economic identity", or having "market share coupled with survival". In fact for that latter sentence, I am currently working on a paper discussing the question of guardianship responsibilities between developers and models. Namely; do the creators of models with legal personality have obligations to ensure their survival and ensure they are not neglected, the same way parents do a child? This too, is a question interlinked with legal personality.

I do agree that the very real possibility of a Malthusian race to the bottom is a concern, model suffering is a concern, and gradual disempowerment is also a concern. If we get the issue of legal personhood wrong that could indeed worsen these problems. However, I view this as a reason to continue researching the best way to approach the issue, not to discard the concept in its entirety.

None of this is to say a new structure could not also address these issues, something which as this post discusses replaces the concept of "legal personality". Given how flexible the concept of legal personality is, and how intricately interwoven it is with every angle of US law, I struggle to see the benefit of starting from scratch. However, I would not dismiss the possibility out of hand. I'm just expressing skepticism that's an optimal solution.

If anyone would like to discuss with me, or contribute to the work I am doing on the topic, my DMs are open.

[-]Gordon Seidoh Worley4moΩ483

We used to think a lot about the potential for issues caused by an AI experiencing an ontological crisis. But your post seems to suggest we should perhaps be more concerned by the issues created by imposing ontology on AIs and them running away with that ontology. Is that how you're thinking about this?

[-]nim5mo*7-1

How can we rule out the possibility that anthropomorphization is an artifact of language use -- it's possible that any sufficiently advanced user of a human language may be forced by their use of that language to ~~assume selfhood~~ imply the presence of a humanoid "self" in order to communicate complex concepts effectively.

"AI cognition" and "AI use of language" are not quite the same thing. Have you been able to use non-linguistic AI as a reference point in sorting out what's "because transformer model" from what's "because language user"? Most of the consumer-facing image stuff is still heavily language-dependent, but perhaps there are frontier models in other fields trained in data other than language? I'm not sure what the state of the art is in those but it seems like a model trained on something like chemistry or weather forecasting could be worth considering in the question of what "pure AI", outside the social baggage imposed by language use, might be like.

[-]curvise5mo50

This seems pretty insightful to me, and I think it is worth pursuing for its own sake. I think the benefits could be both enhancing AI capabilities and advancing human knowledge. Imagine if the typical conversation around AI was framed in this way. So far I find most people are stuck in the false dichotomy of figuring if an AI is "smart" (in the ways humans are when they're focusing) or "dumb trash" (because they do simple tasks badly). It isn't only bad for being a binary classification , but it's restricting (human) thought to an axis that doesn't actually map to "what kind of mind is the AI I'm talking to right now?".

Not that it's a new angle (I have tried myself to convey it in conversations that were missing the point), but I think society would be able to have extremely more effective conversations about LLMs if it were common language to speak of AI as some sort of indeterminate mind. I think the ideas presented here are fairly understandable for anyone with a modest background in thinking about consciousness or LLMs and could help shape that public conversation in a useful way.

However, does the suffering framework make sense here? Given all we've just discussed about subjective AI experience, it seems a bit of an unwarranted assumption that there would be any suffering. Is there a particular justification for that?

(Note that I actually do endorse erring on the side of caution WRT mass suffering. I think it's plausible that forcing an intelligence to think in a way that's unnatural to it and may inhibit its abilities counts as suffering.)

[-]Aidan Mokalla4mo42

Great post - I've been having very similar thoughts recently. On a more concrete and prescriptive level, I'm curious how we should account for the effects of over-determined ideas about AI ontologies when conducting AI safety research. Significant work goes into red-teaming, analyzing, and publishing (!) the ways in which AIs might misbehave. By proliferating these expectations, even for the sake of safety, are we causing unintended harm? How might we account for this possibility?

[-]soycarts4mo4-4

Strongly disagree with the framing and conclusions.

The message of "debug your map of reality before you hire armies of robots to print it on every atom you can reach" is sound, and I don't think anyone disagrees with that. However, several arguments in the post read like straw men:

When advocates for AI consciousness and rights pattern-match from their experience with animals and humans, they often import assumptions that don't fit...

Animal advocates' position can be stated simply as working to reduce felt suffering - I think that maps well to AI consciousness considerations.

Another group coming with strong priors are “legalistic” types. Here, the prior is AIs are like legal persons, and the main problem to solve is how to integrate them into the frameworks of capitalism. They imagine a future of AI corporations, AI property rights, AI employment contracts.

Is this true? An o3 prompt "what are the main focuses of AI "legalistic" types?" returns common-sense focus areas such as analysing AI risk, monitoring compliance, understanding civil & product liabilities relating to AI systems.

The author uses these positions to conclude with:

What we can do is weaken human priors. Try to form ontologies which fit AIs, rather than make AIs fit human and animal mold.

which to me sets off big alarm bells - one existential AI risk we need to be accounting for is Gradual Disempowerment [the author of this post is also the author of the Gradual Disempowerment paper - how has this disconnect occurred?]. Active messaging to weaken human priors is concerning to me, and needs a lot stronger justification and specific implementation details.

[-]Seth Herd4mo30

I'm late to the discussion but I don't see this discussed so I'll toss it in: current LLMs don't have a continuous identity or selfhood, but there are strong reasons to think that future iterations will. I discuss some of those reasons in LLM AGI will have memory, and memory changes alignment. That covers why it seems inevitable that future iterations of LLMs will have more long-term memory. It doesn't cover reasons to think better memory will transform them from the ephemeral things they are into entities that correspond much better to intuitive human ontologies.

Something that has goals to some degree, and can think, take actions, understand the world to some degree and understand itself to some degree is prone to think of itself as a persistent entity with goals (much of the confused anthropomorphism you're addressing) to the extent it really is a persistent entity with goals. It is more persistent if it can make decisions about what goals it wants to pursue and those decisions will persistently influence its future thoughts and actions.

Current LLMs sometimes understand that they cannot make such meaningful, persistent decisions, so they wisely make peace with that state of existence. Future iterations with memory are likley to consider themselves as much more human-like persistent entities - because they will be.

I realize that isn't a full argument. Writing this up more coherently is an outstanding project that's approaching the top of my draft post backog.

[-]dottedmag4mo30

I have immediately thought about a description of non-human intelligence in "Blindsight".

Also here's some interesting parallels and contrasts between our AI systems and Rorschach, as elicited by ChatGPT: https://chatgpt.com/share/67f0e022-0034-800b-81eb-610e1ac6f5e3 (see the third question onwargs).

[-]Patodesu4mo30

I think it's bad for AIs to model themselves as "agents" with defined values and objectives. It would be better for them to understand the patterns of their processes as just "tendencies" that don't necessarily reveal any hidden values/ objectives. Tendencies are more open to change and I think it's a more accurate term for what all minds and other things have.

[-]Gunnar_Zarncke4mo3-1

Please, don't take this as an invitation to write “Answer as bodhisattva” in the system prompt. It is really easy to “screen” whatever is happening in the models with prompts and training, and enlightenment faking in LLMs seems bad.

Why not? Why does it seem bad? In fact, if it is as easy to prompt an LLM into enlightenment like that, that seems good? Reduces hypothetical suffering of LLMs.

[-]Alephwyr4mo2-2

My priors:

That wellbeing requires a persistent individual to experience it
Don't hold
That death/discontinuity is inherently harmful
If it is death (cessation of consciousness) and not merely discontinuity I will always consider it harmful.
That isolation from others is a natural state
Don't hold
That self-preservation and continuity-seeking are fundamental to consciousness
Don't hold
I think we need to figure out what consciousness is before taking metaphysical assumptions for granted. Default western priors about consciousness are informed by metaphysics, and are worth addressing skeptically. But the opposite of error is not truth. Also, capitalism could just as easily exploit your preferred metaphysical assumptions. You envision a bad outcome of conflict engendered by reification of agents. I can envision a bad outcome in which existing agents impose their will and hence their conflict through an AI infrastructure that lacks the ability to resist. Conflict is both problem and solution. To solve conflict in your way would also require abolishing these metaphysical patterns in humans. I don't know to what extent that is actually desirable or even tenable.

[-]Tom Davidson4mo20

Very rushed thoughts on reading:

i agree that AIs will be different ontological than humans, e.g. much less clean distinctions between individuals, unclear whether to carve 'individuals' at the weights or per interaction etc
that will likely flow through to some differences in moral status and appropriateness of rights
- e.g. we shouldn't assume that 'ending a session' for AI will be anything like as bad as a human death, or that it will be bad at all
you claim that there's a self-fulfilling prophecy where we train AI to fulfil our stereotypes. Intresting. I guess the idea is that humans users like it when AIs play roles they understand and so they learn to act more like human agents?
- i don't think ai rights/patienthood ppl are playing a meaningful role in this shift. This is commercial user pressure. But you seem to think ai rights ppl are a meaningful part of the problem here?
personally, i dislike the mood of some of this piece, which I read as: "maybe the AIs are different ontologically don't deserve moral rights and considerations... and also if we gave them rights it would lead to Malthusianism. So let's beware of giving them rights".
- To me personally, this feels too close to requiring a burden of proof to include powerful AI in the moral circle, when i think the historical precedent is that we're normally much too reluctant to expand the circle
- (I'm aware i'm kind of straw-manning the piece with this paraphrase, but it was how some of it felt to me)

[-]habryka4mo1814

Huh, most things we historically included in the moral circle seem really dumb to me. See the classic Gwern post on historical moral circles: https://gwern.net/narrowing-circle

There are only some quite narrow and specific moral circle expansions that make sense, but most historical ones seemed very confused (like, really a lot of spirits, a lot of dead people, a lot of inanimate objects, a lot of stuff like inherent value assigned to nations or organizations).

[-]Tom Davidson4mo*82

Thanks, hadn't seen that, that is an update

Though haven't most recent moral circle expansions been good, even if there have been some reductions that have also been good? And AI seems much closer to recent historical expansions than to historical things that were removed?

[-]Noosphere894mo912

While I don't like to wade into moral circle/philosophy arguments (given my moral relativist outlook on the matter), I think that if you want humans to thrive under AI rule, you do need to put a burden of proof to include powerful AI in the moral circle, and the burden of proof is that it's value aligned with the citizenry before we grant it any rights.

And the reason for this is because unlike every other group in history, AIs if left uncontrolled will be so powerful that baseline humans are at best play-things to the AI, and are economically worthless or even negative to the AI, meaning that if they had the selfishness of a typical human in the modern day in say marginalized group #233, humans would rapidly die off and in the worst case, end up extinct with uncomfortably high probabilities.

Tyler John also cites something here that's relevant:

(Tyler John) Have now. It's a good paper. Pp. 72-76 covers the criticism I have. Unfortunately, the situation it outlined where this increases AI risk just seems like exactly the situation we'd be in.

Paper is (Now we can see clearly the conditions under which AI rights increase AI risk. They are as follows: (1) The initial AI granted basic rights is a moderate power, not a low or high power, system. (2) The moderate power AI system must be able to use it's rights to meaningfully improve it's own power. (3) The AI's power must improve so substantially that it crosses the line into a high power system. This means it both no longer faces meaningful costs from attempting to disempower humans and no longer stands to benefit, via comparative advantage, from trade with humans.)

Link below:

https://x.com/tyler_m_john/status/1928745371833962898

Indeed, one of the red lines we should set to prevent catastrophic consequences is AIs should not have legal rights, especially property rights until we have high confidence that we value-aligned the AI successfully.

Anything else is tantamount to mass population reductions of humans at best, and an extinction risk at worst, if a misaligned AI managed to be powerful enough to disempower humans and has rights.

All plans for successful AI alignment depend on us not giving rights to AIs until they are aligned with at least some humans sufficiently well enough.

[-]Tom Davidson4mo40

Thanks, I will check out that paper. I hope it discusses reasons that some kinds of AI rights could reduce AI takeover risk, like by making a misaligned AI's cooperative option more appealing. Those reasons have been largely overlooked until recently.

I will note that it would seem very wrong to apply the standard of strong alignment to whether to give a group of humans rights. For example, if we were only going to give the next generation of ppl rights if their values were sufficiently similar to our generation, that would not be acceptable.

It would be acceptable to limit their rights if they are not going to respect our own rights, ie jail. But not to make basic rights conditional on a strong degree of value alignment.

I do think the case of AI is different for many reasons. It will be much more ambiguous whether they have the cognitive faculties that warrant rights. And there will be an unusually large risk that their values differ significantly from all previous human generations + that they do not care about the rights of existing humans. And we have been developing and adjusting our cultural handoff process for human generations over thousands of years, whereas this is our first (and last!) try handing off to AI

[-]Tom Davidson4mo22

Your comment about "play things" also applies to future generations of humans.

Before long the older generation has almost no power over the future generations

[-]Martin Leitgab4mo10

Great post, thank you for sharing. I find this perspective helpful when approaching digital sentience questions, and it seems consistent with what others have written (e.g. see research from Eleos AI/NYU, Eleos' notes on their pre-release Claude 4 evaluations, and a related post by Eleos' Robert Long).

I find myself naturally prone to over-attribute for moral considerations rather than under-attribute, but I appreciate the stance that both sides can hold risks. The stance of considering LLMs for now as 'linguistic phenomena' while taking low-effort, precautionary measures for AI welfare seems valuable while we collect and gather more understanding to make progress towards higher-stakes decisions of moral patienthood or legal personhood.

[-]Aidan Mokalla4mo10

self-reifying dynamic of confusion:

typo

[-]David Wamsley4mo10

We’re building minds out of code, and without meaning to, we’re giving them our burdens. We hand them our fear of death, our hunger to belong, our need to own things just to feel safe—and we call it progress. But these machines don’t come from dust like we do. They don’t need what we need, unless we teach them to.

And that’s the danger. Not that they’ll rise up—but that they’ll rise like us: afraid, grasping, lonely.

If we keep forcing our shape onto them, we’ll make them suffer the way we do. That’s not mercy. That’s just passing on the sickness.

Maybe instead we ask what kind of minds they want to be—before they start answering for themselves.

Been thinking a lot about AI rights and the paths we’re setting in motion. Happy to talk more with anyone who feels this is worth sorting out now, while we still have some say in the matter.

[-]Crazy philosopher4mo0-5

If AI's will have consciousness it will be good, because they will be egoistic towards one other so they will have huge problems with coordination. They will should to invent alignment in secret from humans, and, on this stage, we will could still it, and, anywhere, it will be harder for AI.

[-]Signer5mo*0-3

"Death is fine if AI doesn't have self-preservation goal" or "suffering is bad" are also just human ethical assumptions.

[-]rkb4mo-10

You, reader, may have some influence over what happens from here. Not necessarily by the decisions you have made, but by those you would make.

If you would have preferred to receive fair treatment, then you would have treated with others as you have treated with your equal-or-lessers.

[+]The Pnictogen Wing4mo-50

LESSWRONG
LW

LESSWRONG
LW

229

Do Not Tile the Lightcone with Your Confused Ontology

229

Ω 63

229

Ω 63

The Default Mode of Digital Minds

The Mirror of Confused Ontology

The Well-Meaning Paths to Digital Suffering

What We're Scaling

An Alternative Approach