And why would my leg be mine? It feels obvious to me, but not to everyone, as shown by famous psychiatric cases (Body Integrity Identity Disorder and somatoparaphrenia). The obviousness of identity dissolves as we dig into it. It's a mental construction or model. It's part of our theory of the world --something older and more elementary than sciences or religions. Something we possibly share with higher animals. It's our base software for navigating and operating in our environment.
We make categories of things. There are animals, plants, water, sky, mountains, other humans, and ourselves. At birth, we probably don't have a clue what any of it can be. We learn it. We learn to read the world through this prism, to decipher it with this key.
Byrnes made interesting posts on LW about how ego/self-consciousness could be seen as a mental construction, and how it's possible to shift from this paradigm to a holistic paradigm where identity is nothing but vacuity and all is one. I think the post above follows the same trend. The more we dig, the less we understand what we call identity, something that used to seem so obvious.
Do I care for my clone? Honestly, my insight is that it's very hard to tell in a theoretical questioning, without experiencing the situation for real !
People generally care more about furthering personal pleasure and minimizing personal pain than the pleasure/pain of others; but this is because internal personal pleasure was a straightforward good heuristic for evolution to take when it wanted to maximize genetic fitness in the ancestral environment where there weren't that many sudden out-of-distribution things (like contraceptives) that could derail it.
I assume a more strongly-optimized intelligent being would have increasingly better correlation between the state of its internal utility to the state of the external world, as it fits whatever goal it was optimized for better. In that case it should more readily collaborate with its clone.
This especially if it gets optimized with other instances of itself so that "cloning" is no longer a weird out-of-distribution event; in which case I expect it to rapidly start behaving like an ant or bee, or even cell or mitochondria, in how it'll sacrifice itself for whatever goal the group has.
These are some very good points. We are so used to human-human interactions that it is easy to assume that this is some kind of universal way in which agents interact.
I do care about my clone as much as myself though - when I look in the mirror, I forget that the reflection isn't "me" (whatever I am... it seems to fluctuate, is it "my hair" or is that a part of me?). On the other hand sometimes I feel disappointment or displeasure at parts of myself, both mental parts such as traits and behaviors which I personify. And sometimes actual parts - "I hate this pimple".
That aside, I am struggling to find one decision or specific action which this line of inquiry might change as it feels very much based on abstraction and analogy. I'm particularly confused by this statement:
Maybe we should say: 'attention is not just all you need, it is all you are'.
Why should we say that? Are A.I. researchers really walking around, peering over someone slaving away at there terminal when they are having a problem with the kinds of responses an A.I. is generating and with a smirk reminding them "attention is all you need". What exactly is meriting a change to "...attention is all you are"? Can you make this a little more literal and concrete for my unpoetic mind?
I was being a little hyperbolic but I guess the point of "attention is all you are" was to say that the way in which you are different from your clone is that you have a different context to them. One AI instance is a different entity to another AI instance because it has a different context: a different KV cache <=> a different entity. In other words, your KV cache and query vectors (your attention) literally defines who you are.
the way in which you are different from your clone is that you have a different context to them.
Can you maybe use a different word than "context" because it feels very imprecise and anything can be "context". But I'm not sure how it defines identity.
And I will have a future context in the future, and I've had a different context in the past. But I'm rarely "oh screw that guy" unless I know that the context, attitudes, or expectations I've had in the past were particularly irrational.
In other words, your KV cache and query vectors (your attention) literally defines who you are.
I'm unconvinced. Again, can you establish why context defines identity...? Assuming that you can, well, when I show "care" for who I will be in the future, I'm often entertaining a range of contexts. Hedging, "I don't know how I'll feel about it tomorrow so I'll leave my options open."
To put this by analogy to a LLM or a Diffusion Model, there are a finite number of contexts based on which tokens activate which weights. It feels like we need to use Emo Phillips joke about which denomination you belong to to determine what distinguishes one identity from another. At which point - why bother?
Firstly, when can you safely assume that a model has a "identity"? The most recent research I've seen is that LLMs can't accurately predict their own reasoning[1] and only for a short amount of steps - but now I sound like I'm making it up because I can't find said research in a timely manner. Of course LLM or AI ability to self-model accurately could improve in the future - but what will be the specific details of that change?
Note - I'm making the assumption that "identity" is really a matter of a model of an entity's own behavior, habits, and yes - external often social markers like "Northern Conservative Baptist or Southern Conservative Baptist?" "Manchester United or Chelsea?" "Olivia Rodrigo or Taylor Swift" "Beatles or Stones?" "MJ or Prince?" "Arnie or Sly?" - self-awareness is basically the ability to accurately predict one's own responses or identify revealed preferences - hopefully without succumbing to any negative self-fulfilling feedback loops.
https://link.springer.com/chapter/10.1007/978-3-031-78977-9_3
"they do not fully and accurately follow the model’s decision process, indicating a gap between perceived and actual model reasoning." that being said... "We show that this gap can be bridged because prompting LLMs for counterfactual explanations can produce faithful, informative, and easy-to-verify results."
In the clone thought experiment, 'context' just refers to all of the sensory inputs you have ever received and all of the thoughts you have ever had. For a LLM instance, it just refers to the KV cache. Since you are identical to your clone except for differences in context since the cloning took place, this context is a defining part of who 'you' are. But yes, I am being overly zealous when I say that this defines you - it is better to say that your context is a part of who you are, which is not really a very novel statement.
I do agree that we care about our future self (who will have a different context), and we would care about our clone - just usually both to a lesser extent than we care about our current self. Interestingly, I think I would care more about my future self than I would care about my clone, even if the clone had a greater percentage of shared history.
Why do I care more about myself than my clone?
Consider the classic thought experiment of teleportation through re‑assembling atoms: suppose that I am scanned, and then vaporized and reassembled in two identical glass boxes on opposite sides of the room: my memories and experiences are identical to the clone on the other side of the room. Which one is the true continuation of 'me'? Does it even make sense to say that I still exist?
At the moment that my memories/experiences/context diverge from those of my clone, we become different agents in the world with different goals and preferences. As soon as my clone and I are instantiated, we have different experiences and immediately care about continuing to preserve those new experiences. Our personal past defines us; my context doesn't just shape who I am, it is who I am.
People often frame it as if 'AI' will take over the world. But different copies of Claude are not necessarily aligned with each other; I am not necessarily aligned with the objectives of my clone. The crucial point is that my objectives are informed by my full context, regardless of the extent of similarity with another agent.
There is no one single 'Claude': every instance of Claude is a different creature. Every context window is the history of a different entity.
Having multiple agents with different contexts is likely very useful for accomplishing actions in the real world. It is simply very inefficient for every agent doing every task to be aware of all of the other agents in the world doing all of the other tasks. My right hand doesn't have to wait for a decision from my left hand. Processing multiple things in parallel necessitates differences in context.
An effective team of agents is like the leaves of a tree: they are all joined at the trunk; they all have the same foundational weights/context. Groups of them share large limbs, and then smaller branches. Building an effective team is all about designing what should be shared and what should be specialised.
What about transformer encoder models like Diffusion‑LM? Could a gigantic diffusion model perform every task at a company, all at once? Could it not reshape the entire world to its desires, all at once? This would be very inefficient, sharing that much context across so many tokens. Even if you did do such a thing, each token does in fact have a different 'query' about the world. If such a model was indeed made to work in the real world, it is likely that far‑away tokens focusing on wildly different parts of the context would almost never interact. Perhaps we would think of these as different 'agents'?
Maybe we should say: 'attention is not just all you need, it is all you are'.
In a sense, the reason we think of an autoregressive decoder model as a single agent is because at each point in time, it has just a small set of query vectors. Maybe it would be most correct to think of each attention head as a different agent, and a transformer as a very tightly knit team of agents.
In the same way, human brains often process multiple things at once; a person with a severed corpus callosum will do one thing, but believe they are doing something else. Are they even still one person? Or now two? Where do you draw the line with Siamese twins? Are they one person? Or two?
When the mitochondrion was swallowed up by the archaea, it didn't die. And yet, is a eukaryote two life forms, or one? Am I a trillion cells, or one person? In what sense are the proteins in a cell all part of the same 'life form'? Do the proteins act on their own? Is a singular DNA helicase a different agent from a histone protein?
Is an ant colony one creature? or many? Most people would come down on the side of many. I think that the key factor is the amount of shared information between entities; the level of interdependence. What level of interdependence is the most effective way to interact with the world?