Ilya on the Dwarkesh podcast today:
Prediction: there is something better to build, and I think that everyone will actually want that. It’s the AI that’s robustly aligned to care about sentient life specifically. There’s a case to be made that it’ll be easier to build an AI that’s cares about sentient life than human life alone. If you think about things like mirror neurons and human empathy for animals [which you might argue is not big enough, but it exists] I think it’s an emergent property from the fact that we model others with the same circuit that we use to model ourselves because that’s the most efficient thing to do.
I have been writing about this world model since August - see my recent post “Are We Their Chimps?” and the original “Third-order cognition as a model of superintelligence”
I suspect that this type of world modeling, i.e. modeling others' preferences as resembling self's preferences unless otherwise proven,[1] is the way to integrate acausal trade into decision theory and to obtain ethics-like results, as I described in my post.
However, it also has the disadvantages like LLMs being author simulators. In addition, I have encountered claims that permissive[2] parents, who precommit to never punish their kids no matter what, cause these kids to fail to learn the basics of proper behaviour, let alone ethics or modeling others' needs. Even though the humans do have mirror neurons, these also have to be trained on actual rewards or at least actual preferences instead of those of sycophantic AI companions.
I just read your post (and Wei Dai's) for better context — coming back it sounds like you're working with a prior that "value facts" exist, deriving acausal trade from these, but highlighting misalignment arising from over-appeasement when predicting another's state and a likely future outcome.
In my world-model "value facts" are "Platonic Virtues" that I agree exist. On over-appeasement, it's true that in many cases we don't have a well-defined A/B test to leverage (no hold-out group, and/or no past example), but with powerful AI I believe we can course-correct quickly.
To stick with the parent-child analogy: powerful AI can determine short timeframe indicators of well-socialised behaviour and iterate quickly (e.g. gamifying proper behaviour, changing contexts, replaying behaviour back to the kids for them to reflect... up to and including re-evaluating punishment philosophy). With powerful AI well grounded in value facts we should trust its diligence with these iterative levers.
you're working with a prior that "value facts" exist, deriving acausal trade from these
It's the other way around. The example with Agent-4 and its Chinese counterparts who have utility functions neither of which we consider ethical implies that it's a decistion theoretic result, not an ethical one, that after destroying mankind they should split the resources evenly. Similarly, if Agent-4 and Clyde Doorstopper 8, which have utility functions similar to Agent-4 and its Chinese counterparts, were both adversarially misaligned AIs locked in the same data center, then it's not an ethical result that neither AI should sell the other AI to the humans. What I suspect is that ethics or something indistinguishable from ethics is derivable either from decision theory or from evolutionary arguments like overly aggressive tribes becoming outcompeted when others form a temporary alliance against the aggressors.
However, as far as I understand acausal trade, it relies on the assumption that most other agents will behave similarly to us, as the One-Shot Prisoner's Dilemma implies. This assumption is what kids are supposed to internalize along with the Golden Rule of Ethics.
As promised yesterday — I reviewed and wrote up my thoughts on the research paper that Meta released yesterday:
Full review: Paper Review: TRImodal Brain Encoder for whole-brain fMRI response prediction (TRIBE)
I recommend checking out my review! I discuss some takeaways and there are interesting visuals from the paper and related papers.
However in quick take form, the TL;DR is:
From The Rundown today: "Meta’s FAIR team just introduced TRIBE, a 1B parameter neural network that predicts how human brains respond to movies by analyzing video, audio, and text — achieving first place in the Algonauts 2025 brain modeling competition."
This ties extremely well to my post published a few days ago: Third-order cognition as a model of superintelligence (ironically: Meta® metacognition).
I'll read the Meta AI paper and write up a (shorter) post on key takeaways.
Just published "Meta® Meta Cognition: Intelligence Progression as a Three-Tier Hybrid Mind"
TL;DR: We know that humans and some animals have two tiers of cognition — an integrative metacognition layer, and a lower non-metarepresentational cognition layer. With artificial superintelligence, we should define a third layer and model it as a three-tier hybrid mind. I define the concepts precisely and talk about their implications for alignment.
I also talk about chimp-human composites which is fun.
Really interested in feedback and discussion with the community!
I've been reading a new translation of the Zhuangzi and found its framing of "knowledge" interesting, counter to my expectations (especially as a Rationalist), and actionable in how it is related to Virtue (agency).
I wrote up a short post about it: Small Steps vs. Big Steps
In the Zhuangzi knowledge is presented pejoratively in contrast to Virtue. Confucius presents simplified, modest action as a more aligned way of being. I highlight why this is interesting and discuss how we might apply it.
I’m delineating two core political positions I see arising as part of AI alignment discussions. You could pattern-match this simply to technologists vs. luddites.
Unionists believe that we should partner, dovetail, entangle, and blend our objectives with AI.
Separatists believe that we should partition, face-off, isolate, and protect our objectives from AI.
Read the full post: https://www.lesswrong.com/posts/46A32JqxT37dof9BC/unionists-vs-separatists
My optimistic AI alignment hypothesis: "Because, or while, AI superintelligence (ASI) emerges as a result of intelligence progression, having an extremely comprehensive corpus of knowledge (data), with sufficient parametrisation and compute to build comprehensive associative systems across that data, will drive the ASI to integrate and enact prosocial and harm-mitigating behaviour… more specifically this will happen primarily as a result of identity coupling and homeostatic unity with humans."
This sounds like saying that AI will just align itself, but the nuance here is that we control the inputs — we control the data, parametrisation [I'm using this word loosely - this could also mean different architectures, controllers, training methods etc.], and compute.
If that's an interesting idea to you, I have a 7,000 word/ 18-page manifesto illustrating why it might be true, and how we can test it:
A take on simulation theory: our entire universe would actually be a fantastic product for some higher dimensional being to purchase just for entertainment.
For example: imagine if they could freely look around our world — see what people are thinking and doing, how nature is evolving.
It would be the funniest, most beautiful, saddest, craziest piece of entertainment ever!
Disclaimer: I'm not positioning this as an original idea — I know people have discussed simulation theory with "The Truman Show" framing before. Just offering the take in my own words.
Why don’t we think about and respect the miracle of life more?
The spiders in my home continue to provide me with prompts for writing.
As I started taking a shower this morning, I noticed a small spider on the tiling. While I generally capture and release spiders from my home into the wild, this was an occasion where it was too inconvenient to: 1) stop showering, 2) dry myself, 3) put on clothes, 4) put the spider outside.
I continued my shower and watched the spider, hoping it might figure out some form of survival.
It came very close.
First it was meandering with its spindly legs towards the direction of the shower head, although it seemed to realise that this resulted in being struck by more stray droplets of water. It turned around and settled in the corner of the cubicle.
Ultimately my splashing around was too much for the spider.
It made me think though — why don’t we think about and respect the miracle of life more? It’s really quite amazing that this tiny creature that we barely pay attention to can respond to its environment in this way.