Steven Byrnes has written quite a lot on brain-like AGI algorithms. I'll reiterate here of a small part of his work, but you'd be better off reading his stuff directly.
For the ideas which inspired me, see here, here, and here.
This post has good handles for intuitively applying neuroscience to rationality. Don't mistake my "simulator" cluster for a distilled and reduced concept; it's "lumpy rock", not "skewed prism of seafloor impactite from the late Devonian".
My (loose) extension of Steve's models of human minds predicts that we'll get stuck in local minima of self-reinforcing nonreality; with this knowledge, we can more directly target the underlying mechanisms of cognitive biases.
Consciously tracking what you're actually optimizing for is one example skill which seems worth developing despite its time cost. In fact, many rationality skills (like noticing your confusion and scout mindset) are species of this overlying stance.
While discussing something, the goal for rationalists usually isn't to convince the other person of their ideas, but rather to come to truthier beliefs.
One should therefore be vigilant, lest they fall into a combative posture, since this makes them slow to update.
Cool! Is there a principled way to train this?
Do you mean trigger-action patterns?
No; TAPs take conscious effort to implement. How do you make it an unconscious System 1 response, the same way I keep my balance while walking?
Lots of practice.
But what if I have stronger subconscious pressures pushing me away from scout mindset? Like, sometimes I want some idea to work, and then I notice myself defending it. But then it feels like something blocks me from seeing that defensive posture. It's like a part of my brain is trying to keep me blissfully unaware. You're telling me to run uphill, but I want to remove the hill entirely.
If you keep practicing it, you'll learn to do it subconsciously! Then it won't take any effort, just like walking.
But as an infant, I wanted to walk; it got rewarded. Rationality isn't emotionally satisfying on the timescales I care about, at least not by default. My conscious practice builds a sandcastle, then tomorrow the tide washes away all but a shallow lump. Sure, after 3 decades my hill might rise above the waves, but what stops me from building a levee?[1]
Perhaps modern neuroscience can help this person be rewarded by rational thoughts? Let me just...
Simulators
Brains run simulations. I've not seen Steve claim this[2], hence "simulations" instead of "thoughtpredictors".
Let's say that I go parasailing in the Andes. During flight, my mind absorbs tons of data about the sky, landscape, my gear, etc. When I get home, my brain runs lots of internal simulations about flying through the mountains, birds, and more specific stuff about, say, wind patterns. These simulations are based off of those earlier sensory experiences.
In this way, my mind converts the tidbits of data in my flight memories into lots of training examples for itself. To a Solomonoff inductor (an imagined perfect intelligence), my simulations don't add any extra information about my flight compared to my memories.
But I'm not a Solomonoff inductor, so my mind needs many self-generated training examples to fully internalize it.
(This is one reason we're so damn sample-efficient compared to LLMs. We can keep generating internal training data. LLMs, by comparison, have to be fed training data by some externally bottlenecked process like internet scraping or human-written RL environments.)
From what I see, LessWrong rationality advice is almost entirely "ingest this particular training example", which works great if your simulators are faithful to the content of that data. But they're not.
Simulator dynamics cause cognitive biases and actively prevent you from fixing them.
Drift; cognitive attractors
Simulators can be influenced by each other cyclicly; they're not straightforwardly truthful updates on sensory data. As an extreme example, these internal thoughts[3]:
Blah unfocused daydreaming blah
If my best friend of 15 years attacked me with a frying pan, I'd probably punch him in the face and try to dodge the pan since it's slow to swing a cast iron pan like that
Imagining friend misses and shatters a window with the pot momentum
"Bro, stop!"
Now I like my friend less! He did absolutely nothing to imply that he'd attack me with a frying pan. In fact, I had a stomache cramp I didn't notice, which biased my idle thoughts towards pain, and my friend happened to be on my mind because I visited him that morning.
A more typical example for the audience of this post:
After a conversation with a non-rationalist family member about my decision to drop out of my undergrad math degree
She just doesn't understand. AI safety is too weird of a subject, and she's stuck on immigration or whatever Facebook feeds her all day.
Option value
What's that? Oh, I lost my train of thought. Why can't I remember what I was just thinking about? Anyway, humanity doesn't have time for me to screw around. This is urgent.
Cognitive simulators are a dynamical system, in the same sense as weather systems and Conway's Game of Life. At least in humans, simulators can loop back on themselves and/or influence parallel simulators such that they fall into attractor basins.
When a simulator predicts a thought/action X will be rewarded, X-promoters actually do get rewarded. Then future simulations will run "X is rewarding", pinning you in a cycle.
the expert tracks some extra information/estimate in their head. Usually the extra information is an estimate of some not-directly-observed aspect of the system of interest. From outside, watching the expert work, that extra tracking is largely invisible; the expert may not even be aware of it themselves. Rarely are these mental tracking skills explicitly taught. And yet, based on personal experience, each of these is a central piece of performing the task well - arguably the central piece, in most cases.
Do you track what you're locally optimizing for? Are you intuitively aware of whatever queries/searches you're running, all of the time?
Lots of biases don't feel wrong (to an untrained person) because they don't realize they're running the wrong search. Their mind is aiming for something completely different from what they "think" they're doing. If you ask such a person what they're trying to do, they'll describe something different from what they'd feel by introspecting during the process.
Say I'm explaining to someone why they're wrong about X. I "want" both of us to get to the truth; but I'm locally intuitively trying to get them to change their mind. They're wrong, after all. I'm speaking some words I think will get them to believe not X, but Y...
So now I'm optimizing, intuitively but not explicitly, to change this person's beliefs. What do you expect I'll feel if they give a strong counterargument?
I'll feel defensive. I don't endorse this feeling, but my short-term intuitive goal is beset!
Were I conscious of my intuitive goal instead of merely tracking it, I'd notice the swap. This holds for many classes of error you can find in the Sequences.[5]
Social pulls
Humans have quite strong social drives. This is more of a consequence of evolutionarily specified drives than of architectural inefficiencies, so training tactics differ a bit.
Working on a LW post about cognition enhancement; preliminary title Need Smarter Humans at top of screen
Notice that non-rat Altanta folks can see my screen through my office's glass walls
They probably think I'm loony, don't they?
Two minutes later, I'm looking at 3d visualizations of molecules from hydrogel chemistry simulations I'm running
Oh gee, what a happy coincidence that now I look smart and professional to the people behind me! ... Oh. Yikes.
Here again, had I been conscious of my immediate optimization target, I would've noticed the prestige-drive pulling me away from good work.
Probably surrounding myself with people who think very clearly would help. But that leaves a weird residual mismatch; I don't immediately want to be The Rationalistist, even if my milieu would very much respect that.
No, I want to look like someone with excellent epistemology, not to mention my thousand other evolved desires. Culture is only a partial solution, because I still need to actively guide my thought-drift.
Negative queries are unnatural (to most people)
"Try to falsify your beliefs" is is good advice, but notice that it's a very strange operation to ask of a simulator architecture.
When I run a query like "why is the 3d printer I just bought not working?" my mind can start rolling out possible explanations: clogged nozzle, disconnected wire etc. I have a problem, so my mind searches for trajectories which would explain my experience. Same for social situations, research, driving. I have some current problem and my mind searches forward through explanations and possible actions.
But what would it mean to run the negative query "what would it feel like if my current model of this situation were false?"
Simulators run forward, generating possible continuations from some starting frame. If I ask "what explains this under my current model?", my mind has something to roll out; how's the simulator supposed to run "what explains not(this) under my current model?"
Maybe I spend a lot of time imagining what the inverse shape of each of my models feels like. A priori this seems very wasteful to me, as it's swimming upstream against your hardwired cognitive architecture. But I could imagine it working out.
Confusion, however, is something on which we can run positive queries, and it's trainable. "What am I confused about" slowly but predictably clarifies where reality is biting your models, so long as you're actually optimizing for understanding your confusion.
Engram replay seems to be doing something like "running simulations", though much of that is probably also "optimizing the representation of episodic memories for predictive efficiency". Like defragmenting a hard disk drive, but neuromorphic.
I might just be saying in more words "deliberately practice rationality", but "deliberate practice" doesn't, to me, properly describe the metacognitive pose one should take therein.
Steven Byrnes has written quite a lot on brain-like AGI algorithms. I'll reiterate here of a small part of his work, but you'd be better off reading his stuff directly.
For the ideas which inspired me, see here, here, and here.
This post has good handles for intuitively applying neuroscience to rationality. Don't mistake my "simulator" cluster for a distilled and reduced concept; it's "lumpy rock", not "skewed prism of seafloor impactite from the late Devonian".
My (loose) extension of Steve's models of human minds predicts that we'll get stuck in local minima of self-reinforcing nonreality; with this knowledge, we can more directly target the underlying mechanisms of cognitive biases.
Consciously tracking what you're actually optimizing for is one example skill which seems worth developing despite its time cost. In fact, many rationality skills (like noticing your confusion and scout mindset) are species of this overlying stance.
Let's take scout mindset as an example.
While discussing something, the goal for rationalists usually isn't to convince the other person of their ideas, but rather to come to truthier beliefs.
One should therefore be vigilant, lest they fall into a combative posture, since this makes them slow to update.
Perhaps modern neuroscience can help this person be rewarded by rational thoughts? Let me just...
Simulators
Brains run simulations. I've not seen Steve claim this[2], hence "simulations" instead of "thought predictors".
Let's say that I go parasailing in the Andes. During flight, my mind absorbs tons of data about the sky, landscape, my gear, etc. When I get home, my brain runs lots of internal simulations about flying through the mountains, birds, and more specific stuff about, say, wind patterns. These simulations are based off of those earlier sensory experiences.
In this way, my mind converts the tidbits of data in my flight memories into lots of training examples for itself. To a Solomonoff inductor (an imagined perfect intelligence), my simulations don't add any extra information about my flight compared to my memories.
But I'm not a Solomonoff inductor, so my mind needs many self-generated training examples to fully internalize it.
(This is one reason we're so damn sample-efficient compared to LLMs. We can keep generating internal training data. LLMs, by comparison, have to be fed training data by some externally bottlenecked process like internet scraping or human-written RL environments.)
From what I see, LessWrong rationality advice is almost entirely "ingest this particular training example", which works great if your simulators are faithful to the content of that data. But they're not.
Simulator dynamics cause cognitive biases and actively prevent you from fixing them.
Drift; cognitive attractors
Simulators can be influenced by each other cyclicly; they're not straightforwardly truthful updates on sensory data. As an extreme example, these internal thoughts[3]:
Now I like my friend less! He did absolutely nothing to imply that he'd attack me with a frying pan. In fact, I had a stomache cramp I didn't notice, which biased my idle thoughts towards pain, and my friend happened to be on my mind because I visited him that morning.
A more typical example for the audience of this post:
Cognitive simulators are a dynamical system, in the same sense as weather systems and Conway's Game of Life. At least in humans, simulators can loop back on themselves and/or influence parallel simulators such that they fall into attractor basins.
When a simulator predicts a thought/action X will be rewarded, X-promoters actually do get rewarded. Then future simulations will run "X is rewarding", pinning you in a cycle.
Like water running down a hill, minds can get stuck in local puddles.
What are you optimizing in your head
From "What Are You Tracking In Your Head"[4]:
Do you track what you're locally optimizing for? Are you intuitively aware of whatever queries/searches you're running, all of the time?
Lots of biases don't feel wrong (to an untrained person) because they don't realize they're running the wrong search. Their mind is aiming for something completely different from what they "think" they're doing. If you ask such a person what they're trying to do, they'll describe something different from what they'd feel by introspecting during the process.
Say I'm explaining to someone why they're wrong about X. I "want" both of us to get to the truth; but I'm locally intuitively trying to get them to change their mind. They're wrong, after all. I'm speaking some words I think will get them to believe not X, but Y...
So now I'm optimizing, intuitively but not explicitly, to change this person's beliefs. What do you expect I'll feel if they give a strong counterargument?
I'll feel defensive. I don't endorse this feeling, but my short-term intuitive goal is beset!
Were I conscious of my intuitive goal instead of merely tracking it, I'd notice the swap. This holds for many classes of error you can find in the Sequences.[5]
Social pulls
Humans have quite strong social drives. This is more of a consequence of evolutionarily specified drives than of architectural inefficiencies, so training tactics differ a bit.
Here again, had I been conscious of my immediate optimization target, I would've noticed the prestige-drive pulling me away from good work.
But this goes much deeper. Down to the felt sense of what's culturally acceptable/expected by the people around me, and in which ways I care.
Probably surrounding myself with people who think very clearly would help. But that leaves a weird residual mismatch; I don't immediately want to be The Rationalistist, even if my milieu would very much respect that.
No, I want to look like someone with excellent epistemology, not to mention my thousand other evolved desires. Culture is only a partial solution, because I still need to actively guide my thought-drift.
Negative queries are unnatural (to most people)
"Try to falsify your beliefs" is is good advice, but notice that it's a very strange operation to ask of a simulator architecture.
When I run a query like "why is the 3d printer I just bought not working?" my mind can start rolling out possible explanations: clogged nozzle, disconnected wire etc. I have a problem, so my mind searches for trajectories which would explain my experience. Same for social situations, research, driving. I have some current problem and my mind searches forward through explanations and possible actions.
But what would it mean to run the negative query "what would it feel like if my current model of this situation were false?"
Simulators run forward, generating possible continuations from some starting frame. If I ask "what explains this under my current model?", my mind has something to roll out; how's the simulator supposed to run "what explains not(this) under my current model?"
Maybe I spend a lot of time imagining what the inverse shape of each of my models feels like. A priori this seems very wasteful to me, as it's swimming upstream against your hardwired cognitive architecture. But I could imagine it working out.
Confusion, however, is something on which we can run positive queries, and it's trainable. "What am I confused about" slowly but predictably clarifies where reality is biting your models, so long as you're actually optimizing for understanding your confusion.
See Taste & Shaping.
Engram replay seems to be doing something like "running simulations", though much of that is probably also "optimizing the representation of episodic memories for predictive efficiency". Like defragmenting a hard disk drive, but neuromorphic.
I noticed this sort of spiral all the time as a young teenager.
Explicitly citing quote here, not an example monologue.
I might just be saying in more words "deliberately practice rationality", but "deliberate practice" doesn't, to me, properly describe the metacognitive pose one should take therein.