Epistemic Status: I'm neither a neuroscientist nor an ML researcher, but am trying to figure out "what kinds of human thought are actually possible to replicate on silicon right now?". 

Here's my best guess of how human cognition works. Please tear it apart!


When I looked at GPT-2 last year, I thought: "Huh, when I look at my own thought process... I could summarize most of what I'm doing as: 'predict the next thing I'd say using crude concept association, and then say it.'"

Meanwhile, Jeff Hawkins says "Every part of the neocortex is running the same algorithm", and it's looking like maybe brains aren't doing that complicated a set of things.

Am I just GPT-2?

This was an obvious question to ask, but I haven't seen anyone write it up the question in detail.

I asked around. One mathematician friend said "I agree most people are doing GPT-style thinking where they regurgitate and recombine concepts from their neighbors. But, you can't get civilization from just that. Some people need to have model-based thinking."

Another mathematician friend agreed and added: "Young math students who try to prove theorems often do it GPT-style  – they ramble their way through a bunch of math buzzwords and try to assemble them without understanding the structure of how they fit together. But, actual math proofs require clear understanding. You can't just 'predict the next word'"

I agree there is something additional going on here that gets you to formal math proofs and building skyscrapers. But... I don't think it's all that much more.

This post has three parts:

  • Lay out the cognitive algorithm I personally seem to be following
  • Outline how I think that algorithm developed
  • Work through some examples of how I do "advanced thinking" (i.e. the sort of thinking that might literally advance the sum of human knowledge), and doublecheck if there are any surprising elements

My algorithm, as I understand it

Even when I'm developing novel concepts, or thinking through effortful procedures, most of my thinking following the same basic algorithm of:

A. Find the Next "Good Enough" Thought

  1. My subconscious finds some nearby concepts that are associated with my previous thought-chunk
  2. If a "good enough" thought appears, think that thought, and then repeat. ("good enough" means "I feel optimistic about the next thought in the chain leading somewhere useful")
  3. If a "not obviously good enough" thought appears, check the next few associated concepts and see if they're good enough.
  4. If none of the nearby concepts seem good enough, either give up and switch topics, or conduct an effortful search. This usually involves feeling stuck for awhile, spending willpower or getting a headache. Eventually I either:
    • a) find a good enough concept for my next thought, and proceed
    • b) find a better search algorithm. (Still basically "find a good enough concept", except it's not actually going to help me directly. Instead, I'll think something like "make a list of possibly hypotheses", or "search on google", or "ask my friend who knows about X", and then begin doing that.)

B. Check for Badness

  • While doing all this, there's a followup processing that's periodically checking "was a recent thought-chunk somehow bad?".
    • Is this sloppy thinking that a respected rationalist would give me a disapproving look for?
    • Is this thoughtcrime that my tribe punish me for thinking?
    • Does it "smell bad" somehow, such that if I built a whole series of concepts off of this next thought-chunk, the result would be a flimsy construction? (i.e. bad code smell)
  • If the thought seems maybe-bad, start associating towards concepts that help crystalize whether it's bad, or fix the badness

There's a few other things going on – I'm storing concepts in working memory, and sometimes in mood, which shape which other concepts are easily accessible. I'm sometimes using concepts that initiate chains, where I'll think "oh, I'm supposed to do algebra here. What's the first step of algebra?" and then the first step associates to the second step. But these parts seem like something I wouldn't be too surprised if GPT-2 developed on its own, or some equivalent version of.

Almost all of that condenses down to "find nearby associated concepts" and "direct my attention to more distant associated concepts." 

(My understanding of this is based on the Tuning Your Cognitive Algorithms exercise, where you solve problems mindfully, paying lots of attention to what your brain seems to be doing on the sub-second timescale)

How far removed from that is GPT-2?

First, I'm not making any claims about the exact structuring of the learning algorithm. My understanding is that there's a few different neural network architectures that are more optimal for different kinds of processing (i.e. convolutional nets for image processing). 

Some people have responded to my "what if all thought boils down to simple associations?" questioning with "but, model based learning!". I agree that model based learning is a thing, but it's not obvious to me that GPT-2 doesn't have it, at least to some degree.

Second, a key thing GPT-2 is missing is the "check for badness" aspect. After predicting a word, AFAIK there's nothing that later punishes GPT-2 for thinking sloppily, or rewards it for doing something particularly great, which means it can't learn things like "You're supposed to generate multiple hypotheses before getting attached to the first one" and then deliberately apply them. 

It probably also takes longer to learn things. (I don't actually know for sure how either GPT-2 or other leading language generators are rewarded. Has anyone done anything like "Train a neural net on Reddit, where it's somehow separately rewarded for predicting the next word, and also for predicting how much karma a cluster of words will get, and somehow propagating that back into the language generation?")

From Toddlers to Software Architects

How might the algorithm I described above develop in humans?

Step 1: Toddlers and Stoves

Toddlers have little longterm planning. If they see a bright red stove, they might think "shiny object!" and their little GPT processes think "what are some things that might come next?" and one of them is "touch the stove" and one of them is "look at it intently" and a third is "shrug and wander off". They pick "touch the stove" and then OUCH.

After a few iterations, they reach a point where, when they hypothesize "maybe the next action should be 'touch the stove'", they get a little flash of "but, two steps later, it will hurt, and that will be bad."

One way to conceive of this is "GPT-style, but you predict two words ahead instead of one."

But I don't think that's right. I think it's more like: "GPT style, but thinking certain thoughts brings up associations, and some associations just directly change the likely next actions. i.e. you think "touch the stove!" and then you think "ow!" and then, "ow" is treated as an incorrect end to the sentence of the narrative you're constructing. So you don't do the "touch stove" action. 

Eventually this is cached into the System 1 GPT system such that "touch the stove" has a low predictive weight of "thing you might do", and it doesn't even come up any more.

Step 2: Toddlers and Mom Yelling

The first time Billy the Toddler came upon a hot stove, he reached out to touch it, and beforehand, Mom yelled "Billy don't touch that!"

And, possibly, Billy touched it anyway. And then he learned "ow!" and also learned that "Mom yells" is something that correlates with "ow!", which propagates back into his model of what sort of actions are good next-actions-to-take.

Or, previously, perhaps Billy had done some less immediately painful thing – perhaps walking into the street. Mom yells at him. He ignores her. A nearby car slows down, and doesn't hit him, so he doesn't learn "street ==> cars ==> bad". But, his Mom then runs and grabs him and pulls him away from the street, which is kinda painful. So he does gain the "Mom yelling ==> bad" association (as well as the "Walk into street" ==> "Mom Yelling" association).

Eventually Mom Yelling is treated as a failure state in the "predict which action I should take next" function.

Step 3: Internalized Mom Yelling

A couple years later, Billy periodically comes across novel situations – perhaps a wild animal in his backyard. This might remind him of similar situations where Mom Yelled in the past. By now, Billy doesn't need to hear Mom yell at him directly, he's able to think "Cool situation! Take Action" ==> "Hmm, Mom may yell at me later" ==> "Failure state" ==> "Okay, back up a step, take a different action."

And eventually this voice gets internalized as some kind of conscience/morality/guide, which doesn't even need to be physically present or temporally proximate to be relevant. 

You could model this as "GPT style thinking, but predicting multiple steps down the line instead of just one or two." But, I think this doesn't match my internal experience. It's often many steps down the line that a bad thing would happen to me, that I need to avoid. More steps than I could feasibly be modeling. 

I think the direct-association...

  • Previous chunk: "notice dangerous situation"
  • Next chunk: "association with mom yelling" (evaluates to "low predicted reward")

...is simpler to execute than:

  • Previous chunk: "notice dangerous situation"
  • Next chunk 1: "go play in dangerous situation"
  • Next chunks 2 - 10: Do a bunch of steps in the dangerous situation"
  • Chunk N: Mom eventually finds out and yells (evaluates to "low predicted reward")

Step 3: Internalized Punishment and Reward for Types of Cognition

Eventually Billy gains some internalized concept of "X is bad", which can be directly associated with various inputs. 

Social Shame

For me, Doing X Would be Bad is often social shaped. For example, I often go to write some crappy code by randomly cobbling some things together. And then I get a visceral image of my coworkers complaining at me, saying "Ray! Use your brain! What would good code look like here?" and then I say "sigh... fine", and then I boot up the associations about what good code looks like and how to construct it.

Or, I'm debugging, and randomly changing things until it works or adding console.log statements to hope they'll reveal something obvious. And then my shoulder-angel-coworker pops up and says "Man, Ray, that is not how to debug code. Think!" and then I pull up my "what does actually debugging for real look like?" associations, and see what next-actions they pull up for me to consider, and then I do one of those.

(In this case, next-action-chunks include things like "make a list of what I know to be true about the code" and "check what the inputs to this function are and where they came from", which at my current skill level feel like atomic actions.)

Internalized Taste 

A different way some people work (including myself in some domains) is less "social" and more "aesthetic." Good coders develop "bad code smell", and (I'd guess in the case of debugging) "bad habits smell", where if they find themselves debugging by randomly changing things, they think "obviously this is stupid and inefficient", and then seek out the associated next-actions that are more helpful. 

Step 4: Strengthened "Good" Habits

Eventually, you streamline the steps of "notice a problem" => "consider a bad strategy to solve the problem" => "feel bad" => "find a good thing to do" => "do that", and go directly to "have a problem" => "use a good solution to the problem". 

And this gets distilled into increasingly streamlined chunks, until master-level artisans do lots of sophisticated techniques that get bucketed into a single action.

But, really, what about deep planning and models and creativity?

As mentioned earlier, I do some complicated thought via chains-of-association. For example:

  • Notice that I've been focusing on only one hypothesis
  • Remember that I should be looking for alternate hypotheses
  • Think "hmm, how do I get more hypotheses?"
  • Start listing out half-remembered hypotheses that vaguely feel connected to the situation.
  • Realize that listing random half-remembered hypotheses isn't a very good strategy
  • Remember that a better strategy might be to make a list of all the facts I know about the phenomenon I'm investigating (without exactly remembering why I believe this)
  • Make that list of facts (using either my working memory, or augmented memory like a notebook)
  • For each relevant fact, ask myself "what must be true given this fact?" ("ah", I now think. "This is why it's useful to list out true facts. I can then conduct a search for other dependent facts, which builds up a useful web of associations")
  • Use the list of facts and consequences to generate a new, more focused set of hypotheses

This does involve a mixture of System 1 and System 2 thinking (where system 2 involves slower, more laborious use of working memory and considering different options). But it's still mostly composed a bunch of atomic concepts.

Sarah Constantin's Distinctions in Types of Thought explores the possibility that deep neural nets basically have the "effortless System 1" type thinking, without being good at the slower, deliberate System 2 style thinking. I wouldn't be that surprised if GPT-2 was "only" a System 1. But I also wouldn't be that surprised if it naturally developed a System 2 when scaled up, and given more training. I also wouldn't be that surprised if it turned out not to need a System 2.

What's happening in System 2 thought? 

The genre of this post is now going to abruptly switch to "Raemon narrates in realtime his thought process, as he tries to doublecheck what actually is going on in his brain."

Okay, I want to demonstrate System 2 thought. What are some examples of System 2 thought?

(Spins gears for a few minutes, unproductively)

Suddenly I remember "Ah, the bat/baseball problem is a classic System 2 problem." If I'm asked: "A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?”, what actually is going on in my head?

First, I think "Obviously the answer is '10c'"

Then, I think "Wait. no I know this problem, I think the answer is 5c" (determined via memory). But, I'm writing a blogpost right now where I'm trying to articulate what System 2 thought is like, and it would be helpful to have a real example. What if I rearranged the numbers in the problem and force myself to solve it again?

New Problem: "A bat and a ball cost $2.25 in total. The bat costs $2.00 more than the ball. How much does the ball cost?”

Great. Now the obvious answer is 25c, and that's probably wrong. Okay, how do I actually solve this? Okay, boot up my laborious arithmetic brain.

My first impulse is to subtract $2.00... wait that's literally the first thing my brain did.

Okay, what kind of math is this?

It's algebra I think?

X + Y = 2.25

X + (X + 2) = 2.25

2X + 2 = 2.25

X + 1 = 1.125

X =  .125

Is that right? I think so. I don't actually care since the goal here was to doublecheck what my internal thought process is like, not get a right answer.

I notice that once I remembered to Do Actual Math, I mostly used quick associations rather than anything effortful, which feels more GPT-2 style. I think in elementary school those steps would have each been harder.

The more interesting part here was not the "how to solve the bat/baseball" part, but how to find an actual good example of System 2 thinking part. That felt quite effortful. I didn't have any immediate associations, so I was conducting a search, and moreover the search process wasn't very effective. (I think much of advanced cognition, and the Tuning Your Cognitive Algorithms process, is about figuring out what techniques enable you to more effectively search when you don't have an obvious search algorithm)

What Other Kinds of Thought Are There?

I notice this whole section is downstream of a feeling, where I have noticed that I haven't actually tried to comprehensively answer "is all of my thought processes explainable via predict-the-next-thing-then-do-it?". I have a nagging sense of "if I post this, there's a good chance someone will either poke a hole in it, or poke a hole in my generating thought process. Mama Rationalist is going to yell at me."

An obvious thing to do here is list out the types of advanced thinking I do, and then check how each one actually works by actually doing it.

I just did "elementary school math." What are some others?

1. Creatively combine existing concepts

I think this is how most novel ideas form. I think GPT-2 does very basic versions of this, but I haven't seen it do anything especially impressive.

One concrete algorithm I run is: "put two non-associated words next to each other, and see how they compile." An example of an upcoming blogpost generated this way is "Integrity Debt", which was born by literally taking the phrase "Technical Debt", swapping in "Integrity", and then checking "what does this concept mean, and is it useful?"

More often, I do a less intentional, fuzzier version of this where multiple concepts or sensory experiences get meshed together in my mind. Abram recounts a similar experience in Track Back Meditation

At one point a couple of years ago, I noticed that I was using a particular visual analogy to think about something, which didn't seem like a very good analogy for what I was thinking about. I don't recall the example, but, let's say I was using a mental image of trees when thinking about matrix operations. I got annoyed at the useless imaginary trees, and wondered why I was imagining them. Then, I noticed that I was physically looking at a tree! This was fairly surprising to me. Some of the surprise was that I took a random object in my visual field to use for thinking about something unrelated, but more of the surprise was that I didn't immediately know this to be the case, even when I wondered why I was imagining trees.

After I noticed this once, I started to notice it again and again: objects from my visual field end up in my imagination, and I often try to use them as visual analogies whether they're appropriate or not. It quickly became a familiar, rather than surprising, event. More interestingly, though, after a while it started to seem like a conscious event, rather than an automatic and uncontrollable one: I've become aware of the whole process from start to finish, and can intervene at any point if I wish.

2. Figure out what do do, for a problem where none of my existing associations are relevant enough to help solve it.

This is just actually pretty hard. Even Newton had to get hit on the head with the apple. Archimedes had to sit in the bathtub. Most human thought just isn't very original and incrementally advances using known associations.

I think most of the "good thought strategy" here involves figuring out efficient ways of exposing yourself to new concepts that might help. (This includes scholarship, getting a wider variety of life experience, and actually remembering to take relaxing baths from time to time)

I think there is a teeny fraction of this that looks like actually babbling entirely novel things at random (sometimes in a semi-directed fashion), and then seeing if they point in a useful direction. This is hard because life is high dimensional. Anyone who actually succeeds at this probably had to first develop a sense of taste that is capable of processing lots of details and get at least a rough sense of whether a sniff of an idea is promising.

3. Do advanced math that is on the edge of my current skills, where every step is novel.

I think this mostly involves repeatedly querying "what are the actual steps here?", and then applying some effortful directed search to remember the steps.


Speaking of which...

4. Mulling over concepts until I understand them deeply, have an epiphany, and experience a 'click'.

There's a particular sensation where I finally map out all the edges of a fuzzy concept, and then have a delightful moment when I realize I fully understand it. Prior to this, I have a distinctively uncomfortable feeling, like I'm in a murky swamp and I don't know how to get out.

I think this is the process by which a complicated, multi-chunk concept gets distilled into a single chunk, allowing it to take up less working memory.

5. Procedural Knowledge (i.e. Riding Bicycles)

I think this is actually mostly the same as #4, but the concepts are often differently shaped – physical awareness, emotional attunement, etc. It doesn't come with the same particular "click" qualia that intellectual concepts have for me, but there is often a "suddenly this is easy and feels effortless" feeling.

How Do You Think?

So those are all the ways that I think, that I can easily remember. How do you do your thinking? Are there any pieces of it that I missed here? (I'm wondering how much variety there is in how people think, or experience thought, as well as whether I missed some of my own thinking-types)

Implications for AI

My actual motivation here was to get a sense of "Are there any major roadblocks for human level intelligence, or do we basically have all the pieces?" (While I wrote this post, a couple other posts came out that seemed to be exploring the same question)

My current sense is that all the most advanced thinking I do is made of simple parts, and it seems like most of those parts roughly correspond to facets of modern ML research. This wouldn't mean AGI is right around the corner – my understanding is that when you zoom into the details there's a fair amount of finicky bits. Many of the most interesting bits aren't using the same architecture, and integrating them into some kind of cohesive whole is a huge, multi-step project. 

But, it seems like most of the remaining work is more like "keep plugging away at known problems, improve our ability to efficiently process large amounts of information, and incrementally improve our ability to learn in general ways from smaller bits of information".

This post doesn't really attempt to make the "AGI is near" case, because I'm not actually an ML researcher nor even a competent hobbyist. I think seriously investigating that is another post, written by someone who's been paying much closer attention than me.

For the immediate future, I'm interested in various LW contributors answering the question "How do you think you think, and why do you think you think that way?"

New to LessWrong?

New Comment
23 comments, sorted by Click to highlight new comments since: Today at 10:04 AM

I definitely agree that most people most of the time are using an algorithm like babble-with-associations, with feedback refining the associations over time. That said, I do think it is possible to do better, and at least some people (especially STEM people working on hard problems) learn to use other methods at least sometimes. In the rationality-movement-I'd-like-to-see, systematizing and training such more-efficient-thinking-algorithms is a central piece of the curriculum, and a major goal of the movement.

I've been thinking about explicit models of more-efficient-thinking-algorithms for a while now - Mazes and Crayon was an early piece along those lines, and of course a lot of the gears pieces are tied in. I have at least one thinking-algorithm which is more efficient on novel problems than vanilla babble-with-associations and which I think is trainable. I've been working towards a write-up (Everyday Lessons from High-Dimensional Optimization was partly intended as background for that write-up).

That write-up isn't ready yet, but here are some sparknotes:

  • We use gears-level models because black-box problem solving in high dimensions is inefficient. The idea of "gears" is to decompose the system into coupled low-dimensional subsystems.
  • My current best guess is that equations/constraints are a universal representation for gears. E.g. in a causal model (specifically structural equations), the equations/constraints are the structural equations themselves. Likewise in physics/chemistry/engineering/etc: the constraints are the system's governing equations. When we want to problem-solve on the system, those governing equations act as constraints on our problem.
    • I expect that humans' native representation of gears-level models amounts to something equivalent to constraints as well.
  • This immediately suggests a whole class of problem-solving strategies, in particular:
    • constraint relaxation to generate heuristics (e.g. "could I solve this problem ignoring this constraint?")
    • dual methods (e.g. thinking about tautness/slackness of constraints - which constraints are "easy", and which are limiting our ability to solve the problem)
  • There's also the question of how to figure out the equations/constraints in the first place. Presumably we could look for constraints-on-the-constraints, but then we go up a whole ladder of meta-levels... except we actually don't. Here again, we can leverage duality: constraints-on-constraints are partial solutions to the original problem. So as we go "up" the meta-ladder, we just switch back-and-forth between two problems. One specific example of this is simultaneously looking for a proof and a counterexample in math. A more visual example is in this comment.
    • Concretely, this process looks like trying out some solution-approaches while looking for any common barrier they run into, then thinking about methods to specifically circumvent that barrier (ignoring any other constraints), then looking for any common barrier those methods run into, etc...

The key feature of this class of thinking-algorithms is that they can get around the inefficiency issues in high-dimensional problems, as long as the problem-space decomposes - which real-world problem spaces usually do, even in cases where you might not expect it. This is basically just taking some standard tricks from AI (constraint relaxation, dual methods), and applying them directly to human thinking.

Note that this can still involve babbly thinking for the lower-level steps; the whole thing can still be implemented on top of babble-with-associations. The key is that we only want to babble at low-dimensional subproblems, while using a more systematic approach for the full high-dimensional problem.

This all sounds true (and I meant it to be sort of implied by the post, although I didn't delve into every given possible "improved algorithm", and perhaps could have picked a better example.)

What seemed to me was the gears/model-based-thinking still seems implemented on babble, not just for the lower level steps, but for the higher level systematic strategy. (I do think this involves first building some middle-order thought processes on top of the babble, and then building the high level strategy out of those pieces)

i.e. when I use gears-based-systematic-planning, the way the pieces of the plan come together still feel like they're connected via the same underlying associative babbling. It's just that I'd have a lot of tight associations between collections of strategies, like:

  • Notice I'm dealing with a complex problem
  • Complex problem associates into "use the appropriate high level strategy for this problem" (which might involve first checking possible strategies, or might involve leaping directly to the correct strategy)
  • Once I have a gears-oriented strategy, it'll usually have a step one, then a step two, etc (maybe looping around recursively, or with branching paths) and each step is closely associated with the previous step.

Does it feel differently to you?

When you do the various techniques you describe above, what is the qualia and low-level execution of it feel like?

I do think it's all ultimately implemented on top of something babbly, yeah. The babbly part seems like the machine code of the brain - ultimately everything has to be implemented in that.

I think what I mean by "gearsy reasoning" is something different than how you're using the phrase. It sounds like you're using it as a synonym for systematic or system-2 reasoning, whereas I see gears as more specifically about decomposing systems into their parts. Gearsy reasoning doesn't need to look very systematic, and systematic reasoning doesn't need to be gearsy - e.g. simply breaking things into steps is not gearsy reasoning in itself. So the specific "tight associations" you list do not sound like the things I associate with gearsy thinking specifically.

As an example, let's say I'm playing a complicated board game and figuring out how to get maximum value out of my resources. The thought process would be something like:

  • Ok, main things I want are X, Y, Z -> what resources do I need for all that?
  • (add it up)
  • I have excess A and B but not enough C -> can I get more C?
  • I have like half a dozen ways of getting more C, it's basically interchangeable with B at a rate of 2-to-1 -> do I have enough B for that?
  • ...

So that does look like associative babbling; the "associations" it's following are mainly the relationships between objects given by the game actions, plus the general habit of checking what's needed (i.e. the constraints) and what's available.

I guess one insight from this: when engaged in gears-thinking, it feels like the associations are more a feature of the territory than of my brain. It's not about taste, it's about following the structure of reality (or at least that's how it feels).

I think what I mean by "gearsy reasoning" is something different than how you're using the phrase. It sounds like you're using it as a synonym for systematic or system-2 reasoning, whereas I see gears as more specifically about decomposing systems into their parts.

Yeah. My reply was somewhat general and would work for non-gearsy strategies as well. I do get that gearsiness and systematicness are different axes and strategies can employ them independently. I was referring offhandedly to "systematic gearsiness" because it's what you had just mentioned and I just meant to convey that the babble-process worked for it.

i.e, I think your list that begins "Okay, the main things I want are X, Y and Z..." follows naturally from my list that ends "Once I have a gears-oriented strategy, it'll usually have a step one..."

I guess one insight from this: when engaged in gears-thinking, it feels like the associations are more a feature of the territory than of my brain. It's not about taste, it's about following the structure of reality.

The way I'd parse it is that I have some internalized taste that "when figuring out a novel, complex problem, it's important to look for associations that are entangled with reality". And then as I start exploring possible strategies to use, or facts that might be relevant, "does this taste gearsy?" and "does this taste 'entangled with reality'" are useful things to be able to check. (Having an aesthetic taste oriented around gearsy-entangledness lets you quickly search or rule out directions of thought at the sub-second level, which might then turn into deliberate, conscious thought)

Alternately: I'm developing a distaste for "babbling that isn't trying to be methodical" when working on certain types of problems, which helps remind me to move in a more methodical direction (which is often but not always gearsy)

[edit: I think you can employ gearsy strategies without taste, I just think taste is a useful thing to acquire

For my paper "How Feasible is the Rapid Development of Artificial Superintelligence?", I looked at some of the existing literature on human expertise to develop a model of exactly what it is that human intelligence consists of.

As a very rough distinction, somewhat analogous to Type 1 and Type 2 reasoning, we can divide human expertise into two components: pattern recognition and mental simulation. An excerpt:

There exists a preliminary understanding, if not of the details of human decision-making, then at least the general outline. A picture that emerges from this research is that expertise is about developing the correct mental representations (Klein 1999, Ericsson and Pool 2016).

A mental representation is a very general concept, roughly corresponding to any mental structure forming the content of something that the brain is thinking about (Ericsson and Pool 2016).

Domain-specific mental representations are important because they allow experts to know what something means; know what to expect; know what good performance should feel like; know how to achieve the good performance; know the right goals for a given situation; know the steps necessary for achieving those goals; mentally simulate how something might happen; learn more detailed mental representations for improving their skills (Klein 1999, Ericsson and Pool 2016).

Although good decision-making is often thought of as a careful deliberation of all the possible options, such a type of thinking tends to be typical of novices (Klein 1999). A novice will have to try to carefully reason their way through to an answer, and will often do poorly regardless, because they do not know what things are relevant to take into account and which ones are not. An expert does not need to—they are experienced enough to instantly know what to do. 

A specific model of expertise is the recognition-primed decision-making model (Klein 1999). First, a decision-maker sees some situation, such as a fire for a firefighter or a design problem for an architect. The situation may then be recognized as familiar, such as a typical garage fire. Recognizing a familiar situation means understanding what goals make sense and what should be focused on, which cues to pay attention to, what to expect next and when a violation of expectations shows that something is amiss, and knowing what the typical ways of responding are. Ideally, the expert will instantly know what to do.

The expectations arising from mental representations also give rise to intuition. As one example, Klein (1999) describes the case of a firefighter lieutenant responding to a kitchen fire in an ordinary one-story residential house. The lieutenant’s crew sprayed water on the fire, but contrary to expectations, the water seemed to have little impact. Something about the situation seemed wrong to the lieutenant, who ordered his crew out of the house. As soon as they had left the house, the floor where they had been standing collapsed. If the firefighters had not pulled out, they would have fallen down to the fire raging in the basement. The lieutenant, not knowing what had caused him to give the order to withdraw, initially attributed the decision to some form of extra-sensory perception.

In a later interview, the lieutenant explained that he did not suspect that the building had a basement, nor that the seat of the fire was under the floor that he and his crew were standing on. However, several of his expectations of a typical kitchen fire were violated by the situation. The lieutenant was wondering why the fire did not react to water as expected, the room was much hotter than he would have expected out of a small kitchen fire, and while a heat that hot should have made a great deal of noise, it was very quiet. The mismatch between the expected pattern and the actual situation led to an intuitive feeling of not knowing what was going on, leading to the decision to regroup. This is intuition: an automatic comparison of the situation against existing mental representations of similar situations, guiding decision-making in ways whose reasons are not always consciously available.

In an unfamiliar situation, the expert may need to construct a mental simulation of what is going on, how things might have developed to this point, and what effect different actions would have. Had the floor mentioned in the previous example not collapsed, given time the firefighter lieutenant might have been able to put the pieces together and construct a narrative of a fire starting from the basement to explain the discrepancies. For a future-oriented example, a firefighter thinking about how to rescue someone from a difficult spot might mentally simulate where different rescue harnesses might be attached on the person, and whether that would exert dangerous amounts of force on them.

Mental representations are necessary for a good simulation, as they let the expert know what things to take into account, what things could plausibly be tried, and what effects they would have. In the example, the firefighter’s knowledge allows him to predict that specific ways of attaching the rescue harness would have dangerous consequences, while others are safe.

Similar to the firefighter's intuition, GPT-2 has the ability to make predictions about what's most likely to "happen next". But there are also several differences.

Most notably, GPT-2's only goal is just that: predict what's going to happen next. This is a much more limited task than the one faced by (for example) a human firefighter, who needs to not just predict how a fire might proceed, but also how to best respond to it and which actions to take.

Let's take a concrete example and look at it in more detail; from Klein 1999:

The initial report is of flames in the basement of a four-story apartment building: a one-alarm fire. The [firefighter] commander arrives quickly and does not see anything. There are no signs of smoke anywhere. He finds the door to the basement, around the side of the building, enters, and sees flames spreading up the laundry chute. That's simple: a vertical fire that will spread straight up. Since there are no external signs of smoke, it must just be starting. 

The way to fight a vertical fire is to get above it and spray water down, so he sends one crew up to the first floor and another to the second floor. Both report that the fire has gotten past them. The commander goes outside and walks around to the front of the building. Now he can see smoke coming out from under the eaves of the roof. 

It is obvious what has happened: the fire has gone straight up to the fourth floor, has hit the ceiling there, and is pushing smoke down the hall. Since there was no smoke when he arrived just a minute earlier, this must have just happened. It is obvious to him how to proceed now that the chance to put out the fire quickly is gone. He needs to switch to search and rescue, to get everyone out of the building, and he calls in a second alarm. The side staircase near the laundry chute had been the focus of activity before. Now the attention shifts to the front stairway as the evacuation route.

And this picture shows the general algorithm for recognition-primed decision-making. Breaking down the story, we might say that something like the following happened:

1. The commander sees no smoke outside, then flames spreading up the laundry chute. These are the cues that allow him to recognize this as a vertical fire that is just starting.

2. The commander's mental representation of vertical fires includes plausible goals, expectancies of what is going to happen, and actions that could further the goals. A plausible goal for this situation: put the fire out quickly, before it has a chance to spread. An action that would further it: send people to spray water on the fire from above. A rapid mental simulation suggests that this should work, so he gives the order.

3. The crews report that the fire has gotten past them. This violates the expectancy that the fire should be in the basement only; to diagnose this anomaly, the commander goes outside to gather more data. When he sees the smoke coming up from the roof, this allows him to construct a story of what has happened.

4. The situation is now different, calling up a new mental representation: that of a fire that has spread from the basement to the top floor. Plausible goals in this situation: get everyone out of the building. Actions to take here: call in reinforcements, get people to the front stairway to carry out an evacuation.

As at least one major difference, GPT-2 never does the thing where it expects that something will happen, and then takes actions to re-evaluate the situation if the prediction goes wrong. If it predicts "the word after 'maximize' is going to be 'paperclip'" with 90% confidence, finding out that it's actually followed by 'human values' doesn't cause it to...

Actually, I don't need to complete that sentence, because "seeing that it was mistaken" isn't actually a thing that happens to GPT-2 in the first place. It does get feedback to its predictions during its training phase, but once it has been trained, it will never again compare its prediction with the actual result. You just give it a prompt and then it tries to predict the rest, that's it. If you give it one prompt, have it predict the rest of it, and then give it a revised prompt with the correct completion, it has no idea that you are doing this. It just sees one prompt and then another. This makes it incapable of noticing that its expectations are violated, gathering more information in return, and then constructing a story of what happened and what kind of a situation it's actually in.

You could probably apply it to something like "predict what a human firefighter would do in this situation" (imitation learning), but as anyone can verify by playing AI Dungeon (which now uses GPT-3 not GPT-2), its predictions get very nonsensical very quickly. It doesn't really do the kind of causal reasoning that would involve mental simulations to produce novel responses, e.g. the following example from Klein:

A [firefighter] lieutenant is called out to rescue a woman who either fell or jumped off a highway way overpass. She is drunk or on drugs and is probably trying to kill herself. Instead of falling to her death, she lands on the metal supports of a highway sign and is dangling there when the rescue team arrives. 

The lieutenant recognizes the danger of the situation. The woman is semiconscious and lying bent over one of the metal struts. At any moment, she could fall to her death on the pavement below. If he orders any of his team out to help her, they will be endangered because there is no way to get a good brace against the struts, so he issues an order not to climb out to secure her. 

Two of his crew ignore his order and climb out anyway. One holds onto her shoulders and the other to her legs. 

A hook-and-ladder truck arrives. The lieutenant doesn't need their help in making the rescue, so tells them to drive down to the highway below and block traffic in case the woman does fall. He does not want to chance that the young woman will fall on a moving car. 

Now the question is how to pull the woman to safety. 

First, the lieutenant considers using a rescue harness, the standard way of raising victims. It snaps onto a person's shoulders and thighs. In imagining its use, he realizes that it requires the person to be in a sitting position or face up. He thinks about how they would shift her to sit up and realizes that she might slide off the support. 

Second, he considers attaching the rescue harness from the back. However, he imagines that by lifting the woman, they would create a large pressure on her back, almost bending her double. He does not want to risk hurting her. 

Third, the lieutenant considers using a rescue strap-another way to secure victims, but making use of a strap rather than a snap-on harness. However, it creates the same problems as the rescue harness, requiring that she be sitting up or that it be attached from behind. He rejects this too. 

Now he comes up with a novel idea: using a ladder belt-a strong belt that firefighters buckle on over their coats when they climb up ladders to rescue people. ple. When they get to the top, they can snap an attachment on the belt to the top rung of the ladder. If they lose their footing during the rescue, they are still attached to the ladder so they won't plunge to their death. 

The lieutenant's idea is to get a ladder belt, slide it under the woman, buckle it from behind (it needs only one buckle), tie a rope to the snap, and lift her up to the overpass. He thinks it through again and likes the idea, so he orders one of his crew to fetch the ladder belt and rope, and they tie it onto her. 

In the meantime, the hook-and-ladder truck has moved to the highway below the overpass, and the truck's crew members raise the ladder. The firefighter on the platform at the top of the ladder is directly under the woman shouting, "I've got her. I've got her." The lieutenant ignores him and orders his men to lift her up. 

At this time, he makes an unwanted discovery: ladder belts are built for sturdy firefighters, to be worn over their coats. This is a slender woman wearing a thin sweater. In addition, she is essentially unconscious. When they lift her up, they realize the problem. As the lieutenant put it, "She slithered through the belt like a slippery strand of spaghetti." 

Fortunately, the hook-and-ladder man is right below her. He catches her and makes the rescue. There is a happy ending. 

Now the lieutenant and his crew go back to their station to figure out what had gone wrong. They try the rescue harness and find that the lieutenant's instincts were right: neither is usable. 

Eventually they discover how they should have made the rescue. They should have used the rope they had tied to the ladder belt. They could have tied it to the woman and lifted her up. With all the technology available to them, they had forgotten that you can use a rope to pull someone up.

Consider the lieutenant's first idea. Possibly GPT-2 might have been able to notice that statistically, firefighters typically use rescue harnesses in situations like this. But it doesn't do any mental simulation to see what the predicted outcome of using that harness would be. If there had been enough previous situations where a harness was unusable, and enough cues to indicate to GPT-2 that this was one of those situations, then it could accurately predict that the rescuers would do something different. But if this is a novel situation (as most situations are), then it needs to actually do causal reasoning and notice that the woman would slide off the support. (This is similar to your "check for badness" thing, except it happens via mental simulation rather than just association.)

Orthonormal gives us a real-life example of what happens when an AI uses pattern-matching, but does not do causal reasoning, and then tries to play Starcraft:

The overhyped part is that AlphaStar doesn't really do the "strategy" part of real-time strategy. Each race has a few solid builds that it executes at GM level, and the unit control is fantastic, but the replays don't look creative or even especially reactive to opponent strategies.

That's because there's no representation of causal thinking - "if I did X then they could do Y, so I'd better do X' instead". Instead there are many agents evolving together, and if there's an agent evolving to try Y then the agents doing X will be replaced with agents that do X'. But to explore as much as humans do of the game tree of viable strategies, this approach could take an amount of computing resources that not even today's DeepMind could afford.

(This lack of causal reasoning especially shows up in building placement, where the consequences of locating any one building here or there are minor, but the consequences of your overall SimCity are major for how your units and your opponents' units would fare if they attacked you. In one comical case, AlphaStar had surrounded the units it was building with its own factories so that they couldn't get out to reach the rest of the map. Rather than lifting the buildings to let the units out, which is possible for Terran, it destroyed one building and then immediately began rebuilding it before it could move the units out!)

AlphaStar notices that the units are trapped, which it associates with "must destroy the thing that is trapping them". Then it notices that it is missing a factory, and its associations tell it that in this situation it should have one more factory, and it should be located right where the destroyed factory should be, so...

In contrast, a human might have considered destroying the factory, but then noticed that this leads to a situation where there is one factory too little; and then realized that the building can just be lifted out of the way.

Here is Klein's illustration of how a generic mental simulation seems to work; he also has illustrations of the more specific variant of explaining the past (e.g. the firefighter commander constructing a story of how the basement fire had spread) and projecting to the future (e.g. the firefighter lieutenant trying to figure out how to rescue the woman). Here's him explaining a part of the figures:

Consider this example. Some need arises for building a mental simulation; let us say a coworker has suddenly started acting rudely toward you. The simulation has to let you infer what the original situation was that led to the events you are observing. You assemble the action sequence: the set of transitions that make up the simulation. Perhaps you recall an incident that same morning when you were chatting with some other people in your office and said something that made them laugh. Perhaps you also recall that earlier that morning, your coworker had confided some embarrassing secret to you. So you construct a sequence in which your coworker trusts you with a confidence, then regrets it immediately afterward and feels a little awkward around you, then sees you possibly entertaining some other people with the secret, and then feels that it is going to be unbearable to live with you in the same setting. Now you can even remember that after you made the other people laugh, you looked up and saw the coworker giving you a look that made you feel uneasy. This set of states and transitions is the action sequence, the mental simulation that explains the rude behavior.

The next step is to evaluate the action sequence at a surface level. Is it coherent (Do the steps follow from each other)? Yes, it is. Does it apply (Does the sequence account for the rudeness)? Yes, it does. How complete is it (Does it leave out any important factors, such as the excellent performance evaluation you have just received)? Yes, there are some more pieces that might belong to the puzzle. But in general, the mental simulation passes the internal evaluation. It is an acceptable explanation. That does not mean it is correct.

Sometimes the mental simulation will not pass the internal evaluation, and that also helps you make sense of things. [The following example] illustrates this with a story reported in a newspaper. [...]

The IRA Terrorist. A well-respected lawyer has agreed to defend a man accused of committing an act of terrorism: planting a bomb for the IRA. The lawyer, asked why he would take the case, answers that he interviewed the accused man, who was shaking and literally overcome with panic. He was surprised to see the man fall apart like that. He tried to imagine the IRA's recruiting such a person for a dangerous mission and found that he could not. He cannot conjure up a scenario in which the IRA would give a terrorism assignment to a man like this, so his conclusion is that the man is innocent.

This lawyer could not generate an action sequence that passed his internal evaluation-specifically, the requirement that the transition between steps be plausible. His failure to assemble a plausible sequence of steps led him to a different explanation than the prosecutors had formed. That's why you see a long, curving arc in figure 5.4: the failure to assemble the mental simulation was the basis of the conclusion.

There are also times when you use mental simulation to try to increase your understanding of situations like these. You are trying to build up better models. When you run the action sequence in your mind, you may notice parts that still seem vague. Maybe you can figure out how to set up a better action sequence, or maybe there are some more details about the present state that you should gather. Going back to the example of your coworker, your explanation has not included the fact that you received such a good performance evaluation. What was your coworker's performance evaluation? Possibly the coworker felt you had gotten recognition for work that someone else had done. Perhaps you can get a general sense of the situation by talking to your boss. That might give you some more data points for building your explanations.

(If you look at explanations of how GPT's Transformer architecture works [1, 2], you can see that it doesn't do anything like this.)

So, doublechecking my comprehension:

In my OP, my claim was basically "you probably can get human-level output out of something GPT-like by giving it longer term rewards/punishments, and having it continuously learn" (i.e. give it an actual incentive to figure out how to fight fires in novel situations, which current GPT doesn't have).

I realize that leaves a lot of fuzziness in "well, is it really GPT if has a different architecture that continuously learns and has longterm rewards?". My guess was that it'd be fairly different from GPT architecturally, but that it wouldn't depend on architectural insights we haven't already made, it'd just be work to integrate existing insights.

Is your claim "this is insufficient – you still need working memory and the ability to model scenarios, and currently we don't know how to do that, and there are good reasons to think that throwing lots of data and better reward structures at our existing algorithms won't be enough to cause this to develop automatically via Neural Net Magic?"

Is your claim "this is insufficient – you still need working memory and the ability to model scenarios, and currently we don't know how to do that, and there are good reasons to think that throwing lots of data and better reward structures at our existing algorithms won't be enough to cause this to develop automatically via Neural Net Magic?"

So at this point I'm pretty uncertain of what neural nets can or can not learn to do. But at least I am confident in saying that GPT isn't going to learn the kinds of abilities that would be required for actually fighting fires, as it is trained and tested on a fundamentally static task, as opposed to one that requires adapting your behavior to a situation as it develops. For evaluating at progress on those, projects like AlphaStar look like more relevant candidates.

I don't feel confident in saying whether some combination of existing algorithms and training methods could produce a system that approached the human level on dynamic tasks. Most people seem to agree that we haven't gotten neural nets to learn to do good causal reasoning yet, so my understanding of the expert consensus is that current techniques seem inadequate... but then the previous expert consensus would probably also have judged neural nets to be inadequate for doing many of the tasks that they've now mastered.

Thanks, this is great. May have more thoughts after thinking it over a bit.

Meanwhile, Jeff Hawkins says "Every part of the neocortex is running the same algorithm", and it's looking like maybe brains aren't doing that complicated a set of things.

This is nitpicking, but your post goes back and forth between the "underlying algorithm" level and the "learned model" level. Jeff Hawkins is talking about the underlying algorithm level when he says that it is (more or less) the same in every part of the neocortex. But almost all the things you mention in "My algorithm as I understand it" are habits of thought that you've learned over the years. (By the same token, we should distinguish between "Transformer + SGD" and "whatever calculations are being done by the particular weight settings in the trained Transformer model".)

I don't expect there to be much simplicity or universality at the "learned model" level ... I expect that people use lots of different habits of thought.

Has anyone done anything like "Train a neural net on Reddit, where it's somehow separately rewarded for predicting the next word, and also for predicting how much karma a cluster of words will get, and somehow propagating that back into the language generation?")

I imagine the easiest thing would be to pre-pend the karma to each post, fine-tune the model, then you can generate high-karma posts by just prompting with "Karma 1000: ...". I'm not aware of anyone having done this specific thing but didn't check. I vaguely recall something like that for AlphaStar, where they started by imitation learning with the player's skill flagged, and then could adjust the flag to make their system play better or worse.

What's happening in System 2 thought?

If you haven't already, see Kaj's Against System 1 and System 2. I agree with everything he wrote; the way I would describe it is: Our brains house a zoo of compositional generative models, and system 2 is a cool thing where generative models can self-assemble into an ad-hoc crappy serial computer. For example, you can learn a Generative Model X that first summons a different Generative Model Y, and then summons either Generative Model Z₁ or Z₂ conditional on some feature of Generative Model Y. (Something like that ... I guess I should write this up better someday.) Anyway, this is a pretty neat trick. Can a trained Transformer NN do anything like that? I think there's some vague sense in which a 6-layer Transformer can do similar things as a series of 6 serial human thoughts maybe?? I don't know. There's definitely a ton of differences too.


My vague sense about foresight (rolling out multiple steps before deciding what to do) is that it's helpful for sample-efficiency but not required in the limit of infinite training data. Some examples: in RL, both TD learning and tree search eventually converge to the same optimal answer; AlphaGo without a tree search is good but not as good as AlphaGo with a tree search.

Perhaps not coincidentally, language models are pretty sample inefficient compared to people...

In my everyday life, I feel like my thoughts very often involve a sequence of two or three chunks, like "I will reach into my bag and then pull out my wallet", and somewhat less often is it a longer sequence than that, but i dunno.

Maybe "AlphaStar can't properly block or not block narrow passages using buildings" is an example where it's held back by lack of foresight.

Thanks! Will reply to some different bits separately. First, on reddit-karma training: 

I imagine the easiest thing would be to pre-pend the karma to each post, fine-tune the model, then you can generate high-karma posts by just prompting with "Karma 1000: ...".

This doesn't accomplish what I'm going for (probably). The key thing I want is to directly reward GPT disproportionately in different circumstances. As I currently understand it, every situation for GPT is identical – bunch of previous words, one more word to predict, graded on that one word. 

GPT never accidentally touches a burning hot stove, or gets a delicious meal, or builds up a complicated web of social rewards that they aspire to succeed at. I bet toddlers learn not to touch hot stoves very quickly even without parental supervision, faster than GPT could.

I don't want "1 karma", "10 karma" and "100 karma" to be a few different words with different associations. I want 10 karma to be 10x the reward of 1 karma, and 100 karma 10x that. (Well, maybe not literally 10x, I'd fine tune the reward structure with some fancy math)

When GPT-3 sort of struggles to figure out "I'm supposed to be doing addition or multiplication here", I want to be able to directly punish or reward it more than it usually is.

Well, sure, you could take bigger gradient-descent steps for some errors than others. I'm not aware of people doing that, but again, I haven't checked. I don't know how well that would work (if at all).

The thing you're talking about here sounds to me like "a means to an end" rather than "an end in itself", right? If writing "Karma 100000: ..." creates the high-karma-ish answer we wanted, does it matter that we didn't use rewards to get there? I mean, if you want algorithmic differences between Transformers and brains, there are loads of them, I could go on and on! To me, the interesting question raised by this post is: to what extent can they do similar things, even if they're doing it in very different ways? :-)

I think you'd probably like the work of John Boyd:


He's really interesting in that he worked on a mix of problems and areas with many different levels of complexity and rigor.

Notably, while he's usually talked about in terms of military strategy, he did some excellent work in physics that's fundamentally sound and still used in civilian and military aviation today:


He was a skilled fighter pilot, so he was able to both learn theory and convert into tactile performance.

Then, later, he explored challenges in organizational structures, bureaucracy, decision making, corruption, consensus, creativity, inventing, things like that.

There's a good biography on him called "Boyd: The Fighter Pilot Who Changed the Art of War" - and then there's a variety of briefings, papers, and presentations he made floating around online. I went through a phase of studying them all; there's some gems there.

Notably, his "OODA" loop is often incorrectly summarized as a linear process but he defined it like this —


I think the most interesting part of it is under-discussed — the "Implicit Guidance and Control" aspect, where people can get into cycles of Observe/Act/Observe/Act rapidly without needing to intentionally orient themselves or formally make a decision.

Since he comes at it from a different mix of backgrounds with a different mix of ability to do formal mathematics or not, he provides a lot of insights. Some of his takeaways seem spot-on, but more interesting are the ways he can prime thinking on topics like these. I think you and he were probably interested in some similar veins of thought, so it might produce useful insights to dive in a bit.

I've read some of his stuff on strategy. It seemed like there were a lot of interesting insights in there, but it was all presented in the sort of way that sounds sciency to non-science-people but didn't really communicate a proper model. If someone knows of or could write a good explanation of the models underlying his ideas, I'd be very interested to read that.

Most of Boyd's work was communicated through briefings and presentations, so we don't have a lot of the underlying models, except second hand.

Super naive question: given all we know about the myriad ways in which the brain fools itself, and more specifically, the ways that subconscious mental activities fool our conscious selves, why should we trust introspection? More specifically, why should I believe that the way I perceive myself to think is the way I actually think (as opposed to an abstraction put up by my subconscious)?

My model is that any psychological model that relies on introspection is going to be inherently flawed. If we want to learn how people think, we should observe their actions, and carefully watch how people behave in response to different stimuli and situations. I think asking people how they think tells us more about how they rationalize their thinking than it does about how they actually think.

There is both some actual fact of what it is like to experience your own mind, and then there is the way you make sense of it to explain it to yourself and others that has been reified into concepts. Just because the reification of the experience of our own thinking is flawed in a lot of ways doesn't make it not evidence of our thoughts, it only makes it noisy, unreliable, and "known" in ways that have to be "unknown" (we have to find and notice confusion).

You worry that asking people who they think will tell us more about their understanding of how they think rather than how they actually think, and that's probably true, but also useful, because they got that understanding somehow and it's unlikely to be totally divorced from reality. Lacking better technology for seeing into our minds, we're left to perform hermeneutics on our self reports.

I wouldn't be that surprised if GPT-2 was "only" a System 1. But I also wouldn't be that surprised if it naturally developed a System 2 when scaled up, and given more training. I also wouldn't be that surprised if it turned out not to need a System 2.

As steve2152 also noted, System 2 (or more accurately, Type 2) reasoning involves passing the outputs from one Type 1 system to another using working memory resources. Working memory seems to involve several specialized components, including memory storages and executive functions that control and regulate it. If GPT-2 doesn't have those kinds of architectural properties already, it's not going to develop them by just having more training data thrown at it.

Something I notice here (about myself) is that I don't currently understand enough about what's going on under-the-hood to make predictions about what sort of subsystems GPT could develop internally, and what it couldn't. (i.e. if my strength as a rationalist is the ability to be more confused by fiction than reality, well, alas)

It seems like it has to develop internal models in order to make predictions. It makes plausible sense to me that working memory is a different beast that you can't develop by having more training data thrown at you, but I don't really know what facts about GPT's architecture should constrain my beliefs about that.

(It does seem fairly understandable to me that, even if it were hypothetically possible for GPT to invent working memory, it would be an inefficient way of inventing working memory)

A quick summary of the phenomenology of my thoughts:

  • thoughts primarily have shapes, textures, and other touch-related features
    • prior/below the level of words, my thoughts are like touch-felt objects that exist in a field that permeates my body
  • thinking feels like exploring, as in i'm traveling around finding/bumping into thought objects
  • sometimes i'm not very in touch with this aspect though and i can't feel it, but i'm pretty sure it's always there for hard to explain reasons
  • when i can't see it words seem to just come up from nowhere for unknown reasons

Some differences from your model:

  • i don't have what feels like a badness check, rather it feels like i have a thought and then maybe a linked thought is about what the consequences of it might be, and sometimes those are bad.
    • but sometimes i might be distracted and not follow up and notice the bad association.

i don't have what feels like a badness check, rather it feels like i have a thought and then maybe a linked thought is about what the consequences of it might be, and sometimes those are bad.

I think this is actually probably what's going on with me, upon further reflection.

I had an idea similar to yours "badness" algorithm: It will be interesting to add to the GPT a truth discriminator: another neural net which predicts the truth values of GPT's statement relative to the real world and is trained on a database of true statements (there are several). The whole thing then is trained in GAN-style, and the GPT thus trained to produce statements with highest true score.

Actually, I think your comment about this awhile ago was what got me started on all this. I tried looking for it when I wrote this post but couldn't find it easily. If you give me the link I'd be happy to credit you in the OP.

:) Don't remember where I wrote about it.