FC final: Can Factored Cognition schemes scale?

Rafael Harth

(Apologies for the long delay.)

Scaling of Regular Thought

The punchline of the previous post was that there is only one mode of thinking: your brain can solve various tasks in a single step (from the perspective of awareness), and we've called those tasks your cognitive primitives. All primitives are intuition-like in that we can't inspect how they're being done, and we may or may not have an explanation for the result after the fact.

We're now interested in how this process scales. We don't tend to solve hard problems by staring at their description and waiting for a primitive to solve them in one step, so some kind of reasoning backward is going on. However, there is no module for this in the brain, so our ability to 'reason backward' also has to be implemented by primitives.

The easiest way to observe how this works is to take a problem that is just barely too hard to solve with a single primitive. My pick for this is multiplying two 2-digit numbers. Thus, I invite you to do the following

EXERCISE: There is a simple (two 2-digit numbers) multiplication problem in the spoiler below. Make sure you have something to write; it can be a piece of paper or a digital notepad. Look at the exercise, solve it in your head, and write down every verbal thought that pops into your mind until you have the solution. Write only to document your thoughts; don't do a written calculation.

Below is what my transcript looks like. You may have had more or fewer intermediate steps, repetitions, or unrelated thoughts in between. That's all fine.

The coloring is something I've added after the fact. The black thoughts (minus the problem itself) are the outputs of primitives that actually solve math problems. Those are all simple; it's $10 \cdot 17$ and $2 \cdot 17$ and $170 + 34$ . On the other hand, $14 \cdot 17$ itself is outside the set of exercises that can be handled by a single primitive (at least for me). And thus, my brain has performed a feat of utter genius: instead of a primitive that sees the exercise and outputs a solution, it found a primitive which saw the exercise and output another exercise! (Namely, $10 \cdot 17$ .) Subsequently, that exercise was taking as the input to a different primitive, which was then able to solve it in one step.

(It may be that some of the 'subproblem outputs' like $10 \cdot 17$ did not appear as verbal thoughts for you. In general, not all outputs of primitives make it into awareness, and the process that determines whether they do is complicated. You would probably observe the same patterns with a harder exercise.)

This suggests that a major part of thinking consists of applying primitives that output new subproblems.^[1] Does this generalize to harder and/or non-mathy problems? I think the answer is almost certainly yes, even in mundane cases, provided that you don't solve the problem immediately. For example, suppose you have to decide what present to buy a friend for Christmas. This problem does have some potential to be solved quickly, given that there is a set of 'default' options, like sweets or a bottle of wine. But if you're not content with those, you're unlikely to passively wait for your brain to produce more ideas. Instead, you would ask things like "what would make her happy?" or "what are her hobbies?". If you think about it for a while, you might get to less obvious questions like "what kind of gifts don't have the property that she will know better what she likes than I do?" Maybe you would consider helping her solve a problem she hasn't bothered to solve herself, and that would lead to questions like "what has she complained about before?". And so on. Since the domain is no longer governed by a small set of explicit rules, the subproblems don't immediately and uniquely determine the answer as they do in the multiplication case. Nonetheless, they are smaller problems whose solutions constitute progress on the overall problem. In general, I think you will be hard-pressed to find an example where you think about something for a while without outputting subproblems.

Factored Cognition vs. Regular Thought

Factored Cognition, as defined by Ought, refers to "mechanisms [...] where sophisticated learning and reasoning is broken down (or factored) into many small and mostly independent tasks". In light of the above, I posit the sequence's final conjecture:

I've drawn a parallel between Factored Cognition and regular thought all the way back in post #-1. The difference is that that post was taking the perspective of someone who already understands the problem and can choose between different ways of decomposing it, which is relevant for a Debate agent, but not so much for the human (in either scheme) who starts off not understanding the problem. The claim now is that the process of understanding does itself use Factored Cognition.

Consider a node at the top of an HCH tree (say with $t = 1 hour$ ) on the one hand, and a single person thinking for a hundred years on the other. We can call them $H$ and $D$ , respectively. ('D' for 'iDeal' since this is an idealized setting from the standpoint of capability). Presumably, everyone would agree that $H$ and $D$ do something different when they try to solve a problem, but this difference cannot be that $H$ uses Factored Cognition because $D$ does that as well. The difference also cannot be that $D$ only produces one new subproblem at a time since $H$ does that as well: each new question she asks is allowed to depend on everything that has happened up to that point. In both cases, the 'decomposition' is a thing that is continuously updated, not a thing that is ever output in one piece.

So, what is the difference? If you buy that $D$ would be superintelligent but are less sold on $H$ , this is the key question, and the heart of this post will be trying to answer it. We can separate the ways in which $H$ is disadvantaged into two distinct categories. I call them the alternating problem and the translation problem.

The Alternating Problem

The alternating problem is the fact that $H$ is restricted in how many times she can alternate between asking and solving. $D$ has the time budget to iterate through millions of questions throughout her thought process, but $H$ only lives for an hour. On the upside, while $D$ may only make incremental progress on each question, $H$ immediately receives the proper solution, provided the question isn't too difficult. We would now like to know how much value the answer to one such subquestion has.

Here is a model to ask this more formally. Suppose we can assign each question $q$ a difficulty $d (q) \in R$ (this is very similar to the model from part I of the sequence). Suppose further that we can measure $H$ 's progress on $q$ with a real number $y \in R$ so that the question gets solved once $y \geq d (q)$ . Now if $H$ receives the answer to a subquestion, this will increase $y$ . The question is, by how much?

One possible answer is by a fraction of the input question's difficulty, i.e., $c \cdot d (q)$ for some constant $c$ . As the input question gets more difficult, $H$ simply asks more promising questions, and it always takes about $\frac{1}{c}$ many to arrive at a solution.

To test if this is realistic, consider the following three problems:^[2]


Suppose $a$ and $b$ are real numbers, not both $0$ . Find real numbers $c$ and $d$ such that $\frac{1}{a + b i} = c + d i$ .


Prove that there does not exist an operator $T \in L (R^{7})$ such that $T^{2} + T + I$ is nilpotent.


Decide whether it is true that $\forall n \in 2 N : n > 2 ⟹ (\exists p, q prime : p + q = n)$ .

For the above to be true, it would have to be the case that, for all three questions, receiving the answer to a relevant subquestion gets you the same portion of the way to the solution. This is clearly nuts. If you ask 'how can I get the denominator of $\frac{1}{a + b i}$ to be real', you're almost there; if you ask, 'what does nilpotent mean', you've only done a small step; if you ask 'what's the smallest proven gap between prime numbers', you've presumably only taken an infinitesimal step.

On the other hand, asking the correct questions may get you there, but that's not what we're talking about.

So it's not a fraction of $d (q)$ . A second answer is that it's a fraction of the current progress, i.e., $c \cdot y$ for some constant $c$ . Every subquestion $H$ asks has an answer whose usefulness is proportional to $H$ 's current understanding of $q$ .

For this to be true, it would have to be the case that understanding a problem leads one to ask better questions. I probably don't have to convince anyone that this is true, but just to hammer down how prevalent this mechanism is, here are five made-up examples from different contexts:

Anna tries to predict whether China or the USA will have more AGI capabilities in thirty years. After pondering various considerations, she realizes that she should figure out what proportion of each country's AI efforts goes to AGI specifically.
Bob tries to prove that there are infinitely many prime numbers. His approach is to assume there are finitely many and derive a contradiction. After thinking about this for a bit, he realizes that 'take a finite set, construct an additional prime number' is a less confusing problem that amounts to the same thing.
Carla wants to join her first-ever Zoom call but doesn't have a microphone. After considering various ways to acquire one, she realizes that her phone has one and asks whether Zoom could run on that.
Dana tries to find the next best move in a chess game. After studying various lines, she realizes that her opponent's light square bishop is crucial as it can trap her queen in most relevant lines. She now asks how to deal with the bishop.
You come up with a bunch of items that could plausibly be useful for one of your friend's hobbies, but all have the property that you would probably buy an inferior product to what she could buy for herself. You conclude that you should look for things that she likes but doesn't know more about than you do.

If the fraction-of-current-progress answer is correct, then $H$ 's progress $y = f (t)$ ( $t$ is the number of questions considered) obeys the recursive equation $f (t) = f (t - 1) + c \cdot f (t - 1)$ , which is simply $f (t) = (1 + c)^{t}$ . (Of course, progress on any real problem is highly discontinuous and high-variance, so all of this is approximation.) In this model, progress is exponential in the number of questions asked. This also makes sense of why thinking for a very long time is powerful. Suppose that $D$ only gets $\frac{1}{1000}$ as much utility out of each subquestion asked, given that she may only consider them for a few seconds. This still yields $f (t) = (1 + \frac{1}{1000} c)^{t}$ , which may grow slowly at first, but will arrive at something useful eventually because there is a large number in the exponent. Conversely, the abilities of an HCH tree are bounded. Up to some level of difficulty, nodes that receive perfect answers from their children can produce perfect answers themselves, so HCH can answer all such questions by induction. But there is some lowest level of difficulty for which it takes too long to build up an understanding, and a node won't be able to answer such a question even if all subtrees give perfect answers. This is the step on which the induction breaks down.

A relevant counterpoint here is the ability of $H$ to ask meta-questions. A meta-question is something like "what is the best way to think about the question, "What Christmas present should I buy for Hannah?"". This is similar to "What subquestion should I ask to make progress on the question, "What Christmas present should I buy for Hannah?"". The ways in which the two questions are not the same relate to the subtleties of thought that this post mostly brushes over: there can be insights about a problem that don't directly translate to subquestions, there's thinking that's neither about asking nor about solving questions (such as repeating what you understand already), and so on. All of that stuff makes live harder for $H$ (more things to be done in limited time with questionable help), which means that reality will look at least as bad for HCH as the simplified view suggests.

In the simplified view, the existence of meta-questions allows $H$ to receive help in figuring out what subquestions to ask next. The problem is that there is no reason to expect HCH to be capable of solving the meta-question. If thinking is a constant alternation between posing and answering questions – where the questions and their answers become progressively more useful – then finding the perfect questions to ask should be roughly as hard as solving the problem outright. Less abstractly, take a look at the five examples of how progress informs future subquestions. Most of them involve past subquestions and their solutions. If the quality of subquestions is a function of current progress, then thinking about subquestions alone doesn't cut it. Making progress requires alternating between asking and solving.

I find that this result is supported by introspection. The current sequence looks nothing like what I had in mind initially. When I decided to spend time on this problem, the first thing I did was to ask 'what are questions about?', which led to a post called 'Target systems'. Another early post was called 'Dependency Graphs'. Both of those posts were answers to subproblems I had come up with; neither of them turned out to be good subproblems, but I wouldn't have realized this if I hadn't written them. Only through the alternation of asking and answering did I get to this point. The same process happened one level down: within one post, I regularly found that a question I was trying to answer wasn't coherent, and then I usually scrapped it and rethought what I was trying to do. If I were forced to stick with the question anyway (which is the analog of having the alternation problem), I expect it wouldn't work very well. It's also not the case that the decomposition only changed in the beginning; some structural changes have occurred fairly late, and I would change some earlier posts right now if they weren't already published.

The Translation Problem

If we take $D$ and add the alternating problem, we get a scheme where one person is thinking for a long time with the restriction that the decomposition on every level can only be updated a limited number of times. This scheme is not identical to $H$ , so there is a second difference, which I call the translation problem. The translation problem is the fact that every insight communicated between two nodes has to be translated into text (or whatever other format the scheme is using) and back. If $H$ calls a subtree that works for a total of 1000 hours, then $H$ didn't think 1000 hours herself but merely receives the subtree's output. This problem goes both ways: it handicaps the results from subtrees, and it handicaps how much context a node can give to subtrees when asking a question.

More concretely, it has several consequences:

It makes learning new skills difficult. (This is what we've left on at the end of the previous post.) Whenever acquiring a new cognitive primitive takes too much time, it becomes impossible for $H$ to acquire it. This precludes learning primitives that require a lot of examples. These are often the ones that we refer to as intuition.
It can leave value on the table because the subtree is missing context. Suppose $H$ asks a subtree to answer question $q$ , and the subtree asks another subtree to answer $q^{'}$ to help with $q$ . It may be that $q^{'}$ and its answer are important for the overall problem (they may be more important than $q$ ), but $H$ never realizes this since all she receives is the finished answer to $q$ . An example is the concept of Ideal Debate in this sequence, which I believe started as a footnote. Similar things can happen whenever a subtree misjudges which parts of what it found out are relevant for the overall problem.
It makes asking meta-questions throughout difficult. In light of this post, it would seem that asking meta-questions is something $H$ would want to do as often as possible throughout the process. Yet, people tend to think of meta-questions as a thing that's only asked once, and the reason for this is the translation problem. A meta-question asked later in the process can't just be "what is the best way to think about this?" because that was already asked in the beginning. Instead, it has to be "what is the best way to think about this, given that I've already figured out $x y z$ ?" This is difficult to do, and it's also not in the spirit of Factored Cognition, which is supposed to be about independent questions or tasks.

Insofar as the third point is accurate, it implies that we're looking at a second fundamental restriction for $H$ . The first is the alternating problem: the fact that the number of times $H$ can flip between asking and solving is bounded. The second is that the total amount of time $H$ can spend on thinking about new questions is bounded as well. For this to be acceptable, it needs to be the case that 'find the next relevant subproblem' is a task whose difficulty is bounded across every context.

On this point, consider the phenomenon of getting struck. When thinking about something difficult, I sometimes reach a point where I feel like I'm no longer making progress. This usually doesn't last forever, which means that the sense of 'not making any progress' is not literally true, but it shows that finding the next useful subproblem can be difficult. In a world where bounded decomposition budgets are sufficient to solve arbitrary problems, getting stuck should not be possible. You could always come up with a new relevant subproblem and solve that – or if it's too hard, come up with a subproblem for that, and so on. In some sense, 'naive Factored Cognition' appears impossible because it relies on the idea that you can decompose everything, but figuring out the decomposition is a big chunk of the work, and that part appears largely non-decomposable. Speculatively, I think there may be the case that figuring out the decomposition isn't just a big chunk but actually most of the work. My experience of getting stuck is not 'please brain, solve this subproblem' but rather 'please brain, tell me another angle to approach this problem'.

Conclusion

My tentative conclusion from all of this is that an HCH tree would not be superintelligent, with the usual caveat that brute-forcing isn't allowed. I'll operationalize this in terms of strong-HCH since this is what Paul considers to be be the 'normal' scheme (whereas the thing the sequence has focused on is called 'weak-HCH'). In strong-HCH, each node has a list of all IDs of subnodes, allowing her to talk to the same instances repeatedly. Furthermore, messages can contain pointers to existing nodes (so if I'm node $p$ , and I know that node $x$ has insights on a part of a problem that I'm asking node $y$ about, I can include a pointer to $x$ in my question to $y$ ). I think one of the mistakes I've made in this sequence is to not foucs on strong-HCH more. That said, strong-HCH doesn't seem to solve the problems I've talked about in this post, except for the one about missing context. Alas,

Prediction (85%): Ought will not succeed in demonstrating something roughly equivalent to solving the hardest exercise in a textbook using a structure that mirrors strong-HCH, provided each person is unfamiliar with the material and has at most 30 minutes of time. Note that I'm making this prediction without any inside knowledge; I've just read what Ought has published.

Before writing the sequence, I think I would have assigned between 50 and 60 percent to the same prediction (I believe I was a bit more optimistic about Factored Cognition than most people, but there's some penalty since this could be hard to set up even if it's feasible), so there has been about a 30% swing.

Needless to say, if Ought does do such a thing, it will (a) mean I'm wrong and (b) be very good news.

What about Debate?

¯\(ツ)/¯

The reasons I've mentioned for thinking HCH wouldn't work don't apply to Debate (with one exception that I'll talk about in a bit). In fact, I'm yet to come across an argument that Debate cannot work in principle, and the formalism from the first part of the sequence is mildly encouraging. Of course, I also haven't come across an argument for why it must work, but it's harder to imagine that such an argument could exist, so the absence of evidence is altogether a good sign.

Most importantly, Debate sidesteps the alternating problem entirely. If you start with the best possible subquestions, then both of the toy models discussed in this post would agree that things should work out. Of course, the Debate agents don't perform surgery on the judge's brain to insert the perfect decomposition into memory; they have to write it down in text form. The amount that this matters, given that Debate agents are supposed to be highly intelligent, seems like a very hard-to-answer, very different problem from the things I've discussed in this post. I don't have too many intelligent things to say about it, except to repeat that talking about a 'Factored Cognition Hypothesis' really absolutely definitely doesn't make sense.

The aforementioned exception is the fact that the judge is highly limited in her ability to acquire new primitives. However, it seems like the ability to understand arguments fundamentally requires only a bounded set of skills. This is backed up by formal logic,^[3] and we can see the same thing in practice with understanding mathematical proofs. Once again, there is no generalization of this point to a context where a human has to derive the proof.

My verdict is something like 80% that Debate won't fail due to fundamental problems (i.e., problems that relate to Ideal Debate). Note that this number is inflated because there is a chance that Debate would ultimately fail due to fundamental reasons, but we never get there because it fails due to practical problems first. I was a bit disheartened to read the latest report on Debate, which indicates that one of those practical problems (the honest debate agent figuring out which claim of the dishonest agent contains the lie) appears to be quite serious. My estimate on Ideal Debate working out may be more like 60%, but that is not testable.

Miscellaneous

Here is an example of how Debate can handle mathematical proofs. Recall the exercise I've mentioned earlier:


Prove that there does not exist an operator $T \in L (R^{7})$ such that $T^{2} + T + I$ is nilpotent.

While this involves more advanced concepts, the exercise is still relatively easy. Here is a copy-paste from the solution I've written back then:

Let $λ$ be an eigenvalue of $T$ and $v$ a nonzero eigenvector. (Use Theorem 5.26.) We have

$(T^{2} + T + I) v = (λ^{2} + λ + 1) v = ((λ + \frac{1}{2})^{2} + \frac{3}{4}) v$

So that $(T^{2} + T + I) v = α v$ where $α = (λ + \frac{1}{2})^{2} + \frac{3}{4}$ . Clearly $α > 0$ , hence $(T^{2} + T + I)^{k} v = α^{k} v \neq 0$ for all $k \in N$ . Thus, $(T^{2} + T + I)$ is not nilpotent.

If this looks like gibberish, you're in the same position as a judge in Debate may be in. However, as a debate judge, you don't have to understand the entire argument. Here is a possible decomposition into claims, of which you will only have to verify one.

Claim 1: There exists an eigenvalue $λ \in R$ with eigenvector $v$ for $T$ , where $v$ is not the zero vector.
Claim 2: $(T^{2} + T + I) v = (λ^{2} + λ + 1) v$ .
Claim 3: $λ^{2} + λ + 1 = (λ + \frac{1}{2})^{2} + \frac{3}{4}$ .
Claim 4: Set $α := (λ + \frac{1}{2})^{2} + \frac{3}{4}$ . Then $α > 0$ .
Claim 5: Claims #2-4 imply that $(T^{2} + T + I) v = α v > 0$ .
Claim 6: Claim #5 implies that $(T^{2} + T + I)^{k} v = α^{k} v > 0$ for all $k \in N$ .
Claim 7: Claims #1-6 imply that $T^{2} + T + I$ is not nilpotent.

Claim #3 requires only high-school math. If this statement is pointed at, you can verify it without engaging with the concepts 'eigenvector' or 'nilpotent' or even 'vector space'. The same is almost true for claims #4 and #6. Claim #5 requires being comfortable with equations, but not anything specific to Linear Algebra. Claim #1 requires looking up a theorem but not understanding why it is true.^[4] Only claims #2 and #7 require explaining one or more of the field-specific concepts.

Another point I want to make is that this is probably not the optimal decomposition. When translating a text-based proof into a Debate Tree, one need not do it sequentially. Here is a different approach:

Claim 1: {same as above}
Claim 2: There exists an $α \in R_{+}$ such that $(T^{2} + T + I) v = α v$ .
Claim 3: Claims #1-2 imply that $T^{2} + T + I$ is not nilpotent.

Subsequently, Claims #2-5 from the previous decomposition can become #2.1-#2.4, and Claim #6 can become Claim #3.1. This decomposition is superior from the standpoint of hiding complexity. I think it's fair to say that the primary mechanism for an Ideal Debate agent is to reduce a concept to its behavioral properties. In this case, that concept is the $α$ . The behavioral properties are entirely given in Claim #2 of the second decomposition, and they are sufficient for the high-level argument. Only if the other agent doubts the existence of such an $α$ does the debate have to open this black box and look at how $α$ is constructed. In that case, that's still a win in that the judge doesn't have to bother understanding if and how this $α$ solves the exercise (because if claim #2 is pointed at, claim #3 is not).

Appendix: the sequence in 500 words

Since there was this big gap between the previous post and this one, I thought it might be useful to write an ultra-abbreviated version to refresh everyone's memory.

Post #-1: To characterize what constitutes 'solving a subproblem', as supposed to 'making progress on a big problem', one can look at the length of the subproblem's solution. Under this view, decomposing problems is all about hiding as much complexity as possible. It must be the case that we do something like this in regular thought because we can only keep a few objects in mind at the same time yet are able to solve complex problems.

[Post-sequence edit]: This perspective assumes a bird's eye view of the problem, which makes it primarily applicable to the job of an honest debate agent, less so to a human who starts off not understanding the problem.

Post #1: HCH is the ideal of stock amplification. It abstracts away a number of practical problems and implementation details. We can similarly define an ideal for Debate. Given these idealized schemes, we can define and study a formalism ( $\to$ Cognition Spaces). The formalism suggests that HCH and Ideal Debate don't necessarily scale similarly, which means there is no one Factored Cognition Hypothesis.

Post #2: Here are some things we can do with the formalism. Debate seems nicely behaved in the limit. Debate Trees may be an interesting object to consider when thinking about how to explain things optimally.

Post #3: Factored Cognition is about reducing hard problems to human judgment to achieve outer alignment; it's not used because it's the best way to boost capability. Ideal Debate = HCH + Decomposition Oracle. To evaluate HCH or Ideal Debate, consider the task of the human as this is the non-decomposable part.

Post #4: People tend to talk about intuition as if it's a separate mode of thinking that works without access to 'conscious reasoning', but really all thinking is like that; it's just that sometimes we can explain our thoughts, and sometimes we can't. It's useful to think about human thinking in terms of the set of operations that can be done in one such step. We call these operations our cognitive primitives.

Post #5: This whole decomposing problems thing that characterizes Factored Cognition is something we do all the time when we think, except that we constantly alternate between decomposing and solving. You can verify this by taking an arbitrary problem and observing what your brain is doing. Since we only output one subproblem at a time, the term 'decomposition' describes a thing that is continuously updated, not a thing that's ever output in one piece. The alternating thing seems like it's critical, which is bad for HCH. Also problematic is the fact that nodes in an HCH tree have to communicate with something like text. In particular, it will mean that nodes probably won't get a lot of help for the task of decomposing their problem. This seems bad if you believe that decomposing constitutes much or even most of thinking. Strong-HCH may help, but probably not by much. Most of this stuff doesn't apply to Debate. There is no one Factored Cognition Hypothesis.

Note that when I say 'applying', I'm not suggesting a dualistic picture where there is an additional thing in the brain that gets to choose where to apply the primitives. ↩︎
The first is the first exercise out of my favorite textbook, the second is an exercise out of chapter 9 of the same book, and the third is a famous open math problem called the twin prime conjecture. ↩︎
There are formal proof systems that posit a small set of primitive operations such that every proof is reducible to a sequence of such operations. This is what allows proofs about what is provable. ↩︎
Theorem 5.26 is "Every operator on an odd-dimensional real vector space has an eigenvalue." (Incidentally, this is the only thing for which the $7$ in $R^{7}$ matters. It could have also been $R^{1439995}$ . This is another aspect that may have made it more difficult to find a proof because it has the potential to be misleading, but barely matters for verifying the proof.) ↩︎