Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

There are many people who believe that we will be able to get to AGI by basically just scaling up the techniques used in recent large language models, combined with some relatively minor additions and/or architectural changes. As a result, there are people in the AI safety community who now predict timelines of less than 10 years, and structure their research accordingly. However, there are also people who still believe in long(er) timelines, or at least that substantial new insights or breakthroughts will be needed for AGI (even if those breakthroughts in principle could happen quickly). My impression is that the arguments for the latter position are not all that widely known in the AI safety community. In this post, I will summarise as many of these arguments as I can.

I will almost certainly miss some arguments; if so, I would be grateful if they could be added to the comments. My goal with this post is not to present a balanced view of the issue, nor is it to present my own view. Rather, my goal is just to summarise as many arguments as possible for being skeptical of short timelines and the "scaling is all you need" position.

This post is structured into four sections. In the first section, I give a rough overview of the scaling is all you need-hypothesis, together with a basic argument for that hypothesis. In the second section, I give a few general arguments in favour of significant model uncertainty when it comes to arguments about AI timelines. In the third section, I give some arguments against the standard argument for the scaling is all you need-hypothesis, and in the fourth section, I give a few direct arguments against the hypothesis itself. I then end the post on a few closing words.

Glossary:

LLM - Large Language Model
SIAYN - Scaling Is All You Need

 

The View I'm Arguing Against

In this section, I will give a brief summary of the view that these arguments oppose, as well as provide a standard justification for this view. In short, the view is that we can reach AGI by more or less simply scaling up existing methods (in terms of the size of the models, the amount of training data they are given, and/or the number of gradient steps they take, etc). One version says that we can do this by literally just scaling up transformers, but the arguments will apply even if we relax this to allow scaling of large deep learning-based next-token predictors, even if they would need be given a somewhat different architecture, and even if some extra thing would be needed, etc.

Why believe this? One argument goes like this:

(1) Next-word prediction is AI complete. This would mean that if we can solve next-word prediction, then we would also be able to solve any other AI problem. Why think next-word prediction is AI complete? One reason is that human-level question answering is believed to be AI-complete, and this can be reduced to next-word prediction.

(2) The performance of LLMs at next-word prediction improves smoothly as a function of the parameter count, training time, and amount of training data. Moreover, the asymptote of this performance trend is on at least human performance.

(*) Hence, if we keep scaling up LLMs we will eventually reach human-level performance at next-word prediction, and therefore also reach AGI.

An issue with this argument, as stated, is that GPT-3 already is better than humans at next-word prediction. So are both GPT-2 and GPT-1, in fact, see this link. This means that there is an issue with the argument, and that issue is that human-level performance on next-word prediction (in terms of accuracy) evidently is insufficient to attain human-level performance in question answering.

There are at least two ways to amend the argument:

(3) In reaching the limit of performance for next-word prediction, an LLM would invariably develop internal circuits for all (or most) of the tasks of intelligent behaviour, or

(4) the asymptote the of LLM performance scaling is high enough to reach AI-complete performance.

Either of these would do. To make the distinction between (3) and (4) more explicit, (4) says that a "saturated" LLM would be so good at next-word prediction that it would be able to do (eg) human-level question answering if that task is reduced to next-word prediction, whereas (3) says that a saturated LLM would contain all the bits and pieces needed to create a strong agentic intelligence. With (3), one would need to extract parts from the final model, whereas with (4), prompting would in theory be enough by itself.

I will now provide some arguments against this view.

 

General Caution

In this section, I will give a few fairly general arguments for why we should be skeptical of our impressions and our inside views when it comes to AI timelines, especially in the context of LLMs. These arguments are not specifically against the SIAYN hypothesis, but rather some arguments for why we should not be too confident in any hypothesis in the reference class of the SAIYN hypothesis.


1. Pessimistic Meta-Induction

Historically, people have been very bad at predicting AI progress. This goes both for AI researchers guided by inside-view intuitions, and for outsiders relying on outside-view methods. This gives a very general reason for always increasing our model uncertainty quite substantially when it comes to AI timelines.

Moreover, people have historically been bad at predicting AI progress in two different ways; first, people have been bad at estimating the relative difficulty of different problems, and second, people have been bad at estimating the dependency graph for different cognitive capacities. These mistakes are similar, but distinct in some important regards.

The first problem is fairly easy to understand; people often assume that some problem X is easier than some problem Y, when in fact it is the other way around (and sometimes by a very large amount). For example, in the early days of AI, people thought that issues like machine vision and robot motion would be fairly easy to solve, compared to "high-level" problems such as planning and reasoning. As it turns out, it is the other way around. This problem keeps cropping up. For example, a few years ago, I imagine that most people would have guessed that self-driving cars would be much easier to make than a system which can write creative fiction or create imaginative artwork, or that adversarial examples would turn out to be a fairly minor issue, etc. This issue is essentially the same as Moravec's paradox.

The second problem is that people often assume that in order to do X, an AI system would also have to be able to do Y, when in fact this is not true. For example, many people used to think that if an AI system can play better chess than any human, then it must also be able to form plans in terms of high-level, abstract concepts, such as "controlling the centre". As it turns out, tree search is enough for super-human chess (a good documentary on the history of computer chess can be found here). This problem also keeps cropping up. For example, GPT-3 has many very impressive abilities, such as the ability to play decent chess, but there are other, simpler seeming abilities that it does not have, such as the ability to solve a (verbally described) maze, or reverse long words, etc.

There could be many reasons for why we are so bad at predicting AI, some of which are discussed eg here. Whatever the reason, it is empirically very robustly true that we are very bad at predicting AI progress, both in terms of how long it will take for things to happen, and in terms of in what order they will happen. This gives a general reason for more skepticism and more model uncertainty when it comes to AI timelines.


2. Language Invites Mind Projection

Historically, people seem to have been particularly prone to overestimate the intelligence of language-based AI systems. Even ELIZA, one of the first chat bots ever made, can easily give off the impression of being quite smart (especially to someone who does not know anything about how it works), even though it is in reality extremely simple. This also goes for the many, many the chat bots that have been made over the years, which are able to get very good scores on the Turing test (see eg this example). They can often convince a lay audience that they have human-level intelligence, even though most of these bots don't advance the state of the art in AI.

It is fairly unsurprising that we (as humans) behave in this way. After all, in our natural environment, only intelligent things produce language. It is therefore not too surprising that we would be psychologically inclined to attribute more intelligence than what is actually warranted to any system that can produce coherent language. This again gives us a fairly general reason to question our initial impression of the intelligence of a system, when that system is one that we interact with through language.

It is worth looking at some of Gary Marcus' examples of GPT-3 failing to do some surprisingly simple things.


3. The Fallacy of the Successful First Step

It is a very important fact about AI, that a technique or family of techniques can be able to solve some version of a task, or reach some degree of performance on that task, without it being possible to extend that solution to solve the full version of the task. For example, using decision trees, you can get 45 % accuracy on CIFAR-10. However, there is no way to use decision trees to get 99 % accuracy. To give another example, you can use alpha-beta pruning combined with clever heuristics to beat any human at chess. However, there is no way to use alpha-beta pruning to combined with clever heuristics to beat any human at go. To give a third example, you can get logical reasoning about narrow domains of knowledge using description logic. However, there is no way to use description logic to get logical reasoning about the world in general. To give a fourth example, you can use CNNs to get excellent performance on the task of recognising objects in images. However, there is (seeminly) no way to use CNNs to recognise events in videos. And so on, and so forth. There is some nice discussion on this issue in the context of computer vision in this interview.

The lesson here is that just because some technique has solved a specific version of a problem, it is not guaranteed to (and, in fact, probably will not) solve the general version of that problem. Indeed, the solution to the more general version of the problem may not even look at all similar to a solution to the smaller version. It seems to me like each level of performance often cuts off a large majority of all approaches that can reach all lower levels of performance (not just the solutions, but the approaches). This gives us yet another reason to be skeptical that any given method will continue to work, even if it has been successful in the past.


Arguments Against the Argument

In this section, I will point out some flaws with the standard argument for the SIAYN hypothesis that I outlined earlier, but without arguing against the SIAYN hypothesis itself.

 

4. Scaling Is Not All You Need

The argument I gave in Section 2 is insufficient to conclude that LLM scaling can lead to AGI in any practical sense, at least if we use premise (4) instead of the much murkier premise (3). To see this, note that the argument also applies to a few extremely simple methods that definitely could not be used to build AGI in the real world. For example, suppose we have a machine learning method that works by saving all of its training data to a lookup table, and at test time gives a uniform prediction for any input that is not in the lookup table, and otherwise outputs the entry in the table. If some piece of training data can be associated with multiple labels, as is the case with next-word prediction, then we could say that the system outputs the most common label in the training data, or samples from the empirical distribution. If this system is used for next-word prediction, then it will satisfy all the premises of the argument in Section 2. Given a fixed distribution over the space of all text, if this system is given ENOUGH training data and ENOUGH parameters, then it will EVENTUALLY reach any degree of performance that you could specify, all the way down to the inherent entropy of the problem. It therefore satisfies premise (2), so if (1) and (4) hold too then this system will give you AGI, if you just pay enough. However, it is clear that this could not give us AGI in the real world.

This somewhat silly example points very clearly at the issue with the argument in Section 2; the point cannot cannot be that LLMs "eventually" reach a sufficiently high level of performance, because so would the lookup table (and decision trees, and Gaussian processes, and so on). To have this work in practice, we additionally need the premise that LLMs will reach this level of performance after a practical amount of training data and a practical amount of compute. Do LLMs meet this more strict condition? That is unclear. We are not far from using literally all text data in existence to train them, and the training costs are getting quite hefty too.


5. Things Scale Until They Don't

Suppose that we wish to go to the moon, but we do not have the technology to do so. The task of getting to the moon is of course a matter of getting sufficiently high up from the ground. Now suppose that a scientist makes the following argument; ladders get you up from the ground. Moreover, they have the highly desirable property that the distance that you get from the ground scales linearly in the amount of material that you use to construct the ladder. Getting to the moon will therefore just be a matter of a sufficiently large project investing enough resources into a sufficiently large ladder.

Suppose that we wish to build AI, but we do not have the technology to do so. The task of building AI is of course a matter of creating a system that knows a sufficiently large number of things, in terms of facts about the world, ways to learn more things, and ways to attain outcomes. Suppose someone points out that all of these things can be encoded in logical statements, and that the more logical statements you encode, the closer you get to the goal. Getting to AI will therefore just be a matter of a sufficiently large project investing enough resources into encoding a sufficiently large number of facts in the form of logical statements.

And so on.


6. Word Prediction is not Intelligence

Here, I will give a few arguments against premise/assumption (3); that in reaching the limit of performance for next-word prediction, an LLM would invariably develop internal circuits for all (or most) of the tasks of intelligent behaviour. The kinds of AI systems that we are worried about are the kinds of systems that can do original scientific research and autonomously form plans for taking over the world. LLMs are trained to write text that would be maximally unsurprising if found on the internet. These two things are fundamentally not the same thing. Why, exactly, would we expect that a system that is good at the latter necessarily would be able to do the former? Could you get a system that can bring about atomically precise manufacturing, Dyson spheres, and computronium, from a system that has been trained to predict the content (in terms of the exact words used) of research papers found on the internet? Could such a system design new computer viruses, run companies, plan military operations, or manipulate people? These tasks are fundamentally very different. If we make a connection between the two, then there could be a risk that we are falling victims to one of the issues discussed in point 1. Remember; historically, people have often assumed that an AI system that can do X, must be able to do Y, but then turned out to be wrong. What gives us a good reason to believe that this is not one of those cases? 

 

Direct Counterarguments

Here, I give some direct arguments against the SIAYN hypothesis, ignoring the arguments in favour of the SIAYN hypothesis.

 

7. The Language of Thought

This is an argument first made by the philosopher, linguist, and cognitive scientist Jerry Fodor, and was originally applied to the human brain. However, the argument can be applied to AI systems as well.

An intelligent system which can plan and reason must have a data structure for representing facts about and/or states of the world. What can we say about the nature of this data structure? First, this data structure must be able to represent a lot of things, including things that have never been encountered before (both evolutionarily, and in terms of personal experience). For example, you can represent the proposition that there are no elephants on Jupiter, and the proposition that Alexander the Great never visited a McDonalds restaurant, even though you have probably never encountered either of these propositions before. This means that the data structure must be very productive (which is a technical term in this context). Second, there are certain rules which say that if you can represent one proposition, then you can also represent some other proposition. For example, if you can represent a blue block on top of a red block, then you can also represent a red block on top of a blue block. This means that the data structure also must be systematic (which is also a technical term).

What kinds of data structures have these properties? The answer, according to Fodor, is that it is data structures with a combinatorial syntax and compositional semantics. In other words, it is data structures where two or more representations can be combined in a syntactic structure to form a larger representation, and where the semantic content of the complex representation can be inferred from the semantic content of its parts. This explains both productivity and systematicity. The human brain (and any AI system with the intelligence of a human) must therefore be endowed with such a data structure for representing and reasoning about the world. This is called the "language of thought" (LoT) hypothesis, because languages (including logical languages and programming languages) have this structure. (But, importantly, the LoT hypothesis does not say that people literally think in a language such as English, it just says that mental representations have a "language like" structure.)

This, in turn, suggests a data structure that is discrete and combinatorial, with syntax trees, etc, and neural networks do (according to the argument) not use such representations. We should therefore expect neural networks to at some point hit a wall or limit to what they are able to do.

I am personally fairly confused about what to think of this argument. I find it fairly persuasive, and often find myself thinking back to it. However, the conclusion of the argument also seems wery strong, in a suspicious way. I would love to see more discussion and examination of this.


8. Programs vs Circuits

This point will be similar to point 7, but stated somewhat differently. In short, neural network models are like circuits, but an intelligent system would need to use hypotheses that are more like programs. We know, from computer science, that it is very powerful to be able to reason in terms of variables and operations on variables. It seems hard to see how you could have human-level intelligence without this ability. However, neural networks do typically not have this ability, with most neural networks (including fully connected networks, CNNs, RNNs, LSTMs, etc) instead being more analogous to Boolean circuits.

This being said, some people have said that transformers and attention models are getting around this limitation, and are starting to reason more in terms of variables. I would love to see more analysis of this as well.

As a digression, it is worth noting that symbolic program induction style machine learning systems, such as those based on inductive logic programming, typically have much, much stronger generalisation than deep learning, from a very small number of data points. For example, you might be able to learn a program for transforming strings from ~5 training examples. It is worth playing around a bit with one of these systems, to see this for yourself. An example of a user friendly version is available here. Another example is the auto-complete feature in Microsoft Excel.


9. Generalisation vs Memorisation

This point has also already been alluded to, in points 4, 7, and 8, but I will here state it in a different way. There is, intuitively, a difference between memorisation and understanding, and this difference is important. By "memorisation", I don't mean using a literal lookup table, but rather something that is somewhat more permissive. I will for now not give a formal definition of this difference, but instead give a few examples that gesture at the right concept.

For my first example, consider how a child might learn to get a decent score on an arithmetic test by memorising a lot of rules that work in certain special cases, but without learning the rules that would let it solve any problem of arithmetic. For example, it might memorise that multiplication by 0 always gives 0, that multiplication by 1 always gives the other number, that multiplication of a single-digit integer by 11 always gives the integer repeated twice, and so on. There is, intuitively, an important sense in which such a child does not yet understand arithmetic, even though they may be able to solve many problems.

For my second example, I would like to point out that a fully connected neural network cannot learn a simple identity function in a reasonable way. For example, suppose we represent the input as a bitstring. If you try to learn this function by training on only odd numbers then the network will not robustly generalise to even numbers (or vice versa). Similarly, if you train using only numbers in a certain range then the network will not robustly generalise outside this range. This is because a pattern such as "the n'th input neuron is equal to the n'th output neuron" lacks a simple representation in a neural network. This means that the behaviour of a fully connected network, in my opinion, is better characterised as memorisation than understanding when it comes to learning an identity function. The same goes for the function that recognises palindromes, and etc. This shows that knowing whether or not a network is able to express and learn a given function is insufficient to conclude that it would be able to understand it. This issue is also discussed in eg this paper.

For my third example, I would like to bring up that GPT-3 can play chess, but not solve a small, verbally described maze. You can easily verify this yourself. This indicates that GPT-3 can play chess just because it has memorised a lot of cases, rather than learnt how to do heuristic search in an abstract state space. 

For my fourth example, the psychologist Jean Piaget observed that children that are sufficiently young consistently do not understand conservation of mass. If you try to teach such a child that mass is conserved, then they will under-generalise, and only learn that it holds for the particular substance and the particular containers that you used to demonstrate the principle. Then, at some point, the child will suddenly gain the ability to generalise to all instances. This was historically used as evidence against Skinnerian psychology (aka the hypothesis that humans are tabula rasa reinforcement learning agents).

These examples all point to a distinction between two modes of learning. It is clear that this distinction is important. However, the abstractions and concepts that we currently use in machine learning make it surprisingly hard to point to this distinction in a clear way. My best attempt at formalising this distinction in more mathematical terms (off the top of my head) is that a system that understands a problem is able to give (approximately) the right output (or, perhaps, a "reasonable" output) for any input, whereas a system that has memorised the problem only gives the right output for inputs that are in the training distribution. (But there are also other ways to formalise this.)

The question, then, is whether LLMs do mostly memorisation, or mostly understanding. To me, it seems as though this is still undecided. I should first note that a system which has been given such an obscenely large amount of training data as GPT-3 will be able to exhibit very impressive performance even if much of what it does is more like memorisation than understanding. There is evidence in both directions. For example, the fact that it is possible to edit an LLM to make it consistently believe that the Eiffel Tower is in Rome is evidence that it understands certain facts about the world. However, the fact that GPT-3 can eg play chess, but not solve a verbally described maze, is evidence that it relies on memorisation as well. I would love to see a more thorough analysis of this.

As a slight digression, I currently suspect that this distinction might be very important, but that current machine learning theory essentially misses it completely. My characterisation of "understanding" as being about off-distribution performance already suggests that the supervised learning formalism in some ways is inadequate for capturing this concept. The example with the fully connected network and the identity function also shows the important point that a system may be able to express a function, but not "understand" that function.


10. Catastrophic Forgetting

Here, I just want to add the rather simple point that we currently cannot actually handle memory and dynamicism in a way that seems to be required for intelligence. LLMs are trained once, on a static set of data, and after their training phase, they cannot commit new knowledge to their long-term memory. If we instead try to train them continuously, then we run into the problem of catastrophic forgetting, which we currently do not know how to solve. This seems like a rather important obstacle to general intelligence.


Closing Words

In summary, there are several good arguments against the SIAYN hypothesis. First, there are several reasons to have high model uncertainty about AI timelines, even in the presence of strong inside-view models. In particular, people have historically been bad at predicting AI development, have historically had a tendency to overestimate language-based systems, and failed to account for the fallacy of the successful first step. Second, the argument that is most commonly used in favour of the SIAYN hypothesis fails, at least in the form that it is most often stated. In particular, the simple version of the scaling argument leaves out the scaling rate (which is crucial), and there are reasons to be skeptical that scaling will continue indefinitely, and that next-token prediction would give rise to all important cognitive capacities. Third, there are also some direct reasons to be skeptical of the SIAYN hypothesis (as opposed to the argument in favour of the SIAYN hypothesis). Many of these arguments amount to arguments against deep learning in general.

In addition to all of these points, I would also like to call attention to some of the many "simple" things that GPT-3 cannot do. Some good examples are available here, and other good examples can be found in many places on the internet (eg here). You can try these out for yourself, and see how they push your intuitions.

I should stress that I don't consider any of these arguments to strongly refute either the SIAYN hypothesis, or short timelines. I personally default to a very high-uncertainty model of AI timelines, with a decent amount of probability mass on both the short timeline and the long timeline scenario. Rather, my reason for writing this post is just to make some of these arguments better known and easier to find for people in the AI safety community, so that they can be used to inform intuitions and timeline models. 

I would love to see some more discussion of these points, so if you have any objections, questions, or additional points, then please let me know in the comments! I am especially keen to hear additional arguments for long timelines. 

New to LessWrong?

New Comment
21 comments, sorted by Click to highlight new comments since: Today at 1:15 AM

LLMs are trained to write text that would be maximally unsurprising if found on the internet. 

This claim is false. If you look at a random text on the internet it would be very surprising if every word in it is the most likely word to follow based on previous words. 

Kahneman's latest book is Noise: A Flaw in Human Judgment. In it, he talks about how errors in human decisions as a combination of bias and noise. If you take a large sample size of human decisions and build your model on it you remove all the noise. 

While a large LLM trained on all internet text keeps all the bias of the internet text it can remove all the noise. 

In this section, I will give a brief summary of the view that these arguments oppose, as well as provide a standard justification for this view. In short, the view is that we can reach AGI by more or less simply scaling up existing methods (in terms of the size of the models, the amount of training data they are given, and/or the number of gradient steps they take, etc). 

The question of whether scaling large language models is enough might have seem relevant a year ago, but it isn't really today as the strategy that of the top players isn't just calling large language models. 

The step from GTP3 to InstructGPT and ChatGPT was not one of scaling up in terms of size of models and substantial increase in the amount of training data. 

It was rather on learning from well well-curated data. ChatGPT itself is a project to gather a lot of data which inturn reveals a lot of the errors that ChatGPT makes and there are likely currently people at OpenAI working on ways of how to learn from that data 

Over at Deep Mind they have GATO which is an approach that combines large language model with other problems sets. 

LLMs are trained once, on a static set of data, and after their training phase, they cannot commit new knowledge to their long-term memory. 

That's just not true for ChatGPT. ChatGPT was very fast in learning how people tricked it to produce TOS violating content. 

The Language of Thought

This, in turn, suggests a data structure that is discrete and combinatorial, with syntax trees, etc, and neural networks do (according to the argument) not use such representations. We should therefore expect neural networks to at some point hit a wall or limit to what they are able to do.

If you ask ChatGPT to multiply two 4-digit numbers it writes out the reasoning process in natural knowledge and comes to the right answer. ChatGPT is already today decent at using language for its reasoning process. 

If you ask ChatGPT to multiply two 4-digit numbers it writes out the reasoning process in natural knowledge and comes to the right answer.

People keep saying such things. Am I missing something? I asked it to calculate 1024 * 2047, and the answer isn't even close. (Though to my surprise, the first 2 steps are at least correct steps, and not nonsense. And it is actually adding the right numbers together in step 3, again, to my surprise. I've seen it perform much, much worse.)

I did ask it at the beginning to multiply numbers and it seems to behave now differently than it did 5 weeks ago and isn't making correct multiplications anymore. Unfortunatley, I can't access the old chats.

Interesting. I'm having the opposite experience (due to timing, apparently), where at least it's making some sense now. I've seen it using tricks only applicable to addition and pulling numbers out of its ass, so I was surprised what it did wasn't completely wrong.

Asking the same question again even gives a completely different (but again wrong) result:

If you look at a random text on the internet it would be very surprising if every word in it is the most likely word to follow based on previous words. 

I'm not completely sure what your point is here. Suppose you have a biased coin, that comes up heads with p=0.6 and tails with p=0.4. Suppose you flip it 10 times. Would it be surprising if you then get heads 10 times in a row? Yes, in a sense. But that is still the most likely individual sequence.

The step from GTP3 to InstructGPT and ChatGPT was not one of scaling up in terms of size of models and substantial increase in the amount of training data. [...]Over at Deep Mind they have GATO which is an approach that combines large language model with other problems sets. 

I would consider InstructGPT, ChatGPT, GATO, and similar systems, to all be in the general reference class of systems that are "mostly big transformers, trained in a self-supervised way, with some comparably minor things added on top".

That's just not true for ChatGPT. ChatGPT was very fast in learning how people tricked it to produce TOS violating content. 

I'm not sure if this has been made public, but I would be surprised if this was achieved by (substantial) retraining of the underlying foundation model. My guess is that this was achieved mainly by various filters put on top. But it is possible that fine tuning was used. Regardless, catastrophic forgetting remains a fundamental issue. There are various benchmarks you can take a look at, if you want.

If you ask ChatGPT to multiply two 4-digit numbers it writes out the reasoning process in natural knowledge and comes to the right answer. ChatGPT is already today decent at using language for its reasoning process. 

A system can multiply two 4-digit numbers and explain the reasoning process without exhibiting productivity and systematicity to the degree that an AGI would have to. Again, the point is not quite whether or not the system can use language to reason, the point is how it represents propositions, and what that tells us about its ability to generalise (the LoT hypothesis should really have been given a different name...).

I'm not sure if this has been made public, but I would be surprised if this was achieved by (substantial) retraining of the underlying foundation model. My guess is that this was achieved mainly by various filters put on top. But it is possible that fine tuning was used. Regardless, catastrophic forgetting remains a fundamental issue. There are various benchmarks you can take a look at, if you want.

The benchmarks tell you about what the existing systems do. They don't tell you about what's possible.

One of OpenAI's current projects is to figure out how to extract from the conversations that ChatGPT has valuable data for fine-tuning. 

There's no fundamental reason why it can't extract from the conversation it has all the relevant information and do fine-tuning to add it to its long-term memory. 

When it comes to ToS violations it seems evident that such a system is working, based on my interactions with it. ChatGPT has basically three ways to answer with normal text, with red text, and with custom answers which explain to you why it won't answer your query.

Both the red text answers and the custom answers increased over a variety of different prompts. When it does its red text answers there's a feedback button to tell them if you think it made a mistake. 

To me, it seems obvious that those red-text answers get used as training material for fine-tuning and that this helps with detecting similar cases in the future. 

I would consider InstructGPT, ChatGPT, GATO, and similar systems, to all be in the general reference class of systems that are "mostly big transformers, trained in a self-supervised way, with some comparably minor things added on top".

You could summarize InstructGPT's lesson as "You can get huge capability gains by comparably minor things added on top".

You can talk about how they are minor at a technical level but that doesn't change the fact that these minor things produce huge capability gains. 

In the future, there's also a lot of additional room to get more clever about providing training data. 

The benchmarks tell you about what the existing systems do. They don't tell you about what's possible.

Of course. It is almost certainly possible to solve the problem of catastrophic forgetting, and the solution might not be that complicated either. My point is that it is a fairly significant problem that has not yet been solved, and that solving it probably requires some insight or idea that does not yet exist. You can achieve some degree of lifelong learning through regularised fine-tuning, but you cannot get anywhere near what would be required for human-level cognition.

You could summarize InstructGPT's lesson as "You can get huge capability gains by comparably minor things added on top".

Yes, I think that lesson has been proven quite conclusively now. I also found systems like PaLM-SayCan very convincing for this point. But the question is not whether or not you can get huge capability gains -- this is evidently true -- the question is whether you get close to AGI without new theoretical breakthroughts. I want to know if we are now on (and close to) the end of the critical path, or whether we should expect unforeseeable breakthroughts to throw us off course a few more times before then. 

Suppose you have a biased coin, that comes up heads with p=0.6 and tails with p=0.4. Suppose you flip it 10 times.

That's a different case. 

If you have a text you can calculate for every word in the text the likelihood (L_text) how likely it would follow the preceding words in the text. You can also calculate the likelihood (L_ideal) of the most likely word that would follow the preceding text.

L_ideal - L_text is in Kahemann's words noise. If you look at a given text you can calculate the average of the noise for each word. 

The average noise that's produced by GPT3 is less than that of the average text on the internet. It would be surprising to encounter texts with so little noise randomly on the internet. 

Ah, now I get your point, sorry. Yes, it is true that GPTs are not incentivised to reproduce the full data distribution, but rather, are incentivised to reproduce something more similar to a maximum-likelihood estimate point distribution. This means that they have lower variance (at least in the limit), which may improve performance in some domains, as you point out. But individual samples from the model will still have a high likelihood under the data distribution. 

But individual samples from the model will still have a high likelihood under the data distribution. 

That's not true for maximum-likelihood distribution is general. It's been more than a decade since I dealt with that topic in university while studying bioinformatics but in the domain of bioinformatics maximum-likelihood distribution can frequently produce results that are impossible to appear in reality and there are a bunch of tricks to avoid that. 

To get back to the actual case of large language models, imagine there's a complex chain of verbal reasoning. The next correct word in that reasoning chain has a higher likelihood than 200 different words that could be used that lead to a wrong conclusion. The likelihood of the correct word might be 0.01.

A large language model might pick the right word for the reasoning chain for every word over a 1000-word reasoning chain. The result is one that would be very unlikely to appear in the real world. 

Some sort points:

"human-level question answering is believed to be AI-complete" - I doubt that. I think that we consistently far overestimate the role of language in our cognition, and how much we can actually express using language. The simplest example that come to mind is trying to describe a human face to an AI system with no "visual cortex" in a way that would let it generate a human image (e.g. hex representation of pixels). For that matter, try to describe something less familiar than a human face to a human painter in hope that they can paint it.

"GPT... already is better than humans at next-word prediction" - somewhat besides the point, but I do not think that the loss function that we use in training is actually the one that we care about. We don't care that match about specific phrasing, and use the "loss" of how match the content make sense, is true, is useful... Also, we are probably much better at implicit predictions than in explicit predictions, in ways that make us underperform in many tests.

Language Invites Mind Projection - anecdotally, I keep asking ChatGPT to do things that I know it would suck at, because I just can't bring myself to internalise the existence of something that is so fluent, so knowledgeable and so damn stupid at the same time.

Memorization & generalisation - just noting that it is a spectrum rather than a dichotomy, as compression ratios are. Anyway, the current methods don't seem to generalise well enough to overcome the sparsity of public data in some domains - which may be the main bottleneck in (e.g.) RL anyway.

"This, in turn, suggests a data structure that is discrete and combinatorial, with syntax trees, etc, and neural networks do (according to the argument) not use such representations" - let's spell the obvious objection - it is obviously possible to implement discrete representations over continuous representations. This is why we can have digital computers that are based on electrical currents rather than little rocks. The problem is just that keeping it robustly discrete is hard, and probably very hard to learn. I think that problem may be solved easily with minor changes of architecture though, and therefore should not effect timelines.

Inductive logic programming - generalise well in a much more restricted hypothesis space, as one should expect based on learning theory. The issue is that the real world is too messy for this hypothesis space, which is why it is not ruled by mathematicians/physicists. Is may be useful as an augmentation for a deep-learning agent though, the way that calculators are useful for humans.

"human-level question answering is believed to be AI-complete" - I doubt that. I think that [...]

Yes, these are also good points. Human-level question answering is often listed as a candidate for an AI-complete problem, but there are of course people who disagree. I'm inclined to say that question-answering probably is AI-complete, but that belief is not very strongly held. In your example of the painter; you could still convey a low-resolution version of the image as a grid of flat colours (similar to how images are represented in computers), and tell the painter to first paint that out, and then paint another version of what the grid image depicts.

We don't care that match about specific phrasing, and use the "loss" of how match the content make sense, is true, is useful...

Yes, I agree. Humans are certainly better than the GPTs at producing "representative" text, rather than text that is likely on a word-by-word basis. My point there was just to show that "reaching human-level performance on next-token prediction" does not correspond to human-level intelligence (and has already been reached).

Memorization & generalisation - just noting that it is a spectrum rather than a dichotomy, as compression ratios are. Anyway, the current methods don't seem to generalise well enough to overcome the sparsity of public data in some domains - which may be the main bottleneck in (e.g.) RL anyway.

I agree.

let's spell the obvious objection - it is obviously possible to implement discrete representations over continuous representations. This is why we can have digital computers that are based on electrical currents rather than little rocks. The problem is just that keeping it robustly discrete is hard, and probably very hard to learn.

Of course. The main question is if it is at all possible to actually learn these representations in a reasonable way. The main benefit from these kinds of representations would come from a much better ability to generalise, and this is only useful if they are also reasonably easy to learn. Consider my example with an MLP learning an identity function -- it can learn it, but it is by "memorising" it rather than "actually learning" it. For AGI, we would need a system that can learn combinatorial representations quickly, rather than learn them in the way that an MLP learns an identity function.

I think that problem may be solved easily with minor changes of architecture though, and therefore should not effect timelines.

Maybe, that remains to be seen. My impression is that the most senior AI researchers (Yoshua Bengio, Yann LeCun, Stuart Russell, etc) lean in the other direction (but I could be wrong about this). As I said, I feel a bit confused/uncertain about the force of the LoT argument.

Inductive logic programming - generalise well in a much more restricted hypothesis space, as one should expect based on learning theory.

To me, it is not at all obvious that ILP systems have a more restricted hypothesis space than deep learning systems. If anything, I would expect it to be the other way around (though this of course depends on the particular system -- I have mainly used metagol). Rather, the way I think of it is that ILP systems have a much stronger simplicity bias than deep learning systems, and that this is the main reason for why they can generalise better from small amounts of data (and the reason they don't work well in practice is that this training method is too expensive for more large-scale problems).

Thanks for the detailed response. I think we agree about most of the things that matter, but about the rest:

About the loss function for next word prediction - my point was that I'm not sure whether the current GPT is already superhuman even in the prediction that we care about. It may be wrong less, but in ways that we count as more important. I agree that changing to a better loss will not make it significantly harder to learn it any more the same as intelligence etc.

About solving discrete representations with architectural change - I think that I meant only that the representation is easy and not the training, but anyway I agree that training it may be hard or at least require non-standard methods.

About the inductive logic and describing pictures in low-resolution: I made the same communication mistake in both, which is to consider things that are ridiculously highly regulated as not part of the hypothesis space at all. There probably is a logical formula that describe the probability of a given image to be a cat, to every degree of precision. I claim that will will never be able to find or represent that formula, because it is so regulated against. And that this is the price that the theory forced us to pay for the generalisation.

Arguments 6-10 seem like the most interesting ones (as they respond more directly to the argument). But for all of them except argument 6, it seems like the same argument would imply that humans would not be generally intelligent.

[Argument 6]

The kinds of AI systems that we are worried about are the kinds of systems that can do original scientific research and autonomously form plans for taking over the world. LLMs are trained to write text that would be maximally unsurprising if found on the internet. These two things are fundamentally not the same thing. Why, exactly, would we expect that a system that is good at the latter necessarily would be able to do the former?

Because text on the Internet sometimes involves people using logic, reasoning, hypothesis generation, analyzing experimental evidence, etc, and so plausibly the simplest program that successfully predicts that text would do so by replicating that logic, reasoning etc, which you could then chain together to make scientific progress.

What does the argument say in response?

[Argument 7]

This, in turn, suggests a data structure that is discrete and combinatorial, with syntax trees, etc, and neural networks do (according to the argument) not use such representations.

How do you know neural networks won't use such representations? What is true of human brains but not of neural networks such that human brains can do this but neural networks can't?

(Particularly interested in this one since you said you found it compelling.)

[Argument 8]

However, neural networks do typically not have this ability, with most neural networks [...] instead being more analogous to Boolean circuits.

What is true of human brains but not neural networks such that human brains can represent programs but neural networks can't?

(I'd note that I'm including chain-of-thought as a way that neural networks can run programs.)

[Argument 9]

However, the fact that GPT-3 can eg play chess, but not solve a verbally described maze, is evidence that it relies on memorisation as well.

I would bet that you can play chess, but you cannot fold a protein (even if the rules for protein were verbally described to you). What's the difference?

[Argument 10]

If we instead try to train them continuously, then we run into the problem of catastrophic forgetting, which we currently do not know how to solve. 

Why doesn't this apply to humans as well? We forget stuff all the time.

But for all of them except argument 6, it seems like the same argument would imply that humans would not be generally intelligent.


Why is that?

Because text on the Internet sometimes involves people using logic, reasoning, hypothesis generation, analyzing experimental evidence, etc, and so plausibly the simplest program that successfully predicts that text would do so by replicating that logic, reasoning etc, which you could then chain together to make scientific progress.

What does the argument say in response?

There are a few ways to respond.

First of all, what comes after "plausibly" could just turn out to be wrong. Many people thought human-level chess would require human-like strategising, but this turned out to be wrong (though the case for text prediction is certainly more convincing).

Secondly, an LLM is almost certainly not learning the lowest K-complexity program for text prediction, and given that, the case becomes less clear. For example, suppose an LLM instead learns a truly massive ensemble of simple heuristics, that together produce human-like text. It seems plausible that such an ensemble could produce convincing results, but without replicating logic, reasoning, and etc. IBM-Watson did something along these lines. Studies such as this one also provide some evidence for this perspective.

To give an intuition pump, suppose we trained an extremely large random forest classifier on the same data as GPT3 was trained on. How good would the output of this classifier be? While it would probably not be as good as GPT3, it would probably still be very impressive. And a random forest classifier is also a universal function approximator, whose performance keeps improving as it is given more training data. I'm sure there are scaling laws for them. But I don't think many people believe that we could get AGI by making a sufficiently big random forest classifier for next-token prediction. Why is that? I have found this to be an interesting prompt to think about. For me, a gestalt shift that makes long time lines seem plausible is to look at LLMs sort of like how you would look at a giant random forest classifier.

(Also, just to reiterate, I am not personally convinced of long time-lines, I am just trying to make the best arguments for this view more easily available.)

How do you know neural networks won't use such representations?

I can't say this for sure, especially not for newer or more exotic architectures, but it does certainly not seem like these are the kinds of representations that deep learning systems are likely to learn. Rather, they seems much more likely to learn manifold-like representations, where proximity corresponds to relevant similarity, or something along those lines. Syntactically organised, combinatorial representations are certainly not very "native" to the deep learning paradigm.

It is worth clarifying that neural networks of course in principle could implement these representations, at least in the same sense as how a Boolean network can implement a Turing machine. The question is if they in practice can learn such representations in a reasonable way. Consider the example I gave with how an MLP can't learn an identity function, unless the training data essentially forces it to memorise one. The question is whether or not a similar thing is true of LoT-style representations. Can you think of a natural way to represent a LoT in a vector space, that a neural network might plausibly learn, without being "forced" by the training data?

As an extremely simple example, a CNN and an MLP will in practice not learn the same kinds of representations, even though the CNN model space is contained in the MLP model space (if you make them wide enough). How do I know that an MLP won't learn a CNN-like representation? Because these representations are not "natural" to MLPs, and the MLP will not be explicitly incentivised to learn them. My sense is that most deep learning systems are inclined away from LoT-like representations for similar reasons.

What is true of human brains but not of neural networks such that human brains can do this but neural networks can't?

A human brain is not a tabula rasa system trained by gradient descent. I don't know how a human brain is organised, what learning algorithms are used, or what parts are learnt as opposed to innate, etc, but it does not seem as though it works in the same way as a deep learning system. 

What is true of human brains but not neural networks such that human brains can represent programs but neural networks can't?

(I'd note that I'm including chain-of-thought as a way that neural networks can run programs.)

Here I will again just say that a human brain isn't a tabula rasa system trained by gradient descent, so it is not inherently surprising for one of the two to have a property that the other one does not.

Chain-of-thought and attention mechanisms do certainly do seem to bring deep learning systems much closer to the ability to reason in terms of variables. Whether or not it is sufficient, I do not know.

I would bet that you can play chess, but you cannot fold a protein (even if the rules for protein were verbally described to you). What's the difference?

Why wouldn't I be able to fold a protein? At least if the size of the relevant state space is similar to that of eg chess.

(Also, to be clear, GPT-3 struggles with verbally described mazes with as few as ~5 states.)

Why doesn't this apply to humans as well? We forget stuff all the time.

The argument would have to be that humans are more strategic with what to remember, and what to forget.

Meta: A lot of this seems to have the following form:

You: Here is an argument that neural networks have property X.

Me: But that argument as you've stated it would imply that humans have property X, which is false.

You: Humans and neural networks work differently, so it wouldn't be surprising if neural networks have property X and humans don't.

I think you are misunderstanding what I am trying to do here. I'm not trying to claim that humans and neural networks will have the same properties or be identical. I'm trying to evaluate how much I should update on the particular argument you have provided. The general rule I'm following is "if the argument would say false things about humans, then don't update on it". It may in fact be the case that humans and neural networks differ on that property, but if so it will be for some other reason. There is a general catchall category of "maybe something I didn't think about makes humans and neural networks different on this property", and indeed I even assign it decently high probability, but that doesn't affect how much I should update on this particular argument.

Responding to particular pieces:

Why is that?

The rest of the comment was justifying that.

Studies such as this one also provide some evidence for this perspective.

I'm not seeing why that's evidence for the perspective. Even when word order is scrambled, if you see "= 32 44 +" and you have to predict the remaining number, you should predict some combination of 76, 12, and -12 to get optimal performance; to do that you need to be able to add and subtract, so the model presumably still develops addition and subtraction circuits. Similarly for text that involves logic and reasoning, even after scrambling word order it would still be helpful to use logic and reasoning to predict which words are likely to be present. The overall argument for why the resulting system would have strong, general capabilities seems to still go through.

In addition, I don't know why you expect that intelligence can't be implemented through "a truly massive ensemble of simple heuristics".

But I don't think many people believe that we could get AGI by making a sufficiently big random forest classifier for next-token prediction. Why is that?

Huh, really? I think that's pretty plausible, for all the same reasons that I think it's plausible in the neural network case. (Though not as likely, since I haven't seen the scaling laws for random forests extend as far as in the neural network case, and the analogy to human brains seems slightly weaker.) Why don't you think a big random forest classifier could lead to AGI?

Can you think of a natural way to represent a LoT in a vector space, that a neural network might plausibly learn, without being "forced" by the training data?

But it is "forced" by the training data? The argument here is that text prediction is hard enough that the only way the network can do it (to a very very high standard) is to develop these sorts of representation?

I certainly agree that a randomly initialized network is not going to have sensible representations, just as I'd predict that a randomly initialized human brain is going to have sensible representations (modulo maybe some innate representations encoded by the genome). I assume you are saying something different from that but I'm not sure what.

it does not seem as though it works in the same way as a deep learning system. 

But why not? If I were to say "it seems as though the human brain works like a deep learning system, while of course being implemented somewhat differently", how would you argue against that?

Why wouldn't I be able to fold a protein? At least if the size of the relevant state space is similar to that of eg chess.

Oh, is your point "LLMs do not have a general notion of search that they can apply to arbitrary problems"? I agree this is currently true, whereas humans do have this. This doesn't seem too relevant to me, and I don't buy defining memorization as "things that are not general-purpose search" and then saying "things that do memorization are not intelligent", that seems too strong.

The argument would have to be that humans are more strategic with what to remember, and what to forget.

Do you actually endorse that response? Seems mostly false to me, except inasmuch as humans can write things down on external memory (which I expect an LLM could also easily do, we just haven't done that yet).

The general rule I'm following is "if the argument would say false things about humans, then don't update on it".

Yes, this is of course very sensible. However, I don't see why these arguments would apply to humans, unless you make some additional assumption or connection that I am not making. Considering the rest of the conversation, I assume the difference is that you draw a stronger analogy between brains and deep learning systems than I do?

I want to ask a question that goes something like "how correlated is your credence that arguments 5-10 apply to human brains with your credence that human brains and deep learning systems are analogous in important sense X"? But because I don't quite know what your beliefs are, or why you say that arguments 5-10 apply to humans, I find it hard to formulate this question in the right way.

For example, regarding argument 7 (language of thought), consider the following two propositions:

  1. Some part of the human brain is hard-coded to use LoT-like representations, and the way that these representations are updated in response to experience is not analogous to gradient descent.
  2. Updating the parameters of a neural network with gradient descent is very unlikely to yield (and maintain) LoT-like representations.

These claims could both be true simultaneously, no? Why, concretely, do you think that arguments 5-10 apply to human brains?

I'm not seeing why that's evidence for the perspective. Even when word order is scrambled, if you see "= 32 44 +" and you have to predict the remaining number, you should predict some combination of 76, 12, and -12 to get optimal performance; to do that you need to be able to add and subtract, so the model presumably still develops addition and subtraction circuits. Similarly for text that involves logic and reasoning, even after scrambling word order it would still be helpful to use logic and reasoning to predict which words are likely to be present. The overall argument for why the resulting system would have strong, general capabilities seems to still go through.

It is empirically true that the resulting system has strong and general capabilities, there is no need to question that. What I mean is that this is evidence that those capabilities are a result of information processing that is quite dissimilar from what humans do, which in turn opens up the possibility that those processes could not be re-tooled to create the kind of system that could take over the world. In particular, they could be much more shallow than they seem.

It is not hard to argue that a model with general capabilities for reasoning, hypothesis generation, and world modelling, etc, would get a good score at the task of an LLM. However, I think one of the central lessons from the history of AI is that there probably also are many other ways to get a good score at this task.

In addition, I don't know why you expect that intelligence can't be implemented through "a truly massive ensemble of simple heuristics".

Given a sufficiently loose definition of "intelligence", I would expect that you almost certainly could do this. However, if we instead consider systems that would be able to overpower humanity, or very significantly shorten the amount of time before such a system could be created, then it is much less clear to me.
 

Why don't you think a big random forest classifier could lead to AGI?

I don't rule out the possibility, but it seems unlikely that such a system could learn representations and circuits that would enable sufficiently strong out-of-distribution generalisation.

But it is "forced" by the training data? The argument here is that text prediction is hard enough that the only way the network can do it (to a very very high standard) is to develop these sorts of representation?

I think this may be worth zooming in on. One of the main points I'm trying to get at is that it is not just the asymptotic behaviour of the system that matters; two other (plausibly connected) things which are at least as important is how well the system generalises out-of-distribution, and how much data it needs to attain that performance. In other words, how good it is at extrapolating from observed examples to new situations. A system could be very bad at this, and yet eventually with enough training data get good in-distribution performance.

The main point of LoT-like representations would be a better ability to generalise. This benefit is removed if you could only learn LoT-like representation by observing training data corresponding to all the cases you would like to generalise to.

I certainly agree that a randomly initialized network is not going to have sensible representations, just as I'd predict that a randomly initialized human brain is going to have sensible representations (modulo maybe some innate representations encoded by the genome). I assume you are saying something different from that but I'm not sure what.

Yes, I am not saying that.

Maybe if I rephrase it this way; to get us to AGI, LLMs would need to have a sufficiently good inductive bias, but I'm not convinced that they actually have a sufficiently good inductive bias.

But why not? If I were to say "it seems as though the human brain works like a deep learning system, while of course being implemented somewhat differently", how would you argue against that?

It is hard for me to argue against this, without knowing in more detail what you mean by "like", and "somewhat differently", as well as knowing what pieces of evidence underpin this belief/impression.

I would be quite surprised if there aren't important high-level principles in common between deep learning and at least parts of the human brain (it would be a bit too much of a coincidence if not). However, this does not mean that deep learning (in its current form) captures most of the important factors behind human intelligence. Given that there are both clear physiological differences (some of which seem more significant than others) and many behavioural differences, I think that the default should be to assume that there are important principles of human cognition that are not captured by (current) deep learning.

I know several arguments in favour of drawing a strong analogy between the brain and deep learning, and I have arguments against those arguments. However, I don't know if you believe in any of these arguments (eg, some of them are arguments like "the brain is made out of neurons, therefore deep learning"), so I don't want to type out long replies before I know why you believe that human brains work like deep learning systems.

Oh, is your point "LLMs do not have a general notion of search that they can apply to arbitrary problems"? I agree this is currently true, whereas humans do have this. This doesn't seem too relevant to me, and I don't buy defining memorization as "things that are not general-purpose search" and then saying "things that do memorization are not intelligent", that seems too strong.

Yes, that was my point. I'm definitely not saying that intelligence = search, I just brought this up as an example of a case where GPT3 has an impressive ability, but where the mechanism behind that ability is better construed as "memorising the training data" rather than "understanding the problem". The fact that the example involved search was coincidental.

Do you actually endorse that response? Seems mostly false to me, except inasmuch as humans can write things down on external memory (which I expect an LLM could also easily do, we just haven't done that yet).

I don't actually know much about this, but that is the impression I have got from speaking with people who work on this. Introspectively, it also feels like it's very non-random what I remember. But if we want to go deeper into this track, I would probably need to look more closely at the research first.

However, I don't see why these arguments would apply to humans

Okay, I'll take a stab at this.

6. Word Prediction is not Intelligence

"The kinds of humans that we are worried about are the kinds of humans that can do original scientific research and autonomously form plans for taking over the world. Human brains learn to take actions and plans that previously led to high rewards (outcomes like eating food when hungry, having sex, etc)*. These two things are fundamentally not the same thing. Why, exactly, would we expect that a system that is good at the latter necessarily would be able to do the former?"

*I expect that this isn't a fully accurate description of human brains, but I expect that if we did write the full description the argument would sound the same.

7. The Language of Thought

"This, in turn, suggests a data structure that is discrete and combinatorial, with syntax trees, etc, and humans do (according to the argument) not use such representations. We should therefore expect humans to at some point hit a wall or limit to what they are able to do."

(I find it hard to make the argument here because there is no argument -- it's just flatly asserted that neural networks don't use such representations, so all I can do is flatly assert that humans don't use such representations. If I had to guess, you would say something like "matrix multiplications don't seem like they can be discrete and combinatorial", to which I would say "the strength of brain neuron synapse firings doesn't seem like it can be discrete and combinatorial".)

8. Programs vs Circuits

We know, from computer science, that it is very powerful to be able to reason in terms of variables and operations on variables. It seems hard to see how you could have human-level intelligence without this ability. However, humans do not typically have this ability, with most human brains instead being more analogous to Boolean circuits, given their finite size and architecture of neuron connections.

9. Generalisation vs Memorisation

In this one I'd give the protein folding example, but apparently you think you'd be able to fold proteins just as well as you'd be able to play chess if they had similar state space sizes, which seems pretty wild to me.

Do you perhaps agree that you would have a hard time navigating in a 10-D space? Clearly you have simply memorized a bunch of heuristics that together are barely sufficient for navigating 3-D space, rather than truly understanding the underlying algorithm for navigating spaces.

10. Catastrophic Forgetting

(Discussed previously, I think humans are not very deliberate / selective about what they do / don't forget, except when they use external tools.)

In some other parts, I feel like in many places you are being one-sidedly skeptical, e.g.

  • "In particular, they could be much more shallow than they seem."
    • They could also be much more general than they seem.
  • I don't rule out the possibility, but it seems unlikely that such a system could learn representations and circuits that would enable sufficiently strong out-of-distribution generalisation.
    • Perhaps it would enable even stronger OOD generalisation than we have currently.

There could be good reasons for being one-sidedly skeptical, but I think you need to actually say what the reasons are. E.g. I directionally agree with you on the random forests case, but my reason for being one-sidedly skeptical is "we probably would have noticed if random forests generalized better and used them instead of neural nets, so probably they don't generalize better". Another potential argument is "decision trees learn arbitrary piecewise linear decision boundaries, whereas neural nets learn manifolds, reality seems more likely to be the second one" (tbc I don't necessarily agree with this).

""
The kinds of humans that we are worried about are the kinds of humans that can do original scientific research and autonomously form plans for taking over the world. Human brains learn to take actions and plans that previously led to high rewards (outcomes like eating food when hungry, having sex, etc)*. These two things are fundamentally not the same thing. Why, exactly, would we expect that a system that is good at the latter necessarily would be able to do the former?"
""

This feels like a bit of a digression, but we do have concrete examples of systems that are good at eating food when hungry, having sex, and etc, without being able to do original scientific research and autonomously form plans for taking over the world; animals. And the difference between humans and animals isn't just that humans have more training data (or even that we are that much better at survival and reproduction in the environment of evolutionary adaptation). But I should also note that I consider argument 6 to be one of the weaker arguments I know of.

""
We know, from computer science, that it is very powerful to be able to reason in terms of variables and operations on variables. It seems hard to see how you could have human-level intelligence without this ability. However, humans do not typically have this ability, with most human brains instead being more analogous to Boolean circuits, given their finite size and architecture of neuron connections.
""

The fact that human brains have a finite size and architecture of neuron connections does not mean that they are well-modelled as Boolean circuits. For example, a (real-world) computer is better modelled as a Turing machine than as a finite-state automaton, even though there is a sense in which they actually are finite-state automata. 

The brain is made out of neurons, yes, but it matters a great deal how those neurons are connected. Depending on the answer to that question, you could end up with a system that behaves more like Boolean circuits, or more like a Turing machine, or more like something else.

With neural networks, the training algorihtm and the architecture together determine how the neurons end up connected, and therefore, if the resulting system is better thought of as a Boolean circuit, or a Turing machine, or otherwise. If the wiring of the brain is determined by a different mechanism than what determines the wiring of a deep learning system, then the two systems could end up with very different properties, even if they are made out of similar kinds of parts.

With the brain, we don't know what determines the wiring. This makes it difficult to draw strong conclusions about the high-level behaviour of brains from their low-level physiology. With deep learning, it is easier to do this.

""
I find it hard to make the argument here because there is no argument -- it's just flatly asserted that neural networks don't use such representations, so all I can do is flatly assert that humans don't use such representations. If I had to guess, you would say something like "matrix multiplications don't seem like they can be discrete and combinatorial", to which I would say "the strength of brain neuron synapse firings doesn't seem like it can be discrete and combinatorial".
""

What representations you end up with does not just depend on the model space, it also depends on the learning algorithm. Matrix multiplications can be discrete and combinatorial. The question is if those are the kinds of representations that you in fact would end up with when you train a neural network by gradient descent, which to me seems unlikely. The brain does (most likely) not use gradient descent, so the argument does not apply to the brain.

""
Do you perhaps agree that you would have a hard time navigating in a 10-D space? Clearly you have simply memorized a bunch of heuristics that together are barely sufficient for navigating 3-D space, rather than truly understanding the underlying algorithm for navigating spaces.
""

It would obviously be harder for me to do this, and narrow heuristics are obviously an important part of intelligence. But I could do it, which suggests a stronger transfer ability than what would be suggested if I couldn't do this.

""
In some other parts, I feel like in many places you are being one-sidedly skeptical.
"" 

Yes, as I said, my goal with this post is not to present a balanced view of the issue. Rather, my goal is just to summarise as many arguments as possible for being skeptical of strong scaling. This makes the skepticism one-sided in some places.

as a person with very short timelines, pure strong scaling was never what I expected. I expect somewhat strong scaling combined with a wider range of algorithms besides unsupervised.