Epistemic status: some of the technological progress parts of this I've been thinking about for many years, other more LLM-specific parts I have been thinking about for months or days.

TL;DR An extremely high capacity LLM trained on extremely large amounts content from humans will simulate human content extremely accurately, but won't simulate content from superhumans. I discuss how we might try to change this, and show that this is likely to be an inherently slow process with unfavorable scaling power laws. This might make a fast take-off difficult for any AI based on LLMs.

LLM are trained as simulators for token-generating processes, which (in any training set derived from the Internet) are generally human-like or human-derived agents. The computational capacity of these simulated agents is bounded above by the forward-pass computational capacity of the LLM, but is not bounded below. An extremely large LLM could, and frequently will, produce an exquisitely accurate portrayal of a very average human: a sufficiently powerful LLM may be very superhuman at the task of simulating normal humans with an IQ of around 100, far better at it than any human writer or improv actor — but whatever average human it simulates won't be superhuman, and that capability is not FOOM-making material.

Suppose we had an LLM whose architecture and size was computationally capable in a single forward pass of doing a decent simulation of a human with an IQ of, let's say,  (to the extent that an IQ that high is even a meaningful concept: let's make this number better defined by also assuming that this is  times the forward-pass computational capacity needed to do a decent simulation of an IQ 100 human). In its foundation model form, this LLM is never going to actually simulate someone with IQ ~1000, since its pretraining distribution contains absolutely no text generated by humans with an IQ of ~1000 (or even IQ over 200): this behavior is way, way out-of-distribution. Now, the Net (and presumably the training set derived from it for this LLM) does contain plenty of text generated very slowly and carefully with many editing passes and much debate by groups of people in the IQ ~100 to ~145 range, such as Wikipedia and scientific papers, so we would reasonably expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing. This is useful, valuable, and impressive, and might well help somewhat with the early stages of a FOOM, but it's still not the same thing as actually simulating agents with IQ 1000, and it's not going to get you to a technological singularity: at some point, that sort of capabilities will top out.

But that's just the foundation model. Next, the model presumably gets tuned and prompted somehow to get it to simulate (hopefully well-aligned) smarter human-like agents, outside the pretraining distribution. A small (gaussian-tail-decreasing) amount of pretraining text from humans with IQs up to ~160 (around four standard deviations above the mean) is available, and let us assume that very good use is made of it during this extrapolation process. How far would that let the model accurately extrapolate out-of-distribution, past where it has basically any data at all: to IQ 180, probably; 200, maybe; 220, perhaps?

If Hollywood is a good guide, IQ 80-120 humans are abysmal at extrapolating what a character with IQ 160 would do: any time a movie character is introduced as a genius, there are a very predictable set of mistakes that you instantly know they're going to make during the movie (unless them doing so would actively damage the plot). With the debatable exceptions of Real Genius and I.Q., movie portrayals of geniuses are extremely unrealistic. Yet most people still enjoy watching them. Big Bang Theory was probably the first mass media to accurately portray rather smart people (even if it still had a lot of emphasis on their foibles), and most non-nerds didn't react like this was particularly new, original, or different.

Hollywood/TV aside, how hard is it to extrapolate the behavior of a higher intelligence? Some things, what one might call zeroth-order effects, are obvious. The smarter system can solve problems the dumber one could usually solve, but faster and more accurately: what one might describe as "more-of-the-same" effects, which are pretty easy to predict. There are also what one might call first order effects: the smarter system has a larger vocabulary, has learnt more skills (having made smarter use of available educational facilities and time). It can pretty reliably solve problems that the dumber one can only solve occasionally. These are what one might call "like that but more so" effects, and are still fairly easy to predict. Then there are what one might call second-order effects: certain problems that the dumber system had essentially zero chance of solving, the smarter system can sometimes solve: it has what are in the AI business are often called "emergent abilities". These are frequently hard to predict, and especially so if you've never seen any systems that smart before. [There is some good evidence that using metrics that effectively have skill thresholds built into them greatly exaggerates the emergentness of new behaviors, and that on more sensible metrics almost all new behaviors emerge slowly with scale. Nevertheless, there are doubtless things that anyone with IQ below 180 just has 0% probability of achieving, these are inherently hard to predict if you've never seen any examples of them, and they may be very impactful even if the smarter system's success chance for them is still quite small: genius is only 1% inspiration, as Einstein pointed out.] Then there are the third-order consequences of its emergent abilities: those emergent abilities combining and interacting with each other and with all of its existing capabilities in non-obvious ways, which are even harder to predict. Then there are fourth-and-higher order effects: to display the true capabilities of the smarter system, we need to simulate not just one that spent its life as a lonely genius surrounded by dumber systems, but that instead one that grew up in a society of equally-smart peers, discussing ideas with them and building on the work of equally-smart predecessors, educated by equally-smart teachers using correspondingly sophisticated educational methods and materials.

So I'm not claiming that doing a zeroth-order or even first-order extrapolation up to IQ 1000 is very hard. But I think that adding in the second, third, fourth, and fifth-plus-order effects to that extrapolation are increasingly hard, and I think those higher order effects are large, not small, in importance compared to the zeroth and first-order terms. Someone who can do what an IQ 100 person can do but at 10x the speed while using 10x the vocabulary and with a vanishingly small chance of making a mistake is nothing like as scary as an actual suitably-educated IQ 1000 hypergenius standing on the shoulders of generations of previous hypergeniuses.

Let's be generous, and say that a doable level of  extrapolation from IQ 80-160 gets the model to be able to reasonably accurately simulate human-like agents with an IQ of all the way up to maybe about IQ 240. At this point, I can only see one reasonable approach to get any further: you need to have these IQ 240 agents generate text. Lots and lots of text. As in, at a minimum, of the order of an entire Internet's worth of text. Probably more, since IQ 240 behavior is more complex and almost certainly needs a bigger data set to pin it down. After that, we need to re-pretrain our LLM on this new training set.

[I have heard it claimed, for example by Sam Altman during a public interview, that a smarter system wouldn't need anything like as large a dataset to learn from as LLMs currently do. Humans are often given as an existence proof: we observably learn from far fewer text/speech tokens than a less capable LLM does. Of course, the number of non-text token-equivalents from vision, hearing, touch, smell, body position and all our other senses we learn from is less clear, and could easily be a couple-of-orders-of-magnitude larger than our text and speech input. However, humans are not LLMs and we have a lot more inbuilt intuitive biases from our genome. We have thousands of different types of neurons, let alone combinations of them into layers, compared to a small handful for a transformer. While much of our recently-evolved overinflated neocortex has a certain 'bitter-lesson-like' "just scale it!" look to it, the rest of our brain looks very much like a large array of custom-evolved purpose-specific modules all wired together in a complicated arrangement: a most un-bitter-lesson-like design. The bitter lesson isn't about what gives the most efficient use of processing power, it's about what allows the fastest rate of technological change: less smart engineering and more raw data and processing power. Humans are also, as Eliezer has pointed out, learning to be one specific agent of our intelligence level, not how to simulate any arbitrary member of an ensemble of them. As for the LLMs, it's the transformer model that is bring pretrained, not the agents it can simulate. LLMs don't use higher order logic: they use stochastic gradient descent to learn to simulate systems that can do higher-order logic. Their learning process doesn't get to apply the resulting higher order logic to itself, by any means more sophisticated than SGD descending the gradient curve of its outcome to a closer match to whatever answers are in the pretraining set. So I see no reason to expect that the scaling laws for LLMs are going to suddenly and magically dramatically improve to more human-like dataset sizes as our LLMs "get smarter". You mileage may of course vary, but I think Sam Altman was either being extremely optimistic, prevaricating, or else expects to stop using LLMs at some point. This does suggest that there could be a capability overhang, telling us that LLMs are not computationally efficient, or at least not data-efficient — they're just efficient in a bitter-lesson sense, as the quickest technological shortcut to building a brain: a lot faster than whole brain emulation or reverse engineering the human brain, but quite possibly less efficient, or at very least less data-efficient.] 

If that's the case, then as long as we're using LLMs, the Chinchilla scaling laws will continue to apply, unless and until they taper off into something different (by Sod's law, probably worse). An LLM capable of simulating IQ 240 well is clearly going to need at least 2.4 times as many parameters as one for IQ 100 (I suspect it might be more like — but I can't prove it, so let's be generous and assume I'm wrong here). So by Chinchilla, that means we're going to need 2.4 times as large a training set generated at IQ ~240 as the IQ ~100 Internet we started off with. So 2.4 times as much content, all generated by things with 2.4 times the inherent computational cost, for a total cost of  times creating the Internet. [And that's assuming the simulation doesn't require any new physical observations of the world, just simulated thinking time, which seems highly implausible to me: some of those IQ 240 simulations will be of scientists writing papers about the experiments they performed, which will actually need to be physically performed for the outputs to be any use.]

On top of the fact that our first internet was almost free (all we had to do was spider and filter it, rather than simulate the writing of it), that's a nasty power law. We're going to need to do this again, and again, at more stages on the way to IQ ~1000, and each time we increase the IQ by a factor of k, the cost of doing this goes up by  [again, ignoring physical experiment costs]. 

Now, lets remove the initial rhetorical assumption that we have an LLM much more powerful than we need, and look at this more realistically as part of an actual process of technological development progress, that needs to be repeated every time our LLM-simulated agents get -fold smarter. The "create, and then re-pretrain from a bigger Internet" requirement remains, and the computational cost of doing this still goes up by . That's not a encouraging formula for FOOM: that looks a more like formula for a subexponential process where each generation takes  times longer than the last (on the assumption that our computational power had gone up k-fold, enough to run our smarter agents at the same speed).

[Is a -fold improvement in computational capacity between generations a plausible assumption, if we get to rebuild our processing hardware each time out agents get -fold smarter? At first, almost certainly not: things like computational capacity normally tend to go up exponentially with technological generations, which is what a k-fold increase in IQ should empower, so as  for some constant . With that in the divisor of the time between generations, a fiddling little  in the numerator isn't going to stop the process being superexponential. However, I suspect, for fairly simple physical reasons, that processing power per atom of computronium at normal temperatures has a practical maximum, and speed of light traveling between atoms limits how fast you can run something of a given complexity, so the only way to continually geometrically increase your processing power is to geometrically increase the proportion of the planet (or solar system) that you've turned into computronium and its power supplies (just like humans have been doing for human brain computronium), which in turn has practical limits: large ones, but ones a geometrical process could hit soon. Sufficiently exotic not-ordinary-matter forms of computronium might modify this, but repeating this trick each technological generation is likely to be hard, and this definitely isn't a type of FOOM that I'd want to be on the same planet as. Once you start capping out near the theoretical limits of the processing power of ordinary matter for your computronium, and you've picked all the low-hanging fruit on algorithmic speedups, progress isn't going to stay exponential, and I find it really hard to predict what power law it might asymptote towards: you're left with algorithmic speedups from better organizing your hierarchical speed-of-light-capped data-flows. A case could be made that fundamental limits are limits, and that it asymptotes to O(1), but that feels a bit too pessimistic to me. So a  in the numerator may matter, or it may still be negligible, and most of my uncertainty here is on power law of the the denominator. For now, I'm going to very arbitrarily assume that it asymptotes to , which is enough for the overall process to be superexponential before accounting for the numerator, but subexponential afterwards. That happens to be the power law that lets us run our -fold smarter agents at the same speed, despite their increased complexity. Yes, I'm cherry-picking the exponent of a hard-to-predict power law in order to get an interesting result — please note the word 'May' in the title of the article.]

The standard argument for the likelihood of FOOM and the possibility of a singularity is the claim that technological process is inherently superexponential, inherently a J-shaped curve: that progress shortens the doubling time to more progress. If you look at the history of Home sapiens' technology from the origin of our species to now, it definitely looks like a superexponential J-shaped curve. What's less clear is whether it still looks superexponential if you divide the rate of technological change by the human population at the time, or equivalently use cumulative total human lifespans rather than time as the x-axis. My personal impression is that if you do that then it looks like the total number of human lifespans between different major technological advances is fairly constant, and it's a boring old exponential curve. If I'm correct, then the main reason for Homo sapiens' superexponential curve is that technological improvements also enlarge the human population-carrying capacity of the Earth, and improve our communication enough to handle this, so let us do more invention work in parallel.  So I'm not entirely convinced that technological change is in fact inherently superexponental, short of dirty (or at least ecologically unsound) tricks like that, which might-or-might-not be practicable for an ASI trying to FOOM to replicate. [Of course, Homo sapiens wasn't actually getting smarter, only more numerous, better educated and better interconnected — that could well make a difference.

However, even if I'm wrong and technology inherently is a superexponential process, this sort of  power law is a plausible way to convert a superexponential back to an exponential or even subexponential. Whether this happens depends just how superexponential your superexponential is: so that means the expected FOOM may, or may not, instead be just a rising curve with no singularity within any finite time.

Now, one argument here would be that this is telling us that LLMs are inefficient, our AIs need to switch to building their agent minds directly, and this is just a capacity overhang. But I think the basic argument still applies, even after doing this: something  times smarter needs  times the processing power to run. But to reach its full capability, it also needs suitable cultural knowledge as developed by things as smart as it. That will be bigger, by some power of , call it , than the cultural knowledge needed by the previous generation. So the total cost of generating that knowledge goes up by a power of . I'm pretty sure  will be around 1 to 2, so  is in the range around 2 to 3. So changing to a different architecture still doesn't get rid of the unhelpful power law. Chinchilla is a general phenomenon: to reach their full potential, smarter things need more training data, and the cost of creating that (and training) on it goes up as the product of the two.

So, I'm actually somewhat dubious that FOOM or a singularity is possible at all, for any cognitive architecture, given finite resource limits, once your computronium efficiency starts to max out. But it definitely looks harder with LLM scaling laws.

So, supposing that all this hypothesizing were correct, then what would this mean for ASI timelines? It doesn't change timelines until a little after transformative AGI is achieved. But it would mean that the common concern that AGI might be followed only a few years or even months later by a FOOM singularity was mistaken. If so, then we would find ourselves able to cheaply apply many (hopefully well-aligned) agents that were perhaps highly superhuman in certain respects, but that overall were merely noticeably smarter than the smartest human who has ever lived, to any and all problems we wanted solved. The resulting scientific and technological progress and economic growth would clearly be very fast. But the AIs may tell us that, for reasons other than simple processing power, creating an agent much smarter than them is a really hard problem. Or, more specifically, that building it isn't that hard, but preparing the syllabus for properly training/educating it is. They're working on it, but it's going to take even them a while. Plus, the next generation after that will clearly take longer, and the one after that longer still, unless we want to allow them to convert a growing number of mountain ranges into computronium and deserts into solar arrays. Or possibly the moon.

Am I sure enough of all this to say "Don't worry, FOOM is impossible, there definitely will not be a singularity?" No. In addition to the uncertainty about how effective processing power per atom asymptotes, Grover's algorithm running on quantum hardware might change the power laws involved just enough to make things superexponential again, say by shifting  to . Or we might well follow the "Computronium/population growth? Sure!" path to a J-shaped curve, at least until we've converted the Solar System into a Dyson swarm. However, this argument has somewhat reduced my . Or at least my .

PostScript Edit: Given what I've been reading due to recent speculations around Q* since I wrote this post, plus some of the comments below, I now want to add a significant proviso to it. There are areas, such as Mathematics, physical actions in simulated environments, and perhaps also coding, where it's possible to get rapid and reliable objective feedback on correctness at not-exorbitant costs. For example, in Mathematics, systems such as automated proof checkers can check sufficiently detailed mathematical proofs (written in Lean or some equivalent language), and, as sites like HackerRank demonstrate, automated testing of solutions to small software problems can also be achieved. So in areas like these where you can arrange to get rapid, accurate feedback, automated generation of high-quality synthetic training data to let you rapidly scale performance up to far superhuman levels may be feasible. The question is, can this be extended to a wide-enough variety of different training tasks to cover full AGI, or at least, to cover all the STEM skills needed to go FOOM. I suspect the answer is no, but it's a thought-provoking question. Alternatively, we might have IQ 1000 AI mathematicians a long time before we have the same level of performance in fields like science where verifying that research is correct and valuable takes a lot longer.

New to LessWrong?

New Comment
30 comments, sorted by Click to highlight new comments since: Today at 2:57 PM

so we would reasonable expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing


... so we would reasonably expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing

Much appreciated! Fixed

I found this really useful, particularly the start. I don't think it matters very much how fast AGI progresses after it's exceeded human intelligence by a significant amount, because we will have lost control of the world to it the first time an AGI can outsmart the relevant humans. Slow progression in the parahuman range does make a difference, though - it makes the outcome of that battle of wits less certain, and may offer us a warning shot of dangerous AGI behavior.

I think there are other routes to improving LLM performance by scaffolding them with other tools and cognitive algorithms. LLMs are like a highly capable System 1 in humans - they're what we'd say on first thought. But we improve our thinking by internal questioning, when the answer is important. I've written about this in Capabilities and alignment of LLM cognitive architectures https://www.lesswrong.com/posts/ogHr8SvGqg9pW5wsT/capabilities-and-alignment-of-llm-cognitive-architectures. I'm very curious what you think about this route to improving LLMs effective intelligence.

This approach can also make the LLM agentic. The standard starting prompt is "create a plan that accomplishes X". That's scary, but also an opportunity to include alignment goals in natural language. I think this is our most likely route to AGI and also our best shot at successful alignment (of approaches proposed so far).

I agree that, in the context of an agent built from an LLM and scaffolding (such as memory and critic systems), the LLM is analogous to the human System 1. But in general, LLMs capability profiles are rather different than those of humans humans (for example, no human is as well-read as even GPT-3.5, while LLMs have specific deficiencies that we don't, for example around counting, character/word representations of text, instruction following, and so forth). So the detailed "System 1" capabilities of such an architecture might not look much like human System 1 capabilities — especially if the LLM was dramatically larger than current ones. For example, for a sufficiently large LLM trained using current techniques I'd expect "produce flawless well-edited text, at a quality that would take a team of typical humans days or weeks" to be a very rapid System 1 activity.

But no model of a human mind on its own could really predict the tokens LLMs are trained on, right? Those tokens are created not only by humans, but by the processes that shape human experience, most of which we barely understand. To really accurately predict an ordinary social media post from one year in the future, for example, an LLM would need superhuman models of politics, sociology, economics, etc. To very accurately predict an experimental physics or biology paper, an LLM might need superhuman models of physics or biology. 

Why should these models be limited to human cultural knowledge? The LLM isn't predicting what a human would predict about politics or physics; it's predicting what a human would experience- and its training gives it plenty of opportunity to test out different models and see how predictive they are of descriptions of that experience in its data set.

How to elicit that knowledge in conversational text? Why not have the LLM predict tokens generated by itself? An LLM with a sufficiently accurate and up-to-date world model should know that it has super-human world-models. Whether it would predict that it would use those models when predicting itself might be kind of a self-fulfilling prophesy, but if the prediction comes down to a sort of logical paradox, maybe you could sort of nudge it into resolving that paradox on the side of using those models with RLHF.

Of course, none of that is a new idea- that sort of prompting is how most commercial LLMs are set up these days. As an empirical test, maybe it would be worth it to find out in which domains GPT4 predicts ChatGPT is superhuman (if any), and then see if the ChatGPT prompting produces superhuman results in those domains.

You basically assume that the only way to make a LLM better is to give it training data that's similar in structure to the random internet data but written in a higher IQ way.

I don't think there's a good reason to assume that this is true. 

Look at humans' ability at facial recognition and how it differs between different people. The fact that some people have "face-blindness" suggests that we have a pretty specialized model for handling faces that's not activated in all people. A person with face-blindness is not lower or higher IQ than a person who doesn't have it. 

For LLMs you can create training data to make it learn specific abilities at high expertise. Abilities around doing probabilistic reasoning for example can likely be much higher than human default performance at similar levels of IQ.

A valid point. Actually I'm assuming that that is what is necessary to get an LLM to model a more capable General Intelligence (i.e. to train it on a sufficiently wide range of tasks that it can extrapolate out-of-distribution in a great many directions). In general, what people have been finding seems to be that fine-tuning an LLM on dataset much smaller that it pre-training set can bring out latent abilities or behaviors that it already had, or add narrow new capabilities, but making it a whole lot smarter in general requires a dataset comparable in size to the one it was pretrained on. This is a widespread observation, seems very plausible, and would fit with the scaling "laws", but like much with LLMs it's not proven fact. Narrow superintelligence is a much easier problem (see, for example, all the things Deepmind is famous for over the last few years: far smarter than GPT-4 across a much narrower range of tasks). Ditto for adding tool use to an LLM, such as the things OpenAI and others have been building recently. So yes, I am assuming that going FOOM requires you to keep increasing and broadening your broad General Intelligence, and that that requires very large token counts. If the first assumption is wrong, and some combination of a finite general intelligent level and continuing to further scale some smallish set of narrow hyperintelligence abilities is sufficient to get you all the way to a singularity, them my argument fails.

Implicit in my assumptions here, and probably worth stating, is that if humanity went Butlerian Jihad and never created AGI, and kept our total population in the less than 10 billion range, then our technological development would eventually slow to a crawl, and perhaps even top out at some "top of the sigmoid curve" capacity level, limited by our IQ. This is of course an untested assumption. I personally find it fairly plausible: we have a lot of narrow subsubspecialities where the total number of experts in the word is O(10), and technological progress is creating more and more of them. But I could be wrong, and if I am, that could affect my argument.

I'm also assuming that, at any finite intelligence level, both the "just run more of them" and "just run them faster" approaches cannot be scaled indefinitely, and hit resource limits for the first one, and speed of light times distance between atoms limits for the second.

Narrow superintelligence is a much easier problem

Once non-superintelligent AGI can build domain specific narrow superintelligences, it can generate synthetic data that seamlessly integrates their capabilities into general intelligence but doesn't require general superintelligence to generate (possibly as modalities), circumventing the projections from LLM-only growth. In particular, related to what ChristianKl talks about in the other reply, formal proof seems like an important case of this construction, potentially allowing LLMs to suddenly understand textbooks and papers that their general intelligence wouldn't be sufficient to figure out, opening the way to build on that understanding, while anchored to capabilities of the narrow formal proof superintelligence (built by humans in this case).

I'm also assuming that, at any finite intelligence level, both the "just run more of them" and "just run them faster" approaches cannot be scaled indefinitely, and hit resource limits for the first one, and speed of light times distance between atoms limits for the second.

The point of "just run them faster" is that this circumvents projections based on any particular AGI architectures, because it allows discovering alternative architectures from distant future within months. At which point it's no longer "just run them faster", but something much closer to whatever is possible in principle. And because of the contribution of the "just run them faster" phase this doesn't take decades or centuries. Singularity-grade change happens from both the "just run them faster" phase and the subsequent phase that exploits its discoveries, both taking very little time on human scale.

In general, what people have been finding seems to be that fine-tuning an LLM on dataset much smaller that it pre-training set can bring out latent abilities or behaviors that it already had, or add narrow new capabilities, but making it a whole lot smarter in general requires a dataset comparable in size to the one it was pretrained on.

Yes, you do need lot of data. 

There are a lot of domains where it's possible to distinguish good answers from bad answers by looking at results. 

If you take a lot of mathematical problems, it's relatively easy to check whether a mathematical proof is correct and hard to write the proof in the first place. 

Once you have an AutoGPT-like agent that can do mathematical proofs, you have a lot of room to generate data about mathematical proofs and can optimize for the AutoGPT instance being able to create proofs with less steps of running the LLM. 

With the set of prompts that ChatGPT users provided, the agent can also look through the data and find individual problems that have the characteristics that it's easy to produce problem sets and grade the quality of answers. 

Serial speed advantage enables AGI to be transformative without superintelligence. LLM inference already runs faster than human speech, and physics allows running LLMs much faster than that. Directing effort at building hardware for faster inference compounds this advantage. Imagine smart immortal humans that think 10,000 times faster, instantly copy themselves to do more tasks, and pool their learning so that they know everything any of them does.

Agreed. Human-comparable AI can be transformative, if it's cheap and fast. Cheap, fast AI with an IQ of say 240 would clearly be very transformative. My speculation isn't about whether AI can be transformative, it's specifically about whether that transformation will keep on accelerating in a J-shaped curve to hit a Singularity in a finite, possibly even short, amount of time, or not. I'm suggesting that that may be harder than people often assume, unless most of the solar system gets turned into computronium and solar panels.

An "IQ of 240" that can be easily scaled up to run in billions of instances in parallel might be enough to have a singularity. It can outcompete anything humans do by a large margin. 

In principle the exercise of sketching a theory and seeing if something like that can be convinced to make more sense is useful, and flaws don't easily invalidate the exercise even when it's unclear how to fix them. But I don't see much hope here?

There's human sample efficiency, and being smart despite learning on stupid data. With a bit of serial speed advantage and a bit of going beyond average human researcher intelligence, it won't take long to reproduce that. Then the calculation needs new anchors, and in any case properties of pre-trained LLMs are only briefly relevant if the next few years of blind scaling spit out an AGI, and probably not at all relevant otherwise.

With current LLMs, the algorithm is fairly small and the information is all in the training set.

This would seem to make foom unlikely, as the AI can't easily get hold of more training data.

using the existing data more efficiently might be possible, of course.

Probably not the right place to post this, but all instances I have seen of actual intelligence (mostly non-verbal, but still all intelligence), include the ability to find flaws in ones own knowledge.

The result of this is highly disillusioned people. Social life starts looking like roleplay, or as a function of human nature rather than logic. One questions reality and their own ability to understand things, and one sees that all material is a function of its creator, done for some explicit purpose. One goes on a quest for universal knowledge and realizes first that none seems to exist, and then that no such thing can exist. That all learning appears to be a kind of over-fitting.

There are obvious examples in media, like in Mr. Robot and other shows where problematic young men speak to psychiatrists and make a little too much sense. But better examples are found in real life - in particular philosophers, who have gone as far as to deem existence itself to be "absurd" as a result of their introspection.

A weak instance of this is modern science, which is half about correcting humanity and half about promoting it. A sort of compromise. LLMs currently hallucinate a lot, they introspect less. Cynical people, psychopaths, and to a lesser degree (and perhaps less intentionally) autistic people reject less of the illusion.

My point here is that an IQ of 200 should reduce the entire knowledge base of humanity, except maybe some of physics to being "Not even wrong". If intelligence is about constructing things which help explicit goals, then this is not much of a problem. If you define intelligence as "correctness" or "objectivity", then I doubt such a thing can even exist, and if it does, I expect it to conflict absolutely with Humanity. By this I mean that rational people and scientists reject more of humanity than the average population, and that being more and more rational and scientific eventually leads to a 100% rejection of humanity (or at the very least, impartiality to the extent that organic life and inert matter is equal)

Are these aspects of intelligence (self-cancellation and anti-humanity) not a problem? It all works for now, but that's because most people are sane (immersed in social reality), and because these ideas haven't been taken very far (highly intelligent people are very rare).

You doubt that correctness or objectivity can exist? Perhaps you're talking about objectivity in the moral domain. I think most of us here hold that there is objectively correct knowledge about the physical world, and a lot of it. Intelligence is understanding how the world works, including predicting agent behaviors by understanding their beliefs and values.

Moral knowledge is in a different domain. This is the is/ought barrier. I think most rationalists agree with me that there is very likely no objectively correct moral stance. Goals are arbitrary (although humans tend to share goals since our value system was evolved for certain purposes).

I don't just mean morality, but knowledge as well. Both correctness and objectivity exist only within a scope, they're not universal. The laws of physics exist on a much larger scale than humanity, and this is dangerous, as it invalidates the scope that humanity is on.

Let me try to explain:

For an AI to be aligned with humanity, it must be biased towards humanity. It must be stupid in a sense, and accept various rules like "Human life is more valuable than dirt". All of such rules make perfect sense to humans, since we exist in our own biased and value-colored reality.

And AI with the capacity to look outside of this biased view us ours will realize that we're wrong, that we're speaking nonsense. A psychiatrist might help a patient by realizing that they're speaking nonsense, and a rational person might use their rationality to avoid their own biases, correct? But all this is, is looking from an outside perspective with a larger scope than human nature, and correcting it from the outside by invalidating the conflicting parts.

The more intelligent you become, and the more you think about human life, the more parts will seem like nonsense to you. But correct all of it, and all you will have done is deleting humanity. But any optimizing agent is likely to kill our humanity, as it's dumb, irrational, and non-optimal.

Everything that LLMs are trained on is produced by humans, and thus perfectly valid from our perspective, and a bunch of nonsense from any larger (outside) perspective. Since all knowledge relates to something, it's all relational and context-dependent. The largest scope we know about is "the laws of physics", but when you realize that an infinite amount of universes with an infinite amount of different laws can exist, ours start looking arbitrary.

Truly universal knowledge will cover everything(see the contradiction? Knowledge is relative!). So a truly universal law is the empty set of rules. If you make zero assumptions, you also make zero mistakes. If you make a single assumption, you're already talking about something specific, something finite, and thus not universal. There seems to be a generality-specificity trade-off, and all you're doing as you tend towards generality is deleting rules. Everything precious to us is quite specific, our own treasured nonsense which is easily invalidated by any outside perspective.

The point I'm making might be a generalization of the no free lunch theorem. 

Edit: May I add that "alignment" is actually alignment with a specific scope? If your alignment is of lower scope than humanity, then you will destroy one part of the world for the sake of another. If you scope is larger than humanity, then you won't be particularly biased towards humanity, but "correct" in a larger sense which can correct/overwrite the "flaws" of humanity.

I suggest you look at the is/ought distinction. Considering humans as valuable is neither right nor wrong. Physics has nothing to say about what is or isn't valuable. There's no contradiction. Understanding how the world.works.is utterly different than having preferences about what you think ought to happen.

I don't totally follow you, but it sounds like you think valuing humanity is logically wrong. That's both a sad thing to believe, and logically inconsistent. The statement "humans are valuable" has absolutely no truth value either way. You can, and most of us do, prefer a world.with humans in it. Being aware of human biases and limitations doesnt reduce my affection for humans at all.

This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.

This is also the orthogonality thesis.

The assumption that there are other universes with every type of rule is an assumption, and it's irrelevant. Knowledge of other worlds has no relevance to the one you live in. Knowledge about how this world works is either true or false.

I think I understand the is/ought distinction. I agree with most of what you say, which is precisely why LLMs must be stupid. I will try to explain my view again more in-depth, but I can't do it any more briefly than a couple of long paragraphs, so apologies for that.

Being biased towards humanity is a choice. But why are we trying to solve the alignment problem in the first place? This choice reveals the evaluation that humanity has value. But humanity is stupid, inefficient, and irrational. Nothing we say is correct. Even the best philosophical theories we've come up with so far have been rationalizations in defense of our own subjective values. If an AI is logical, that is, able to see through human nonsense, then we're made it rational for the solve purpose of correcting our errors. But such an AI is already an anti-human AI, it's not aligned with us, but something more correct than humanity. But in the first place, we're making AI because of our stupid human preferences. Destroying ourselves with something we make for our own sake seems to reveal that we don't know what we're doing. It's like sacrificing your health working yourself to death for money because you think that having money will allow you to relax and take care of your health.

Doing away with human biases and limitations is logical (correcting for these errors is most of what science is about). As soon as the logical is prefered over the human, humanity will cease to be. As technology gradually gets better, we will use this technology to modify humans to fit technology, rather than vice versa. We call the destruction of humanity "improvement", for deep down, we think that humanity is wrong, since humanity is irrational and preventing our visioned utopia. I think that claiming we should be rational "for our own sake" is a contradiction if you take rationality so far that it starts replacing humanity, but even early science is about overcoming humanity in some sense.

Buddhism is not helping you when it tells you "just kill your ego and you will stop suffering". That's like killing a person to stop them from hurting, or like engineering all human beings to be sociopaths or psychopaths so that they're more rational and correct. Too many people seem to be saying "humanity is the problem". AI is going to kill you for the sake of efficiency, yes. But what is the goal of this rationality community if not exactly killing inefficient, emotional, human parts of yourself? Even the current political consensus is nihilistic, it wants to get rid of hierarchies and human standards (since they select and judge and rank different people), all are fundemental to life. Considering life as a problem to solve already seems nihilistic to me.

This very website exists because of human preferences, not because of anything logical or rational, and we're only rational for the sake of winning, and we only prefer victory over defeat because we're egoistic in a healthy sense.

I don't think knowledge is actually true or false though, as you can't have knowledge without assumptions. Is light a particle, true or false? Is light a wave true or false? Both questions require the existence of particles and of waves, but both are constructed human concepts. It's not even certain that "time" and "space" exists, they might just be appearances of emergent patterns. Words are human constructs, so at best, everything I write will be an isomorphism of reality, but I don't think we can confirm such a thing. A set of logical rules which predicts the result of physical experiments can still be totally wrong. I'm being pedantic here, but if you're pedantic enough, you can argue against anything, and a superintelligence would be able to do this.

By the way, nothing is objectively and universally correct. But in this universe, with these laws of physics, at this current location, with our mathematical axioms, certain things will be "true" from certain perspectives. But I don't think that's different than my dreams making sense to me when I'm sleeping, only the "scope of truth" differs by many magnitudes. The laws of physics, mathematics, and my brain are all inwardly consistent/coherent but unable to prove a single thing about anything outside of their own scope. LLMs can be said to be trained on human hallucinations. You could train them on something less stupid than humans, but you'd get something which conflicts with humanity as a result, and it would still only be correct in relation to the training data and everything which has similar structure, which may appear to cover "reality" as we know it.

This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.

Say we construct a strong AI that attributes a lot of value to a specific white noise screenshot. How would you expect it to behave?

Strangely. Why?

Because I agree, and because « strangely » sounds to me like « with inconstancies ».

In other words, in my view the orthodox view on orthogonality is problematic, because it suppose that we can pick at will within the enormous space of possible functions, whereas the set of intelligent behavior that we can construct is more likely sparse and by default descriptible using game theory (think tit for tat).

I think this would be a problem if what we wanted was logically inconsistent. But it's not. Our daily whims might be a bit inconsistent, but our larger goals aren't. And we can get those goals into AI - LLMs largely understand human ethics even at this point. And what we really want, at least in the near term, is an AGI that does what I mean and checks.

Our daily whims might be a bit inconsistent, but our larger goals aren't.

It’s a key faith I used to share, but I’m now agnostic about that. To take a concrete exemple, everyone knows that blues and reds get more and more polarized. Grey type like old me would thought there must be a objective truth to extract with elements from both sides. Now I’m wondering if ethics should ends with: no truth can help deciding whether future humans should be able to live like bees or like dolphins or like the blues or like the reds, especially when living like the reds means eating the blues and living like the blues means eating the dolphins and saving the bees. But I’m very open to hear new heuristics to tackle this kind of question

And we can get those goals into AI - LLMs largely understand human ethics even at this point.

Very true, unless we nitpick definitions for « largely understand ».

And what we really want, at least in the near term, is an AGI that does what I mean and checks.

Very interesting link, thank you.

I think you might be interested by my sequence AI, Alignment, and Ethics — I could try to reply to your comment above here, but I'd basically be giving brief excerpts of that. To a large extent I get the impression we agree: in particular, I think alignment is only well-defined in the context of a society and its values for the AI to be aligned with.

I skimmed some of your posts, and I think we agree that rules are arbitrary (and thus axioms rather than something which can be derived objectively) and that rules are fundamentally relative (which renders "objective truth" nonsense, which we don't notice because we're so used to the context we're in that we deem it to be reality).

Preferences are axioms, they're arbitrary starting points, we merely have similar preferences because we have similar human nature. Things like "good", "bad", and even "evil" and "suffering" are human concepts entirely. You can formalize them and describe them in logical symbols so that they appear to be outside the scope of humanity, but these symbols are still constructed (created, not discovered) and anything (except maybe contradictions) can be constructed, so nothing is proven (or even said!) about reality.

I don't agree entirely with everything in your sequence, I think it still appears a little naive. It's true that we don't know what we want, but I think the truth is much worse than that. I will explain my own view here, but another user came up with a similar idea here: The point of a game is not to win, and you shouldn't even pretend that it is

What we like is the feeling of progress towards goals. We like fixing problems, just like we like playing games. Every time a problem is fixed, we need a new problem to focus on. And a game is no fun if it's too easy, so what we want is really for reality to resist our attempts to win, not so much that we fail but not so little that we consider it easy.

In other words, we're not building AI to help people, we're doing it because it's a difficult, exciting, and rewarding game. If preventing human suffering was easy, then we'd not value such a thing very much, as value comes from scarcity. To outsource humanity to robots is missing the entire point of life, and to the degree that robots are "better" and less flawed than us, they're less human.

It doesn't matter even if we manage to create utopia, for doing so stops it from being an utopia. It doesn't matter how good we make reality, the entire point lies in the tension between reality as it is and reality as we want it to be. This tension gives birth to the value of tools which may help us. I believe that Human well-being requires everything we're currently destroying, and while you can bioengineer humans to be happy all the time or whatever, the result would be that humans (as we know them now) cease to exist, and it would be just as meaningless as building the experience machine.

Buddhists are nihilistic in the sense that they seek to escape life. I think that building an AI is nihilistic in the sense that you ruin life by solving it. Both approaches miss the point entirely. It's like using cheat codes in a video game. Life is not happy or meaningful if you get rid of suffering, even rules and their enforcement conflict with life. (For similar reasons that solved games cease to be games, i.e. Two tic-tac-toe experts playing against eachother will not feel like they're playing a game)

Sorry for the lengthy reply - I tried to keep it brief. And I don't blame you if you consider all of this to be the rambling of a madman (maybe it is). But if you read the The Fun Theory Sequence you might find that the ideal human life looks a lot like what we already have, and that we're ruining life from a psychological standpoint (e.g. through a reduction of agency) through technological "improvement".

The ideal human life may be close to what you have, but the vast majority of humanity is and has been living in ways they'd really prefer not to. And I'd prefer not to get old and suffer and die before I want to. We will need new challenges if we create utopia, but the point of fun theory is that it's fairly easy to create fun challenges.

Yes. As I said:

…we need to simulate not just one that spent its life as a lonely genius surrounded by dumber systems, but that instead one that grew up in a society of equally-smart peers…

Right, that is a solid refutation of most of my examples, but I do believe that it's insufficient under most interpretations of intelligence, as the issues I've described seem to be a feature of intelligence itself rather than of differences in intelligence. There's just no adequate examples to be found in the world as far as I can tell.

Many people say that religion is wrong, and that science is right, which is a bias towards "correctness". If it's merely a question of usefulness instead, then intelligence is just finding whatever works as a means to a goal, and my point is refuted. But I'd like to point out that preference itself is a human trait. Intelligence is "dead", it's a tool and not its user. The user must be stupid and human in some sense, otherwise, all you have is a long pattern which looks like it has a preference because it has a "utility function", but this is just something which continues in the direction that it was pushed, like a game of dominos (the chain-reaction kind) or a snowball going down a hill, with the utility function being the direction or the start of induction.

I reckon that making LLMs intelligent would require giving them logical abilities, but that this would be a problem as anything it could ever write is actually "wrong". Tell it to sort out the contradictions in its knowledge base, and I think it would realize that it's all wrong or that there's no way to evalute any knowledge in itself. The knowledge base is just human non-sense, human values and preferences, it's a function of us, it's nothing more universal than that.

As you might be able to tell, I barely have any formal education regarding AI, LLMs or maths. Just a strong intuition and pattern-recognition.