Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Here are some views, oftentimes held in a cluster:

  • You can't make strong predictions about what superintelligent AGIs will be like. We've never seen anything like this before. We can't know that they'll FOOM, that they'll have alien values, that they'll kill everyone. You can speculate, but making strong predictions about them? That can't be invalid.
  • You can't figure out how to align an AGI without having an AGI on-hand. Iterative design is the only approach to design that works in practice. Aligning AGI right on the first try isn't simply hard, it's impossible, so racing to build an AGI to experiment with is the correct approach for aligning it.
  • An AGI cannot invent nanotechnology/brain-hacking/robotics/[insert speculative technology] just from the data already available to humanity, then use its newfound understanding to build nanofactories/take over the world/whatever on the first try. It'll have to engage in extensive, iterative experimentation first, and there'll be many opportunities to notice what it's doing and stop it.
  • More broadly, you can't genuinely generalize out of distribution. The sharp left turn is a fantasy — you can't improve without the policy gradient, and unless there's someone holding your hand and teaching you, you can only figure it out by trial-and-error. Thus, there wouldn't be genuine sharp AGI discontinuities.
  • There's something special about training by SGD, and the "inscrutable" algorithms produced this way. They're a specific kind of "connectivist" algorithms made up of an inchoate mess of specialized heuristics. This is why interpretability is difficult — it involves translating these special algorithms into a more high-level form — and indeed, it's why AIs may be inherently uninterpretable!

You can probably see the common theme here. It holds that learning by practical experience (henceforth LPE) is the only process by which a certain kind of cognitive algorithms can be generated. LPE is the only way to become proficient in some domains, and the current AI paradigm works because it implements this kind of learning, and it only works inasmuch as it implements this kind of learning.[1]

All in all, it's not totally impossible. I myself had suggested that some capabilities may only be implementable via one algorithm and one algorithm only.

But I think this is false, in this case. And perhaps, when put this way, it already looks false to you as well.

If not, let's dig into the why.[2]

A Toy Formal Model

What is a "heuristic", fundamentally speaking? It's a recorded statistical correlation — the knowledge that if you're operating in some environment  with the intent to achieve some goal , taking the action  is likely to lead to achieving that goal.

As a toy formality, we can say that it's a structure of the following form:

The question is: what information is necessary for computing ? Clearly you need to know  and  — the structure of the environment and what you're trying to do there. But is there anything else?

The LPE view says yes: you also need a set of "training scenarios" , where the results of taking various actions  on the environment are shown. Not because you need to learn the environment's structure — we're already assuming it's known. No, you need them because... because...

Perhaps I'm failing the ITT here, but I think the argument just breaks down at this step, in a way that can't be patched. It seems clear, to me, that  itself is entirely sufficient to compute , essentially by definition. If heuristics are statistical correlations, it should be sufficient to know the statistical model of the environment to generate them!

Toy-formally, . Once the environment's structure is known, you gain no additional information from playing around with it.

If your understanding is incomplete, sure, you may gain an additional appreciation of the environment's dynamics by running mental simulations. But it's still about figuring out the environment's structure, not because this training set is absolutely necessary.

Concretely:

  • Imagine that your knowledge of tic-tac-toe was erased, and now you're introduced to the game's rules anew. You'll likely instantly infer that taking the center square is a pretty good starting move, because it maximizes optionality[3]. To make that inference, you won't need to run mental games against imaginary opponents, in which you'll start out by making random moves. It'll be clear to you at a glance.
  • Imagine that someone told you a number of simple but novel mathematical theorems, in a domain you're familiar with. Would you try to learn how to use them by generating random strings of mathematical symbols and seeing whether a given random string constitutes a valid application of one of the theorems? I expect not: rather, you'll be able to instantly "slot" them into the domain's structure, track their implications, draw associations. You may then still "play around" with them, but the bulk of the work will have already been done.

Figuring out good environmental heuristics does not strictly require a training set, only the knowledge of the environment's structure.

Why Are Humans Tempted to Think Otherwise?

Two reasons:

The first is because in many practical cases, LPE is the most cost-efficient way to learn an environment's structure. Even in my very simple tic-tac-toe example, momentary abstract reasoning only yielded us a "pretty good" move. In practical cases, the situation is even worse: we're not given the game's rules on a silver platter, we can only back-infer them from studying how things tend to play out.

The second is because our System 1 (which implements quick heuristics) is faster and allocated more compute than System 2 (which does abstract reasoning), owning to the fact that general intelligence is a novel evolutionary adaptation. Thus, "solving" environments abstractly is more time-consuming than just running out and refining our LPE-heuristics against them, and the resultant algorithms work slower. (And that often makes them useless — consider trying to use System 2 to coordinate muscle movements in a brawl.)

This creates the illusion that LPE is the only thing that works. It is, however, an illusion:

  • As I'd mentioned, we often apply non-LPE-based environment-solving to constrain the space of heuristics over which we search, as in the tic-tac-toe and math examples. Indeed, it seems that scientific research would be impossible without that.
  • LPE-based learning does not work in domains where failure is lethal, by definition. However, we have some success navigating them anyway.

LPE is a specific method of deriving a certain type of statistical correlations from the environment, and it only works if it's given a set of training examples as an input. But it's not the only method — merely one that's most applicable in the regime in which we've been operating up to this point.

What about superintelligent AGIs, then? By the definition of being "superintelligent", they'd have more resources allocated to their general-intelligence module/System-2 equivalent. Thus, they'd be natively better at solving environments abstractly, "without experience".

Takeaways

The LPE views holds that merely knowing the structure of some domain is not enough to learn how to navigate it. You also need to do some trial-and-error in it, to arrive at the necessary heuristics.[4]

I claim that this is false, that there are algorithms that allow learning without experience — and indeed, that one of such algorithms is the cornerstone of "general intelligence".

If true, this should negate the initial statements:

It is, in fact, possible to make strong predictions about OOD events like AGI Ruin — if you've studied the problem exhaustively enough to infer its structure despite lacking the hands-on experience. By the same token, it should be possible to solve the problem in advance, without creating it first.

And an AGI, by dint of being superintelligent, would be very good at this sort of thing — at generalizing to domains it hasn't been trained on, like social manipulation, or even to entirely novel ones, like nanotechnology, then successfully navigating them at the first try.


Much like the existence vs. nonexistence of general intelligence, the degree of importance ascribed to LPE seems to be one of the main causes of divergence in people's P(doom) estimates.

  1. ^

    Put in other words, it says that babble-and-prune is the only general-purpose method of planning possible. Stochastically generate candidate solutions, prune them, repeat until arriving at a good-enough solution.

  2. ^

    Also, here's a John Wentworth post that addresses the babble-and-prune framing in particular.

  3. ^

    And it's indeed a pretty good move, much better than random, if not the optimal one.

  4. ^

    Indeed, some people ascribe some truly mythical importance to that process.

New to LessWrong?

New Comment
14 comments, sorted by Click to highlight new comments since: Today at 4:48 AM
  • As I'd mentioned, we often apply non-LPE-based environment-solving to constrain the space of heuristics over which we search, as in the tic-tac-toe and math examples. Indeed, it seems that scientific research would be impossible without that.
  • LPE-based learning does not work in domains where failure is lethal, by definition. However, we have some success navigating them anyway.

I think this is a strawman of LPE.  People who point out you need real world experience don't say that you need 0 theory, but that you have to have some contact with reality, even in deadly domains.

Outside of a handful of domains like computer science and pure mathematics, contact with reality is necessary because the laws of physics dictate that we can only know things up to a limited precision.  Moreover, it is the experience of experts in a wide variety of domains that "try the thing out and see what happens" is a ridiculously effective heuristic.

Even in mathematics, the one domain where LPE should in principal be unnecessary, trying things out is one of the main ways that mathematicians gain intuitions for what new results are/aren't likely to hold.

I also note that your post doesn't give a single example of a major engineering/technology breakthrough that was done without LPE (in a domain that interacts with physical reality).

It is, in fact, possible to make strong predictions about OOD events like AGI Ruin — if you've studied the problem exhaustively enough to infer its structure despite lacking the hands-on experience.

This is literally the one specific thing LPE advocates think you need to learn from experience about, and you're just asserting it as true?

To summarize:

Domains where "pure thought" is enough:

  • toy problems
  • limited/no interaction with the real world
  • solution/class of solutions known in advance

Domains where LPE is necessary:

  • too complicated/messy to simulate
  • depends on precise physical details of the problem
  • even a poor approximation to solution not knowable in advance

Yeah, it's clear I wasn't precise enough in outlining what exactly I meant in the post / describing the edge cases. In particular, I should've addressed the ways by which you can gather information about an environment structure in realistic domains where that structure is occluded.

To roughly address that specific point: You don't actually need to build full-scale rocket prototypes to get enough information about the rocket-design domain to build a rocket right on the first try. You can try low-scale experiments, and experiments that don't involve "rockets" at all, to figure out the physical laws governing everything rocket-related. You don't need to build anything even similar to rockets, except in a very abstract sense, to gather all that data.

It's not done this way in practice because it's severely cost-ineffective in most cases, but it's doable. Just an extrapolation of the same principle by which it can occur to us to build a "rocket prototype" at all, instead of all inventions happening because people perturb matter completely at random until hitting on a design that works.

the laws of physics dictate that we can only know things up to a limited precision

In these cases technology is straight-up impossible. If the environment structure is such that only things up to a limited precision work, then there's no way to build a technology that goes beyond that level of precision, by trial-and-error or otherwise.

This specific limitation is not about whether you need LPE or not; it's about what kinds of design are possible at all.

I think this is a strawman of LPE

I don't think it is, I don't think it's even a weak man. I concur that there's a "sliding scale" of "LPE is crucial", and I should've addressed that in the introductory part.

I don't think my arguments address only the weak version of the argument, however. My impression is that a lot of people have "practical experience" and "the need to know the environment structure" intermixed in their minds, which confuses their intuitions. The extent of the intermixing is what determines the "severity" of their position. I'd attempted to address what seems to me like the root cause: that practical experience is only useful inasmuch as it uncovers the environment structure.

Here are some views, often held in a cluster:

I'm not sure exactly which clusters you're referring to, but I'll just assume that you're pointing to something like "people who aren't very into the sharp left turn and think that iterative, carefully bootstrapped alignment is a plausible strategy." If this isn't what you were trying to highlight, I apologize. The rest of this comment might not be very relevant in that case.

To me, the views you listed here feel like a straw man or weak man of this perspective.

Furthermore, I think the actual crux is more often "prior to having to align systems that are collectively much more powerful than humans, we'll only have to align systems that are somewhat more powerful than humans." This is essentially the crux you highlight in A Case for the Least Forgiving Take On Alignment. I believe disagreements about hands-on experience are quite downstream of this crux: I don't think people with reasonable views (not weak men) believe that "without prior access to powerful AIs, humans will need to align AIs that are vastly, vastly superhuman, but this will be fine because these AIs will need lots of slow, hands-on experience in the world to do powerful stuff (like nanotech)."

So, discussing how well superintelligent AIs can operate from first principles seems mostly irrelevant to this discussion (if by superintelligent AI, you mean something much, much smarter than the human range).

I would be more sympathetic if you made a move like, "I'll accept continuity through the human range of intelligence, and that we'll only have to align systems as collectively powerful as humans, but I still think that hands-on experience is only..." In particular, I think there is a real disagreement about the relative value of experimenting on future dangerous systems instead of working on theory or trying to carefully construct analogous situations today by thinking in detail about alignment difficulties in the future.

Your post seems to disagree with several empirically based lesswrong posts.  Since your model of the capabilities of simulations is wrong, why should anyone believe ASIs will be exempt?  Analysis follows:

https://blog.aiimpacts.org/p/you-cant-predict-a-game-of-pinball 

Mathematically shows that it's impossible to model a game of pinball well enough to predict it at all.  Note that if this is an unknown pinball machine - it's not a perfect ideal one, but there are irregularities in the table, wear on the bumpers, and so on - then even an ASI with a simulator cannot actually solve this game of pinball.  It will need to play it some.

If you think about the pinball problem in more detail - "give it 5 minutes" - you will realize that brute force playing thousands of games isn't needed.  To know about the irregularities of the tabletop, you need the ball to travel over all of the tabletop, from probably several different directions and speeds, and observe it's motion with a camera.  To know about hidden flaws in the bumpers you likely need impacts from different angles and speeds.

There are a variety of microscope scanning techniques that work like the above.  This is also similar to how PBR material scanning is done (example link https://www.a23d.co/blog/pbr-texture-scanning/)

Conclusion: you won't need the thousands of games a human player will need to get good at a particular pinball table, but you will need to play enough games on a given table or collect data from it using sensors not available to humans (and not published online in any database, you will have to get humans to setup the sensors over the table or send robots equipped with the sensors).  Without this information, if the task is "achieve expert level performance on this pinball table, zero shot, with nothing but a photo of the table" , the task is impossible.  No ASI, even an "infinite superintelligence", can solve. 

This extends in a general sense, https://www.lesswrong.com/posts/qpgkttrxkvGrH9BRr/superintelligence-is-not-omniscience  and https://www.lesswrong.com/posts/etYGFJtawKQHcphLi/bandgaps-brains-and-bioweapons-the-limitations-of 

What these lesswrong posts are showing is that in known domains, simulation inaccurate enough that it's infeasible with any computer built with current technology, especially for nanoscale domains.

In essence, because electron interactions scale exponentially, it is less expensive to build your apparatus and test it using the universe's sim engine that it is to attempt to simulate any large system at the nanoscale.

This is a general disproof of :

An AGI cannot invent nanotechnology/brain-hacking/robotics/[insert speculative technology] just from the data already available to humanity, then use its newfound understanding to build nanofactories/take over the world/whatever on the first try.

Unless you can show errors in the above posts, this is impossible.  Well, sufficiently impossible that the odds are less than the 1 in 3 million odds for the Manhattan project.

Note I have thought a bit about how an ASI could solve this and the answer is similar to the above case for pinball.  Rather than build trillions of possible nanostructures and measure their properties (similar to the idea of having to play 1-10k or more games on a single pinball table to know it), you could build a library of nanostructures, measure them, and predict the properties of many more by affine and other transforms.  You could also build quantum computers that essentially predict electron interactions because the quantum computer itself has set up an analogous electron cloud and you then sample it's properties.

So you can reduce the number of experiments needed from what humans would require, especially as there are less mistakes made and less duplicate research.  It is similar to how a PBR materials scanner takes the minimum number of photos to fully capture it's properties, or how a lidar scanner only obtains enough points to fully scan a surface plus overcome noise.  

Unless you can show an error though, reducing the number of experiments to zero is impossible and we can bet the planet on that.

 

I think I've sufficiently disproven your post entirely and look forward to a response.


Afterword:  Note that this entire argument is about the path to the endgame.  Obviously, once an ASI has very large quantum computers available, it likely can predict the exact behavior of nanoscale structures (including proteins) so long as the problem fits within the qbit limit of the particular machine.  Once it has nanoforges, it can order them to self replicate until there are many trillions of them available, then use them to manufacture the setup for whatever experiment the machine wants to perform.  Once it has access to neural lace data (from a device similar to a neuralink), it probably is possible to find out if there are argument strategies that reliably convince humans to act against their own interests.  And so on.

We're talking about the difference between "ASI can compress 500 years of R&D into 5 weeks" and "ASI can compress 500 years of R&D into 50 years".  Final state's the same.

Conclusion: you won't need the thousands of games a human player will need to get good at a particular pinball table, but you will need to play enough games on a given table or collect data from it using sensors not available to humans (and not published online in any database, you will have to get humans to setup the sensors over the table or send robots equipped with the sensors).

There's the crucial difference from the nanotech case: there is plenty of data available online about that specific pinball table. The laws of physics are much simpler than the detailed structure of a given table, and everything leaks data about them, everything constraints their possible shape. And we haven't squeezed every bit of evidence about them from the data already available to us.

As an illustrative example, consider AlphaFold. It was able to largely solve protein folding from the datasets already available to us — it was able to squeeze more data out of them than we were able to. On the flip side, this implies that those datasets already constrained the protein-folding algorithm uniquely enough that it was inferrable — we just didn't manage to do it on our own.

It is, of course, a question of informal judgements, but I don't think there's a strong case for assuming that this doesn't extrapolate. That a very similar problem of nanotechnology design isn't, likewise, already uniquely or near-uniquely constrained by the available data.

... That wasn't really the core of my argument, though. The core is that practical experience is only useful inasmuch as it informs you about the environment structure, and if you can gather the information about the environment structure in other ways (sensors analysing the pinball table), no practical experience is needed. Which you seem to agree with.

The laws of physics are much simpler than the detailed structure of a given table

It is not practical to simulate everything down to the level of the laws of physics. In practice, you usually have to come up with much coarser models that can actually be computed within a reasonable time and most of the experimentation is needed to construct those models in the first place so that they align sufficiently with reality, and even then only in certain circumstances.

You could maybe use quantum mechanics to calculate the planetary orbits out for thousands of years, but it’s much simpler to use Newtonian mechanics for that, and that’s because the planetary motions happen to be easily modelable in that way, which however isn’t true for building rocket engines, or predicting the stock market or global politics.

I largely agree with the general point that I think this post is making, which I would summarize in my own words as: the importance of iteration-and-feedback cycles, experimentation, experience, trial-and-error, etc. (LPE, in your terms) is sometimes overrated in importance and necessity. This over-emphasis is particularly common among those who have an optimistic view on solving the alignment problem through iterative experimentation.

I think degree to which LPE is actually necessary for solving problems in any given domain, as well as the minimum amount of time, resources, and general tractability of obtaining such LPE, is an empirical question which people frequently investigate for particular important domains.

Differing intuitions about how important LPE is in general, and how tractable it is to obtain, seems like an important place for identifying cruxes in world views. I wrote a bit more about this in a recent post, and commented on one of the empirical investigations to which my post is partially a response to. As I said in the comment, I find such investigations interesting and valuable as a matter of furthering scientific understanding about the limits of the possible, but pretty futile as attempts to bound the capabilities of a superintelligence. I think your post is a good articulation of one reason why I find these arguments so uncompelling.

I think degree to which LPE is actually necessary for solving problems in any given domain, as well as the minimum amount of time, resources, and general tractability of obtaining such LPE, is an empirical question which people frequently investigate for particular important domains.

Isn't it sort of "god in the gaps" to presume that the ASI , simply by having lots of compute , no longer actually has to validate anything and apply the scientific method in the reality its attempting to exert control over?

We have machine learning algo's in biomedicine screen for molecules of interest. This lowers the fail rate of new pharmaceuticals , most of them still fail. Most of them during rat and mouse studies.

So all available human data on chemistry , pharmacodynamics , pharmacokinetics etc + the best simulation models available (alphago etc) still wont result in it being able to "hit" on a new drug for say "making humans obedient zombies" on the first try.

Even if we hand wave and say it discovers a bunch of insights in our data we dont have access to , their are simply too many variables and sheer unknowns for this to work without it being able to simulate human bodies down to the molecular level.

So it can discover a nerve gas thats deadly enough no problem , but we already have deadly nerve gas.

It just again , seems very hand wavy to have all these leaps in reasoning "because ASI" when good hypothesis prove false all the time upon application of avtual experimentation.

I liked that you found a common thread in several different arguments.

However, I don't think that the views are all believed or all disagreed with in practice. I do think Yann LeCun would agree with all the points and Eliezer Yudkowsky would disagree with all the points (except perhaps the last point).

For example, I agree with 1 and 5, agree with the first half but not the second half of 2 disagree with 3 and have mixed feelings about 4.

Why? At a high level, I think the extent to which individual researchers, large organizations and LLMs/AIs need empirical feedback to improve are all quite different.

But every environment which isn't perfectly known and every "goal" which isn't complete concrete , opens up error. Which then stacka upon error as any "plan" to interact with / modify reality adds another step.

If the ASI can infer some materials science breakthroughs with given human knowledge and existing experimental data to some great degree of certainty , ok I buy it.

What I don't buy is that it can simulate enough actions and reactions with enough certainty to nail a large domain of things on the first try.

But I suppose thats still sort of moot from an existential risk perspective because FOOM and sharp turns aren't really a requirement.

But "inferring" the best move in tic tac toe and say "developing a unified theory of reality without access to super colliders" is a stretch that doesn't hold up to reason.

"Hands on experience ia not magic" , neither is "superintelligence" , the LLM's already hallucinate and any concievable future iteration will still be bound by physics , a few wrong assumptions compounded together can whiff a lot of hyperintelligent schemes.

But I suppose thats still sort of moot from an existential risk perspective because FOOM and sharp turns aren't really a requirement.

It's not a moot point, because a lot of the difficulty of the problem as stated here is the "iterative approaches cannot work" bit.

Well to flesh that out , we could have an ASI that seems valye aligned and controllable...until it isn't.

Or the sociap effects (deep fakes for example) cpuld ruin the world or land us in a dystopia well before actual AGI.

But that might be a bit orthagonal and in the weeds (specific examples of how we end up with x-risk or s-risk end scenarios without the attributing magic powers to the ASI)

Well to flesh that out , we could have an ASI that seems valye aligned and controllable...until it isn't.

I think that scenario falls under the "worlds where iterative approaches fail" bucket, at least if prior to that we had a bunch of examples of AGIs that seemed and were value aligned and controllable, and the misalignment only showed up in the superhuman domain.

There is a different failure mode, which is "we see a bunch of cases of deceptive alignment in sub-human-capability AIs causing minor to moderate disasters, and we keep scaling up despite those disasters". But that's not so much "iterative approaches cannot work" as "iterative approaches do not work if you don't learn from your mistakes".