Muireall — LessWrong

What Indicators Should We Watch to Disambiguate AGI Timelines?

I have a moment so I'll summarize some of my thinking here for the sake of discussion. It's a bit more fleshed out at the link. I don't say much about AI capabilities directly since that's better-covered by others.

In the first broad scenario, AI contributes to normal economic growth and social change. Key drivers limit the size and term of bets industry players are willing to make: [1A] the frontier is deeply specialized into a particular paradigm, [1B] AI research and production depend on lumpy capital projects, [1C] firms have difficulty capturing profits from training and running large models, and [1D] returns from scaling new methods are uncertain.

In the second, AI drives economic growth, but bottlenecks in the rest of the economy limit its transformative potential. Key drivers relate to how much AI can accelerate the non-AI inputs to AI research and production: [2A] limited generality of capabilities, [2B] limited headroom in capabilities, [2C] serial physical bottlenecks, and [2D] difficulty substituting theory for experiment.

Indicators (hypothetical observations that would lead us to expect these drivers to have more influence) include:

Specialized methods, hardware, and infrastructure dominate those for general-purpose computing in AI. (+1A)
Training and deployment use different specialized infrastructure. (+1A, +1B)
Generic progress in the semiconductor industry only marginally advances AI hardware. (+1A)
Conversely, advances in AI hardware are difficult to repurpose for the rest of the semiconductor industry. (+1A)
Specialized hardware production is always scaling to meet demand. (+1A)
Research progress is driven chiefly by what we learn from the largest and most expensive projects. (+1B, +1D, +2D)
Open-source models and second-tier competitors lag the state of the art by around one large training run. (+1C, +1D)
Small models can be cheaply trained once expensive models are proven, achieving results nearly as good at much lower cost. (+1C, +1D)
Progress in capabilities at the frontier originates from small-scale experiments or theoretical developments several years prior, brought to scale at some expense and risk of failure, as is the status quo in hardware. (+1D, +2D)
Progress in AI is very uneven or even nonmonotonic across domains—each faces different bottlenecks that are addressed individually. (+2A)
Apparent technical wins are left on the table, because they only affect a fraction of performance and impose adoption costs on the entire system. (+1A, +2B, +2C)
The semiconductor industry continues to fragment. (+2B)
More broadly, semiconductor industry trends, particularly in cost and time (exponential and with diminishing returns), continue. (+2A, +2B, +2C)
Semiconductor industry roadmaps are stable and continue to extend 10–15 years out. (+2C, +2D)

Negative indicators (indicating that these drivers have less influence) include

The same hardware pushes the performance frontier not only for AI training and inference but also for high-performance computing more traditionally. (–1A)
Emerging hardware technologies like exotic materials for neuromorphic computing successfully attach themselves as adjuncts to general-purpose silicon processes, giving themselves a self-sustaining route to scale. (–1A, –2B)
Training runs use as much compute as they can afford; there's always a marginal stock of hardware that can be repurposed for AI as soon as AI applications become slightly more economical. (–1A, –1B)
AI industry players engage in pre-competitive collaboration, for example setting interoperability standards or jointly funding the training of a shared foundation model. (–1B)
Alternatively, early industry leaders establish monopolistic advantages over the rest of the field. (–1B, –1C)
AI training becomes more continuous, rather than something one "pulls the trigger" on. Models see large benefits from "online" training as they're being used, as compared with progress from model to model. (–1B)
Old models have staying power, perhaps being cheaper to run or tailored to niche applications. (–1C)
Advances in AI at scale originate from experiments or theory with relatively little trouble applying them at scale within a few years, as is the status quo in software. (–1D, –2D)
The leading edge features different AI paradigms or significant churn between methods. (–1A, –1D)
The same general AI is broadly deployed in different domains, industry coordination is strong (through monopoly or standardization), and upgrades hit many domains together. (–2A)
Evidence builds that a beyond-silicon computing paradigm could deliver performance beyond the roadmap for the next 15 years of silicon. (–2B)
New semiconductor consortia arise, for example producing consensus chiplet or heterogeneous integration standards, making it easier for a fragmented industry to continue to build on one another's work. (–1A, –2C)
Spatial/robotics problems in particular—proprioception, navigation, manipulation—are solved. (–2C)
Fusion power becomes practical. (–2C)
AI is applied to experimental design and yields markedly better results than modern methods. (–2B, –2D)
AI research progress is driven by theory. (–1D, –2D)
Breakthroughs make microscopic physical simulation orders of magnitude easier. Molecular dynamics, density functional theory, quantum simulation, and other foundational methods are accelerated by AI while also greatly improving accuracy. (–2B, –2C, –2D)

What Indicators Should We Watch to Disambiguate AGI Timelines?

Muireall1y81

I went through a similar exercise trying to develop key drivers and indicators for a couple slow scenarios back in May 2023, focusing on lessons from the semiconductor industry. I think my "slow" is even slower than yours, so it may not be super useful to you, but maybe you'll find it interesting.

My Interview With Cade Metz on His Reporting About Slate Star Codex

Muireall2y20

Yeah, plus all the other stuff Alexander and Metz wrote about it, I guess.

My Interview With Cade Metz on His Reporting About Slate Star Codex

Muireall2y30

It's just a figure of speech for the sorts of thing Alexander describes in Kolmogorov Complicity. More or less the same idea as "Safe Space" in the NYT piece's title—a venue or network where people can have the conversations they want about those ideas without getting yelled at or worse.

Mathematician Andrey Kolmogorov lived in the Soviet Union at a time when true freedom of thought was impossible. He reacted by saying whatever the Soviets wanted him to say about politics, while honorably pursuing truth in everything else. As a result, he not only made great discoveries, but gained enough status to protect other scientists, and to make occasional very careful forays into defending people who needed defending. He used his power to build an academic bubble where science could be done right and where minorities persecuted by the communist authorities (like Jews) could do their work in peace...

But politically-savvy Kolmogorov types can’t just build a bubble. They have to build a whisper network...

They have to serve as psychological support. People who disagree with an orthodoxy can start hating themselves – the classic example is the atheist raised religious who worries they’re an evil person or bound for Hell – and the faster they can be connected with other people, the more likely they are to get through.
They have to help people get through their edgelord phase as quickly as possible. “No, you’re not allowed to say this. Yes, it could be true. No, you’re not allowed to say this one either. Yes, that one also could be true as best we can tell. This thing here you actually are allowed to say still, and it’s pretty useful, so do try to push back on that and maybe we can defend some of the space we’ve still got left.”
They have to find at-risk thinkers who had started to identify holes in the orthodoxy, communicate that they might be right but that it could be dangerous to go public, fill in whatever gaps are necessary to make their worldview consistent again, prevent overcorrection, and communicate some intuitions about exactly which areas to avoid. For this purpose, they might occasionally let themselves be seen associating with slightly heretical positions, so that they stand out to proto-heretics as a good source of information. They might very occasionally make calculated strikes against orthodox overreach in order to relieve some of their own burdens. The rest of the time, they would just stay quiet and do good work in their own fields.

My Interview With Cade Metz on His Reporting About Slate Star Codex

Muireall2y97

That section is framed with

Part of the appeal of Slate Star Codex, faithful readers said, was Mr. Siskind’s willingness to step outside acceptable topics. But he wrote in a wordy, often roundabout way that left many wondering what he really believed.

More broadly, part of the piece's thesis is that the SSC community is the epicenter of a creative and influential intellectual movement, some of whose strengths come from a high tolerance for entertaining weird or disreputable ideas.

Metz is trying to convey how Alexander makes space for these ideas without staking his own credibility on them. This is, for example, what Kolmogorov Complicity is about; it's also what Alexander says he's doing with the neoreactionaries in his leaked email. It seems clear that Metz did enough reporting to understand this.

The juxtaposition of "Scott aligns himself with Murray [on something]" and "Murray has deplorable beliefs" specifically serves that thesis. It also pattern-matches to a very clumsy smear, which I get the impression is triggering readers before they manage to appreciate how it relates to the thesis. That's unfortunate, because the “vague insinuation” is much less interesting and less defensible than the inference that Alexander is being strategic in bringing up Murray on a subject where it seems safe to agree with him.

My Interview With Cade Metz on His Reporting About Slate Star Codex

Muireall2y*1-8

In 2021, I was following these events and already less fond of Scott Alexander than most people here, and I still came away with the impression that Metz's main modes were bumbling and pattern-matching. At least that's the impression I've been carrying around until today. I find his answers here clear, thoughtful, and occasionally cutting, although I get the impression he leaves more forceful versions on the table for the sake of geniality. I'm wondering whether I absorbed some of the community's preconceptions or instinctive judgments about him or journalists in general.

I do get the stubbornness, but I read that mostly as his having been basically proven right (and having put in the work at the time to be so confident).

Is a random box of gas predictable after 20 seconds?

Answer by MuireallJan 25, 2024*245

In the 2D case, there's no escaping exponential decay of the autocorrelation function for any observable satisfying certain regularity properties. (I'm not sure if this is known to be true in higher dimensions. If it's not, then there could conceivably be traps with sub-exponential escape times or even attractors, but I'd be surprised if that's relevant here—I think it's just hard to prove.) Sticking to 2D, the question is just how the time constant in that exponent for the observable in question compares to 20 seconds.

The presence of persistent collective behavior is a decent intuition but I'm not sure it saves you. I'd start by noting that for any analysis of large-scale structure—like a spectral analysis where you're looking at superpositions of harmonic sound waves—the perturbation to a single particle's initial position is a perturbation to the initial condition for every component in the spectral basis, all of which perturbations will then see exponential growth.

In this case you can effectively decompose the system into "Lyapunov modes" each with their own exponent for the growth rate of perturbations, and, in fact, because the system is close to linear in the harmonic basis, the modes with the smallest exponents will look like the low-wave-vector harmonic modes. One of these, conveniently, looks like a "left-right density" mode. So the lifetime (or Q factor) of that first harmonic is somewhat relevant, but the actual left-right density difference still involves the sum of many harmonics (for example, with more nodes in the up-down dimension) that have larger exponents. These individually contribute less (given equipartition of initial energy, these modes spend relatively more of their energy in up-down motion and so affect left-right density less), but collectively it should be enough to scramble the left-right density observable in 20 seconds even with a long-lived first harmonic.

On the other hand, 1 mol in 1 m^3 is not very dense, which should tend to make modes longer-lived in general. So I'm not totally confident on this one without doing any calculations. Edit: Wait, no, I think it's the other way around. Speed of sound and viscosity are roughly constant with gas density and attenuation otherwise scales inversely with density. But I think it's still plausible that you have a 300 Hz mode with Q in the thousands.

Muireall's Shortform

Muireall2y*10

Related would be some refactoring of Deception Chess.

When I think about what I'd expect to see in experiments like that, I get curious about a sort of "baseline" set of experiments without deception or even verbal explanations. When can I distinguish the better of two chess engines more efficiently than playing them against each other and looking at the win/loss record? How much does it help to see the engines' analyses over just observing moves?

How is this related? Well, how deep is Chess? Ratings range between, say, 800 and 3500, with 300 points being enough to distinguish players (human or computer) reasonably well. So we might say there are about 10 "levels" in practice, or that it has a rating depth of 10.

If Chess were Best-Of-30 ChessMove as described above, then ChessMove would have a rating depth a bit below 2 (just dividing by ). In other words, we'd expect it to be very hard to ever distinguish any pair of engines off a single recommended move—and difficult with any number of isolated observations, given our own error-prone human evaluation. If it's closer to Best-Of-30 Don'tBlunder, it's a little more complicated—usually you can't tell the difference because there basically is none, but on rare pivotal moves it will be nearly as easy to tell as when looking at a whole game.

The solo version of the experiment looks like this:

I find a chess engine with a rating around mine, and use it to analyze positions in games against other engines. Play a bunch of games to get a baseline "hybrid" rating for myself with that advisor.
I do the same thing with a series of stronger chess engines, ideally each within a "level" of the last.
I do the same thing with access to the output of two engines, and I'm blinded to which is which. (The blinding might require some care around, for example, timing, as well as openings.) In sub-experiment A, I only get top moves and their scores. In sub-experiment B, I can look at lines from the current position up to some depth. In sub-experiment C, I can use these engines however I want. For example, I can basically play them against each other if I want to run down my own clock doing it. (Because pairs might be within a level of one another, I can't be sure which is stronger from a single win/loss outcome. I'd hope to find more efficient ways of distinguishing them.)
I repeat #3 with different random pairs of advisors.

What I'd expect is that my ratings with pairs of advisors should be somewhere between my rating with the bad advisor and my rating with the good advisor. If I can successfully distinguish them, it's close to the latter. If I'm just guessing, it's close to the former (in the Don'tBlunder world) or to the midpoint (in the ChessMove world). I should have an easier time in sub-experiments B and C. Having a worse engine in the mix weighs me down relatively more (a) the closer the engines are to each other, and (b) the stronger both engines are compared to me.

The main question I'd hope might be answerable this way would be something like, "How do (a) and (b) trade off?" Which is easier to distinguish—1800 and 2100, or, say, 2700 and 3300? Will there be a ceiling beyond which I'm always just guessing? Might I tend to side with worse advisors because, being closer to my level, they agree with me?

It seems like we'd want some handle on these questions before asking how much worse outright deception can be.

(There's some trouble here because higher-ranked players are more likely to draw given a fixed rating difference. This itself is relatively Don'tBlunder-like, and it makes me wonder if it's possible to project how far our best engines are likely to be from perfect play. But it makes it harder to disentangle inability to draw distinctions in play above my level from "natural" indistinguishability. There are also more general issues in doing these experiments with computers—for example, weak engines tend to be weak in ways humans wouldn't be, and it's hard to calibrate ratings for superhuman play.)

(It might also be interesting to automate myself out of this experiment by choosing between recommendations using some simple scripted logic and evaluation by a relatively weak engine.)

Along the lines of what I wrote in the parent, even though I think there's potentially a related and fairly deep "worldview"-type crux (crux generator?) nearby when it comes to AI risk—are we in a ChessMove world or a Don'tBlunder world?—[sorry, these are terrible names, because actual Chess moves are more like Don'tBlunder, which is itself horribly ugly]—I'm not particularly motivated to do this experiment, because I don't think any possible answer on this level of metaphor would be informative enough to shift anyone on more important questions.

Muireall's Shortform

Muireall2y*10

I sometimes wonder how much we could learn from toy models of superhuman performance, in terms of what to expect from AI progress. I suspect the answer is "not much", but I figured I'd toss some thoughts out here, as much to discharge any need I feel to think about them further as to see if anyone has any good pointers here.

Like—when is performance about making super-smart moves, and when is it about consistently not blundering for as long as possible? My impression is that in Chess, something like "average centipawn loss" (according to some analysis engine) doesn't explain outcomes as well as "worst move per game". (I don't know the keywords to search for, but I relatedly found this neat paper which finds a power law for the difference between best and second-best moves in a position.) What does Go look like, in comparison?

How deep are games? What's the longest chain of players such that each consistently beats the next? How much comes from the game itself being "deep" versus the game being made up of many repeated small contests? (E.g., the longest chain for best-of-9 Chess is going to be about 3 times longer than that for Chess, if the assumptions behind the rating system hold. Or, another example, is Chess better thought of as Best-Of-30 ChessMove with Elo-like performance and rating per move, or perhaps as Best-Of-30 Don'tBlunder with binary performance per move?)

Where do ceilings come from? Are there diminishing returns on driving down blunder probabilities given fixed deep uncertainties or external randomness? Is there such a thing as "perfect play", and when can we tell if we're approaching it? (Like—maybe there's some theoretically-motivated power law that matches a rating distribution until some cutoff at the extreme tail?)

What do real-world "games" and "rating distributions" look like in this light?

The problems with the concept of an infohazard as used by the LW community [Linkpost]

Muireall2y140

Many times have I heard people talk about ideas they thought up that are ‘super infohazardous’ and ‘may substantially advance capabilities’ and then later when I have been made privy to the idea, realized that they had, in fact, reinvented an idea that had been publicly available in the ML literature for several years with very mixed evidence for its success – hence why it was not widely used and known to the person coming up with the idea.

I’d be very interested if anyone has specific examples of ideas like this they could share (that are by now widely known or obviously not hazardous). I’m sympathetic to the sorts of things the article says, but I don’t actually have any picture of the class of ideas it’s talking about.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments