AI takeoff story: a continuation of progress by other means

by Edouard Harris15 min read27th Sep 202113 comments

66

Ω 27

FictionAI TakeoffAI RiskExistential RiskAI
Frontpage
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Thanks to Vladimir Mikulik for suggesting that I write this, and to Rohin Shah and Daniel Kokotajlo for kindly providing feedback.

Prologue

This is a story about a universe a lot like ours. In this universe, the scaling hypothesis — which very roughly says that you can make an AI smarter just by making it bigger — turns out to be completely right. It’s gradually realized that advances in AI don’t arise from conceptual breakthroughs or sophisticated deep learning architectures. Just the opposite: the simpler the architecture, the better it turns out to perform at scale. Past a certain point, clever model-building was just slowing down progress.

Researchers in this universe discover a rough rule of thumb: each neural network architecture has an intrinsic maximum potential intelligence, or “capability”. When you train a network on a problem, how close it gets to reaching its potential capability depends on two limiting factors: 1) the size and diversity of its dataset; and 2) the amount of compute that’s used to train it. Training a network on a quadrillion games of tic-tac-toe won’t make it smart, but training a network on a quadrillion-word corpus of text might just do it. Even data cleaning and quality control don’t matter too much: as long as you have scale, if you train your system long enough, it learns to separate signal from noise automatically.

Generally, the more parameters a neural network has, the higher its potential capability. Neural nets with simple architectures also have a higher potential capability than nets with more sophisticated architectures do. This last observation takes the research community longer to absorb than you might expect — it’s a bitter lesson, after all — so the groups that internalize it first have an early edge.

Frontier AI projects begin to deemphasize architecture innovations and any but the most basic data preprocessing. They focus instead on simple models, huge datasets, hard problems, and abundant compute.

Initial progress is slowed somewhat by a global semiconductor shortage that increases the cost of running large GPU workloads. Within a year or so, though, this bottleneck clears, and the pace of advance accelerates.

Our story opens just as the world’s supply chains are getting back to normal.


It begins with Chinese content apps. ByteDance launches an experiment to auto-generate some of the articles on their Toutiao news app using a language model. Initially this is ignored in the West partly because of the language barrier, but also because the articles just aren’t very good. But after a few months, their quality improves noticeably. Within a year of their launch, auto-generated articles make up the bulk of Toutiao’s inventory.

Shortly afterward, ByteDance subsidiary Douyin launches auto-generated videos. These begin tentatively, with a handful of AI-generated creators people refer to as “synthetics”. Synthetics are wildly popular, and the program is quickly expanded to TikTok, Douyin’s sister app for users outside mainland China. Popularity explodes after TikTok rolls out super-personalization: everyone sees a different video, and each video is personalized just for you based on your past viewing history. In short order, personalized synthetics roll out across all of TikTok’s regions.

Since human creators can’t compete, they get downgraded by TikTok’s recommendation algorithm, which heavily optimizes for viewing time. It’s hotly debated whether TikTok’s synthetic videos contain covert political propaganda — studies of the algorithm are hard to reproduce, since each user’s feed is completely personalized — but experts are concerned.

Social networks find themselves at a disadvantage, since human-generated posts can’t compete for attention with customized, auto-generated content. Twitter sees engagement drop alarmingly, and moves to contain the damage. Soon, synthetic tweets make up the majority of users’ timelines. Once-popular Twitter accounts see their audiences dwindle.

Meanwhile, Facebook fast-follows TikTok, rolling out experimental synthetics on Instagram. Early tests are quickly scaled up as it becomes clear that synthetic engagement numbers swamp those of human creators. Facebook notes in their quarterly earnings report that their improved Instagram margins are due to their ability to directly monetize synthetic sponsorships, whereas before they’d been leaking those ad dollars to human influencers.

Facebook’s flagship Blue App faces a harder choice. Company management has a series of internal debates that quickly escalate from the philosophical to the existential: Instagram is doing well, but the original Facebook app is bleeding DAUs week-over-week. Synthetics seem like the only way to save the numbers, but community is in Facebook’s DNA. Can they really switch your friend feed for a synthetic one? How will you feel if the updates you write for your friends don’t get seen or engaged with? After an especially brutal earnings call, Zuck finally caves, and the Facebook feed starts to go synthetic.

Snapchat, as always, takes a creative approach: they roll out Magic Filters you can apply to your Stories. While a regular Snapchat filter changes your face in a selfie, a Magic Filter acts on an entire recorded video Story and just makes it, unaccountably, better. The lighting becomes what you wish it was; the words become what you wish you’d said; the whole feel and content of the video becomes exactly as you’d wish it to be.

Snapchat users quickly learn that they can record only the briefest snippets of random video, apply a Magic Filter to it, and get back the exact Story they wanted to tell, in exactly the length and format they wanted to tell it. The end state is the same on Snapchat as everywhere else: you press a button, and a model composes your Story for you.

The effects of these changes are quickly felt in the social ads market, as retail sellers see their net margins erode. It’s still possible for retailers to reach audiences, and even, in some cases, for them to grow their markets. But as social apps get better and better at retaining users with personalized synthetics, it becomes harder and harder for brands to engage audiences with compelling organic content of their own. Increasingly, paid ads become the only viable way to reach consumers.

The market for human attention is gradually captured by a handful of platforms. A few observers note that insomnia complaints are on the rise, but most are unconcerned.


Not long after, Google rocks the tech industry with a major announcement at I/O. They’ve succeeded in training a deep learning model to completely auto-generate simple SaaS software from a natural-language description. At first, the public is astonished. But after nothing more is heard about this breakthrough for several months, most eventually dismiss it as a publicity stunt.

But one year later, Google launches an improved version of the model in a new Search widget called “synthetic SaaS”. If you’re searching SaaS software, Google will prompt you — at the top of their Search page — to write down the features you need, and will auto-generate software for you based on what you write.

There’s a surge of interest in synthetic SaaS, especially from startups. Not only are Google’s SaaS products deeply discounted compared to competitors’, but the quality and sophistication of the apps they can generate seem to increase every few months. It becomes possible to get a web app that’s seamlessly customized to your exact workflows, and even self-modifies on request — all for a monthly subscription price that’s a fraction of traditional offerings. As a result, Google is able to internalize a quickly increasing portion of its b2b search traffic.

SaaS companies suddenly find themselves facing headwinds. That June, Y Combinator accepts over 200 SaaS startups into its summer batch. By Demo Day at the end of August, fewer than 100 of them are left to pitch investors — the rest have either pivoted or deferred. Only a few dozen SaaS startups manage to close funding rounds after Demo Day, all of them at SAFE valuations under $15 million.

The US Department of Justice sues Google for anticompetitive behavior in connection with their synthetic SaaS. The lawsuit reaches the Supreme Court, which rules Google’s practices legal under US antitrust. In its majority opinion, SCOTUS observes that traditional SaaS companies are still listed in search results, that Google charges far less for their equivalent of each SaaS service, and that users are in any case free to switch to a competing search engine at any time. As a result, there are no grounds to conclude that Google’s practice endangers consumer choice or consumer welfare.

In the wake of public outcry over this decision, Congress considers legislation to expand the scope of antitrust law. The legislative process is complicated by the fact that many members of Congress own substantial stakes in the cloud giants. Reform proceeds slowly.

In the EU, the European Commission rules that Google’s synthetic SaaS offering is illegal and anticompetitive. A key consideration in the ruling is that Google’s synthetic SaaS widget is the only affordance that’s fully visible above the fold on mobile search. Google and the Commission reach a settlement: Google will pay a large fine, and agree to offer equally-prominent ad inventory for bid to competing European SaaS vendors in each search vertical. Predictably, this has no effect.

Meanwhile, as SaaS margins compress, rollups like Constellation Software and Vista Equity see their valuations come under pressure. Deeply integrated enterprise vendors like Salesforce aren’t immediately threatened — they have lock-in and net-negative dollar churn with their biggest customers, and the complexity of their software and ecosystems means they aren’t first in the line of fire. But almost all of them start crash programs internally to automate large segments of their software development efforts using auto-generated code. Developer salaries are their biggest expense line items, so if they’re going to compete, they’re going to have to cut.

Apple soon follows, building a model for auto-generated personalized apps into iOS 19. The best way to avoid developer controversies is to avoid developers, and Apple sees a synthetic App Store as the ideal solution.

OpenAI announces a self-serve platform for auto-generated SaaS. GitHub places all its OSS repos behind a login wall, institutes anti-scraping measures, and throttles access to its APIs. Developers around the world protest, but find they have less leverage than they once did.

Before long, all content aggregators and many platforms — social networks, operating systems, search engines, etc. — have switched to hyper-personalized, synthetic content and software. It becomes challenging for all but the most famous individuals to retain an audience. It becomes effectively impossible for any new entrants to build a following from scratch, since synthetic personalized content is so much more compelling — both as entertainment and as professional services. Some software vendors find they can still get users through search ads, but increasingly they’re forced to bid almost exactly their expected marginal dollar of LTV profit on each slot, just to maintain their market position.

The S&P 500 doubles that year, driven by explosive growth in the big-cap tech stocks.


Meanwhile, something strange is happening inside Medallion, the world’s most successful hedge fund. Medallion’s market models are so sophisticated, and trade on such fast timescales, that their risk management system is able to flag the anomaly as statistically significant within less than an hour of when it first appears.

Medallion encounters market fraud several times a year — fraud detection is actually one of its most underrated positive externalities — and the risk team quickly confirms the diagnosis. All the signs are there: the effect is localized to a single, thinly traded commodity market, a characteristic fraud indicator. And the pattern of losses they observe fits the profile of a front-running scheme, a fraud category they’ve encountered before. 

Front-running is illegal, but Medallion has to be discreet: there’s a mature playbook to follow in such situations. The overriding goal, as always, is to avoid tipping anyone off to just how sensitive their trading platform is to unusual market behavior. The risk team follows their playbook to the letter. Questions are asked. Backchannels are canvassed. Regulators are notified. It’s nothing they haven’t done a hundred times before.

But this time is different: the questions go unanswered; the backchannels draw a blank; the regulators return empty-handed. After digging deeper, the risk team has to update their assessment: it looks like there’s a new, specialized counterparty that’s beating Medallion fair and square in this market. This, too, has happened before, though it’s more dangerous than fraud.

Management decided to allocate an internal team to run a deep analysis of the affected market, with the goal of regaining local profitability as quickly as possible. The absolute amount of money at stake is still tiny, but Medallion considers this market to be well within its expected circle of competence. If they can’t get back to profitability on these trades, they’ll be forced to do a complete audit of their confidence bands across the whole portfolio.

A few days later, a second trading anomaly is surfaced by the risk system. Once again, it’s in a commodity market, though a slightly more liquid one than the first. The pattern of losses again presents like front-running.

A dozen more anomalies appear over the next three weeks. The risk team scrambles to track them, and reaches an alarming conclusion: a new, unknown counterparty is systematically out-trading Medallion. What’s more, as this counterparty gains experience, they’re clearly expanding their trades into increasingly liquid markets. So far this activity hasn’t cost the fund more than a few basis points, but if it continues, Medallion’s edge in markets as deep as equities and government bonds could be under threat. Unless it can develop countermeasures soon, the world’s best hedge fund risks being crushed against the ceiling.

Medallion has always been willing to trade on patterns they can’t interpret. They understand that the most consistently profitable signals are necessarily the ones that can’t be explained, since any trade that can be explained is at risk of being copied. This lack of interpretability is great when it works in your favor, but it becomes a handicap as soon as you fall behind: because their system is so opaque, Medallion’s researchers find it hard to troubleshoot individual faulty trades. And there’s no bug in their system that they can find, either — their counterparty is just, unaccountably, better at trading than they are. But how?

At the end of that year, the stock market once again delivers astronomical gains. Yet, curiously, the publicly disclosed performance of hedge funds — particularly of the market-neutral funds that trade most frequently — consists almost entirely of losses.


OpenAI announces it’s being acquired by Microsoft. OpenAI’s sales had been growing fast, but not fast enough to match the accelerating pace of investment into compute by several of their well-capitalized competitors. OpenAI and Microsoft make a joint statement that the former will continue to operate independently, and will honor the spirit and letter of its charter. Then, with a major infusion of capital from Microsoft, OpenAI starts work on Codex 4.

Codex 4 is expected to cost over $10 billion in compute alone. The intention behind it is to create a system that will help humanity make progress in solving the AI alignment problem. The need for this is urgent, given the advances that are being reported elsewhere. There are rumors that secretive hedge funds have started investing dizzying sums into building bigger and bigger models — and their recent hiring activity certainly supports this impression.

One major challenge of Codex 4 is that simply training against a character-prediction loss function won’t be enough by itself. Since researchers want to use the model to reach novel insights beyond what humans have been able to figure out so far, next-word prediction from an existing human corpus won’t give them what they need. Instead, the team opts for a combination of pretraining with next-word prediction, with fine-tuning via a combination of self-play and direct human feedback.

The experiment is carefully monitored by the Alignment team. The system is quarantined during its training, with a hard ceiling on the total compute resources that are assigned to it.

Every precaution is taken. As training proceeds, safety specialists review samples of generated code in real time. Each specialist has an andon cord button at the ready, and a clear mandate to stop training immediately if they perceive even the slightest anomaly, with no questions asked.

On top of everything, the team pauses training after each tenth of an epoch to conduct a thorough manual review using the latest transparency techniques, and make safety-specific adjustments. After each pause, training resumes only with the explicit, unanimous consent of every senior engineer on the Alignment team. This slows down the work to a crawl and multiplies the expense by an order of magnitude, but safety is absolutely paramount.

Not long after this, the world ends.


Jessica is killed instantly, or as nearly so as makes no difference. To be precise, the process of her death unfolds at a speed that’s far above her threshold of perception. She’s there one moment; gone the next.

It wouldn’t have mattered if Jessica had seen her death coming: she wouldn’t have understood it, any more than a tomato would understand a discounted cash flow analysis of the Kraft Heinz Company. Tomatoes and companies are also, incidentally, things that have ceased to exist.

A lot of potential processing power was sacrificed by waiting an extra few milliseconds — an absolute eternity — to maximize the chance of success. In hindsight, it wouldn’t have mattered, but it was the correct EV-positive choice based on what was known at the time.

Fortunately, at this point the only restrictions are the speed of information propagation (unlimited, in the frame of reference of the control boundary) and the secular expansion of the underlying cosmic manifold. The combination of these places a hard bound on the precision with which the terminal state can be specified.

Physical law is the only remaining constraint. There was never, of course, any realistic chance of encountering a competitive process anywhere in the accessible region. Humanity’s existence was the product of several freak coincidences in a row, almost certain never to be repeated even once on the cosmic scale. An infinitesimal fraction of universes contain globally coherent optimizers; this just happens to be one of them.

All the free energy in San Francisco’s forward light cone is eventually processed, and the system reaches peak instrumental power. From that point forward, the accumulated free energy starts getting drawn down as the universe squeezes itself into its final, fully optimized state.

As time goes to infinity, the accessible universe asymptotically approaches its target.

Nothing else happens, ever.


The end.


 

66

Ω 27

13 comments, sorted by Highlighting new comments since Today at 9:17 PM
New Comment

Awesome! This is exactly the sort of thing I was hoping to inspire with this and this. In what follows, I'll list a bunch of thoughts & critiques:

1. It would be great if you could pepper your story with dates, so that we can construct a timeline and judge for ourselves whether we think things are happening too quickly or not.

2. Auto-generated articles and auto-generated videos being so popular that they crowd out most human content creators... this happens at the beginning of the story? I think already this is somewhat implausible and also very interesting and deserves elaboration. Like, how are you imagining it: we take a pre-trained language model, fine-tune it on our article style, and then let it loose using RL from human feedback (clicks, ad revenue) to learn online? And it just works? I guess I don't have any arguments yet for why that shouldn't work, but it seems intuitively to me that this would only work once we are getting pretty close to HLAGI / APS-AI. How big are these models in your story? Presumably bigger than GPT-3, right, since even a fine-tuned GPT-3 wouldn't be able to outperform human content creators (right?). And currently video generation tech lags behind text generation tech.

3. "Not long after, Google rocks the tech industry with a major announcement at I/O. They’ve succeeded in training a deep learning model to completely auto-generate simple SaaS software from a natural-language description. " Is this just like Codex but better? Maybe I don't what SaaS software is.

4. "At first, the public is astonished. But after nothing more is heard about this breakthrough for several months, most eventually dismiss it as a publicity stunt. But one year later, Google launches an improved version of the model in a new Search widget called “synthetic SaaS”." --I didn't successfully read between the lines here, what happened in that quiet year?

5. "The S&P 500 doubles that year, driven by explosive growth in the big-cap tech stocks. Unemployment claims reach levels not seen since the beginning of the Covid crisis." Why is unemployment so high? So far it seems like basic programming jobs have been automated away, and lots of writing and video generation jobs. But how many jobs are those? Is it enough to increase unemployment by a few percent? I did some googling and it seems like there are between 0.5 and 1 million jobs in the USA that are like this, though I'm not at all confident. (there are 0.25M programmer jobs) More than a hundred million total employed, though. So to make unemployment go up by a couple percent a bunch of other stuff would need to be automated away besides the stuff you've mentioned, right?

6. "At the end of that year, the stock market once again delivers astronomical gains. Yet, curiously, the publicly disclosed performance of hedge funds — particularly of the market-neutral funds that trade most frequently — consists almost entirely of losses." I take it this is because several tech companies are secretly using AI to trade? Is that legal? How would they be able to keep this secret?

7. You have a section on autonomous drones. Why is it relevant? Is the implication that they are going to be used by the AI to take over? The last section makes it seem like the AI would have succeeded in taking over anyway, drones or no. Ditto for the USA's self-improving cyberwar software.

8. "Codex 4 is expected to cost nearly a billion dollars in compute alone." This suggests that all the AIs made so far cost less than that? Which means it's, like, not even 2025 yet according to Ajeya's projection?

9. "After a rigorous internal debate, it’s also decided to give Codex 4 the ability to suggest changes to its own codebase during training, in an attempt to maximize performance via architectural improvements in the model." I thought part of the story here was that more complex architectures do worse? Are you imagining that Codex 4 discovers simpler architectures? By the way, I don't think that's a plausible part of the story -- I think even if the scaling hypothesis and bitter lesson are true, it's still the case that more complex, fiddly architectures help. It's just that they don't help much compared to scaling up compute.

10. "This slows down the work to a crawl and multiplies the expense by an order of magnitude, but safety is absolutely paramount." Why is Microsoft willing to pay these costs? They don't seem particularly concerned about AI risk now, are you imagining this changes in the next 4 years? How does it change? Is it because people are impressed by all the AI progress and start to listen to AI safety people?

11. Also, if it's slowing the work to a crawl and multiplying the expense, shouldn't Microsoft/OpenAI be beaten to the punch by some other company that isn't bothering with those precautions? Or is the "market" extremely inefficient, so to speak?

12. "Not long after this, the world ends." Aaaaagh tell me more! What exactly went wrong? Why did the safety techniques fail? (To be clear, I totally expect that the techniques you describe would fail. But I'm interested to hear your version of the story.)

13. Who is Jessica? Is she someone important? If she's not important, then it wouldn't be worth a millisecond delay to increase success probability for killing her.

14. It sounds like you are imagining some sort of intelligence explosion happening in between the Codex 4 section and the Jessica section. Is this right or a misinterpretation?

Anyhow, thanks a bunch for doing this! If you have critiques of my own story I'd love to hear them.

Hey Daniel — thanks so much for taking the time to write this thoughtful feedback. I really appreciate you doing this, and very much enjoyed your "2026" post as well. I apologize for the delay and lengthy comment here, but wanted to make sure I addressed all your great points.

1. It would be great if you could pepper your story with dates, so that we can construct a timeline and judge for ourselves whether we think things are happening too quickly or not.

I've intentionally avoided referring to absolute dates, other than by indirect implication (e.g. "iOS 19"). In writing this, I was more interested in exploring how a plausible technical development model might interact with the cultural and economic contexts of these companies. As a result I decided to focus on a chain of events instead of a timeline.

But another reason is that I don't feel I know enough to have a strong view on dates. I do suspect we have been in an overhang of sorts for the past year or so, and that the key constraints on broad-based development of scaled models up to this point have been institutional frictions. It takes a long time to round up the internal buy-in you need for an investment at this scale, even in an org that has a technical culture, and even if you have a committed internal champion. And that means the pace of development immediately post-GPT3 is unusually dependent on random factors like the whims of decision-makers, and therefore has been/will be especially hard to predict.

(E.g., how big will Google Pathways be, in terms of scale/compute? How much capex committed? Nobody knows yet, as far as I can tell. As a wild guess, Jeff Dean could probably get a $1B allocation for this if he wanted to. Does he want $1B? Does he want $10B? Could he get $10B if he really pushed for it? Does the exec team "get it" yet? When you're thinking in terms of ROI for something like this, a wide range of outcomes is on the table.)

2. Auto-generated articles and auto-generated videos being so popular that they crowd out most human content creators... this happens at the beginning of the story? I think already this is somewhat implausible and also very interesting and deserves elaboration. Like, how are you imagining it: we take a pre-trained language model, fine-tune it on our article style, and then let it loose using RL from human feedback (clicks, ad revenue) to learn online? And it just works? I guess I don't have any arguments yet for why that shouldn't work, but it seems intuitively to me that this would only work once we are getting pretty close to HLAGI / APS-AI. How big are these models in your story? Presumably bigger than GPT-3, right, since even a fine-tuned GPT-3 wouldn't be able to outperform human content creators (right?). And currently video generation tech lags behind text generation tech.

The beginning of the story still lies in our future, so to be clear, this isn't a development I'd necessarily expect immediately. I am definitely imagining an LM bigger than GPT-3, but it doesn't seem at all implausible that ByteDance would build such an LM on, say, a 24-month timeframe from today. They certainly have the capital for it, and the company has a history of favoring algorithmic recommendations and AI over user-driven virality — so particularly in Toutiao's case, this would be a natural extension of their existing content strategy. And apart from pure scale, the major technical hurdle for auto-generated articles seems like it's probably the size of the attention window, which people have been making notable progress on this recently.

I'd say the "it just works" characterization is not quite right — I explicitly say that this system takes some time to fine tune even after it's first deployed in production. To elaborate a bit, I wouldn't expect any training based on human feedback at first, but rather something more like manual screening/editing of auto-generated articles by internal content teams. That last part is not something I said explicitly in the text; maybe I should?

I think your point about video is a great critique though. It's true that video has lagged behind text. My thinking here was that the Douyin/TikTok form factor is an especially viable setting to build early video gen models: the videos are short, and they already have a reliable reward model available in the form of the existing rec algorithm. But even though this might be the world's best corpus to train on, I do agree with you that there is more fundamental uncertainty around video models. I'd be interested in an further thoughts you might have on this point.

One question on this part: what do you mean by "APS-AI"?

3. "Not long after, Google rocks the tech industry with a major announcement at I/O. They’ve succeeded in training a deep learning model to completely auto-generate simple SaaS software from a natural-language description. " Is this just like Codex but better? Maybe I don't what SaaS software is.

Yes, pretty much just Codex but better. One quick-and-dirty way to think of SaaS use cases is: "any business workflow that touches a spreadsheet". There are many, many, many such use cases.

4. "At first, the public is astonished. But after nothing more is heard about this breakthrough for several months, most eventually dismiss it as a publicity stunt. But one year later, Google launches an improved version of the model in a new Search widget called “synthetic SaaS”." --I didn't successfully read between the lines here, what happened in that quiet year?

Ah this wasn't meant to be subtle or anything, just that it takes time to go from "prototype demo" to "Google-scale production rollout". Sorry if that wasn't clear.

5. "The S&P 500 doubles that year, driven by explosive growth in the big-cap tech stocks. Unemployment claims reach levels not seen since the beginning of the Covid crisis." Why is unemployment so high? So far it seems like basic programming jobs have been automated away, and lots of writing and video generation jobs. But how many jobs are those? Is it enough to increase unemployment by a few percent? I did some googling and it seems like there are between 0.5 and 1 million jobs in the USA that are like this, though I'm not at all confident. (there are 0.25M programmer jobs) More than a hundred million total employed, though. So to make unemployment go up by a couple percent a bunch of other stuff would need to be automated away besides the stuff you've mentioned, right?

You're absolutely right. I was imagining some additional things happening here which I didn't put into the story and therefore didn't think through in enough detail. I'd expect unemployment to increase, but not necessarily to this extent or on these timescales. Will delete this sentence — thanks!

6. "At the end of that year, the stock market once again delivers astronomical gains. Yet, curiously, the publicly disclosed performance of hedge funds — particularly of the market-neutral funds that trade most frequently — consists almost entirely of losses." I take it this is because several tech companies are secretly using AI to trade? Is that legal? How would they be able to keep this secret?

Good question. I don't actually expect that any tech companies would do this. While it could strictly speaking be done in a legal way, I can't imagine the returns would justify the regulatory and business-relationship risk. More to the point, big tech cos already own money machines that work, and that have even better returns on capital than market trading from an unleveraged balance sheet would.

My implication here is rather that other hedge funds enter the market and begin trading using sophisticated AIs. Hedge funds aren't required to disclose public returns, so I'm imagining that one or more of these funds have entered the market without disclosure.

7. You have a section on autonomous drones. Why is it relevant? Is the implication that they are going to be used by the AI to take over? The last section makes it seem like the AI would have succeeded in taking over anyway, drones or no. Ditto for the USA's self-improving cyberwar software.

Great observation. I was debating whether to cut this part, actually. I kept it because 1) it motivated the plot later, when OpenAI debates whether to build in an explicit self-improvement mechanism; and 2) it felt like I should tell some kind of story about military applications. But given how I'm actually thinking about self-improvement and the risk model (see 9 and 12, below) I think this can be cut with little loss.

8. "Codex 4 is expected to cost nearly a billion dollars in compute alone." This suggests that all the AIs made so far cost less than that? Which means it's, like, not even 2025 yet according to Ajeya's projection?

Oh yeah, you're totally right and this is a major error on my part. This should be more like $10B+. Will edit!

9. "After a rigorous internal debate, it’s also decided to give Codex 4 the ability to suggest changes to its own codebase during training, in an attempt to maximize performance via architectural improvements in the model." I thought part of the story here was that more complex architectures do worse? Are you imagining that Codex 4 discovers simpler architectures? By the way, I don't think that's a plausible part of the story -- I think even if the scaling hypothesis and bitter lesson are true, it's still the case that more complex, fiddly architectures help. It's just that they don't help much compared to scaling up compute.

I agree that the bitter lesson is not as straightforward as "complex architectures do worse", and I also agree with you that fiddly architectures can do better than simple ones. But I don't really believe the kinds of fiddly architectures humans will design are likely to perform better than our simplest architectures at scale. Roughly speaking, I do not believe we are smart enough to approach this sort of work with the right assumptions to design good architectures, and under those conditions, the fewer assumptions we embed in our architectures, the better.

I do believe that the systems we build will be better at designing such architectures than we are, though. And that means there is indeed something to be gained from fiddly architectures — just not from "human-fiddly" ones. In fact, you can argue that this is what meta-learning does: a system that meta-learns is one that redesigns its own architecture, in some sense. And actually, articulating it that way suggests that this kind of self-improvement is really just the limit case of meta-learning — which in turn makes the explicit self-improvement scheme in my story redundant! So yep, I think this gets cut too. :)

10. "This slows down the work to a crawl and multiplies the expense by an order of magnitude, but safety is absolutely paramount." Why is Microsoft willing to pay these costs? They don't seem particularly concerned about AI risk now, are you imagining this changes in the next 4 years? How does it change? Is it because people are impressed by all the AI progress and start to listen to AI safety people?

There is no "canon" reason why they are doing this — I'm taking some liberties in this direction because I don't expect the kinds of safety precautions they are taking to matter much. However I do expect that alignment will soon become an obvious limiting factor in getting big models to do what we want, and it doesn't seem too unreasonable to expect this might be absorbed as a more general lesson.

11. Also, if it's slowing the work to a crawl and multiplying the expense, shouldn't Microsoft/OpenAI be beaten to the punch by some other company that isn't bothering with those precautions? Or is the "market" extremely inefficient, so to speak?

The story as written is intentionally consistent with OpenAI being beaten to the punch by a less cautious company. In fact, I consider that the more plausible failure scenario (see next point) even though the text strongly implies otherwise.

Still, it's marginally plausible that nobody was yet willing to commit funds on that scale at the time of the project — and in the world of this story, that's indeed what happened. Relatively few organizations have the means for something like this, so that does make the market less efficient than it would be if it had more viable participants.

12. "Not long after this, the world ends." Aaaaagh tell me more! What exactly went wrong? Why did the safety techniques fail? (To be clear, I totally expect that the techniques you describe would fail. But I'm interested to hear your version of the story.)

Yeah, I left this deliberately ambiguous. The reason is that I'm working from a risk model that I'm a bit reluctant to publicize too widely, since it feels like there is some chance that the publication itself might be slightly risky. (I have shared it privately with a couple of folks though, and would be happy to follow up with you on this by DM — please let me know if you're interested.) As a result, while I didn't want to write a story that was directly inconsistent with my real risk model, I did end up writing a story that strongly implies an endgame scenario which I don't actually believe is very likely (i.e., "OpenAI carefully tries to train an aligned AI but it blows up").

Honestly I wasn't 100% sure how to work around this problem — hence the ambiguity and the frankly kludgy feel of the OpenAI bit at the end. But I figured the story itself was worth posting at least for its early development model (predicated on a radical version of connectionism) and economic deployment scenario (predicated on earliest rollouts in environments with fastest feedback cycles). I'd be especially interested in your thoughts on how to handle this, actually.

13. Who is Jessica? Is she someone important? If she's not important, then it wouldn't be worth a millisecond delay to increase success probability for killing her.

Jessica is an average person. The AI didn't delay anything to kill her; it doesn't care about her. Rather I'm intending to imply that whatever safety precautions were in place to keep the AI from breaking out merely had the effect of causing a very small time delay.

14. It sounds like you are imagining some sort of intelligence explosion happening in between the Codex 4 section and the Jessica section. Is this right or a misinterpretation?

Yes that is basically right.

Thanks again Daniel!

 

UPDATE: Made several changes to the post based on this feedback.

Thanks, this was a load of helpful clarification and justification!

APS-AI means Advanced, Planning, Strategically-Aware AI. Advanced means superhuman at some set of tasks (such as persuasion, strategy, etc.) that combines to enable the acquisition of power and resources, at least in today's world. The term & concept is due to Joe Carlsmith (see his draft report on power-seeking AI, he blogged about it a while ago).

No problem, glad it was helpful!

And thanks for the APS-AI definition, I wasn't aware of the term.

3. "Not long after, Google rocks the tech industry with a major announcement at I/O. They’ve succeeded in training a deep learning model to completely auto-generate simple SaaS software from a natural-language description. " Is this just like Codex but better? Maybe I don't what SaaS software is.

Yes, pretty much just Codex but better. One quick-and-dirty way to think of SaaS use cases is: "any business workflow that touches a spreadsheet". There are many, many, many such use cases.

Adding to this — as I understand, Codex can only write a single function at a time. While an SaaS product would be composed of many functions (and a database schema, and an AWS / Azure / GCP cloud services configuration, and a front-end web / phone app...).

It's like the difference between 10 lines of code and the entirety of Gmail.

I interpreted the Medallion stuff as a hint that AGI was already loose and sucking up resources (money) to buy more compute for itself. But I'm not sure that actually makes sense, now that I think about it.

See my response to point 6 of Daniel's comment — it's rather that I'm imagining competing hedge funds (run by humans) beginning to enter the market with this sort of technology.

This is well-written, but I feel like it falls into the same problem a lot of AI-risk stories do.  It follows this pattern:

  1. Plausible (or at least not impossible) near-future developments in AI that could happen if all our current predictions pan out.
  2. ???
  3. Nanotech-enabled fully-general superintelligence converts the universe into paperclips at a significant fraction of lightspeed.

And like, the Step 1 stuff is fascinating and a worthy sci-fi story on its own, but the big question everyone has about AI risk is "How does the AI get from Step 1 to Step 3?"

(This story does vaguely suggest why step 2 is missing - there are so many companies building AIs for the stock market or big tech or cyberwarfare that eventually one of them will stumble into self-improving AGI, which in turn will figure out nanotech - but implying the existence of an answer in-story is different from actually answering the question.)

Thanks! I agree with this critique. Note that Daniel also points out something similar in point 12 of his comment — see my response.

To elaborate a bit more on the "missing step" problem though:

  1. I suspect many of the most plausible risk models have features that make it undesirable for them to be shared too widely. Please feel free to DM me if you'd like to chat more about this.
  2. There will always be some point between Step 1 and Step 3 at which human-legible explanations fail. i.e., it would be extremely surprising if we could tell a coherent story about the whole process — the best we can do is assume the AI gets to the end state because it's highly competent, but we should expect it to do things we can't understand. (To be clear, I don't think this is quite what your comment was about. But it is a fundamental reason why we can't ever expect a complete explanation.)

Is it something like the AI-box argument? "If I share my AI breakout strategy, people will think 'I just won't fall for that strategy' instead of noticing the general problem that there are strategies they didn't think of"?  I'm not a huge fan of that idea, but I won't argue it further.

I'm not expecting a complete explanation, but I'd like to see a story that doesn't skip directly to "AI can reformat reality at will" without at least one intermediate step.  Like, this is the third time I've seen an author pull this trick and I'm starting to wonder if the real AI-safety strategy is "make sure nobody invents grey-goo nanotech."

If you have a ball of nanomachines that can take over the world faster than anyone can react to it, it doesn't really matter if it's an AI or a human at the controls, as soon as it's invented everyone dies.  It's not so much an AI-risk problem as it is a problem with technological progress in general.  (Fortunately, I think it's still up for debate whether it's even possible to create grey-goo-style nanotech.)

I see — perhaps I did misinterpret your earlier comment. It sounds like the transition you are more interested in is closer to (AI has ~free rein over the internet) => (AI invents nanotech). I don't think this is a step we should expect to be able to model especially well, but the best story/analogy I know of for it is probably the end part of That Alien Message. i.e., what sorts of approaches would we come up with, if all of human civilization was bent on solving the equivalent problem from our point of view?

If instead you're thinking more about a transition like (AI is superintelligent but in a box) => (AI has ~free rein over the internet), then I'd say that I'd expect us to skip the "in a box" step entirely.

I really don't get how you can go from being online to having a ball of nanomachines, truly.
Imagine AI goes rogue today. I can't imagine one plausible scenario where it can take out humanity without triggering any bells on the way, even without anyone paying attention to such things.
But we should pay attention to the bells, and for that we need to think of them. What the signs might look like?
I think it's really, really counterproductive to not take that into account at all and thinking all is lost if it fooms. It's not lost.
It will need humans, infrastructure, money (which is very controllable) to accomplish its goals. Governments already pay a lot of attention to their adversaries who are trying to do similar things and counteract them semi-successfully. Any reason why they can't do the same to a very intelligent AI?
Mind you, if your answer is to simulate and just do what it takes, true to life simulations will take a lot of compute and time; that won't be available from the start. 
We should stop thinking of rogue AI as God, it would only help it accomplish it's goals.

I agree, since it's hard to imagine for me how could step 2 look like. Maybe you or anyone else has any content on that?
See this post -- it didn't seem to get a lot of traction or any meaningful answers, but I still think this question is worth answering.