"Taking AI Risk Seriously" (thoughts by Critch)


149


Raemon

Content Note: Serious discussion of end-of-the-world and what to do given limited info. Scrupulosity triggers, etc.

Epistemic Status: The first half of this post is summarizing my own views. I think I phrase each sentence about as strongly as I feel it. (When I include a caveat, please take the caveat seriously)

Much of this blogpost is summarizing opinions of Andrew Critch, who said them confidently. Some of them I feel comfortable defending explicitly, others I think are important to think seriously about but don’t necessarily endorse myself.

Critch is pretty busy these days and probably won’t have time to clarify things, so I’m trying to err on the side of presenting his opinions cautiously.

Table of Contents:

  • Core Claims
  • My Rough AGI Timelines
  • Conversations with Critch
    • Hierarchies
    • Deep Thinking
    • Turning Money into Time
    • Things worth learning
    • Planning, Thinking, Feedback, Doing
  • A Final Note on Marathons

Summary of Claims

In past two years, my AI timelines have gone from "I dunno, anywhere from 25-100 years in the future?" to "ten years seems plausible, and twenty years seems quite possible, and thirty years seems quite likely."

My exact thoughts on this are still a bit imprecise, but I feel confident in a few claims:

Claim 1: Whatever your estimates two years ago for AGI timelines, they should probably be shorter and more explicit this year.

Claim 2: Relatedly, if you’ve been waiting for concrete things to happen for you to get worried enough to take AGI x-risk seriously, that time has come. Whatever your timelines currently are, they should probably be influencing your decisions in ways more specific than periodically saying “Well, this sounds concerning.”

Claim 3: Donating money is helpful (I endorse Zvi’s endorsement of MIRI and I think Lark’s survey of the overall organizational landscape is great), but honestly, we really need people who are invested in becoming useful for making sure the future is okay.

What this might mean depends on where you are. It took me 5 years to transition into being the sort of person able to consider this seriously. It was very important, for the first couple of those years, for me to be able to think about the questions without pressure or a sense of obligation.

I still don’t see any of this as an obligation - just as the obviously-right-thing-to-do if you’re a Raemon with the particular sets of beliefs and life-circumstances that I currently have.

But I do wish I’d been able to make the transition faster. Depending on where you currently are, this might mean:

  1. Get your shit together, in general. Become the sort of person who can do things on purpose, if you aren’t already.
  2. Develop your ability to think – such that if you spent an additional hour thinking about a problem, you tend to become less confused about that problem, rather than overwhelmed or running in circles.
  3. Get your financial shit together. (Get enough stability and runway that you can afford to take time off to think, and/or to spend money on things that improve your ability to think and act)
  4. Arranging your life such that you are able to learn about the state of AI development. There are reasons to do this both so that you find things to do to help, and so that you are just individually prepared for what’s coming, whatever it is.

I think that really getting a handle on what is happening in the world and what to do about it requires more time than “occasionally, in your off hours.”

Serious thinking requires serious effort and deep work.

I have some thoughts about “what to do, depending on your current life circumstances”, but first want to drill down a bit into why I’m concerned.

Rough Timelines

My current AI timelines aren’t very rigorous. A somewhat embarrassing chunk of my “ten to twenty years” numbers come from “this is what smart people I respect seem to think”, and there’s a decent chance those smart people are undergoing a runaway information cascade.

But I still think there’s enough new information to make some real updates. My views are roughly the aggregate of the following:

My vague impression from reading informal but high-level discussions (among people of different paradigms) is that the conversation has shifted from “is AI even a thing to be concerned about?” to “what are the specific ways to approach safety in light of the progress we’ve seen with AlphaGo Zero?”

The main thrust of my belief is that current neural nets, while probably still a few insights away from true AGI a la Sarah Constantin...

  • Seem to be able to do a wide variety of tasks, using principles similar to how human brains actually do stuff.
  • Seem to be progressing faster at playing games than most people expected
  • We’re continuing to develop hardware (i.e. TPUs) optimized for AI that may make it easier to accomplish things via brute force even if we don’t have the best algorithms yet.
  • Upon hitting milestones, we seem to quickly go from “less good than a human” at a task to “superhuman” in a matter of months, and this is before recursive self improvement enters the picture. (see Superintelligence FAQ)

Meanwhile, my takeaway from Katja Grace survey of actual AI researchers is that industry professionals saying “it’s at least decades away” don’t really have a model at all, since different framings of the question yield very different average responses.

Sarah Constantin offers the strongest argument I’ve seen that AGI isn’t imminent - that all the progress we’ve seen (even ability to solve arbitrary arcade games) still doesn’t seem to indicate AI that understand concepts and can think deliberately about them. A key piece of general intelligence is missing.

She also argues that progress has been fairly linear, even as we have incorporated deep learning. “Performance trends in AI” was written a year ago and I’m not sure how much AlphaGo Zero would change her opinion.

But this isn’t that reassuring to me. At best, this seems to point towards a moderate takeoff that begins in earnest a couple decades from now. That still suggests radical changes to the world within our lifetimes, moving faster than I expect to be able to recover from mistakes.

Meanwhile, AlphaGo Zero isn’t overwhelming evidence to the contrary, but it seemed at least like a concrete, significant bit of evidence that progress can be faster, simpler, and more surprising.

From Metaphors to Evidence

When I first read the sequences, the arguments for fast takeoff seemed reasonable, but they also sounded like the sort of clever-sounding things a smart person could say to make anything sound plausible.

By now, I think enough concrete evidence has piled up that we’re way beyond “Is an Earth full of Einsteins a reasonable metaphor?” style debates, and squarely into “the actual things happening in the real world seem to firmly point towards sooner and faster rather than later.

When you factor in that companies don’t share everything they’re working on, and that we should expect DeepMind et all to have some projects we don’t even know about yet (and that the last couple of their announcements surprised many people), it seems that should further push probability mass towards more-progress-than-we-intuitively-expect.

If you aren’t currently persuaded on AI timelines being short or that you should change your behavior, that’s fine. This isn’t meant to be a comprehensive argument.

But, if you believe that 10-20 year AI timelines are plausible, and you’re psychologically, financially, and operationally ready to take that seriously, I think it’s a good time to kick your “take-seriously-o-meter” up a few notches.

If you’re close but not quite ready to take AI fully seriously, I think this is a useful set of things to start thinking about now, so that in a year or two when you’re more capable or the timing is better, the transition will be easier.

Conversations with Critch

I've had a few recent conversations with Andrew Critch, who's been involved at MIRI and currently helps run the Center for Human Compatible AI (CHAI) and the Berkeley Existential Risk Initiative (BERI).

In past conversations with Critch, I’ve frequently run into the pattern:

  • Critch: says thing that sounds ridiculous
  • Me: “That’s ridiculous”
  • *argues for an hour or two*
  • Me: “Huh, okay, I guess that does make sense.”

This has happened enough that I’ve learned to give him the benefit of the doubt. It’s usually a straightforward case of inferential distance, occasionally due to different goals. Usually there are caveats that make the ridiculous thing make more sense in context.

I mention this because when I asked him what advice he'd give to people looking to donate to x-risk or AI safety, he said something to the effect of:

“If you have three years of runway saved up, quit your job and use the money to fund yourself. Study the AI landscape full-time. Figure out what to do. Do it.”

This felt a little extreme.

Part of this extremity is due to various caveats:

  • “Three years of runway” means comfortable runway, not “you can technically live off of ramen noodles” runway.
  • [Edit for clarity] The goal is not to quit your job for three years – the goal is to have as much time as you need (i.e from 6 months to 2 years or so) to learn what you need before scarcity mindset kicks in. If you're comfortable living with less than a month of runway, you can get away with less.
  • This requires you to already be the sort of person who can do self-directed study with open ended, ambiguous goals.
  • This requires you to, in an important sense, know how to think.
  • This makes most sense if you’re not in the middle of plans that seem comparably important.
  • The core underlying idea is more like “it’s more important to invest in your ability to think, learn and do, than to donate your last spare dollar”, rather than the specific conclusion “quit your job to study full-time.”

But… the other part of it is simply...

If you actually think the world might be ending or forever changing in your lifetime – whether in ten years, or fifty…

_..._maybe you should be taking actions that feel extreme?

Even if you’re not interested in orienting your life around helping with x-risk – if you just want to not be blindsided by radical changes that may be coming,

Critch on AI

[This next section is a summary/paraphrase of a few conversations with Critch, written first-person from his perspective.]

We need more people who are able to think full-time about AI safety.

I’ve gotten the sense that you think of me like I'm in some special inner circle of "people working on x-risk". But honestly, I struggle desperately to be useful to people like Stuart Russell who are actually in the inner circle of the world stage, who get to talk to government and industry leaders regularly.

Hierarchies are how you get things done for real.

Humans are limited in their time/attention. We need people focusing on specific problems, reporting up to people who are keeping an eye on the big picture. And we need those people reporting up to people keeping their eye on the bigger picture.

Right now our ability to grow the hierarchy is crippled - it can only be a couple layers deep, because there are few people who a) have their shit together, and b) understand both the technical theories/math/machine-learning and how inter-organizational politics works.

So people like me can't just hand complicated assignments off and trust they get done competently. Someone might understand the theory but not get the political nuances they need to do something useful with the theory. Or they get the political nuances, and maybe get the theory at-the-time, but aren't keeping up with the evolving technical landscape.

There are N things I'm working on right now that need to get done in the next 6 months, and I only really have the time to do M of them because there's no one else with the skills/credentials/network who can do it.

So the most important thing we need is more people putting enough time into making themselves useful.

I think that means focusing full-time.

We don't know exactly what will happen, but I expect serious changes of some sort over the next 10 years. Even if you aren't committing to saving the world, I think it's in your interest just to understand what is happening, so in a decade or two you aren't completely lost.

And even 'understanding the situation' is complicated enough that I think you need to be able to quit your day-job and focus full-time, in order to get oriented.

Deep Thinking

How much time have you spent just thinking about major problems?

There are increasing returns to deep, uninterrupted work. A half hour here and there is qualitatively different from spending a four-hour block of time, which is qualitatively different from focusing on a problem for an entire week.

A technique I use is spend escalating chunks of time figuring out how to spend escalating chunks of time thinking. Spend half an hour thinking “how useful would it be to spend four hours thinking about X?”

When you have a lot of distractions - including things like a day job, or worrying about running out of money, it can be very difficult to give important problems the attention they deserve.

If you’re currently a student, take advantage of the fact that you’re life is currently structured to focus on thinking.

Funding Individuals

I think funding individuals who don't have that runway would be a good thing for major donors to do. The problem is that it's moderately expensive - even a major donor can only afford to do it a few times. It's really hard to evaluate which individuals to prioritize (and if people know you’re thinking about it, they’ll show up trying to get your money, whether they’re good or not).

The good/bad news is that, because the whole world may be changing in some fashion soon, it's in an individual's direct interest to have thought about that a lot in advance.

So while a major-donor deciding to give some 2-3 years of runway to think would be risking a lot on a hard-to-evaluate person, an individual person who self-funds is more likely to get a lot of value regardless.

If you do know someone else you highly trust, it may be worth funding them directly.

Small Scale "Money into Time"

A lot of people have internalized a "be thrifty" mindset, which makes it harder to spend money to gain more time. There are a lot of options that might feel a bit extravagant. But right now it looks to me like we may only have 10 years left, and every opportunity to turn money into time is valuable. Examples:

  • Buying a larger monitor, iPad or even large pen-and-paper notebook so you have more "exo-brain" to think on. A human can only really keep seven things in their head at once, but having things written down externally makes it easier to keep track of more.
  • Paying for cabs that gives you space to think and write during travel time.
  • Paying for food delivery rather than making it or going out.
  • Paying for personal assistants who can do random odd-jobs for you. (Getting value out of this took a lot of practice – some things turned out to be hard to outsource, and managing people is a nuanced skill. But if you can put in the time experimenting, learning, and finding the right assistant, it’s very worthwhile)
  • Paying for a personal trainer to help you get better at exercise because it turns out exercise is pretty important overall.

What to Actually Read and Think About?

What to actually read is a hard question, since the landscape is changing fairly quickly, and most of the things worth reading aren’t optimized for easy learning, or figuring out if the thing is right-for-you.

But an underlying principle is to think about how minds work, and to study what’s happening in the world of AI development. If you don’t understand what’s going on in the world of AI development, figure out what background you need to learn in order to understand it.

[Edit: the goal here is not to be "become an AI researcher." The goal is to understand the landscape well enough that whatever you're doing, you're informed enough on it]

A lot of this is necessarily technical, which can be pretty intimidating if you haven’t been thinking of yourself as a technical person. You can bypass some of this by finding technically oriented people who seem to be able to make good predictions about the future, and relying on them to tell you how they expect the world to change. But that will limit how flexible a plan you’ll be able to create for yourself. (And again, this seems relevant whether your goal is “help with x-risk” or just “not have your life and career upended as things begin changing radically).

[Ray note: FWIW, I had acquired an image of myself as a “non-technical person”, averse to learning mathy stuff in domains I wasn’t already familiar with. I recently just… got over it, and started learning calculus, and actually enjoyed it and feel kinda lame about spending 10 years self-identifying in a way that prevented me from growing in that direction]

Rough notes on how to go about this:

  • If you can’t viscerally feel the difference between .1% and 1%, or a thousand and a million, you will probably need more of a statistics background to really understand things like “how much money is flowing into AI, and what is being accomplished, and what does it mean?”. A decent resources for this is Friedman Statistics Fourth Edition.
  • Calculus is pretty important background for understanding most technical work.
  • Mutivariable Calculus and Linear Algebra are important for understanding machine learning in particular.
  • Read the latest publications by DeepMind and OpenAI to have a sense of what progress is being made.

Remember as you’re learning all this to think about the fact that minds are made of matter, interacting. Statistics is the theory of aggregating information. You are a bunch of neurons aggregating information. Think about what that means, as you learn the technical background on what the latest machine learning is doing.

Planning, Thinking, Feedback, Doing

A different tack is, rather than simply catching up on reading, to practice formulating, getting feedback on, and executing plans.

A general strategy I find useful is to write up plans on google docs, making your thought process explicit. Google docs are easy to share, optimal for people to provide both in-line comments as well as suggesting major revisions.

If you can write up a plan, get feedback from 2-4 people who are representative of different thought processes, who all agree that your plan makes sense, that’s evidence that you’re got something worth doing.

Whereas if you just keep your plan in your head, you may run into a few issues:

  1. You only have so much working memory. Writing it down lets you make sure you can see all of your assumptions at once. You can catch obvious errors. You can build more complex models.
  2. You may have major blindspots. Getting feedback from multiple people with different outlooks helps ensure that you’re not running off majorly wrong models.
  3. The process of finding people to give feedback is an important skill that will be relevant towards executing plans that matter. Getting the buy-in from people to seriously review an idea can be hard. Buy-in towards actually executing a plan can be harder.

One of our limiting capabilities here is forming plans that people in multiple organizations with different goals are able to collaborate on. An early step for this is being aware of how people from different organizations think.

An important consideration is which people to get feedback from. The people you are most familiar with at each organization are probably the people who are most busy. Depending on your current network, some good practice is to start with people in your social circle who seem generally smart, then reach out to people at different organizations who aren’t the primary spokesperson or research heads.

Final Note on Marathons

(Speaking now as Raemon, again)

I've talked a lot lately about burning out, making sure you have enough slack. In the past, I was the sort of person who said "OMG the world is burning" and then became increasingly miserable for 3 years, and I've seen other people do the same.

Ten to twenty year timelines are quite scary. You should be more concretely worried than you were before. In the terms of a strategy game, we're transitioning from the mid-game to the late game.

But ten or twenty years is still a marathon, not a sprint. We're trying to maximize the distance covered in the next decade or two, not race as fast as we can for the next 6 months and then collapse in a heap. There may come a time when we're racing to the finish and it's worth employing strategies that are not long-term sustainable, but we are not at that point.

You know better than I what your own psychological, physical and financial situation is, and what is appropriate given that situation.

There's room to argue about the exact timelines. Smart people I know seem to agree there's a reasonable chance of AGI in ten years, but disagree on whether that's "likely" or just "possible."

But it is significant that we are definitively in a marathon now, as opposed to some people hanging out in a park arguing over whether a race even exists.

Wherever you are currently at, I recommend:

...if you haven’t acquired the general ability to do things on purpose, or think about things on purpose, figure out how to do that. If you haven’t spent 3 hours trying to understand and solve any complex problem, try that, on whatever problem seems most near/real to you.

...if you haven’t spent 3 hours thinking about AI in particular, and things that need doing, and skills that you have (or could learn), and plans you might enact… consider carving out those 3 hours.

If you haven’t carved out a full weekend to do deep thinking about it, maybe try that.

And if you’ve done all that, figure out how to rearrange your life to regularly give yourself large chunks of time to think and make plans. This may take the form of saving a lot of money and quitting your job for awhile to orient. It may take the form of building social capital at your job so that you can periodically take weeks off to think and learn. It may take the form of getting a job where thinking about the future is somehow built naturally into your workflow.

Whatever your situation, take time to process that the future is coming, in one shape or another, and this should probably output some kind of decisions that are not business as usual.


Further Reading:

Deliberate Grad School

Bibliography for the Berkeley Center for Human Compatible AI