Note: Serious discussion of end-of-the-world and what to do given limited info. Scrupulosity triggers, etc.

Epistemic Status: The first half of this post is summarizing my own views. I think I phrase each sentence about as strongly as I feel it. (When I include a caveat, please take the caveat seriously)

Much of this blogpost is summarizing opinions of Andrew Critch, founder of the Berkeley Existential Risk Initiative, who said them confidently. Some of them I feel comfortable defending explicitly, others I think are important to think seriously about but don’t necessarily endorse myself.

Critch is pretty busy these days and probably won’t have time to clarify things, so I’m trying to err on the side of presenting his opinions cautiously.

Table of Contents:

  • Core Claims
  • My Rough AGI Timelines
  • Conversations with Critch
    • Hierarchies
    • Deep Thinking
    • Turning Money into Time
    • Things worth learning
    • Planning, Thinking, Feedback, Doing
  • A Final Note on Marathons

Summary of Claims

Between 2016 and 2018, my AI timelines have gone from "I dunno, anywhere from 25-100 years in the future?" to "ten years seems plausible, and twenty years seems quite possible, and thirty years seems quite likely."

My exact thoughts are still imprecise, but I feel confident that:

Claim 1: Whatever your estimates two years ago for AGI timelines, they should probably be shorter and more explicit this year.

Claim 2: Relatedly, if you’ve been waiting for concrete things to happen for you to get worried enough to take AGI x-risk seriously, that time has come. Whatever your timelines currently are, they should probably be influencing your decisions in ways more specific than periodically saying “Well, this sounds concerning.”

Claim 3: Donating money is helpful (I endorse Zvi’s endorsement of MIRI and I think Lark’s survey of the overall organizational landscape is great), but honestly, we really need people who are invested in becoming useful for making sure the future is okay.

What this might mean depends on where you are. It took me 5 years to transition into being the sort of person able to consider this seriously. It was very important, for the first couple of those years, for me to be able to think about the questions without pressure or a sense of obligation.

I still don’t see any of this as an obligation - just as the obviously-right-thing-to-do if you’re someone like me. But I wish I’d been able to make the transition faster. Depending on where you currently are, this might mean:

  1. Get your shit together, in general. Become the sort of person who can do things on purpose, if you aren’t already.
  2. Develop your ability to think – such that if you spent an additional hour thinking about a problem, you tend to become less confused about that problem, rather than overwhelmed or running in circles.
  3. Get your financial shit together. (Get enough stability and runway that you can afford to take time off to think, and/or to spend money on things that improve your ability to think and act)
  4. Arranging your life such that you are able to learn about the state of AI development. There are reasons to do this both so that you find things to do to help, and so that you are just individually prepared for what’s coming, whatever it is.

I think getting a handle on what is happening in the world, and what to do about it, requires more time than “occasionally, in your off hours.”

Serious thinking requires serious effort and deep work.

I have some thoughts about “what to do, depending on your current life circumstances”, but first want to drill down into why I’m concerned.

Rough Timelines

My current AI timelines aren’t very rigorous. A somewhat embarrassing chunk of my “ten to twenty years” numbers come from “this is what smart people I respect seem to think”, and there’s a decent chance those smart people are undergoing a runaway information cascade.

But I still think there’s enough new information to make some real updates. My views are roughly the aggregate of the following:

My vague impression from reading informal but high-level discussions (among people of different paradigms) is that the conversation has shifted from “is AI even a thing to be concerned about?” to “what are the specific ways to approach safety in light of the progress we’ve seen with AlphaGo Zero?”

The main thrust of my belief is that current neural nets, while probably still a few insights away from true AGI...

  • Seem to be able to do a wide variety of tasks, using principles similar to how human brains actually do stuff.
  • Seem to be progressing faster at playing games than most people expected
  • We’re continuing to develop hardware (i.e. TPUs) optimized for AI that may make it easier to accomplish things via brute force even if we don’t have the best algorithms yet.
  • Upon hitting milestones, we seem to quickly go from “less good than a human” at a task to “superhuman” in a matter of months, and this is before recursive self improvement enters the picture. (see Superintelligence FAQ)

Meanwhile, my takeaway from Katja Grace survey of actual AI researchers is that industry professionals saying “it’s at least decades away” don’t really have a model at all, since different framings of the question yield very different average responses.

Sarah Constantin offers the strongest argument I’ve seen that AGI isn’t imminent - that all the progress we’ve seen (even ability to solve arbitrary arcade games) still doesn’t seem to indicate AI that understand concepts and can think deliberately about them. A key piece of general intelligence is missing.

She also argues that progress has been fairly linear, even as we have incorporated deep learning. “Performance trends in AI” was written a year ago and I’m not sure how much AlphaGo Zero would change her opinion.

But this isn’t that reassuring to me. At best, this seems to point towards a moderate takeoff that begins in earnest a couple decades from now. That still suggests radical changes to the world within our lifetimes, moving faster than I expect to be able to recover from mistakes.

Meanwhile, AlphaGo Zero isn’t overwhelming evidence to the contrary, but it seemed at least like a concrete, significant bit of evidence that progress can be faster, simpler, and more surprising.

From Metaphors to Evidence

When I first read the sequences, the arguments for fast takeoff seemed reasonable, but they also sounded like the sort of clever-sounding things a smart person could say to make anything sound plausible.

By now, I think enough concrete evidence has piled up that we’re way beyond “Is an Earth full of Einsteins a reasonable metaphor?” style debates, and squarely into “the actual things happening in the real world seem to firmly point towards sooner and faster rather than later.

When you factor in that companies don’t share everything they’re working on, and that we should expect DeepMind et al to have some projects we don’t even know about yet (and that the last couple of their announcements surprised many people), it seems that should further push probability mass towards more-progress-than-we-intuitively-expect.

If you aren’t currently persuaded on AI timelines being short or that you should change your behavior, that’s fine. This isn’t meant to be a comprehensive argument.

But, if you believe that 10-20 year AI timelines are plausible, and you’re psychologically, financially, and operationally ready to take that seriously, I think it’s a good time to kick your “take-seriously-o-meter” up a few notches.

If you’re close but not quite ready to take AI fully seriously, I think this is a useful set of things to start thinking about now, so that in a year or two when you’re more capable or the timing is better, the transition will be easier.

Conversations with Critch

I've had a few recent conversations with Andrew Critch, who's been involved at MIRI and currently helps run the Center for Human Compatible AI (CHAI) and the Berkeley Existential Risk Initiative (BERI).

In past conversations with Critch, I’ve frequently run into the pattern:

  • Critch: says thing that sounds ridiculous
  • Me: “That’s ridiculous”
  • *argues for an hour or two*
  • Me: “Huh, okay, I guess that does make sense.”

This has happened enough that I’ve learned to give him the benefit of the doubt. It’s usually a straightforward case of inferential distance, occasionally due to different goals. Usually there are caveats that make the ridiculous thing make more sense in context.

I mention this because when I asked him what advice he'd give to people looking to donate to x-risk or AI safety, he said something to the effect of:

“If you have three years of runway saved up, quit your job and use the money to fund yourself. Study the AI landscape full-time. Figure out what to do. Do it.”

This felt a little extreme.

Part of this extremity is due to various caveats:

  • “Three years of runway” means comfortable runway, not “you can technically live off of ramen noodles” runway.
  • [Edit for clarity] The goal is not to quit your job for three years – the goal is to have as much time as you need (i.e from 6 months to 2 years or so) to learn what you need before scarcity mindset kicks in. If you're comfortable living with less than a month of runway, you can get away with less.
  • This requires you to already be the sort of person who can do self-directed study with open ended, ambiguous goals.
  • This requires you to, in an important sense, know how to think.
  • This makes most sense if you’re not in the middle of plans that seem comparably important.
  • The core underlying idea is more like “it’s more important to invest in your ability to think, learn and do, than to donate your last spare dollar”, rather than the specific conclusion “quit your job to study full-time.”

But… the other part of it is simply...

If you actually think the world might be ending or forever changing in your lifetime – whether in ten years, or fifty…

...maybe you should be taking actions that feel extreme?

Even if you’re not interested in orienting your life around helping with x-risk – if you just want to not be blindsided by radical changes that may be coming,

Critch on AI

[This next section is a summary/paraphrase of a few conversations with Critch, written first-person from his perspective.]

We need more people who are able to think full-time about AI safety.

I’ve gotten the sense that you think of me like I'm in some special inner circle of "people working on x-risk". But honestly, I struggle desperately to be useful to people like Stuart Russell who are actually in the inner circle of the world stage, who get to talk to government and industry leaders regularly.

Hierarchies are how you get things done for real.

Humans are limited in their time/attention. We need people focusing on specific problems, reporting up to people who are keeping an eye on the big picture. And we need those people reporting up to people keeping their eye on the bigger picture.

Right now our ability to grow the hierarchy is crippled - it can only be a couple layers deep, because there are few people who a) have their shit together, and b) understand both the technical theories/math/machine-learning and how inter-organizational politics works.

So people like me can't just hand complicated assignments off and trust they get done competently. Someone might understand the theory but not get the political nuances they need to do something useful with the theory. Or they get the political nuances, and maybe get the theory at-the-time, but aren't keeping up with the evolving technical landscape.

There are N things I'm working on right now that need to get done in the next 6 months, and I only really have the time to do M of them because there's no one else with the skills/credentials/network who can do it.

So the most important thing we need is more people putting enough time into making themselves useful.

I think that means focusing full-time.

We don't know exactly what will happen, but I expect serious changes of some sort over the next 10 years. Even if you aren't committing to saving the world, I think it's in your interest just to understand what is happening, so in a decade or two you aren't completely lost.

And even 'understanding the situation' is complicated enough that I think you need to be able to quit your day-job and focus full-time, in order to get oriented.

Deep Thinking

How much time have you spent just thinking about major problems?

There are increasing returns to deep, uninterrupted work. A half hour here and there is qualitatively different from spending a four-hour block of time, which is qualitatively different from focusing on a problem for an entire week.

A technique I use is spend escalating chunks of time figuring out how to spend escalating chunks of time thinking. Spend half an hour thinking “how useful would it be to spend four hours thinking about X?”

When you have a lot of distractions - including things like a day job, or worrying about running out of money, it can be very difficult to give important problems the attention they deserve.

If you’re currently a student, take advantage of the fact that your life is currently structured to focus on thinking.

Funding Individuals

I think funding individuals who don't have that runway would be a good thing for major donors to do. The problem is that it's moderately expensive - even a major donor can only afford to do it a few times. It's really hard to evaluate which individuals to prioritize (and if people know you’re thinking about it, they’ll show up trying to get your money, whether they’re good or not).

The good/bad news is that, because the whole world may be changing in some fashion soon, it's in an individual's direct interest to have thought about that a lot in advance.

So while a major-donor deciding to give some 2-3 years of runway to think would be risking a lot on a hard-to-evaluate person, an individual person who self-funds is more likely to get a lot of value regardless.

If you do have the money, and know someone else you highly trust, it may be worth funding them directly.

Small Scale "Money into Time"

A lot of people have internalized a "be thrifty" mindset, which makes it harder to spend money to gain more time. There are a lot of options that might feel a bit extravagant. But right now it looks to me like we may only have 10 years left, and every opportunity to turn money into time is valuable. Examples:

  • Buying a larger monitor, iPad or even large pen-and-paper notebook so you have more "exo-brain" to think on. A human can only really keep seven things in their head at once, but having things written down externally makes it easier to keep track of more.
  • Paying for cabs that gives you space to think and write during travel time.
  • Paying for food delivery rather than making it or going out.
  • Paying for personal assistants who can do random odd-jobs for you. (Getting value out of this took a lot of practice – some things turned out to be hard to outsource, and managing people is a nuanced skill. But if you can put in the time experimenting, learning, and finding the right assistant, it’s very worthwhile)
  • Paying for a personal trainer to help you get better at exercise because it turns out exercise is pretty important overall.

What to Actually Read and Think About?

What to actually read is a hard question, since the landscape is changing fairly quickly, and most of the things worth reading aren’t optimized for easy learning, or figuring out if the thing is right-for-you.

But an underlying principle is to think about how minds work, and to study what’s happening in the world of AI development. If you don’t understand what’s going on in the world of AI development, figure out what background you need to learn in order to understand it.

[Edit: the goal here is not to be "become an AI researcher." The goal is to understand the landscape well enough that whatever you're doing, you're informed enough on it]

A lot of this is necessarily technical, which can be pretty intimidating if you haven’t been thinking of yourself as a technical person. You can bypass some of this by finding technically oriented people who seem to be able to make good predictions about the future, and relying on them to tell you how they expect the world to change. But that will limit how flexible a plan you’ll be able to create for yourself. (And again, this seems relevant whether your goal is “help with x-risk” or just “not have your life and career upended as things begin changing radically).

[Ray note: FWIW, I had acquired an image of myself as a “non-technical person”, averse to learning mathy stuff in domains I wasn’t already familiar with. I recently just… got over it, and started learning calculus, and actually enjoyed it and feel kinda lame about spending 10 years self-identifying in a way that prevented me from growing in that direction]

Rough notes on how to go about this:

  • If you can’t viscerally feel the difference between .1% and 1%, or a thousand and a million, you will probably need more of a statistics background to really understand things like “how much money is flowing into AI, and what is being accomplished, and what does it mean?”. A decent resources for this is Friedman Statistics Fourth Edition.
  • Calculus is pretty important background for understanding most technical work.
  • Mutivariable Calculus and Linear Algebra are important for understanding machine learning in particular.
  • Read the latest publications by DeepMind and OpenAI to have a sense of what progress is being made.

Remember as you’re learning all this to think about the fact that minds are made of matter, interacting. Statistics is the theory of aggregating information. You are a bunch of neurons aggregating information. Think about what that means, as you learn the technical background on what the latest machine learning is doing.

Planning, Thinking, Feedback, Doing

A different tack is, rather than simply catching up on reading, to practice formulating, getting feedback on, and executing plans.

A general strategy I find useful is to write up plans on google docs, making your thought process explicit. Google docs are easy to share, optimal for people to provide both in-line comments as well as suggesting major revisions.

If you can write up a plan, get feedback from 2-4 people who are representative of different thought processes, who all agree that your plan makes sense, that’s evidence that you’re got something worth doing.

Whereas if you just keep your plan in your head, you may run into a few issues:

  1. You only have so much working memory. Writing it down lets you make sure you can see all of your assumptions at once. You can catch obvious errors. You can build more complex models.
  2. You may have major blindspots. Getting feedback from multiple people with different outlooks helps ensure that you’re not running off majorly wrong models.
  3. The process of finding people to give feedback is an important skill that will be relevant towards executing plans that matter. Getting the buy-in from people to seriously review an idea can be hard. Buy-in towards actually executing a plan can be harder.

One of our limiting capabilities here is forming plans that people in multiple organizations with different goals are able to collaborate on. An early step for this is being aware of how people from different organizations think.

An important consideration is which people to get feedback from. The people you are most familiar with at each organization are probably the people who are most busy. Depending on your current network, some good practice is to start with people in your social circle who seem generally smart, then reach out to people at different organizations who aren’t the primary spokesperson or research heads.

Final Note on Marathons

(Speaking now as Raemon, again)

I've talked a lot lately about burning out, making sure you have enough slack. In the past, I was the sort of person who said "OMG the world is burning" and then became increasingly miserable for 3 years, and I've seen other people do the same.

Ten to twenty year timelines are quite scary. You should be more concretely worried than you were before. In the terms of a strategy game, we're transitioning from the mid-game to the late game.

But ten or twenty years is still a marathon, not a sprint. We're trying to maximize the distance covered in the next decade or two, not race as fast as we can for the next 6 months and then collapse in a heap. There may come a time when we're racing to the finish and it's worth employing strategies that are not long-term sustainable, but we are not at that point.

You know better than I what your own psychological, physical and financial situation is, and what is appropriate given that situation.

There's room to argue about the exact timelines. Smart people I know seem to agree there's a reasonable chance of AGI in ten years, but disagree on whether that's "likely" or just "possible."

But it is significant that we are definitively in a marathon now, as opposed to some people hanging out in a park arguing over whether a race even exists.

Wherever you are currently at, I recommend:

...if you haven’t acquired the general ability to do things on purpose, or think about things on purpose, figure out how to do that. If you haven’t spent 3 hours trying to understand and solve any complex problem, try that, on whatever problem seems most near/real to you.

...if you haven’t spent 3 hours thinking about AI in particular, and things that need doing, and skills that you have (or could learn), and plans you might enact… consider carving out those 3 hours.

If you haven’t carved out a full weekend to do deep thinking about it, maybe try that.

And if you’ve done all that, figure out how to rearrange your life to regularly give yourself large chunks of time to think and make plans. This may take the form of saving a lot of money and quitting your job for awhile to orient. It may take the form of building social capital at your job so that you can periodically take weeks off to think and learn. It may take the form of getting a job where thinking about the future is somehow built naturally into your workflow.

Whatever your situation, take time to process that the future is coming, in one shape or another, and this should probably output some kind of decisions that are not business as usual.

Further Reading:

Deliberate Grad School

Bibliography for the Berkeley Center for Human Compatible AI


New Comment
68 comments, sorted by Click to highlight new comments since: Today at 6:24 PM

This post was an experiment in "making it easier to write down conversations."

One of the primary problems LessWrong is trying to solve (but can't just be solved via technical solutions) is getting ideas out from inside the heads of people who think seriously about stuff (but tend to communicate only in high-context, high-trust environments), into the public sphere where people can benefit from it.

Obstacles to this include:

  • Being too busy
  • Feeling like it'd take too long to polish an entire into something worth posting
  • Ideas being too sensitive (i.e. based on secrets or pseudo secrets, or wanting to avoid attracting journalists, etc)
  • Not wanting to deal with commenters who don't have the context.

"Ideas being too sensitive" may not have a good solution for public, permanent discourse, but "being too busy", and the related "no time to polish it" both seem like things that could be solved if the less busy participant in a conversation takes the time to write it up. (This only works when the other things the less-busy person could do are less valuable than writing up the post)

A dynamic that Ben, Oli and I have been discussing and hoping to bring about more is separating out the "idea generating" and "writing" roles, since idea generation is a more scarce resource.

A side effect of this is that, since Critch is busy off founding new x-risk mitigation organizations, he probably won't be able to clarify points that come up in the comments here, but I think it's still more valuable to have the ideas out there. (Both for the object level value of those ideas, and to help shift norms such that this sort of thing happens more often)

I'd love to see more conversations being written up, especially as someone who is outside of the Bay Area and so who is probably missing out on many ideas that are in wide circulation there. I wonder if there are any Less Wrongers with journalism training who would take up this role.

Future Perfect is sorta this, but probably too mainstream / high-powered for the use-case you have in mind.

... and it looks like they're hiring a writer!

“If you have three years of runway saved up, quit your job and use the money to fund yourself. Study the AI landscape full-time. Figure out what to do. Do it.”

In an important way, saying this is more honest than asking for funding--like, it's harder for the incentives of someone saying this to line up perversely. I'd basically say the same thing, but add in "noticing all the incentives you have to believe certain things" and "engineered pandemics" along with "the AI landscape", because that's just my take.

The one thing I have to wonder about, is if doing this on your own helps you get it right. Like, there's a cadre of depressed rationalists in Berkeley who are trying, in theory, to do this for themselves. It can't be helping that there's social approval to be had for doing this, because that's just a recipe (because of incentives) for people incorporating "I care about AI risk and am doing relevant things" into their narrative and self-image. If your self-esteem is tied into doing things that help with AI risk, then I empathize with you pretty hard, because everything that feels like a failure is going to hurt you, both emotionally and productivity-wise.

Grad students have a similar thing, where narrative investment into being good at research burns people out. Even productive grad students who have been having a few bad months. But if your social group is mostly other grad students who also think that being good at research is what makes one good and praiseworthy, then of course you'd have part of your self image invested into being good at research.

It'd be hard for a group of grad students to all simultaneously switch to not being emotionally invested in how good everyone else was at research. I'd say the same is true for AI-risk-oriented groups of rationalists who live near each other.

That's why I say it's best to study the landscape on your own. With geographic distance from others, even. Keep track of the work everyone else is doing, but keep your social group separate--if you choose your friends correctly, your self-esteem can be grounded in something more durable than your performance.

Whatever your estimates two years ago for AGI timelines, they should probably be shorter and more explicit this year.

Should they? (the rest of this comment is me thinking out loud)

AlphaGo and breakthroughs in the Atari domain were reasonable things to update on, but those were 2015 and early 2016 so around two years ago. Thinking about progress since then, GANs have done interesting stuff and there's been progress especially in image recognition and generation; but many of the results seem more like incremental than qualitative progress (the DeepDream stuff in 2015 already caused me to guess that image generation stuff would be on the horizon).

Intuitively, it does feel like AI results have been coming out faster recently, so in that sense there might be reason to update somewhat in the direction of shorter timelines - it shows that the qualitative breakthroughs of the earlier years could be successfully built on. But off the top of my head, it's not clear to me that anything would obviously contradict a model of "we're seeing another temporary AI boom enabled by new discoveries, which will run out of steam once the newest low-hanging fruit get picked" - a model which one might also have predicted two years ago.

While I'm seeing the incremental progress proceeding faster now than I probably would have predicted earlier, it mostly seems within the boundaries of the deep learning paradigm as implied by the 2015-early 2016 state of the art. So it feels like we may end up running against the limitations of the current paradigm (which people are already writing papers about) faster than expected, but there isn't any indication that we would get past those limitations any faster. In 2015/e2016 there seemed to be rough agreement among experts that deep learning was enabling us to implement decades-old ideas because we finally had the hardware for it, but that there hadn't been any new ideas nor real progress in understanding intelligence; I'm under the impression that this still mostly reflects the current consensus.

One exception to the "looks like mostly incremental progress" is Google Neural Machine Translation (November 2016, which we should probably count as less than two years) which I wouldn't have predicted based on the earlier stuff.

On the other hand, one could make the argument that this wave of AI is going to boost economic growth and science; if e.g. various fields of science end up incorporating more AI techniques and accelerate as a result, then that could end up feeding back into AI and accelerating it further. In particular, advances in something like neuroscience could accelerate timelines, and deep learning is indeed being applied to stuff like neuroimaging.

Overall, it does feel to me like a reasonable claim that we should expect somewhat shorter AGI timelines now than two years ago, with most of the update coming from AI boosting science and the economy; but I worry that this feels more like an intuition driven by the ease of having found a plausible-sounding story ("AI will boost science in general and some of that progress will come back to boost AGI development") rather than any particularly rigorous evidence.

My short answer is that I think you're right enough here that I should probably walk back my claim somewhat, or at least justify it better than it currently is. (I.e. I notice that I have a hard time answering this in a way I feel confident and good about)

The mechanism by which I updated wasn't about AI boosting science and economy. It's more like:

Prior to the past few years, my understanding of how AI's might behave was almost entirely theoretical. In the absence of being able to do empiricism, working with theory is important. But by now we've seen things that before we could only theorize about.

I think my "2 years" remark is maybe framed slightly better as 3 years. In the past 3 years, we've seen 3 milestones that struck me as significant:

  • Deepmind creating a thing that can solve arbitrary atari games given pixel input. (I think this was 3 or 4 years ago, and was a significant update for me about agent-like-things being able to interact with the world)
  • AlphaGo (which came sooner than people were expecting, even after the Atari stuff)
  • AlphaGoZero (which my impression was also sooner than people expected, even after AlphaGo, and where the improvements came from simplifying the architecture)

I do think I was overfixated on AlphaGo in particular. While writing the OP, rereading Sarah's posts that emphasize the other domains where progress isn't so incredibly did slightly reverse my "oh god timelines are short" belief.

But Sarah's posts still note that we've been seeing improvements in gameplay in particular, which seems like the domain most relevant to AGI, even if the mechanism is "deep learning allows us to better leverage hardware improvements."

"On the other hand, one could make the argument that this wave of AI is going to boost economic growth and science" - One can make a much more direct argument than this. The rate of incremental progress is important because that determines the amount of money flowing into the field and the amount of programmers studying AI. Now that the scope of tasks solvable by AI has increased vastly, the size of the field has been permanently raised and this increases the chance that innovations in general will occur. Further, there has been an increase in optimism about the power of AI which encourages people to be more ambitious.

"AI" may be too broad of a category, though. As an analogy, consider that there is currently a huge demand for programmers who do all kind of website development, but as far as I know, this hasn't translated into an increased number of academics studying - say - models of computation, even though both arguably fall under "computer science".

Similarly, the current wave of AI may get us a lot of people into doing deep learning and building machine learning models for specific customer applications, without increasing the number of people working on AGI much.

It's true that there is now more excitement for AI, including more excitement for AGI. On the other hand, more excitement followed by disillusionment has previously led to AI winters.

I'll just take this opportunity to say again that if you have ideas about AI alignment, our contest accepts entries of any size and skill level, and we discuss and send feedback to everyone. We don't expect you to solve the whole problem, every bit of incremental progress is welcome.

Thank you for doing this, and for giving feedback to all submissions!

As 2018 began, I started thinking about what I should do if I personally take AI seriously. So your post is timely for me. I've spent the last couple weeks figuring out how to catch up on the current state of AI development.

What I should do next is still pretty muddy. Or scary.

I have a computer engineering degree and have been a working software developer for several years. I do consider myself a "technical person," but I haven't focused on AI before now. I think I could potentially contribute to AI safety research. If I spend some time studying first. I'm not caught up on the technical skills these research guides point to:

But I'm also not intimidated by the topics or the prospect of a ton of self-directed study. Self-directed study is my fun. I've already started on some of the materials.

The scary stuff is:

I could lose myself for years studying everything in those guides.

I have no network of people to bounce any ideas or plans off of.

  • I live in the bible belt, and my day-to-day interactions are completely devoid of anyone who would take any of this seriously.
  • People in the online community (rationality or AI Safety) don't know I exist, and I'm concerned that spending a lot of time getting noticed is a status game and time sink that doesn't help me learn about AI as fast as possible.

There's also a big step of actually reaching out to people in the field. I don't know how to know when I'm ready or qualified. Or if it's remotely worth contacting people sooner than later because I'm prone to anxious underconfidence, and I could at least let people know I exist, even if I doubt I'm impressive.

I do feel like one of these specialty CFAR workshops would be a wonderful kick-start, but none are yet listed for 2018.

[Context: I'm Matthew Graves, and currently handle a lot of MIRI's recruiting, but this is not MIRI's official view.]

We're hiring engineers to help with our research program, which doesn't require extensive familiarity with AI alignment research.

When reading through research guides, it's better to take a breadth-first approach where you only go deep on the things that are interesting to you, and not worry too much about consuming all of it before you start talking with people about it. Like with software projects, it's often better to find some open problem that's interesting and then learn the tools you need to tackle that problem, rather than trying to build a general-purpose toolkit and then find a problem to tackle.

There are some online forums where you can bounce ideas and plans off of; LW is historically a decent place for this, as are Facebook groups like AI Safety Open Discussion. I expect there to be more specialty CFAR workshops this year, but encourage you to get started on stuff now rather than waiting for one. There's also people like me at MIRI and other orgs who field these sorts of questions. I encourage you to contact us too early instead of too late; the worst-case scenario is that we give you a stock email with links we think will be helpful rather than you eat a huge amount of our time. (For getting detailed reactions to plans, I suspect posting to a group where several different people might have the time to respond to it soon would work better than emailing only one person about it, and it's considerably easier to answer specific questions than it is to give a general reaction to a plan.)


I think you're right in that getting additional feedback (bouncing stuff of) is good.

Unfortunately, my rough sense right now is that things are geographically constrained (e.g. there's stuff happening at CHAI in Berkeley, FHI in Oxford, and DeepMind in London, but not a lot of concentrated work elsewhere.) If you're in the bible belt, my guess is Roman Yampolskiy is probably the closest (maybe?) person who's doing lots of stuff in the field.

Speaking from my experience with CFAR (and not in any official capacity whatsoever), I think the AI Fellows tends to be held once a year in the summer/fall (although this might change w/ add'l funding), so that's maybe also a ways off.

I'd encourage you, though, to reach out to people sooner than later, as you mention. It's been my experience that people are helpful when you reach out, if you're genuine about this stuff.

I haven’t reached out to anyone yet, primarily because I imagined that they (Luke, Eliezer, etc) receive many of these kinds of "I’m super excited to help, what can I do?" emails and pattern-matched that onto "annoying person who didn’t read the syllabus". What has your experience been?

(I live in Oregon)

(This is part of what I was going for with the "find a person at a company who's NOT the highest profile person to get feedback from.")

Also, one person said to me "I'm generally quite happy to answer succinct, clear questions like 'I'm considering whether to do X. I've thought of considerations N, M, and O. I'm wondering if consideration W is relevant?'"

As opposed to "Hey, what do you think of OpenAI/DeepMind/MIRI?" or other vague (and 'can't quite tell if this is an undercover journalist')" style questions.

I'm very glad that this was written up. My timelines are also about this short and I've been working on taking them increasingly seriously.

If you're committed to studying AI safety but have little money, here are two projects you can join (do feel free to add other suggestions):

1) If you want to join a beginners or advanced study group on reinforcement learning, post here in the RAISE group.

2) If you want to write research in a group, apply for the AI Safety Camp in Gran Canaria on 12-22 April.

Curious to know more about these (from the outside it's hard to tell if this is a good place to endorse people checking out. At risk of being Pat Modesto-like, who's running them and what's their background? And hopefully being not-Pat-Modesto-like, what material do these groups cover, and what is the thought process behind their approach?)

Great, let me throw together a reply to your questions in reverse order. I've had a long day and lack the energy to do the rigorous, concise write-up that I'd want to do. But please comment with specific questions/criticisms that I can look into later.

What is the thought process behind their approach?

RAISE (copy-paste from slightly-promotional-looking wiki):

AI safety is a small field. It has only about 50 researchers. The field is mostly talent-constrained. Given the dangers of an uncontrolled intelligence explosion, increasing the amount of AIS researchers is crucial for the long-term survival of humanity.

Within the LW community there are plenty of talented people that bear a sense of urgency about AI. They are willing to switch careers to doing research, but they are unable to get there. This is understandable: the path up to research-level understanding is lonely, arduous, long, and uncertain. It is like a pilgrimage. One has to study concepts from the papers in which they first appeared. This is not easy. Such papers are undistilled. Unless one is lucky, there is no one to provide guidance and answer questions. Then should one come out on top, there is no guarantee that the quality of their work will be sufficient for a paycheck or a useful contribution.

The field of AI safety is in an innovator phase. Innovators are highly risk-tolerant and have a large amount of agency, which allows them to survive an environment with little guidance or supporting infrastructure. Let community organisers not fall for the typical mind fallacy, expecting risk-averse people to move into AI safety all by themselves. Unless one is particularly risk-tolerant or has a perfect safety net, they will not be able to fully take the plunge. Plenty of measures can be made to make getting into AI safety more like an "It's a small world"-ride:

  • Let there be a tested path with signposts along the way to make progress clear and measurable.
  • Let there be social reinforcement so that we are not hindered but helped by our instinct for conformity.
  • Let there be high-quality explanations of the material to speed up and ease the learning process, so that it is cheap.

AI Safety Camp (copy-paste from our proposal, which will be posted on LW soon):

Aim: Efficiently launch aspiring AI safety and strategy researchers into concrete productivity by creating an ‘on-ramp’ for future researchers.


  1. Get people started on and immersed into concrete research work intended to lead to papers for publication.
  2. Address the bottleneck in AI safety/strategy of few experts being available to train or organize aspiring researchers by efficiently using expert time.
  3. Create a clear path from ‘interested/concerned’ to ‘active researcher’.
  4. Test a new method for bootstrapping talent-constrained research fields.

Method: Run an online research group culminating in a two week intensive in-person research camp.

(our plans is test our approach in Gran Canaria on 12 April, for which we're taking in applications right now, and based on our refinements, organise a July camp at the planned EA Hotel in the UK)

What material do these groups cover?

RAISE (from the top of my head)

The study group has finished writing video scripts on the first corrigibility unit for the online course. It has now split into two to work on the second unit:

  1. group A is learning about reinforcement learning using this book
  2. group B is writing video scripts on inverse reinforcement learning

Robert Miles is also starting to make the first video of the first corrigibility unit (we've allowed ourselves to get delayed too much in actually publishing and testing material IMO). Past videos we've experimented with include a lecture by Johannes Treutin from FRI and Rupert McCallum giving lectures on corrigibility.

AI Safety Camp (copy-paste from proposal)

Participants will work in groups on tightly-defined research projects on the following topics:

  • Agent foundations
  • Machine learning safety
  • Policy & strategy
  • Human values

Projects will be proposed by participants prior to the start of the program. Expert advisors from AI Safety/Strategy organisations will help refine them into proposals that are tractable, suitable for this research environment, and answer currently unsolved research questions. This allows for time-efficient use of advisors’ domain knowledge and research experience, and ensures that research is well-aligned with current priorities.

Participants will then split into groups to work on these research questions in online collaborative groups over a period of several months. This period will culminate in a two week in-person research camp aimed at turning this exploratory research into first drafts of publishable research papers. This will also allow for cross-disciplinary conversations and community building. Following the two week camp, advisors will give feedback on manuscripts, guiding first drafts towards completion and advising on next steps for researchers.

Who's running them and what's their background?

Our two core teams mostly consist of young European researchers/autodidacts who haven't published much on AI safety yet (which does risk us not knowing enough about the outcomes we're trying to design for others).

RAISE (from the top of my head):

Toon Alfrink (founder, coordinator): AI bachelor student, also organises LessWrong meetups in Amsterdam.

Robert Miles (video maker): Runs a relatively well-known YouTube channel advocating careully for AI safety.

Veerle de Goederen (oversees preqs study group): Finished a Biology bachelor (and has been our most reliable team member)

Johannes Heidecke (oversees the advanced study group): Master student, researching inverse reinforcement learning in Spain.

Remmelt Ellen (planning coordinator): see below.

AI Safety Camp (copy-paste from proposal)

Remmelt Ellen Remmelt is the Operations Manager of Effective Altruism Netherlands, where he coordinates national events, supports organisers of new meetups and takes care of mundane admin work. He also oversees planning for the team at RAISE, an online AI Safety course. He is a Bachelor intern at the Intelligent & Autonomous Systems research group. In his spare time, he’s exploring how to improve the interactions within multi-layered networks of agents to reach shared goals – especially approaches to collaboration within the EA community and the representation of persons and interest groups by negotiation agents in sub-exponential takeoff scenarios.

Tom McGrath Tom is a maths PhD student in the Systems and Signals group at Imperial College, where he works on statistical models of animal behaviour and physical models of inference. He will be interning at the Future of Humanity Institute from Jan 2018, working with Owain Evans. His previous organisational experience includes co-running Imperial’s Maths Helpdesk and running a postgraduate deep learning study group.

Linda Linsefors Linda has a PhD in theoretical physics, which she obtained at Université Grenoble Alpes for work on loop quantum gravity. Since then she has studied AI and AI Safety online for about a year. Linda is currently working at Integrated Science Lab in Umeå, Sweden, developing tools for analysing information flow in networks. She hopes to be able to work full time on AI Safety in the near future.

Nandi Schoots Nandi did a research master in pure mathematics and a minor in psychology at Leiden University. Her master was focused on algebraic geometry and her thesis was in category theory. Since graduating she has been steering her career in the direction of AI safety. She is currently employed as a data scientist in the Netherlands. In parallel to her work she is part of a study group on AI safety and involved with the reinforcement learning section of RAISE.

David Kristoffersson David has a background as R&D Project Manager at Ericsson where he led a project of 30 experienced software engineers developing many-core software development tools. He liaised with five internal stakeholder organisations, worked out strategy, made high-level technical decisions and coordinated a disparate set of subprojects spread over seven cities on two different continents. He has a further background as a Software Engineer and has a BS in Computer Engineering. In the past year, he has contracted for the Future of Humanity Institute, and has explored research projects in ML and AI strategy with FHI researchers.

Chris Pasek After graduating from mathematics and theoretical computer science, Chris ended up touring the world in search of meaning and self-improvement, and finally settled on working as a freelance researcher focused on AI alignment. Currently also running a rationalist shared housing project on the tropical island of Gran Canaria and continuing to look for ways to gradually self-modify in the direction of a superhuman FDT-consequentialist.

Mistake: I now realise that by not mentioning that I'm involved with both may resemble a conflict of interest – I had removed 'projects I'm involved with' from my earlier comment before posting it to keep it concise.

I think that we could increase the proportion of LWers actually doing something about this via positive social expectation: peer-centric goal-setting and feedback. Positive social expectation (as I've taken to calling it) is what happens when you agree to meet a friend at the gym at 5 - you're much more likely to honor a commitment to a friend than one to yourself. I founded a student group to this effect at my undergrad and am currently collaborating with my university (I highly recommend reading the writeup) to implement it on a larger scale.

Basically, we could have small groups of people checking in once a week for half an hour. Each person briefly summarizes their last week and what they want to do in the next week; others can share their suggestions. Everyone sets at least one habit goal (stop checking email more than once a day) and one performance goal (read x chapters of set theory, perhaps made possible by improved methodology suggested by more-/differently-experienced group members).

I believe that the approach has many advantages over having people self-start:

  • lowered psychological barrier to getting started on x-risk (all they have to do is join a group; they see other people who aren't (already) supergeniuses like Eliezer doing work, so they feel better about their own influence)
  • higher likelihood of avoiding time / understanding sinks (bad / unnecessary textbooks)
  • increased instrumental rationality
  • lower likelihood of burnout / negative affect spirals / unsustainable actions being taken
  • a good way to form friendships in the LW community
  • robust way to get important advice (not found in the Sequences) to newer people that may not be indexed under the keywords people initially think to search for

The downside is the small weekly time commitment.

I'll probably make a post on this soon, and perhaps even a sequence on Actually Getting Started (as I've been reorganizing my life with great success).

"Make a meetup" is indeed one of my favorite rationalsphere hammers. So I'm sympathetic to this approach. But I gradually experienced a few issues that make me skeptical about this:

1) Social commitment devices are very fragile. In my experience, as soon as one buddy doesn't show up to the gym once, it rapidly spirals into ineffectiveness. Building habits with an internal locus of control is very important to gaining real habits and skills.

I think the social commitment device can be useful to get started, but I think you should very rapidly try to evolve such that you don't need it

2) I think x-risk really desperately needs people who already have the "I can self-start on my own" and "I can think usefully for myself" properties.

The problem is that there is little that needs doing that can be done by people who don't already have those skills. The AI safety field keeps having people show up who say "I want to help", and then it turns out not to be that easy to help, so those people sort of shrug and go back to their lives.

And the issue is that the the people who are involved do need help, but it requires a lot of context, and giving people context requires a lot of time (i.e. have several lengthy conversations over several weeks, or working together on a project), and then that time is precious.

And if it then turns out that the person they're basically mentoring doesn't have the "I can self start, self motivate, and think for myself" properties, then the mentor hasn't gained an ally - they've gained a new obligation to take care of, or spend energy checking in on, or they just wasted their time.

I think a group like you describe could be useful, but there are lot of ways for it to be ineffectual if not done carefully. I may have more thoughts later.

One more thought -

The AI safety field keeps having people show up who say "I want to help", and then it turns out not to be that easy to help, so those people sort of shrug and go back to their lives.

I think this can be nearly completely solved by using a method detailed in Decisive - expectation-setting. I remember that employers found that warning potential employees of the difficulty and frustration involved with the job, retention skyrocketed. People (mostly) weren't being discouraged from the process, but having their expectations set properly actually made them not mind the experience.

I think the social commitment device can be useful to get started, but I think you should very rapidly try to evolve such that you don't need it

I agree. At uni, the idea is that it gets people into a framework where they're able to get started, even if they aren't self-starters. Here, one of the main benefits would be that people at various stages of the pipeline could share what worked and what didn't. For example, knowing that understanding one textbook is way easier if you've already learned a prereq is valuable information, and that doesn't seem to always be trivially-knowable ex ante. The onus is less on the social commitment and more on the "team of people working to learn AI Safety fundamentals".

I think x-risk really desperately needs people who already have the "I can self-start on my own" and "I can think usefully for myself" properties.

Agreed. I'm not looking to make the filter way easier to pass, but rather to encourage people to keep working. "I can self-start" is necessary, but I don't think we can expect everyone to be able to self-motivate indefinitely in the face of a large corpus of unfamiliar technical material. Sure, a good self-starter will reboot eventually, but it's better to have lightweight support structures that maintain a smooth rate of progress.

Additionally, my system 1 intuition is that there are people close to the self-starter threshold who are willing to work on safe AI, and that these people can be transformed into grittier, harder workers with the right structure. Maybe that's not even worth our time, but it's important to keep in mind the multiplicative benefits possible from actually getting more people involved. I could also be falling prey to the typical mind fallacy, as I only got serious when my worry overcame my doubts of being able to do anything.

And if it then turns out that the person they're basically mentoring doesn't have the "I can self start, self motivate, and think for myself" properties, then the mentor hasn't gained an ally - they've gained a new obligation to take care of, or spend energy checking in on, or they just wasted their time.

Perhaps a more beneficial structure than "one experienced person receives lots of obligations" could be "three pairs of people (all working on learning different areas of the syllabus at any given time) share insights they picked up in previous iterations". Working in pairs could spike efficiency relative to working alone due to each person making separate mistakes; together, they smooth over rough spots in their learning. I remember this problem being discussed in a post a few years back about how most of the poster's autodidact problems were due to trivial errors that weren't easily fixable by someone not familiar with that specific material.


FWIW, relatedly on the object-level, there's already a weekly AI safety reading group which people can join.

I think that we could increase the proportion of LWers actually doing something about this via public discussions on LW about AI related issues, new developments, new evidence, and posts offering the readers to think about certain challenges (even if they don't already think the challenges are critical). That makes sense right?

If I look for AI related posts over the last month, I see a few, but a lot of them are meta issues, and in general, I don't see anything that could force an unconvinced reader to update in either direction.

This is based on the premise that there are many people on LW who are familiar with the basic arguments, and would be able to engage in some meaningful work, but don't find the arguments all that convincing. Note, you need to convince people not only that AI risk exists (a trivial claim), but that it's more likely than, e.g. an asteroid impact (honestly, I don't think I've seen an argument in that spirit).

I think that could be a good idea, too. The concern is whether there is substantial meaningful (and non-technical) discussion left to be had. I haven't been on LW very long, but in my time here it has seemed like most people agree on FAI being super important. This topic seems (to me) to have already been well-discussed, but perhaps that was in part because I was searching out that content.

For many people, the importance hasn't reached knowing, on a gut level, that unfriendly AI can plausibly, and will probably (in longer timescales, at the very minimum), annihilate everything we care about, to a greater extent than even nuclear war. That is - if we don't act. That's a hell of a claim to believe, and it takes even more to be willing to do something.

That's a hell of a claim to believe

Yes, it is, and it should take a hell of an argument to make someone believe this. The fact that many people don't quite believe this (on a gut level, at least), suggests that there are still many arguments to be made (alternatively, it might not be true).

it has seemed like most people agree on FAI being super important.

There are many people here who believe it, who spend a lot of time thinking about it and who also happen to be active users in LW, which might skew your perception of the average user. I suspect that there are many people who find the arguments a little fishy, but don't quite know what's wrong with them. At the very least, there is me.

I'm also in this position at the moment, but in part due to this post I now plan to spend significant time (at least 5 days, I.e. 40 hrs, cumulatively) doing deep reflection on timelines this summer, with the goal of making my model detailed enough to make life decisions based on it. (Consider this my public precommitment to do so; I will at minimum post a confirmation of whether or not I have done so on my shortform feed by August 15th, and if it seems useful I may also post a writeup of my reflections).

Suppose the Manhattan project was currently in progress, meaning we somehow had the internet, mobile phones, etc. but not nuclear bombs. You are a smart physicist that keeps up with progress in many areas of physics and at some point you realize the possibility of a nuclear bomb. You also foresee the existential risk this poses.

You manage to convince a small group of people of this, but many people are skeptical and point out the technical hurdles that would need to be overcome, and political decisions that would need to be taken, for the existential risk to become reality. They think it will all blow over and work itself out. And most people fail to grasp enough of the details to have a rational opinion about the topic.

How would you (need to) go about convincing a sufficient amount of the right people that this development poses an existential risk?

Would you subsequently try to convince them we should preemptively push for treaties, and aggressive upholding of those treaties, to prevent the annihilation of the human species? How would you get them to cooperate? Would you try to convince them to put as much effort as possible into a Manhattan project to develop an FAI that can subsequently prevent any other AI from becoming powerful enough to threaten them? Another approach?

I’m probably treading well trodden ground, but it seems to me that knowledge about AI safety is not what matters. What matters is convincing enough sufficiently powerful people that we need such knowledge before AGI becomes reality. Which should result in regulating AI development or urgently pushing for obtaining knowledge on AI safety or ...

Without such people involved the net effect of the whole FAI community is a best effort skunkworks project attempting to uncover FAI knowledge, disseminate it as wide as possible and pray to god those first achieving AGI will actually make use of that knowledge. Or perhaps attempting to beat Google, the NSA or China to it. That seems like a hell of a gamble to me and although much more within the comfort zone of the community, vastly less likely to succeed than convincing Important People.

But I admit that I am clueless as to how that should be done. It’s just that it makes “set aside three years of your life to invest in AI safety research” ring pretty desperate and suboptimal to me.

But I admit that I am clueless as to how that should be done. It’s just that it makes “set aside three years of your life to invest in AI safety research” ring pretty desperate and suboptimal to me.

I think this sentence actually contains my own answer, basically. I didn't say "invest three years of your life in AI safety research." (I realize looking back that I didn't clearly *not* say that, so this misunderstanding is on me and I'll consider rewriting that section). What I meant to say was:

  • Get three years of runway (note: this does not mean you're quitting your job for three years, it means that you have 3 years of runway so you can quit your job for 1 or 2 years before starting to feel antsy about not having enough money)
  • Quit your job or arrange your life such that you have to time to think clearly
  • figure out what's going on (this involves keeping up on industry trends and understanding them well enough to know what they mean, keeping up on AI safety community discourse, following relevant bits of politics in both government, corporations, etc)
  • figure out what to do (including what skills you need to gain in order to be able to do it)
  • do it

i.e, the first step is to become not clueless. And then step 2 depends a lot on your existing skillset. I specifically am not saying to go into AI safety research (although I realize it may have looked that way). I'm asserting that some minimum threshold of technical literacy is necessary make serious contributions in any domain.

Do you want to persuade powerful people to help? You'll need to know what you're talking about.

Do you want to direct funding to the right places? You need to understand what's going on well enough to know what needs funding.

Do you want to just be a cog in an organization where you mostly just work like a normal person but are helping move progress forward? You'll need to know what's going on enough to pick an organization where you'll be a marginally beneficial cog.

The question isn't "what is the optimal thing for AI risk people collectively to do". It's "what is the optimal thing for you in particular to do, given that the AI risk community exist." In the past 10 years, the AI risk community has gone from a few online discussion groups to a collection of orgs that have millions of funding in current dollars; funders who have millions or billions more; as of this week, Henry Kissinger endorsing AI risk as important.

In that context, figuring out "what the best marginal contribution you personally can make to one of the most important problems humanity will face" is a difficult question.

The thesis of this post is that taking that question seriously requires a lot of time to think, and that because money is less of a limiting bottleneck now, you are more useful on the margin as a person who has carved out enough to time to think seriously than as an Earning-to-Give person.

If you're not saying to go into AI safety research, what non-business-as-usual course of action are you expecting? Is your premise that everyone taking this seriously should figure out their comparative advantage within an AI risk organization because they contain many non-researcher roles, or are you imagining some potential course of action outside of "Give your time/money to MIRI/HCAI/etc"?

Is your premise that everyone taking this seriously should figure out their comparative advantage within an AI risk organization because they contain many non-researcher roles

Yes, basically. One of the specific possibilities I alluded to was taking on managerial or entreprenerial roles, here:

So people like me can't just hand complicated assignments off and trust they get done competently. Someone might understand the theory but not get the political nuances they need to do something useful with the theory. Or they get the political nuances, and maybe get the theory at-the-time, but aren't keeping up with the evolving technical landscape.

The thesis of the post is intended to be 'donating to MIRI/CHAI etc is not the most useful thing you can be doing'

I'm curious as to whether or not the rationalsphere/AI risk community has ever experimented with hiring people to work on serious technical problems who aren't fully aligned with the values of the community or not fully invested in it already. It seems like ideological alignment is a major bottleneck to locating and attracting relevant skill levels and productivity levels, and there might be some benefit to being open about tradeoffs that favor skill and productivity at the expense of not being completely committed to solving AI risk.

I wrote a post with some reasons to be skeptical of this.

Not sure I agree. When I started working on this topic, I wasn't much invested in it, just wanted to get attention on LW. Wei has admitted to a similar motivation.

I think the right way to encourage work on AI alignment is by offering mainstream incentives (feedback, popularity, citations, money). You don't have to think of it as hiring, it can be lower pressure. Give them a grant, publish them in your journal, put them on the front page of your website, or just talk about their ideas. If they aren't thinking in the right direction, you have plenty of opportunity to set the terms of the game.

Not saying this is a panacea, but I feel like many people here focus too much on removing barriers and too little on creating incentives.

I can't remember the key bit of Ben's post (and wasn't able to find it quickly on skimming). But my hot take is:

It seems obviously net-positive for contributions on x-risk to be rewarded with status within the AI Safety community, but it's not obviously net-positive for those contributions to be rewarded with serious money or status in the broader world.

If the latter gets too large, then you start getting swarmed with people who want money and prestige but don't necessarily understand how to contibute, who are incentivized to degrade the signal of what's actually important.

To quote a conversation with habryka: there are two ways to make AI Safety prestigious. The first way is to make serious AI Safety work (i.e. solving the alignment problem) prestigious. The second is to change the definition of AI Safety to be something more obviously prestigious in the first place (which may get you things like 'solving problems with self-driving cars'). And the latter is often easier to do.

So, if you're making it easier for not-fully-aligned people to join the movement, motivated by prestige, they'll want to start changing the movement to make it easier for them to get ahead.

This isn't to say this obviously cashes out to "net-negative" either, just that it's something to be aware of.

Principal-agent problems certainly matter! But despite that, collaboration based on extrinsic rewards (instead of selfless agreement on every detail) has been a huge success story for mankind. Is our task unusually prone to principal-agent problems, compared to other tasks? In my experience, the opposite is true: AI alignment research is unusually easy to evaluate in detail, compared to checking the work of a contractor building your house or a programmer writing code for your company.

If the latter gets too large, then you start getting swarmed with people who want money and prestige but don't necessarily understand how to contibute, who are incentivized to degrade the signal of what's actually important.

During this decade the field of AI in general became one of the most prestigious and high-status academic fields to work in. But as far as I can tell, it hasn't slowed down the rate of progress in advancing AI capability. If anything, it has sped it up - by quite a bit. It's possible that a lot of newcomers to the field are largely driven by the prospect of status gain and money. And there are quite a few "AI" hype-driven startups that have popped up and seem doomed to fail, but despite this, it doesn't seem to be slowing the pace of the most productive research groups. Maybe the key here is that if you suddenly increase the prestige of a scientific field by a dramatic amount, you are bound to get a lot of nonsense or fraudulent activity, but this might be constrained to being outside of serious research circles. And the most serious people working in the field are likely to be helped by the rising tide as well, due to increased visibility and funding to their labs and so on.

It's also my understanding that the last few years (during the current AI boom) have been some of the most successful (financially and productively) for MIRI in their entire history.

This is an interesting point I hadn't considered. Still mulling it over a bit.

I don't think we disagree. The top-level comment said

experimented with hiring people to work on serious technical problems

and you said

You don't have to think of it as hiring

So you were talking past each other a bit.

Also, you said:

When I started working on this topic, I wasn't much invested in it, just wanted to get attention on LW. Wei has admitted to a similar motivation...
I feel like many people here focus too much on removing barriers and too little on creating incentives.

I think we can create a strong incentive landscape on LW for valuable work to be done (in alignment and in other areas); it's just very important to get it right, and to not build something that can easily fall prey to adversarial goodheart (or even just plain old regressional goodheart). I'm very pro creating incentives (have been thinking a lot lately about how to do that, have got a few ideas that I think is good for this, will write it up for feedback from y'all when I get a chance).

This is why I've always insisted, for example, that if you're going to start talking about "AI ethics", you had better be talking about how you are going to improve on the current situation using AI, rather than just keeping various things from going wrong.  Once you adopt criteria of mere comparison, you start losing track of your ideals—lose sight of wrong and right, and start seeing simply "different" and "same".

From: Guardians of the Truth

I'd put some serious time into that as well, if you can. If you think that you can influence the big lever that is AI, you should know which way to pull.

[Note: I am writing from my personal epistemic point of view from which pretty much all the content of the OP reads as obvious obviousness 101.]

The reason why people don't know this is not because it's hard to know it. This is some kind of common fallacy: "if I say true things that people apparently don't know, they will be shocked and turn their lives around". But in fact most people around here have more than enough theoretical capacity to figure out this, and much more, without any help. The real bottleneck is human psychology, which is not able to take certain beliefs seriously without much difficult work at the fundamentals. So "fact" posts about x-risk are mostly preaching to the choir. At best, you get some people acting out of scrupulosity and social pressure, and this is pretty useless.

Of course I still like your post a lot, and I think it's doing some good on the margin. It's just that it seems like you're wasting energy on fighting the wrong battle.

Note to everyone else: the least you can do is share this post until everyone you know is sick of it.

Note to everyone else: the least you can do is share this post until everyone you know is sick of it.

I would feel averse to this post being shared outside LW circles much, given its claims about AGI in the near future being plausible. I agree with the claim but not really for the reasons provided in the post; I think it's reasonable to put some (say 10-20%) probability on AGI in the next couple of decades due to the possibility of unexpectedly fast progress and the fact that we don't actually know what would be needed for AGI. But that isn't really spelled out in the post, and the general impression one gets from the post is that "recent machine learning advances suggest that AGI will be here within a few decades with high probability".

This is a pretty radical claim which many relevant experts would disagree with, but which is not really supported or argued for in the post. I would expect that many experts who saw this post would lower their credence in AI risk as a result, as they would see a view they strongly disagreed with, didn't see any supporting arguments they'd consider credible, and end up thinking that Raemon (and by extension AI risk people) didn't know what they were talking about.

I do mostly agree with not sharing this as a public-facing document. This post is designed to be read after you've read the sequences and/or Superintelligence and are already mostly on board.

I'm sympathetic to this. I do think there's something important about making all of this stuff common knowledge in addition to making it psychologically palatable to take seriously.

Generally, yeah.

But I know that I got something very valuable out of the conversations in question, which wasn't about social pressure or scrupulosity, but... just actually taking the thing seriously. This depended on my psychological state in the past year, and at least somewhat depended on psychological effects of having a serious conversation about xrisk with a serious xrisk person. My hope is that at least some of the benefits of that could be captured in written form.

If that turns out to just not be possible, well, fair. But I think if at least a couple people in the right-life-circumstances gets 25% of the value I got from the original conversation(s) from reading this, it'll have been a good use of time.

I also disagree slightly with the "the reason people don't know this isn't that it's hard to know." It's definitely achievable to figure out most of the content here. But there's a large search space of things worth figuring out, and not all of it is obvious.

All sounds sensible.

Also, reminds me of the 2nd Law of Owen:

In a funny sort of way, though, I guess I really did just end up writing a book for myself.

If you can’t viscerally feel the difference between .1% and 1%, or a thousand and a million, you will probably need more of a statistics background to really understand things like “how much money is flowing into AI, and what is being accomplished, and what does it mean?”

I'm surprised at the suggestion that studying statistics strengthens gut sense of the significance of probabilities. I've updated somewhat towards that based on the above, but I would still expect something more akin to playing with and visualising data to be useful for this

"that we should expect DeepMind et all to have some projects..."

et all should be et al.

Meta: the link Performance Trends in AI (Sarah Constantin) doesn't work. I see no content, only title, score and comments. Presumably it's the same post as this ?

Thanks! Fixed.

Similar meta: none of the links to currently work due to, well, being to rather than

hmm. I can fix these links, but fyi if you clear your browser cache they should work for you. (If not, lemme know)

Doesn't work in incognito mode either. There appears to be an issue with when accessed over HTTPS — over HTTP it sends back a reasonable-looking 301 redirect, but on port 443 the TCP connection just hangs.

I swear I've seen a post like this before with someone (Critch I presume?) arguing for taking a year just to orient oneself. Anyone have the link?

That's a lot of text that leaves me confused in a lot of ways.

The claim about AI timeline is, I think, not contentious. But what about "AI will be dangerous"? Surely that's Claim 4? Is it so obvious that it's not even worth listing?

Regarding AI risk estimation, this seems a weirdly rare topic of discussion on LW. The few recent AI related posts feel like summaries of stuff like the superintelligence FAQ without much new substance. Are those discussions happening elsewhere? Maybe there aren't any new discussions to be had?

You have a section called "what to think about", but I didn't actually understand what to think about. Is this about solving alignment? Or about updating our personal plans to account for the coming space communism/apocalypse? You sort of hinted that there is a list of things that need to be done, but haven't really explained what they would be.

Regarding "thinking", would you have told a farmer in 1700s to stop working, learn engineering and think about what the industrial revolution is going to look like? Does that really make sense? I think "wait and see" is also a good strategy.

We’ve had about 5-6 years of active discussion on AI risk estimation on LessWrong, as well as the shape of intelligence explosions. If you haven’t read superintelligence or Rationality:A-Z yet, then I would recommend reading those, since most of the writing on LW will assume you’ve read those and take knowing the arguments in them mostly as given.

Note: I actually think your question is legitimate and valid, it’s just something we’ve literally spent about 40% of the content on this site discussing, and I think it’s important that we aren’t forced to have the same conversations over and over again. A lot of that discussion is now a few years old, which does mean it isn't really actively discussed anymore, but the current state of things does mean that I usually want someone to credibly signal that they've read the old discussions, before I engage in a super long debate with them rehashing the old arguments.

I'm vaguely aware there used to be discussion, though I wasn't there for it. So what happened? I'm not suggesting that you should replay the same arguments (though we might, surely I'm not the only new person here). I'm suggesting that you should have new arguments. Time has passed, new developments have happened, presumably progress is made on something and new problems were discovered and alignment remains unsolved.

Raemon suggests that you should spend time thinking about AI. If you agree how important it is, you probably should. And if you're going to think, you might as well publish what you figure out. And if you're going to publish, you might as well do in on LW, right?

Regarding reading the old arguments, it there some way to find the good ones? A lot of the arguments I see are kind of weak, long, intuitive, make no assumptions that the reader knows CS 101, etc. Rationality: A-Z falls into this category, I think (though I haven't read it in a long time). Is "Superintelligence" better?

By the way, you saw my recent post on AI. Was it also a point previously talked about? Do you have links?

This isn't intended to argue about AI safety from the ground up, this is targeted towards people who are familiar (and buy into) the arguments, but aren't taking action on them. (Scott Alexander's Superintelligence FAQ is the summary I point people to if they aren't buying the basic paradigm. If you've read that, and either aren't fully convinced or feel like you want more context, and you haven't read the Sequences and Superintelligence, I do literally suggest doing that first, as Habryka suggests)

So, "AI timelines have shifted sooner, even among people who were taking them seriously" is the bit of new information for people who've been sort of following things but not religiously keeping track of a lot of in-person and FB discussions.

Just pointing out I'm still waiting on a response from my comment asking a similar question the zulu here. I read the sequences and superintelligence, but I still don't see how an AI would proliferate and advance faster than our ability to kill it - a year to get from baby to einstein level intelligence is plenty long enough to react.

This post of mine is (among other things), one piece of a reply to this comment.

React to what? Such an AI might appear perfectly safe and increasingly useful right up to the point where it can no longer be turned off.

Why do you think people wouldn't shut down an AI when they see it developing the capability of being unable to shut down, regardless of how useful it is currently being?

Why do you think people wouldn't shut down an AI when they see it developing the capability of being unable to shut down, regardless of how useful it is currently being?

Does section 5.2 of Disjunctive Scenarios answer your question? There are plenty of reasons why various groups would set an AI free, and just something like it having unlimited Internet access may allow it to copy itself to be run somewhere else, preventing any future shutdown attempts.

Also, "we should shut this thing because it's dangerous, regardless of how useful it is currently" is something that humans are empirically terrible at. Most people know that modern operating systems are probably full of undiscovered security holes that someone may be exploiting even as we speak, but nobody's seriously proposing that we take down all computers while rebuild operating systems from the ground up to be more secure.

More generally, how many times have you read a report of some accident that includes some phrasing to the extent of "everyone knew it was a disaster just waiting to happen"? Eliezer also had a long article just recently about all kinds of situations where a lot of people know that the situation is fucked up, but can't really do anything about it despite wanting to.

OK, that answers my first two point. So, if I bought into the arguments, it would be clear to me what I should be thinking about? And the value of this thinking would also be clear to me?

That seems dubious but, anyway, since I don't yet fully buy into the arguments, would you explain what exactly to think about and why that should be a good idea?