The Best of LessWrong

The Best of LessWrong — LessWrong

Paul thinks common stories about what AI catastrophe might look like are flawed. Here, he paints a picture of two failure scenarios that seem more likely, summarized as: "ML helps us get what we measure (creating a slow rolling catastrophe)", and "ML gives rise to greedy patterns that try to expand their influence."

The Parable of Predict-O-Matic

A story in nine parts about someone creating an AI that predicts the future, and multiple people who wonder about the implications. What happens when the predictions influence what future happens?

#18

Chris Olah’s views on AGI safety

In thinking about AGI safety, I’ve found it useful to build a collection of different viewpoints from people that I respect, such that I can think from their perspective. I will often try to compare what an idea feels like when I put on my Paul Christiano hat, to when I put on my Scott Garrabrant hat. Recently, I feel like I’ve gained a "Chris Olah" hat, which often looks at AI through the lens of interpretability.

The goal of this post is to try to give that hat to more people.

#19

Reframing Superintelligence: Comprehensive AI Services as General Intelligence

Eric Drexler's CAIS model suggests that before we get to a world with monolithic AGI agents, we will already have seen an intelligence explosion due to automated R&D. This reframes the problems of AI safety and has implications for what technical safety researchers should be doing. Rohin reviews and summarizes the model

#33

human psycholinguists: a critical appraisal

nostalgebraist argues that GPT-2 is a fascinating and important development for our understanding of language and the mind, despite its flaws. They're frustrated that many psycholinguists who previously studied language in detail now seem uninterested in looking at what GPT-2 tells us about language, instead focusing on whether it's "real AI".

#34

AI Safety "Success Stories"

AI safety researchers have different ideas of what success would look like. This post explores five different AI safety "success stories" that researchers might be aiming for and compares them along several dimensions.

20fiddler

I think this post is incredibly useful as a concrete example of the challenges of seemingly benign powerful AI, and makes a compelling case for serious AI safety research being a prerequisite to any safe further AI development. I strongly dislike part 9, as painting the Predict-o-matic as consciously influencing others personality at the expense of short-term prediction error seems contradictory to the point of the rest of the story. I suspect I would dislike part 9 significantly less if it was framed in terms of a strategy to maximize predictive accuracy. More specifically, I really enjoy the focus on the complexity of “optimization” on a gears-level: I think that it’s a useful departure from high abstraction levels, as the question of what predictive accuracy means, and the strategy AI would use to pursue it, is highly influenced by the approach taken. I think a more rigorous approach to analyzing whether different AI approaches are susceptible to “undercutting” as a safety feature would be an extremely valuable piece. My suspicion is that even the engineer’s perspective here is significantly under-specified with the details necessary to determine whether this vulnerability exists. I also think that Part 9 detracts from the piece in two main ways: by painting the predict-o-matic as conscious, it implies a significantly more advanced AI than necessary to exhibit this effect. Additionally, because the AI admits to sacrificing predictIve accuracy in favor of some abstract value-add, it seems like pretty much any naive strategy would outcompete the current one, according to the engineer, meaning that the type of threat is also distorted: the main worry should be AI OPTIMIZING for predictive accuracy, not pursuing its own goals. That’s bad sci-fi or very advanced GAI, not a prediction-optimizer. I would support the deletion or aggressive editing of part 9 in this and future similar pieces: I’m not sure what it adds. ETA-I think whether or not this post should be upd

20DanielFilan

* Olah’s comment indicates that this is indeed a good summary of his views. * I think the first three listed benefits are indeed good reasons to work on transparency/interpretability. I am intrigued but less convinced by the prospect of ‘microscope AI’. * The ‘catching problems with auditing’ section describes an ‘auditing game’, and says that progress in this game might illustrate progress in using interpretability for alignment. It would be good to learn how much success the auditors have had in this game since the post was published. * One test of ‘microscope AI’: the go community has had a couple of years of the computer era, in which time open-source go programs stronger than AlphaGo have been released. This has indeed changed the way that humans think about go: seeing the corner variations that AIs tend to play has changed our views on which variations are good for which player, and seeing AI win probabilities conditioned on various moves, as well as the AI-recommended continuations, has made it easier to review games. Yet sadly, there has been to my knowledge no new go knowledge generated from looking at the internals of these systems, despite some visualization research being done (https://arxiv.org/pdf/1901.02184.pdf, https://link.springer.com/chapter/10.1007/978-3-319-97304-3_20). As far as I’m aware, we do not even know if these systems understand the combinatorial game theory of the late endgame, the one part of go that has been satisfactorily mathematized (and therefore unusually amenable to checking whether some program implements it). It’s not clear to me whether this is for a lack of trying, but this does seem like a setting where microscope AI would be useful if it were promising. * The paper mostly focuses on the benefits of transparency/interpretability for AI alignment. However, as far as I’m aware, since before this post was published, the strongest argument against work in this direction has been the problem of tractability - can we ac

20nostalgebraist

I wrote this post about a year ago. It now strikes me as an interesting mixture of 1. Ideas I still believe are true and important, and which are (still) not talked about enough 2. Ideas that were plausible at the time, but are much less so now 3. Claims I made for their aesthetic/emotional appeal, even though I did not fully believe them at the time In category 1 (true, important, not talked about enough): * GPT-2 is a source of valuable evidence about linguistics, because it demonstrates various forms of linguistic competence that previously were only demonstrated by humans. * Much scholarly ink has been spilled over questions of the form "what would it take, computationally, to do X?" -- where X is something GPT-2 can actually do. Since we now have a positive example, we should revisit these debates and determine which claims GPT-2 disproves, and which it supports. * Some of the key participants in those debates are not revisiting them in this way, and appear to think GPT-2 is entirely irrelevant to their work. In category 2 (plausible then but not now): * "The structure of the transformer is somehow specially apt for language, relative to other architectures that were tried." * I now think this is much less likely thanks to the 2 OpenAI scaling papers in 2020. * The first paper made it seem more plausible that LSTMs would behave like GPT-2 if given a much larger quantity of compute/data * The second paper showed that the things we know about transformers from the text domain generalize very well to image/video/math * I now think transformers are just a "good default architecture" for our current compute regime and may not have special linguistic properties * I'm finding this difficult to phrase, but in 2019 I think I believed Gary Marcus had similar preconceptions to me but was misreading the current evidence. * I now think he's more committed to the idea that GPT-2-like approaches are fundamentally barking up the wrong tree, and wi

The Best of LessWrong

Rationality

Optimization

World

Practical

AI Strategy

Technical AI Safety