This review is mostly going to talk about what I think the post does wrong and how to fix it, because the post itself does a good job explaining what it does right. But before we get to that, it's worth saying up-front what the post does well: the post proposes a basically-correct notion of "power" for purposes of instrumental convergence, and then uses it to prove that instrumental convergence is in fact highly probable under a wide range of conditions. On that basis alone, it is an excellent post.
I see two (related) central problems, from which various o...
Looking back, I have quite different thoughts on this essay (and the comments) than I did when it was published. Or at least much more legible explanations; the seeds of these thoughts have been around for a while.
The basketballism analogy remains excellent. Yet searching the comments, I'm surprised that nobody ever mentioned the Fosbury Flop or the Three-Year Swim Club. In sports, from time to time somebody comes along with some crazy new technique and shatters all the records.
Comparing rationality practice to sports practice, rationality has ...
One of the main arguments in AI risk goes something like:
One common answer to this is "ok, how about we make AI which isn't goal-directed"?
Unconscious Economics says: selection effects will often create the same effect as goal-directedness, even if we're trying to build a non-goal-directed AI.
Discussions around CAIS are one obvious application. Paul's "you get what...
ETA 1/12: This review is critical and at times harsh, not because I want to harshly criticize the post or the author, but because I did not consider harshness of criticism when writing. I still think the post is positive-net-value, and might even vote it up in the review. I especially want to emphasize that I do not think it is in any way useful to blame or punish the author for the things I complain about below; this is intended as a "pointing out a problematic habit which a lot of people have and society often encourages" criticism, not a "bad thing must...
The type signature of goals is the overarching topic to which this post contributes. It can manifest in a lot of different ways in specific applications:
If we want to "align AI with human values", build ML interpretability tools, etc, then that's going to be pretty to...
The material here is one seed of a worldview which I've updated toward a lot more over the past year. Some other posts which involve the theme include Science in a High Dimensional World, What is Abstraction?, Alignment by Default, and the companion post to this one Book Review: Design Principles of Biological Circuits.
Two ideas unify all of these:
I revisited this post a few months ago, after Vaniver's review of Atlas Shrugged.
I've felt for a while that Atlas Shrugged has some really obvious easy-to-articulate problems, but also offers a lot of value in a much-harder-to-articulate way. After chewing on it for a while, I think the value of Atlas Shrugged is that it takes some facts about how incentives and economics and certain worldviews have historically played out, and propagates those facts into an aesthetic. (Specifically, the facts which drove Rand's aesthetics presumably came from growing up i...
This is an excellent post, with a valuable and well-presented message. This review is going to push back a bit, talk about some ways that the post falls short, with the understanding that it's still a great post.
There's this video of a toddler throwing a tantrum. Whenever the mother (holding the camera) is visible, the child rolls on the floor and loudly cries. But when the mother walks out of sight, the toddler soon stops crying, gets up, and goes in search of the mother. Once the toddler sees the mother again, it's back to rolling on the floor crying.
A k...
In a field like alignment or embedded agency, it's useful to keep a list of one or two dozen ideas which seem like they should fit neatly into a full theory, although it's not yet clear how. When working on a theoretical framework, you regularly revisit each of those ideas, and think about how it fits in. Every once in a while, a piece will click, and another large chunk of the puzzle will come together.
Selection vs control is one of those ideas. It seems like it should fit neatly into a full theory, but it's not yet clear what that will look like. I revis...
So, this was apparently in 2019. Given how central the ideas have become, it definitely belongs in the review.
I don't particularly like dragging out the old coherence discussions, but the annual review is partly about building common knowledge, so it's the right time to bring it up.
This currently seems to be the canonical reference post on the subject. On the one hand, I think there are major problems/missing pieces with it. On the other hand, looking at the top "objection"-style comment (i.e. Said's), it's clear that the commenter didn't even finish reading the post and doesn't understand the pieces involved. I think this is pretty typical among people who object...
I generally like the post genre of "things obvious from some cultural lens which most LW readers haven't experienced", and would like to see more like this.
So, I don't necessarily think that all the details of this belong in the 2019 books, but... y'know, this is LessWrong, things just don't feel complete without a few levels of meta thrown in.
This seems to be where simulacra first started to appear in LW discourse? There doesn't seem to be a polished general post on the subject until 2020, but I feel like the concepts and classification were floating around in 2019, and some credit probably belongs on this post.
In order to apply economic reasoning in the real world, this is an indispensable concept, and this post is my go-to link for it.
This is an interesting frame which is orthogonal to most of my other frames of the topic and seems to capture something which those other frames miss (or at least deemphasize).
I'd love to nominate to basically everything Jason writes. Heck, I'd totally buy a book of posts from roots of progress. But of those which showed up on LW in 2019, this is one of the two which were most roots-of-progress-y.
Nominating mainly for the diagrams, which stuck with me more than the specifics of the post.
I'd love to nominate to basically everything Jason writes. Heck, I'd totally buy a book of posts from roots of progress. But of those which showed up on LW in 2019, this is one of the two which were most roots-of-progress-y.
I found this post interesting and helpful, and have used it as a mental hook on which to hang other things. Interpreting what's going on with double descent, and what it implies, is tricky, and I'll probably write a proper review at some point talking about that.
I think this post remains under-appreciated. Aesthetics drive a surprisingly large chunk of our behavior, and I find it likely that some aesthetics tend to outperform others in terms of good decision-making. Yet it's a hard thing to discuss at a community level, because aesthetics are often inherently tied to politics. I'd like to see more intentional exploration of aesthetic-space, and more thinking about how to evaluate how-well-different-aesthetics-perform-on-decisions, assuming the pitfalls of politicization can be avoided.
Things To Take Away From The Essay
First and foremost: Yudkowsky makes absolutely no mention whatsoever of the VNM utility theorem. This is neither an oversight nor a simplification. The VNM utility theorem is not the primary coherence theorem. It's debatable whether it should be considered a coherence theorem at all.
Far and away the most common mistake when arguing about coherence (at least among a technically-educated audience) is for people who've only heard of VNM to think they know what the debate is about. Looking at the top-voted comments on this ess... (read more)