Reviews (All Years)

Sorted by Top

Things To Take Away From The Essay

First and foremost: Yudkowsky makes absolutely no mention whatsoever of the VNM utility theorem. This is neither an oversight nor a simplification. The VNM utility theorem is not the primary coherence theorem. It's debatable whether it should be considered a coherence theorem at all.

Far and away the most common mistake when arguing about coherence (at least among a technically-educated audience) is for people who've only heard of VNM to think they know what the debate is about. Looking at the top-voted comments on this ess... (read more)

I think this post is emblematic of the problem I have with most of Val's writing: there are useful nuggets of insight here and there, but you're meant to swallow them along with a metric ton of typical mind fallacy, projection, confirmation bias, and manipulative narrativemancy.

Elsewhere, Val has written words approximated by ~"I tried for years to fit my words into the shape the rationalists wanted me to, and now I've given up and I'm just going to speak my mind."

This is what it sounds like when you are blind to an important distinction. Trying to hedge m... (read more)

I didn't like this post. At the time, I didn't engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn't actually engage with the idea very much. So it seems like a good idea to say something now.

The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don't think it's my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it's a straw-man of the view it&... (read more)

In my opinion, the biggest shift in the study of rationality since the Sequences were published were a change in focus from "bad math" biases (anchoring, availability, base rate neglect etc.) to socially-driven biases. And with good reason: while a crash course in Bayes' Law can alleviate many of the issues with intuitive math, group politics are a deep and inextricable part of everything our brains do.

There has been a lot of great writing describing the issue like Scott’s essays on ingroups and outgroups and Robin Hanson’s the... (read more)

I think this post might be the best one of all the MIRI dialogues. I also feel confused about how to relate to the MIRI dialogues overall.

A lot of the MIRI dialogues consist of Eliezer and Nate saying things that seem really important and obvious to me, and a lot of my love for them comes from a feeling of "this actually makes a bunch of the important arguments for why the problem is hard". But the nature of the argument is kind of closed off. 

Like, I agree with these arguments, but like, if you believe these arguments, having traction on AI Alignment... (read more)

(I'm just going to speak for myself here, rather than the other authors, because I don't want to put words in anyone else's mouth. But many of the ideas I describe in this review are due to other people.)

I think this work was a solid intellectual contribution. I think that the metric proposed for how much you've explained a behavior is the most reasonable metric by a pretty large margin.

The core contribution of this paper was to produce negative results about interpretability. This led to us abandoning work on interpretability a few months later, which I'm... (read more)

This post is correct, and the point is important for people who want to use algorithmic information theory.

But, as many commenters noted, this point is well understood among algorithmic information theorists. I was taught this as a basic point in my algorithmic information theory class in university (which tbc was taught by one of the top algorithmic information theorists, so it's possible that it's missed in other treatments).

I'm slightly frustrated that Nate didn't realize that this point is unoriginal. His post seems to take this as an example of a case... (read more)

This is an unusually difficult post to review. In an ideal world, we'd like to be able to review things as they are, without reference to who the author is. In many settings, reviews are done anonymously (with the author's name stricken off), for just this reason. This post puts that to the test: the author is a pariah. And ordinarily I would say, that's irrelevant, we can just read the post and evaluate it on its own merits.

Other comments have mentioned that there could be PR concerns, ie, that making the author's existence and participation on LessWrong

... (read more)

This review is mostly going to talk about what I think the post does wrong and how to fix it, because the post itself does a good job explaining what it does right. But before we get to that, it's worth saying up-front what the post does well: the post proposes a basically-correct notion of "power" for purposes of instrumental convergence, and then uses it to prove that instrumental convergence is in fact highly probable under a wide range of conditions. On that basis alone, it is an excellent post.

I see two (related) central problems, from which various o... (read more)

This is a review of both the paper and the post itself, and turned more into a review of the paper (on which I think I have more to say) as opposed to the post. 

Disclaimer: this isn’t actually my area of expertise inside of technical alignment, and I’ve done very little linear probing myself. I’m relying primarily on my understanding of others’ results, so there’s some chance I’ve misunderstood something. Total amount of work on this review: ~8 hours, though about 4 of those were refreshing my memory of prior work and rereading the paper. 

TL... (read more)

I think that strictly speaking this post (or at least the main thrust) is true, and proven in the first section. The title is arguably less true: I think of 'coherence arguments' as including things like 'it's not possible for you to agree to give me a limitless number of dollars in return for nothing', which does imply some degree of 'goal-direction'.

I think the post is important, because it constrains the types of valid arguments that can be given for 'freaking out about goal-directedness', for lack of a better term. In my mind, it provokes various follo

... (read more)

In this essay, ricraz argues that we shouldn't expect a clean mathematical theory of rationality and intelligence to exist. I have debated em about this, and I continue to endorse more or less everything I said in that debate. Here I want to restate some of my (critical) position by building it from the ground up, instead of responding to ricraz point by point.

When should we expect a domain to be "clean" or "messy"? Let's look at everything we know about science. The "cleanest" domains are mathematics and fundamental physics. There, we have crisply defined

... (read more)

1. Manioc poisoning in Africa vs. indigenous Amazonian cultures: a biological explanation?

Note that while Josef Henrich, the author of TSOOS, correctly points out that cassava poisoning remains a serious public health concern in Africa, he doesn't supply any evidence that it wasn't also a public health issue in Amazonia. One author notes that "none of the disorders which have been associated with high cassava diets in Africa have been found in Tukanoans or other indigenous groups on cassava-based diets in Amazonia."

Is this because Tukanoans have superior p... (read more)

What does this post add to the conversation?

Two pictures of elephant seals.

How did this post affect you, your thinking, and your actions?

I am, if not deeply, but certainly affected by this post. I felt some kind of joy looking at these animals. It calmed my anger and made my thoughts somewhat happier. I started to believe the world can become a better place, and I would like to make it happen. This post made me a better person.

Does it make accurate claims? Does it carve reality at the joints? How do you know?

The title says elephant seals 2 and c... (read more)

Tldr; I don’t think that this post stands up to close scrutiny although there may be unknown knowns anyway. This is partly due to a couple of things in the original paper which I think are a bit misleading for the purposes of analysing the markets.

The unknown knowns claim is based on 3 patterns in the data:

“The mean prediction market belief of replication is 63.4%, the survey mean was 60.6% and the final result was 61.9%. That’s impressive all around.”

“Every study that would replicate traded at a higher probability of suc... (read more)

[this is a review by the author]

I think what this post was doing was pretty important (colliding two quite different perspectives). In general there is a thing where there is a "clueless / naive" perspective and a "loser / sociopath / zero-sum / predatory" perspective that usually hides itself from the clueless perspective (with some assistance from the clueless perspective; consider the "see no evil, hear no evil, speak no evil" mindset, a strategy for staying naive). And there are lots of difficulties in trying to establish communication. And the dial

... (read more)

This post provides a valuable reframing of a common question in futurology: "here's an effect I'm interested in -- what sorts of things could cause it?"

That style of reasoning ends by postulating causes.  But causes have a life of their own: they don't just cause the one effect you're interested in, through the one causal pathway you were thinking about.  They do all kinds of things.

In the case of AI and compute, it's common to ask

  • Here's a hypothetical AI technology.  How much compute would it require?

But once we have an answer to this quest... (read more)

This post is a review of Paul Christiano's argument that the Solomonoff prior is malign, along with a discussion of several counterarguments and countercounterarguments. As such, I think it is a valuable resource for researchers who want to learn about the problem. I will not attempt to distill the contents: the post is already a distillation, and does a a fairly good job of it.

Instead, I will focus on what I believe is the post's main weakness/oversight. Specifically, the author seems to think the Solomonoff prior is, in some way, a distorted model of rea... (read more)

Looking back, I have quite different thoughts on this essay (and the comments) than I did when it was published. Or at least much more legible explanations; the seeds of these thoughts have been around for a while.

On The Essay

The basketballism analogy remains excellent. Yet searching the comments, I'm surprised that nobody ever mentioned the Fosbury Flop or the Three-Year Swim Club. In sports, from time to time somebody comes along with some crazy new technique and shatters all the records.

Comparing rationality practice to sports practice, rationality has ... (read more)

Here are my thoughts.

  1. Being honest is hard, and there are many difficult and surprising edge-cases, including things like context failures, negotiating with powerful institutions, politicised narratives, and compute limitations.
  2. On top of the rule of trying very hard to be honest, Eliezer's post offers an additional general rule for navigating the edge cases. The rule is that when you’re having a general conversation all about the sorts of situations you would and wouldn’t lie, you must be absolutely honest. You can explicitly not answer certain questions if
... (read more)

(This is a review of the entire sequence.)

On the day when I first conceived of this sequence, my room was covered in giant graph paper sticky notes. The walls, the windows, the dressers, the floor. Sticky pads everywhere, and every one of them packed with word clouds and doodles in messy bold marker.

My world is rich. The grain of the wood on the desk in front of me, the slightly raw sensation inside my nostrils that brightens each time I inhale, the pressure of my search for words as I write that rises up through my chest and makes my brain feel like it’s ... (read more)

I wrote up a longer, conceptual review. But I also did a brief data collection, which I'll post here as others might like to build on or go through a similar exercise. 

In 2019 YC released a list of their top 100 portfolio companies ranked by valuation and exit size, where applicable.

So I went through the top 50 companies on this list, and gave each company a ranking ranging from -2 for "Very approval-extracting" to 2 for "Very production-oriented".  

To decide on that number, I asked myself questions like "Would growth of this company seem cancero... (read more)

I've been thinking about this post a lot since it first came out. Overall, I think it's core thesis is wrong, and I've seen a lot of people make confident wrong inferences on the basis of it. 

The core problem with the post was covered by Eliezer's post "GPTs are Predictors, not Imitators" (which was not written, I think, as a direct response, but which still seems to me to convey the core problem with this post):  

Imagine yourself in a box, trying to predict the next word - assign as much probability mass to the next token as possible - for all t

... (read more)

I don't know if I'll ever get to a full editing of this. I'll jot notes here of how I would edit it as I reread this.

  • I'd ax the whole opening section.
    • That was me trying to (a) brute force motivation for the reader and (b) navigate some social tension I was feeling around what it means to be able to make a claim here. In particular I was annoyed with Oli and wanted to sidestep discussion of the lemons problem. My focus was actually on making something in culture salient by offering a fake framework. The thing speaks for itself once you l
... (read more)

In this essay, Rohin sets out to debunk what ey perceive as a prevalent but erroneous idea in the AI alignment community, namely: "VNM and similar theorems imply goal-directed behavior". This is placed in the context of Rohin's thesis that solving AI alignment is best achieved by designing AI which is not goal-directed. The main argument is: "coherence arguments" imply expected utility maximization, but expected utility maximization does not imply goal-directed behavior. Instead, it is a vacuous constraint, since any agent policy can be regarded as maximiz

... (read more)

(I reviewed this in a top-level post: Review of 'But exactly how complex and fragile?'.)

I've thought about (concepts related to) the fragility of value quite a bit over the last year, and so I returned to Katja Grace's But exactly how complex and fragile? with renewed appreciation (I'd previously commented only a very brief microcosm of this review). I'm glad that Katja wrote this post and I'm glad that everyone commented. I often see private Google docs full of nuanced discussion which will never see the light of day, and that makes me sad, and I'm happy ... (read more)

In this essay Paul Christiano proposes a definition of "AI alignment" which is more narrow than other definitions that are often employed. Specifically, Paul suggests defining alignment in terms of the motivation of the agent (which should be, helping the user), rather than what the agent actually does. That is, as long as the agent "means well", it is aligned, even if errors in its assumptions about the user's preferences or about the world at large lead it to actions that are bad for the user.

Rohin Shah's comment on the essay (which I believe is endorsed

... (read more)

As far as I can tell, this post successfully communicates a cluster of claims relating to "Looking, insight meditation, and enlightenment". It's written in a quite readable style that uses a minimum of metaphorical language or Buddhist jargon. That being said, likely due to its focus as exposition and not persuasion, it contains and relies on several claims that are not supported in the text, such as:

  • Many forms of meditation successfully train cognitive defusion.
  • Meditation trains the ability to have true insights into the mental causes of mental process
... (read more)

Zack's series of posts in late 2020/early 2021 were really important to me. They were a sort of return to form for LessWrong, focusing on the valuable parts.

What are the parts of The Sequences which are still valuable? Mainly, the parts that build on top of Korzybski's General Semantics and focus hard core on map-territory distinctions. This part is timeless and a large part of the value that you could get by (re)reading The Sequences today. Yudkowsky's credulity about results from the social sciences and his mind projection fallacying his own mental quirk... (read more)

I do not like this post. I think it gets most of its rhetorical oomph from speaking in a very moralizing tone, with effectively no data, and presenting everything in the worst light possible; I also think many of its claims are flat-out false. Let's go through each point in order.

1. You can excuse anything by appealing to The Incentives

No, seriously—anything. Once you start crying that The System is Broken in order to excuse your actions (or inactions), you can absolve yourself of responsibility for all kinds of behaviors that, on paper, should raise red f

... (read more)

I strongly oppose collation of this post, despite thinking that it is an extremely well-written summary of an interesting argument on an interesting topic. The reason that I do so is because I believe it represents a substantial epistemic hazard because of the way it was written, and the source material it comes from. I think this is particularly harmful because both justifications for nominations amount to "this post was key in allowing percolation of a new thesis unaligned with the goals of the community into community knowledge," which is a justificatio... (read more)

I have several problems with including this in the 2018 review. The first is that it's community-navel-gaze-y - if it's not the kind of thing we allow on the frontpage because of concerns about newcomers seeing a bunch of in-group discussion, then it seems like we definitely wouldn't want it to be in a semi-public-facing book, either. 

The second is that I've found that most discussion of the concept of 'status' in rationalist circles to be pretty uniformly unproductive, and maybe even counterproductive. People generally only discuss 'status' when they

... (read more)

A brief authorial take - I think this post has aged well, although as with Caring Less (https://www.lesswrong.com/posts/dPLSxceMtnQN2mCxL/caring-less), this was an abstract piece and I didn't make any particular claims here.

I'm so glad that A) this was popular B) I wasn't making up a new word for a concept that most people already know by a different name, which I think will send you to at least the first layer of Discourse Hell on its own.

I've met at least one person in the community who said they knew and thought about this post a lot, well before they'd

... (read more)

A year later, I continue to agree with this post; I still think its primary argument is sound and important. I'm somewhat sad that I still think it is important; I thought this was an obvious-once-pointed-out point, but I do not think the community actually believes it yet.

I particularly agree with this sentence of Daniel's review:

I think the post is important, because it constrains the types of valid arguments that can be given for 'freaking out about goal-directedness', for lack of a better term."

"Constraining the types of valid arguments" is exactly the... (read more)

Comments on the outcomes of the post:

  • I'm reasonably happy with how this post turned out. I think it probably bought the Anthropic/superposition mechanistic interpretability agenda somewhere between 0.1 to 4 counterfactual months of progress, which feels like a win.
  • I think sparse autoencoders are likely to be a pretty central method in mechanistic interpretability work for the foreseeable future (which tbf is not very foreseeable).
  • Two parallel works used the method identified in the post (sparse autoencoders - SAEs) or slight modification:
    • Cunningham et al.
... (read more)

I read this post for the first time in 2022, and I came back to it at least twice. 

What I found helpful

  • The proposed solution: I actually do come back to the “honor” frame sometimes. I have little Rob Bensinger and Anna Salamon shoulder models that remind me to act with integrity and honor. And these shoulder models are especially helpful when I’m noticing (unhelpful) concerns about social status.
  • A crisp and community-endorsed statement of the problem: It was nice to be like “oh yeah, this thing I’m experiencing is that thing that Anna Salamon calls PR
... (read more)

I think I agree with the thrust of this, but I think the comment section raises caveats that seem important. Scott's acknowledged that there's danger in this, and I hope an updated version would put that in the post.

But also...

Steven Pinker is a black box who occasionally spits out ideas, opinions, and arguments for you to evaluate. If some of them are arguments you wouldn’t have come up with on your own, then he’s doing you a service. If 50% of them are false, then the best-case scenario is that they’re moronically, obviously false, so that you can reject

... (read more)

In “Why Read The Classics?”, Italo Calvino proposes many different definitions of a classic work of literature, including this one:

A classic is a book which has never exhausted all it has to say to its readers.

For me, this captures what makes this sequence and corresponding paper a classic in the AI Alignment literature: it keeps on giving, readthrough after readthrough. That doesn’t mean I agree with everything in it, or that I don’t think it could have been improved in terms of structure. But when pushed to reread it, I found again and again that I had m... (read more)

Connection to Alignment

One of the main arguments in AI risk goes something like:

  • AI is likely to be a utility maximizer (or goal-directed in some other sense)
  • Goodhart, instrumental convergence, etc make powerful goal-directed agents dangerous by default

One common answer to this is "ok, how about we make AI which isn't goal-directed"?

Unconscious Economics says: selection effects will often create the same effect as goal-directedness, even if we're trying to build a non-goal-directed AI.

Discussions around CAIS are one obvious application. Paul's "you get what... (read more)

There are two separate lenses through which I view the idea of competitive markets as backpropagation.

First, it's an example of the real meat of economics. Many people - including economists - think of economics as studying human markets and exchange. But the theory of economics is, to a large extent, general theory of distributed optimization. When we understand on a gut level that "price = derivative", and markets are just implementing backprop, it makes a lot more sense that things like markets would show up in other fields - e.g. AI or b... (read more)

The referenced study on group selection on insects is "Group selection among laboratory populations of Tribolium," from 1976. Studies on Slack claims that "They hoped the insects would evolve to naturally limit their family size in order to keep their subpopulation alive. Instead, the insects became cannibals: they ate other insects’ children so they could have more of their own without the total population going up." 

This makes it sound like cannibalism was the only population-limiting behavior the beetles evolved. According to the original study, ho... (read more)

ETA 1/12: This review is critical and at times harsh, not because I want to harshly criticize the post or the author, but because I did not consider harshness of criticism when writing. I still think the post is positive-net-value, and might even vote it up in the review. I especially want to emphasize that I do not think it is in any way useful to blame or punish the author for the things I complain about below; this is intended as a "pointing out a problematic habit which a lot of people have and society often encourages" criticism, not a "bad thing must... (read more)

This post points to a rather large update, which I think has not yet propagated through the collective mind of the alignment community. Gains from algorithmic improvement have been roughly comparable to gains from compute and data, and much larger on harder tasks (which are what matter for takeoff).

Yet there's still an implicit assumption behind lots of alignment discussion that progress is mainly driven by compute. This is most obvious in discussions of a training pause: such proposals are almost always about stopping very large runs only. That would stop... (read more)

Selection vs Control is a distinction I always point to when discussing optimization. Yet this is not the two takes on optimization I generally use. My favored ones are internal optimization (which is basically search/selection), and external optimization (optimizing systems from Alex Flint’s The ground of optimization). So I do without control, or at least without Abram’s exact definition of control.

Why? Simply because the internal structure vs behavior distinction mentioned in this post seems more important than the actual definitions (which seem constra... (read more)

I think this post, as promised in the epistemic status, errs on the side of simplistic poetry. I see its core contribution as saying that the more people you want to communicate to, the less you can communicate to them, because the marginal people aren't willing to put in work to understand you, and because it's harder to talk to marginal people who are far away and can't ask clarifying questions or see your facial expressions or hear your tone of voice. The numbers attached (e.g. 'five' and 'thousands of people') seem to not be super precise.

That being sa... (read more)

I've been pleasantly surprised by how much this resource has caught on in terms of people using it and referring to it (definitely more than I expected when I made it). There were 30 examples on the list when was posted in April 2018, and 20 new examples have been contributed through the form since then. I think the list has several properties that contributed to wide adoption: it's fun, standardized, up-to-date, comprehensive, and collaborative.

Some of the appeal is that it's fun to read about AI cheating at tasks in unexpected ways (I&apo... (read more)

This essay had a significant influence on my growth in the past two years. I shifted from perceiving discomfort as something I am subject to, to considering my relationship with discomfort as an object that can be managed. There are many other writings and experiences that contributed to this growth, but this was the first piece I encountered that talked about managing our relationship with hazards as a thing we can manipulate and improve at. It made me wonder why all human activity may be considered running in the meadow and why contracting may be bad, it... (read more)

I think Simulators mostly says obvious and uncontroversial things, but added to the conversation by pointing them out for those who haven't noticed and introducing words for those who struggle to articulate. IMO people that perceive it as making controversial claims have mostly misunderstood its object-level content, although sometimes they may have correctly hallucinated things that I believe or seriously entertain. Others have complained that it only says obvious things, which I agree with in a way, but seeing as many upvoted it or said they found it ill... (read more)

It's been over a year since the original post and 7 months since the openphil revision.

A top level summary:

  1. My estimates for timelines are pretty much the same as they were.
  2. My P(doom) has gone down overall (to about 30%), and the nature of the doom has shifted (misuse, broadly construed, dominates).

And, while I don't think this is the most surprising outcome nor the most critical detail, it's probably worth pointing out some context. From NVIDIA:

In two quarters, from Q1 FY24 to Q3 FY24, datacenter revenues went from $4.28B to $14.51B.

From the post:

In 3 year

... (read more)

In this post, the author proposes a semiformal definition of the concept of "optimization". This is potentially valuable since "optimization" is a word often used in discussions about AI risk, and much confusion can follow from sloppy use of the term or from different people understanding it differently. While the definition given here is a useful perspective, I have some reservations about the claims made about its relevance and applications.

The key paragraph, which summarizes the definition itself, is the following:

An optimizing system is a system that

... (read more)

I assign a decent probability to this sequence (of which I think this is the best post) being the most important contribution of 2022. I am however really not confident of that, and I do feel a bit stuck on how to figure out where to apply and how to confirm the validity of ideas in this sequence. 

Despite the abstract nature, I think if there are indeed arguments to do something closer to Kelly betting with one's resources, even in the absence of logarithmic returns to investment, then that would definitely have huge effects on how I think about my ow... (read more)

I really like this post. I think it points out an important problem with intuitive credit-assignment algorithms which people often use. The incentive toward inaction is a real problem which is often encountered in practice. While I was somewhat aware of the problem before, this post explains it well.

I also think this post is wrong, in a significant way: asymmetric justice is not always a problem and is sometimes exactly what you want. in particular, it's how you want a justice system (in the sense of police, judges, etc) to work.

The book Law's Order explai... (read more)

I <3 Specificity

For years, I've been aware of myself "activating my specificity powers" multiple times per day, but it's kind of a lonely power to have. "I'm going to swivel my brain around and ride it in the general→specific direction. Care to join me?" is not something you can say in most group settings. It's hard to explain to people that I'm not just asking them to be specific right now, in this one context. I wish I could make them see that specificity is just this massively under-appreciated cross-domain power. That's why I wanted this sequence to... (read more)

It was interesting to re-read this article 2 years later.  It reminds me that I am generally working with a unique subset of the population, which is not fully representative of human psychology.  That being said, I believe this article is misleading in important ways, which should be clarified.  The article focused too much on class, and it is hard to see it as anything but classist. While I wrote an addendum at the end, this really should have been incorporated into the entire article and not tacked on, as the conclusions one would re... (read more)

Figuring out the edge cases about honesty and truth seem important to me, both as a matter of personal aesthetics and as a matter for LessWrong to pay attention to. One of the things people have used to describe what makes LessWrong special is that it's a community focused on truth-seeking, which makes "what is truth anyway and how do we talk about it" a worthwhile topic of conversation. This article talks about it, in a way that's clear. (The positive example negative example pattern is a good approach to a topic that can really suffer from illusion of tr... (read more)

I really liked this post in that it seems to me to have tried quite seriously to engage with a bunch of other people's research, in a way that I feel like is quite rare in the field, and something I would like to see more of. 

One of the key challenges I see for the rationality/AI-Alignment/EA community is the difficulty of somehow building institutions that are not premised on the quality or tractability of their own work. My current best guess is that the field of AI Alignment has made very little progress in the last few years, which is really not w... (read more)

In this post, I appreciated two ideas in particular:

  1. Loss as chisel
  2. Shard Theory

"Loss as chisel" is a reminder of how loss truly does its job, and its implications on what AI systems may actually end up learning. I can't really argue with it and it doesn't sound new to my ear, but it just seems important to keep in mind. Alone, it justifies trying to break out of the inner/outer alignment frame. When I start reasoning in its terms, I more easily appreciate how successful alignment could realistically involve AIs that are neither outer nor inner aligned. In p... (read more)

I continue to believe that the Grabby Aliens model rests on an extremely sketchy foundation, namely the anthropic assumption “humanity is randomly-selected out of all intelligent civilizations in the past present and future”.

For one thing, given that the Grabby Aliens model does not weight civilizations by their populations, it follows that, if the Grabby Aliens model is right, then all the “popular” anthropic priors like SIA and SSA and UDASSA and so on are all wrong, IIUC.

For another (related) thing, in order to believe the Grabby Aliens model, we need t... (read more)

In this post, the author presents a case for replacing expected utility theory with some other structure which has no explicit utility function, but only quantities that correspond to conditional expectations of utility.

To provide motivation, the author starts from what he calls the "reductive utility view", which is the thesis he sets out to overthrow. He then identifies two problems with the view.

The first problem is about the ontology in which preferences are defined. In the reductive utility view, the domain of the utility function is the set of possib... (read more)

The work linked in this post was IMO the most important work done on understanding neural networks at the time it came out, and it has also significantly changed the way I think about optimization more generally.

That said, there's a lot of "noise" in the linked papers; it takes some digging to see the key ideas and the data backing them up, and there's a lot of space spent on things which IMO just aren't that interesting at all. So, I'll summarize the things which I consider central.

When optimizing an overparameterized system, there are many many different... (read more)

I think this post should be included in the best posts of 2018 collection. It does an excellent job of balancing several desirable qualities: it is very well written, being both clear and entertaining; it is informative and thorough; it is in the style of argument which is preferred on LessWrong, by which I mean makes use of both theory and intuition in the explanation.

This post adds to the greater conversation by displaying rationality of the kind we are pursuing directed at a big societal problem. A specific example of what I mean that distinguishes this... (read more)

I thought I'd add a few quick notes as the author.

As I reread this, a few things jump out for me:

  • I enjoy its writing style. Its clarity is probably part of why it was nominated.
  • I'd now say this post is making a couple of distinct claims:
    • External forces can shape what we want to do. (I.e., there are lotuses.)
    • It's possible to notice this in real time. (I.e., you can notice the taste of lotuses.)
    • It's good to do so. Otherwise we find our wanting aligned with others' goals regardless of how they relate to our own.
    • If you notice this, you
... (read more)

This post is the best overview of the field so far that I know of. I appreciate how it frames things in terms of outer/inner alignment and training/performance competitiveness--it's very useful to have a framework with which to evaluate proposals and this is a pretty good framework I think.

Since it was written, this post has been my go-to reference both for getting other people up to speed on what the current AI alignment strategies look like (even though this post isn't exhaustive). Also, I've referred back to it myself several times. I learned a lot from... (read more)

[Disclaimer: I'm reading this post for the first time now, as of 1/11/2020. I also already have a broad understanding of the importance of AI safety. While I am skeptical about MIRI's approach to things, I am also a fan of MIRI. Where this puts me relative to the target demographic of this post, I cannot say.]

Overall Summary

I think this post is pretty good. It's a solid and well-written introduction to some of the intuitions behind AI alignment and the fundamental research that MIRI does. At the same time, the use of analogy made the post m... (read more)

Author here.

In the hindsight, I still feel that the phenomenon is interesting and potentially important topic to look into. I am not aware of any attempt to replicate or dive deeper though.

As for my attempt to explain the psychology underlying the phenomenon I am not entirely happy with it. It's based only on introspection and lacks sound game-theoretic backing.

By the way, there's one interesting explanation I've read somewhere in the meantime (unfortunately, I don't remember the source):

Cooperation may incur different costs on different participants. If y

... (read more)

As with the CCS post, I'm reviewing both the paper and the post, though the majority of the review is on the paper. Writing this quickly (total time on review: ~1.5h), but I expect to be willing to defend the points being made --

There's a lot of reasons I like the work. It's an example of:

  1. Actually poking inside a real model. A lot of the mech interp work in early-mid 2022 was focused on getting a deep understanding of toy models trained on algorithmic tasks (at least in this community).[1] There was some effort at Redwood to do neuron-by-neuron replac
... (read more)

I'm glad I ran this survey, and I expect the overall agreement distribution probably still holds for the current GDM alignment team (or may have shifted somewhat in the direction of disagreement), though I haven't rerun the survey so I don't really know. Looking back at the "possible implications for our work" section, we are working on basically all of these things. 

Thoughts on some of the cruxes in the post based on last year's developments:

  • Is global cooperation sufficiently difficult that AGI would need to deploy new powerful technology to make it
... (read more)

A Cached Belief

I find this Wired article an important exploration of an enormous wrong cached belief in the medical establishment: namely that based on its size, Covid would be transmitted exclusively via droplets (which quickly fall to the ground), rather than aerosols (which hang in the air). This justified a bunch of extremely costly Covid policy decisions and recommendations: like the endless exhortations to disinfect everything and to wash hands all the time. Or the misguided attempt to protect people from Covid by closing public parks and playgrounds... (read more)

I still think this is great. Some minor updates, and an important note:

Minor updates: I'm a bit less concerned about AI-powered propaganda/persuasion than I was at the time, not sure why. Maybe I'm just in a more optimistic mood. See this critique for discussion. It's too early to tell whether reality is diverging from expectation on this front. I had been feeling mildly bad about my chatbot-centered narrative, as of a month ago, but given how ChatGPT was received I think things are basically on trend.
Diplomacy happened faster than I expected, though in a ... (read more)

This post states the problem of gradient hacking. It is valuable in that this problem is far from obvious, and if plausible, very dangerous. On the other hand, the presentation doesn’t go into enough details, and so leaves gradient hacking open to attacks and confusion. Thus instead of just reviewing this post, I would like to clarify certain points, while interweaving my critics about the way gradient hacking was initially stated, and explaining why I consider this problem so important.

(Caveat: I’m not pretending that any of my objections are unknown to E... (read more)

“Phase change in 1960’s” - first claim is california’s prison pop went from 5k to 25k. According to wikipedia this does seem to happen… but then it’s immediately followed by a drop in prison population between 1970 and 1980. It also looks like the growth is pretty stable starting in the 1940s.

According to this prison pop in California was a bit higher than 5k historically, 6k-8k, and started growing in 1945 by about 1k/year fairly consistently until 1963. It was then fairly steady, even dropping a bit, until 1982 when it REALLY exploded, more than doubling... (read more)

The only way to get information from a query is to be willing to (actually) accept different answers. Otherwise, conservation of expected evidence kicks in. This is the best encapsulation of this point, by far, that I know about, in terms of helping me/others quickly/deeply grok it. Seems essential.

Reading this again, the thing I notice most is that I generally think of this point as being mostly about situations like the third one, but most of the post's examples are instead about internal epistemic situations, where someone can't confidently conclude or ... (read more)

What's the type signature of goals?

The type signature of goals is the overarching topic to which this post contributes. It can manifest in a lot of different ways in specific applications:

  • What's the type signature of human values?
  • What structure types should systems biologists or microscope AI researchers look for in supposedly-goal-oriented biological or ML systems?
  • Will AI be "goal-oriented", and what would be the type signature of its "goal"?

If we want to "align AI with human values", build ML interpretability tools, etc, then that's going to be pretty to... (read more)

I just re-read this sequence. Babble has definitely made its way into my core vocabulary. I think of "improving both the Babble and Prune of LessWrong" as being central to my current goals, and I think this post was counterfactually relevant for that. Originally I had planned to vote weakly in favor of this post, but am currently positioning it more at the upper-mid-range of my votes.

I think it's somewhat unfortunate that the Review focused only on posts, as opposed to sequences as a whole. I just re-read this sequence, and I think the posts More Babble, P

... (read more)

This is my post.

How my thinking has changed

I've spent much of the last year thinking about the pedagogical mistakes I made here, and am writing the Reframing Impact sequence to fix them. While this post recorded my 2018-thinking on impact measurement, I don't think it communicated the key insights well. Of course, I'm glad it seems to have nonetheless proven useful and exciting to some people!

If I were to update this post, it would probably turn into a rehash of Reframing Impact. Instead, I'll just briefly state the argument as I would present it today.

... (read more)

IMO, this post makes several locally correct points, but overall fails to defeat the argument that misaligned AIs are somewhat likely to spend (at least) a tiny fraction of resources (e.g., between 1/million and 1/trillion) to satisfy the preferences of currently existing humans.

AFAICT, this is the main argument it was trying to argue against, though it shifts to arguing about half of the universe (an obviously vastly bigger share) halfway through the piece.[1]

When it returns to arguing about the actual main question (a tiny fraction of resources) at the e... (read more)

I replicated this review, which you can check out in this colab notebook (I get much higher performance running it locally on my 20-core CPU).

There is only one cluster of discrepancies I found between my analysis and Vaniver's: in my analysis, mating is even more assortative than in the original work:

  • Pearson R of the sum of partner stats is 0.973 instead of the previous 0.857
  • 99.6% of partners have an absolute sum of stats difference < 6, instead of the previous 83.3%.
  • I wasn't completely sure if Vaniver's "net satisfaction" was the difference of self-sat
... (read more)
  • Oh man, what an interesting time to be writing this review!
  • I've now written second drafts of an entire sequence that more or less begins with an abridged (or re-written?) version of "Catching the Spark". The provisional title of the sequence is "Nuts and Bolts Of Naturalism".  (I'm still at least a month and probably more from beginning to publish the sequence, though.) This is the post in the sequence that's given me the most trouble; I've spent a lot of the past week trying to figure out where I stand with it.
  • I think if I just had to answer "yes" or
... (read more)

A short note to start the review that the author isn’t happy with how it is communicated. I agree it could be clearer and this is the reason I’m scoring this 4 instead of 9. The actual content seems very useful to me.

AllAmericanBreakfast has already reviewed this from a theoretical point of view but I wanted to look at it from a practical standpoint.

***

To test whether the conclusions of this post were true in practice I decided to take 5 examples from the Wikipedia page on the Prisoner’s dilemma and see if they were better modeled by Stag Hunt or Schelling... (read more)

This post is making a valid point (the time to intervene to prevent an outcome that would otherwise occur, is going to be before the outcome actually occurs), but I'm annoyed with the mind projection fallacy by which this post seems to treat "point of no return" as a feature of the territory, rather than your planning algorithm's map.

(And, incidentally, I wish this dumb robot cult still had a culture that cared about appreciating cognitive algorithms as the common interest of many causes, such that people would find it more natural to write a post about "p... (read more)

I'm reaffirming my relatively extensive review of this post.

The simbox idea seems like a valuable guide for safely testing AIs, even if the rest of the post turns out to be wrong.

Here's my too-terse summary of the post's most important (and more controversial) proposal: have the AI grow up in an artificial society, learning self-empowerment and learning to model other agents. Use something like retargeting the search to convert the AI's goals from self-empowerment to empowering other agents.

I've used the term "safetwashing" at least once every week or two in the last year. I don't know whether I've picked it up from this post, but it still seems good to have an explanation of a term that is this useful and this common that people are exposed to.

[anonymous]3y220Review for 2019 Review

The parent-child model is my cornerstone of healthy emotional processing. I'd like to add that a child often doesn't need much more than your attention. This is one analogy of why meditation works: you just sit down for a while and you just listen

The monks in my local monastery often quip about "sitting in a cave for 30 years", which is their suggested treatment for someone who is particularly deluded. This implies a model of emotional processing which I cannot stress enough: you can only get in the way. Take all distractions away from someone and t... (read more)

Author here. I still endorse the post and have continued to find it pretty central to how I think about myself and nearby ecosystems.

I just submitted some major edits to the post. Changes include:

1. Name change ("Robust, Coherent Agent")

After much hemming and hawing and arguing, I changed the name from "Being a Robust Agent" to "Being a Robust, Coherent Agent." I'm not sure if this was the right call.

It was hard to pin down exactly one "quality" that the post was aiming at. Coherence was the single word that pointed towards "what sort of agent to become." ... (read more)

I was surprised that this post ever seemed surprising, which either means it wasn't revolutionary, or was *very* revolutionary. Since it has 229 karma, seems like it was the latter. I feel like the same post today would have been written with more explicit references to reinforcement learning, reward, addiction, and dopamine. The overall thesis seems to be that you can get a felt sense for these things, which would be surprising - isn't it the same kind of reward-seeking all the way down, including on things that are genuinely valuable? Not sure how to model this.

Author here.

I still believe this article is a important addition to the discussion of inadequate equilibria. While Scott Alexander's Moloch post and Eliezer Yudkowsky's book are great for introduction and discussion of the topic, both of them fail, in my opinion, to convey the sheer complexity of the problem as it occurs in the real world. That, I think, results in readers thinking about the issue in simple malthusian or naive game-theoretic terms and eventually despairing about inescapability of suboptimal Nash equilibria.

What I try to present is a world

... (read more)

I think about this post a lot, and sometimes in conjunction with my own post on common knowlege.

As well as it being a referent for when I think about fairness, it also ties in with how I think about LessWrong, Arbital and communal online endeavours for truth. The key line is:

For civilization to hold together, we need to make coordinated steps away from Nash equilibria in lockstep.

You can think of Wikipedia as being a set of communally editable web pages where the content of the page is constrained to be that which we can easily gain common knowledge of its

... (read more)

The core of this post seems to be this

  • Decoupling norms: It is considered eminently reasonable to require your claims to be considered in isolation - free of any context or potential implications. An insistence on raising these issues despite a decoupling request are often seen as sloppy thinking or attempts to deflect.
  • Contextualising norms: It is considered eminently reasonable to expect certain contextual factors or implications to be addressed. Not addressing these factors is often seen as sloppy or even an intentional evasion.

As Zack_M_Davis points out ... (read more)

Many of the best LessWrong posts give a word and a clear mental handle for something I kinda sorta knew loosely in my head. With the concept firmly in mind, I can use it and build on it deliberately. Sazen is an excellent example of the form.

Sazens are common in many fields I have some expertise in. "Control the centre of the board" in chess. "Footwork is foundational" in martial arts. "Shots on goal" in sports. "Conservation of expected evidence" in rationality. "Premature optimization is the root of all evil" in programming. These sentences a useful remi... (read more)

Uncharitable Summary

Most likely there’s something in the intuitions which got lost when transmitted to me via reading this text, but the mathematics itself seems pretty tautological to me (nevertheless I found it interesting since tautologies can have interesting structure! The proof itself was not trivial to me!). 

Here is my uncharitable summary:

Assume you have a Markov chain M_0 → M_1 → M_2 → … → M_n → … of variables in the universe. Assume you know M_n and want to predict M_0. The Telephone theorem says two things:

  • You don’t need to keep a
... (read more)

This post is an excellent distillation of a cluster of past work on maligness of Solomonoff Induction, which has become a foundational argument/model for inner agency and malign models more generally.

I've long thought that the maligness argument overlooks some major counterarguments, but I never got around to writing them up. Now that this post is up for the 2020 review, seems like a good time to walk through them.

In Solomonoff Model, Sufficiently Large Data Rules Out Malignness

There is a major outside-view reason to expect that the Solomonoff-is-malign ar... (read more)

This essay provides some fascinating case studies and insights about coordination problems and their solutions, from a book by Elinor Ostrom. Coordination problems are a major theme in LessWrongian thinking (for good reasons) and the essay is a valuable addition to the discussion. I especially liked the 8 features of sustainable governance systems (although I wish we got a little more explanation for "nested enterprises").

However, I think that the dichotomy between "absolutism (bad)" and "organically grown institutions (good)" that the essay creates needs

... (read more)

I read this a year or two ago, tucked it in the back of my mind, and continued with life.

When I reread it today, I suddenly realized oh duh, I’ve been banging my head against this on X for months. I’d noticed there was this interpersonal dynamic that kept trying to blow up, where I kept not seeing the significance of phrasings or word choices other people said they found deeply important. I’d been using Typical Mind Fallacy and trying to figure out how to see through their eyes, and it kept not working.

Colour Blindness feels like a close cousin of Typical ... (read more)

I was pleasantly surprised by how many people enjoyed this post about mountain climbing. I never expected it to gain so much traction, since it doesn't relate that clearly to rationality or AI or any of the topics usually discussed on LessWrong.

But when I finished the book it was based on, I just felt an overwhelming urge to tell other people about it. The story was just that insane.

Looking back I think Gwern probably summarized what this story is about best: a world beyond the reach of god. The universe does not respect your desire for a coherent, meaning... (read more)

Tl;dr I encourage people who changed their behavior based on this post or the larger sequence to comment with their stories.

I had already switched to freelance work for reasons overlapping although not synonymous with moral mazes when I learned the concept, and since then the concept has altered how I approach freelance gigs. So I’m in general very on board with the concept.

But as I read this, I thought about my friend Jessica, who’s a manager at a Fortune 500 company. Jessica is principled and has put serious (but not overwhelming) effort into enacting th... (read more)

I still think this post is correct in spirit, and was part of my journey towards good understanding of neuroscience, and promising ideas in AGI alignment / safety.

But there are a bunch of little things that I got wrong or explained poorly. Shall I list them?

First, my "neocortex vs subcortex" division eventually developed into "learning subsystem vs steering subsystem", with the latter being mostly just the hypothalamus and brainstem, and the former being everything else, particularly the whole telencephalon and cerebellum. The main difference is that the "... (read more)

I think Luna Lovegood and the Chamber of Secrets would deserve to get into the Less Wrong Review if all we cared about were its merits. However, the Less Wrong Review is used to determine which posts get into a book that is sold on paper for money. I think this story should be disqualified from the Less Wrong Review on the grounds that Harry Potter fanfiction must remain non-commercial, especially in the strict sense of traditional print publishing.

First, some meta-level things I've learned since writing this:

  1. What people crave most is very practical advice on what to buy. In retrospect this should have been more obvious to me. When I look for help from others on how to solve a problem I do not know much about, the main thing I want is very actionable advice, like "buy this thing", "use this app", or "follow this Twitter account".

  2. Failing that, what people want is legible, easy-to-use criteria for making decisions on their own. Advice like "Find something with CRI>90, and more CRI is better" i

... (read more)

This post is even-handed and well-reasoned, and explains the issues involved well. The strategy-stealing assumption seems important, as a lot of predictions are inherently relying on it either being essentially true, or effectively false, and I think the assumption will often effectively be a crux in those disagreements, for reasons the post illustrates well.

The weird thing is that Paul ends the post saying he thinks the assumption is mostly true, whereas I thought the post was persuasive that the assumption is mostly false. The post illustrates that the u... (read more)

The discussion around It's Not the Incentives, It's You, was pretty gnarly. I think at the time there were some concrete, simple mistakes I was making. I also think there were 4-6 major cruxes of disagreement between me and some other LessWrongers. The 2019 Review seemed like a good time to take stock of that.

I've spent around 12 hours talking with a couple people who thought I was mistaken and/or harmful last time, and then 5-10 writing this up. And I don't feel anywhere near done, but I'm reaching the end of the timebox so here goes.

Core Claims

I think th... (read more)

Quick authorial review: This post has brought me the greatest joy from other sources referring to it, including Marginal Revolution (https://marginalrevolution.com/marginalrevolution/2018/10/funnel-human-experience.html) and the New York Times bestseller "The Uninhabitable Earth". I was kind of hoping to supply a fact about the world that people could use in many different lights, and they have (see those and also like https://unherd.com/2018/10/why-are-woke-liberals-such-enemies-of-the-past/ )

An unintentional takeaway from this attention is solidifying my

... (read more)

One of the founders of Circling Europe sincerely and apropos-of-nothing thanked me for writing this post earlier this year, which I view as a sign that there were good consequences of me writing this post. My guess is that a bunch of rationalists found their way to Circling, and it was beneficial for people.

I've heard it said that this is one of the more rationalist-friendly summaries of Circling. I don't know it's the best possible such, but I think it's doing OK. I would certainly write it differently now, but shrug.

At this point I&... (read more)

I wrote this post, and at the time I just wrote it because... well, I thought I'd be able to write a post with a grand conclusion about how science used to check the truth, and then point to how it changed, but I was so surprised to find that journals had not one sentence of criticism in them at all. So I wrote it up as a question post instead, framing my failure to answer the question as 'partial work' that 'helped define the question'.

In retrospect, I'm really glad I wrote the post, because it is a clear datapoint about how science does not work. I have

... (read more)

In my personal view, 'Shard theory of human values' illustrates both the upsides and pathologies of the local epistemic community.

The upsides
- majority of the claims is true or at least approximately true
- "shard theory" as a social phenomenon reached critical mass making the ideas visible to the broader alignment community, which works e.g. by talking about them in person, votes on LW, series of posts,...
- shard theory coined a number of locally memetically fit names or phrases, such as 'shards'
- part of the success leads at some people in the AGI labs to... (read more)

Self-Review: After a while of being insecure about it, I'm now pretty fucking proud of this paper, and think it's one of the coolest pieces of research I've personally done. (I'm going to both review this post, and the subsequent paper). Though, as discussed below, I think people often overrate it.

Impact The main impact IMO is proving that mechanistic interpretability is actually possible, that we can take a trained neural network and reverse-engineer non-trivial and unexpected algorithms from it. In particular, I think by focusing on grokking I (semi-acci... (read more)

This is a negative review of an admittedly highly-rated post.

The positives first; I think this post is highly reasonable and well written. I'm glad that it exists and think it contributes to the intellectual conversation in rationality. The examples help the reader reason better, and it contains many pieces of advice that I endorse.

But overall, 1) I ultimately disagree with its main point, and 2) it's way too strong/absolutist about it.

Throughout my life of attempting to have true beliefs and take effective actions, I have quite strongly learned some disti... (read more)

I wrote this post about a year ago.  It now strikes me as an interesting mixture of

  1. Ideas I still believe are true and important, and which are (still) not talked about enough
  2. Ideas that were plausible at the time, but are much less so now
  3. Claims I made for their aesthetic/emotional appeal, even though I did not fully believe them at the time

In category 1 (true, important, not talked about enough):

  • GPT-2 is a source of valuable evidence about linguistics, because it demonstrates various forms of linguistic competence that previously were only demonstrated
... (read more)

There is a joke about programmers, that I picked up long ago, I don't remember where, that says: A good programmer will do hours of work to automate away minutes of drudgery. Some time last month, that joke came into my head, and I thought: yes of course, a programmer should do that, since most of the hours spent automating are building capital, not necessarily in direct drudgery-prevention but in learning how to automate in this domain.

I did not think of this post, when I had that thought. But I also don't think I would've noticed, if that joke had crosse... (read more)

This post seems excellent overall, and makes several arguments that I think represent the best of LessWrong self-reflection about rationality. It also spurred an interesting ongoing conversation about what integrity means, and how it interacts with updating.

The first part of the post is dedicated to discussions of misaligned incentives, and makes the claim that poorly aligned incentives are primarily to blame for irrational or incorrect decisions. I’m a little bit confused about this, specifically that nobody has pointed out the obvious corollary: the peop... (read more)

I think this post is incredibly useful as a concrete example of the challenges of seemingly benign powerful AI, and makes a compelling case for serious AI safety research being a prerequisite to any safe further AI development. I strongly dislike part 9, as painting the Predict-o-matic as consciously influencing others personality at the expense of short-term prediction error seems contradictory to the point of the rest of the story. I suspect I would dislike part 9 significantly less if it was framed in terms of a strategy to maximize predictive accuracy.... (read more)

Rereading this post, I'm a bit struck by how much effort I put into explaining my history with the underlying ideas, and motivating that this specifically is cool. I think this made sense as a rhetorical move--I'm hoping that a skeptical audience will follow me into territory labeled 'woo' so that they can see the parts of it that are real--and also as a pedagogical move (proofs may be easy to verify, but all of the interesting content of how they actually discovered that line of thought in concept space has been cleaned away; in this post, rather than hid... (read more)

Since others have done a contextualized review, I'll aim to do a decoupled review, with a caveat that I think the contextual elements are important for consideration with inclusion into the compendium.

Okay. There’s a social interaction concept that I’ve tried to convey multiple times in multiple conversations, so I’m going to just go ahead and make a graph.
I’m calling this concept “Affordance Widths”.

I'd like to see a clear definition here before launching into an example. In fact, there's no clear ... (read more)

This post snuck up on me.

The first time I read it, I was underwhelmed.  My reaction was: "well, yeah, duh.  Isn't this all kind of obvious if you've worked with GPTs?  I guess it's nice that someone wrote it down, in case anyone doesn't already know this stuff, but it's not going to shift my own thinking."

But sometimes putting a name to what you "already know" makes a whole world of difference.

Before I read "Simulators," when I'd encounter people who thought of GPT as an agent trying to maximize something, or people who treated MMLU-like one... (read more)

Epistemic Status

I am an aspiring selection theorist and I have thoughts.

 


 

Why Selection Theorems?

Learning about selection theorems was very exciting. It's one of those concepts that felt so obviously right. A missing component in my alignment ontology that just clicked and made everything stronger.

 

Selection Theorems as a Compelling Agent Foundations Paradigm

There are many reasons to be sympathetic to agent foundations style safety research as it most directly engages the hard problems/core confusions of alignment/safety. However, one concer... (read more)

This will not be a full review—it's more of a drive-by comment which I think is relevant to the review process.

However, the defense establishment has access to classified information and models that we civilians do not have, in addition to all the public material. I’m confident that nuclear war planners have thought deeply about the risks of climate change from nuclear war, even though I don’t know their conclusions or bureaucratic constraints.

I am extremely skeptical of and am not at all confident in this conclusion. Ellsberg's The Doomsday Machine descri... (read more)

Frames that describe perception can become tools for controlling perception.

The idea of simulacra has been generative here on LessWrong, used by Elizabeth in her analysis of negative feedback, and by Zvi in his writings on Covid-19. It appears to originate in private conversations between Benjamin Hoffman and Jessica Taylor. The four simulacra levels or stages are a conception of Baudrillard’s, from Simulacra and Simulation. The Wikipedia summary quoted on the original blog post between Hoffman and Taylor has been reworded several times by various authors ... (read more)

This post is based on the book Moral Mazes, which is a 1988 book describing "the way bureaucracy shapes moral consciousness" in US corporate managers. The central point is that it's possible to imagine relationship and organization structures in which unnecessarily destructive behavior, to self or others, is used as a costly signal of loyalty or status.

Zvi titles the post after what he says these behaviors are trying to avoid, motive ambiguity. He doesn't label the dynamic itself, so I'll refer to it here as "disambiguating destruction" (DD). Before procee... (read more)

Self Review.

I still endorse the broad thrusts of this post. But I think it should change at least somewhat. I'm not sure how extensively, but here are some considerations

Clearer distinctions between Prisoner's Dilemma and Stag Hunts

I should be more clear about what the game theoretical distinctions I'm actually making between Prisoners Dilemma and Stag Hunt. I think Rob Bensinger rightly criticized the current wording, which equivocates between "stag hunting is meaningfully different" and "'hunting rabbit' has nicer aesthetic properties than 'defect'".&nbs... (read more)

I revisited this post a few months ago, after Vaniver's review of Atlas Shrugged.

I've felt for a while that Atlas Shrugged has some really obvious easy-to-articulate problems, but also offers a lot of value in a much-harder-to-articulate way. After chewing on it for a while, I think the value of Atlas Shrugged is that it takes some facts about how incentives and economics and certain worldviews have historically played out, and propagates those facts into an aesthetic. (Specifically, the facts which drove Rand's aesthetics presumably came from growing up i... (read more)

I find it deeply sad that many of us feel the need to frequently link to this article - I don't think I have ever done so, because if I need to explain local validity, then perhaps I'm talking to the wrong people? But certainly the ignoring of this principle has gotten more and more blatant and common over time since this post, so it's becoming less reasonable to assume that people understand such things. Which is super scary.

Hi, I'm pleased to see that this has been nominated and has made a lasting impact.

Do I have any updates? I think it aged well. I'm not making any particular specific claims here, but I still endorse this and think it's an important concept.

I've done very little further thinking on this. I was quietly hoping that others might pick up the mantle and write more on strategies for caring less, as well as cases where this should be argued. I haven't seen this, but I'd love to see more of it.

I've referred to it myself when talking about values that I think people

... (read more)

This was counter to the prevailing narrative at the time, and I think did some of the work of changing the narrative. It's of historical significance, if nothing else.

Shoulder!Justis telling me to replace "it", "this", etc with real nouns is maybe the most legible improvement in my writing over the last few years.

I remain pretty happy with most of this, looking back -- I think this remains clear, accessible, and about as truthful as possible without getting too technical.

I do want to grade my conclusions / predictions, though.

(1). I predicted that this work would quickly be exceeded in sample efficiency. This was wrong -- it's been a bit over a year and EfficientZero is still SOTA on Atari. My 3-to-24-month timeframe hasn't run out, but I said that I expected "at least a 25% gain" towards the start of the time, which hasn't happened.

(2). There has been a shift to... (read more)

This is a long and good post with a title and early framing advertising a shorter and better post that does not fully exist, but would be great if it did. 

The actual post here is something more like "CFAR and the Quest to Change Core Beliefs While Staying Sane." 

The basic problem is that people by default have belief systems that allow them to operate normally in everyday life, and that protect them against weird beliefs and absurd actions, especially ones that would extract a lot of resources in ways that don't clearly pay off. And they similarl... (read more)

The goal of this post is to help us understand the similarities and differences between several different games, and to improve our intuitions about which game is the right default assumption when modeling real-world outcomes.

My main objective with this review is to check the game theoretic claims, identify the points at which this post makes empirical assertions, and see if there are any worrisome oversights or gaps. Most of my fact-checking will just be resorting to Wikipedia.

Let’s start with definitions of two key concepts.

Pareto-optimal: One dimension ... (read more)

I've alluded to this in other comments, but I think worth spelling out more comprehensively here.

I think this post makes a few main points:

  1. Categories are not arbitrary. You might need different categories for different purposes, but categories are for helping you think about the things you care about, and a category that doesn't correspond to the territory will be less helpful for thinking and communciating.
  2. Some categories might sort of look like they correspond to something in reality, but they are gerrymandered in a way optimized for deception. 
  3. You
... (read more)

There are two aspects of this post worth reviewing: as an experiment in a different mode of discourse, and as a description of the procession of simulacra, a schema originally advanced by Baudrillard.

As an experiment in a diffferent mode of discourse, I think this was a success on its own terms, and a challenge to the idea that we should be looking for the best blog posts rather than the behavior patterns that lead to the best overall discourse.

The development of the concept occurred over email quite naturally without forceful effort. I would have written ... (read more)

The material here is one seed of a worldview which I've updated toward a lot more over the past year. Some other posts which involve the theme include Science in a High Dimensional World, What is Abstraction?, Alignment by Default, and the companion post to this one Book Review: Design Principles of Biological Circuits.

Two ideas unify all of these:

  1. Our universe has a simplifying structure: it abstracts well, implying a particular kind of modularity.
  2. Goal-oriented systems in our universe tend to evolve a modular structure which reflects the structure of the u
... (read more)

I notice I am confused.

I feel as though these type of posts add relatively little value to LessWrong, however, this post has quite a few upvotes. I don’t think novelty is a prerequisite for a high-quality post, but I feel as though this post was both not novel and not relevant, which worries me. I think that most of the information presented in this article is a. Not actionable b. Not related to LessWrong, and c. Easily replaceable with a Wikipedia or similar search. This would be my totally spot balled test for a topical post: at least one of these 3 must... (read more)

This post kills me. Lots of great stuff, and I think this strongly makes the cut. Sarah has great insights into what is going on, then turns away from them right when following through would be most valuable. The post is explaining why she and an entire culture is being defrauded by aesthetics. That is it used to justify all sorts of things, including high prices and what is cool, based on things that have no underlying value. How it contains lots of hostile subliminal messages that are driving her crazy. It's very clear. And then she... doesn't see the fnords. So close!

This post should be included in the Best-of-2018 compilation.

This is not only a good post, but one which cuts to the core of what this community is about. This site began not as a discussion of topics X, Y, and Z, but as a discussion of how to be... less wrong than the world around you (even/especially your own ingroup), and the difficulties this entails. Uncompromising honesty and self-skepticism are hard, and even though the best parts are a distillation of other parts of the Sequences, people need to be reminded more often than they need to be instructed.

I think this post was quite helpful. I think it does a good job laying out a fairly complete picture of a pretty reasonable safety plan, and the main sources of difficulty. I basically agree with most of the points. Along the way, it makes various helpful points, for example introducing the "action risk vs inaction risk" frame, which I use constantly. This post is probably one of the first ten posts I'd send someone on the topic of "the current state of AI safety technology".

I think that I somewhat prefer the version of these arguments that I give in e.g. ... (read more)

This post didn't lead to me discovering any new devices, and I haven't heard from anyone who found something they valued via it. So overall not a success, but it was easy to write so I don't regret the attempt. 

I have three views on this post.

One view: The first section (say, from "While working as the curriculum director" to "Do you know what you are doing, and why you are doing it?") I want as its own post. The Fundamental Question is too short. (https://www.lesswrong.com/posts/xWozAiMgx6fBZwcjo/the-fundamental-question) I think this is a useful question to have loaded in a person's brain, and the first section of this post explains how to use it and makes a pitch for why it's important. I haven't yet linked someone to How To: A Workshop (or anything) and told ... (read more)

I think this is still one of the most comprehensive and clear resources on counterpoints to x-risk arguments. I have referred to this post and pointed people to a number of times. The most useful parts of the post for me were the outline of the basic x-risk case and section A on counterarguments to goal-directedness (this was particularly helpful for my thinking about threat models and understanding agency). 

I think it's a bit hard to tell how influential this post has been, though my best guess is "very". It's clear that sometime around when this post was published there was a pretty large shift in the strategies that I and a lot of other people pursued, with "slowing down AI" becoming a much more common goal for people to pursue.

I think (most of) the arguments in this post are good. I also think that when I read an initial draft of this post (around 1.5 years ago or so), and had a very hesitant reaction to the core strategy it proposes, that I was picking up... (read more)

When this post came out, I left a comment saying:

It is not for lack of regulatory ideas that the world has not banned gain-of-function research.

It is not for lack of demonstration of scary gain-of-function capabilities that the world has not banned gain-of-function research.

What exactly is the model by which some AI organization demonstrating AI capabilities will lead to world governments jointly preventing scary AI from being built, in a world which does not actually ban gain-of-function research?

Given how the past year has gone, I should probably lose at... (read more)

I find this post fairly uninteresting, and feel irritated when people confidently make statements about "simulacra." One problem is, on my understanding, that it doesn't really reduce the problem of how LLMs work. "Why did GPT-4 say that thing?" "Because it was simulating someone who was saying that thing." It does postulate some kind of internal gating network which chooses between the different "experts" (simulacra), so it isn't contentless, but... Yeah. 

Also I don't think that LLMs have "hidden internal intelligence", given e.g LLMs trained on “A i... (read more)

I don't have any substantive comment to provide at the moment, but I want to share that this is the post that piqued my initial interest in alignment. It provided a fascinating conceptual framework around how we can qualitatively describe the behavior of LLMs, and got me thinking about implications of more powerful future models. Although it's possible that I would eventually become interested in alignment, this post (and simulator theory broadly) deserve a large chunk of the credit. Thanks janus.

End-of-2023 author retrospective: 

Yeah, this post holds up. I'm proud of it. The Roman dodecahedron and the fox lady still sit proudly on my desk.

I got the oldest known example wrong, but this was addressed in the sequel post: Who invented knitting? The plot thickens. If you haven't read the sequel, where I go looking for the origins of knitting, you will enjoy it. Yes, even if you're here for the broad ideas about history rather than specifically knitting. (That investigation ate my life for a few months in there. Please read it. 🥺)

I'm extremely ple... (read more)

I think this post paints a somewhat inaccurate view of the past.

The post claims that MIRI's talk of recursive self-improvement from a seed AI came about via MIRI’s attempts to respond to claims such as "AI will never exceed human capabilities" or "Growth rates post AI will be like growth rates beforehand." Thus, the post says, people in MIRI spoke of recursive self-improvement from a seed AI not because they thought this was a particularly likely mainline future -- but because they thought this was one obvious way that AI -- past a certain level of develop... (read more)

ELK was one of my first exposures to AI safety. I participated in the ELK contest shortly after moving to Berkeley to learn more about longtermism and AI safety. My review focuses on ELK’s impact on me, as well as my impressions of how ELK affected the Berkeley AIS community.

Things about ELK that I benefited from

Understanding ARC’s research methodology & the builder-breaker format. For me, most of the value of ELK came from seeing ELK’s builder-breaker research methodology in action. Much of the report focuses on presenting training strategies and pres... (read more)

I suppose, with one day left to review 2021 posts, I can add my 2¢ to my own here.

Overall I still like this post. I still think it points at true things and says them pretty well.

I had intended it as a kind of guide or instruction manual for anyone who felt inspired to create a truly potent rationality dojo. I'm a bit saddened that, to the best of my knowledge, no one seems to have taken what I named here and made it their own enough to build a Beisutsu dojo. I would really have liked to see that.

But this post wasn't meant to persuade anyone to do it. It w... (read more)

If you judge your social media usage by whether the average post you read is good or bad, you are missing half of the picture. The rapid context switching incurs an invisible cost even if the interaction itself is positive, as does the fact that you expect to be interrupted. "[T]he knowledge that interruptions could come at every time will change your mental state", as Elizabeth puts it.

This is the main object-level message of this post, and I don't have any qualms with it. It's very similar to what Sam Harris talks about a lot (e.g., here), and it seems t... (read more)

The post is still largely up-to-date. In the intervening year, I mostly worked on the theory of regret bounds for infra-Bayesian bandits, and haven't made much progress on open problems in infra-Bayesian physicalism. On the other hand, I also haven't found any new problems with the framework.

The strongest objection to this formalism is the apparent contradiction between the monotonicity principle and the sort of preferences humans have. While my thinking about this problem evolved a little, I am still at a spot where every solution I know requires biting a... (read more)

Alexandros Marinos (LW profile) has a long series where he reviewed Scott's post:

The Potemkin argument is my public peer review of Scott Alexander’s essay on ivermectin. In this series of posts, I go through that essay in detail, working through the various claims made and examining their validity. My essays will follow the structure of Scott’s essay, structured in four primary units, with additional material to follow

 This is his summary of the series, and this is the index. Here's the main part of the index:

Introduction

Part 1: Introduction (TBC)

Part

... (read more)

The post claims:

I have investigated this issue in depth and concluded that even a full scale nuclear exchange is unlikely (<1%) to cause human extinction.

This review aims to assess whether having read the post I can conclude the same.

The review is split into 3 parts:

  • Epistemic spot check
  • Examining the argument
  • Outside the argument

Epistemic spot check

Claim: There are 14,000 nuclear warheads in the world.

Assessment: True

Claim: Average warhead yield <1 Mt, probably closer to 100kt

Assessment: Probably true, possibly misleading. Values I found were:

... (read more)

Simulacra levels were probably the biggest incorporation to the rationalist canon in 2020. This was one of maybe half-a-dozen posts which I think together cemented the idea pretty well. If we do books again, I could easily imagine a whole book on simulacra, and I'd want this post in it.

I've stepped back from thinking about ML and alignment the last few years, so I don't know how this fits into the discourse about it, but I felt like I got important insight here and I'd be excited to include this. The key concept that bigger models can be simpler seems very important. 

In my words, I'd say that when you don't have enough knobs, you're forced to find ways for each knob to serve multiple purposes slash combine multiple things, which is messy and complex and can be highly arbitrary, whereas with lots of knobs you can do 'the thing you na... (read more)

The notion of specificity may be useful, but to me its presentation in terms of tone (beginning with the title "The Power to Demolish Bad Arguments") and examples seemed rather antithetical to the Less Wrong philosophy of truth-seeking.

For instance, I read the "Uber exploits its drivers" example discussion as follows: the author already disagrees with the claim as their bottom line, then tries to win the discussion by picking their counterpart's arguments apart, all the while insulting this fictitious person with asides like "By sloshing around his mental ... (read more)

This is a self-review, looking back at the post after 13 months.

I have made a few edits to the post, including three major changes:
1. Sharpening my definition of what counts as "Rationalist self-improvement" to reduce confusion. This post is about improved epistemics leading to improved life outcomes, which I don't want to conflate with some CFAR techniques that are basically therapy packaged for skeptical nerds.
2. Addressing Scott's "counterargument from market efficiency" that we shouldn't expect to invent easy self-improvement techniques that haven't be... (read more)

(Self-review.) I've edited the post to include the calculation as footnote 10.

The post doesn't emphasize this angle, but this is also more-or-less my abstract story for the classic puzzle of why disagreement is so prevalent, which, from a Bayesian-wannabe rather than a human perspective, should be shocking: there's only one reality, so honest people should get the same answers. How can it simultaneously be the case that disagreement is ubiquitous, but people usually aren't outright lying? Explanation: the "dishonesty" is mostly in the form... (read more)

I'm a bit torn here, because the ideas in the post seem really important/useful to me (e.g., I use these phrases as a mental pointer sometimes), such that I'd want anyone trying to make sense of the human situation to have access to them (via this post or a number of other attempts at articulating much the same, e.g. "Elephant and the Brain"). And at the same time I think there's some crucial misunderstanding in it that is dangerous and that I can't articulate. Voting for it anyhow though.

[Update: the new version is now live!!]

[Author writing here.]

The initial version of this post was written quickly on a whim, but given the value people have gotten from this post (as evidenced by the 2018 Review nomination and reviews), I think it warrants a significant update which I plan to write in time for possibly publication in a book, and ideally the Review voting stage.

Things I plan to include in the update:

... (read more)

I hadn't realized this post was nominated, partially because of my comment, so here's a late review. I basically continue to agree with everything I wrote then, and I continue to like this post for those reasons, and so I support including it in the LW Review.

Since writing the comment, I've come across another argument for thinking about intent alignment -- it seems like a "generalization" of assistance games / CIRL, which itself seems like a formalization of an aligned agent in a toy setting. In assistance games, the agent explici... (read more)

Many people pointed out that the real cost of a Bitcoin in 2011 or whenever wasn't the couple of cents that it cost, but the several hours of work it would take to figure out how to purchase it. And that costs needed to be discounted by the significant risk that a Bitcoin purchased in 2011 would be lost or hacked - or by the many hours of work it would have taken to ensure that didn't happen. Also, that there was another hard problem of not selling your 2011-Bitcoins in 2014. I agree that all of these are problems with the original post, and tha... (read more)

I was going to write a longer review but I realised that Ben’s curation notice actually explains the strengths of this post very well so you should read that!

In terms of including this in the 2018 review I think this depends on what the review is for.

If the review is primarily for the purpose of building common knowledge within the community then including this post maybe isn’t worth it as it is already fairly well known, having been linked from SSC.

On the other hand if the review process is at least partly for, as Raemon put it:

“I wan... (read more)

This is a review of my own post.

The first thing to say is that for the 2018 Review Eli’s mathematicians post should take precedence because it was him who took up the challenge in the first place and inspired my post. I hope to find time to write a review on his post.

If people were interested (and Eli was ok with it) I would be happy to write a short summary of my findings to add as a footnote to Eli’s post if it was chosen for the review.

***

This was my first post on LessWrong and looking back at it I think it still holds up fairly well.

There... (read more)

Seems to me like a blindingly obvious post that was kind of outside of the overton window for too long. Eliezer also smashed the window with his TIME article, but this was first, so I think it's still a pretty great post. +4

I should acknowledge first that I understand that writing is hard. If the only realistic choice was between this post as it is, and no post at all, then I'm glad we got the post rather than no post.

That said, by the standards I hold my own writing to, I would embarrassed to publish a post like this which criticizes imaginary paraphrases of researchers, rather than citing and quoting the actual text they've actually published. (The post acknowledges this as a flaw, but if it were me, I wouldn't even publish.) The reason I don't think critics necessarily nee... (read more)

Epistemic status: I read the entire post slowly, taking careful sentence-by-sentence notes. I felt I understood the author's ideas and that something like the general dynamic they describe is real and important. I notice this post is part of a larger conversation, at least on the internet and possibly in person as well, and I'm not reading the linked background posts. I've spent quite a few years reading a substantial portion of LessWrong and LW-adjacent online literature and I used to write regularly for this website.

This post is long and complex. Here ar... (read more)

[This is a self-review because I see that no one has left a review to move it into the next phase. So8res's comment would also make a great review.]

I'm pretty proud of this post for the level of craftsmanship I was able to put into it. I think it embodies multiple rationalist virtues. It's a kind of "timeless" content, and is a central example of the kind of content people want to see on LW that isn't stuff about AI.

It would also look great printed in a book. :)

I remember reading this post and thinking it is very good and important. I have since pretty much forgot about it and it's insights, probably because I didn't think much about GDPs anyway. Rereading the post, I maintain that it is very good and important. Any discussion of GDP should be with the understanding of what this post says, which I summarized to myself like so (It's mostly a combination of edited excerpts from the post):

Real GDP is usually calculated by adding up the total dollar value of all goods, using prices from some recent year (every few ye

... (read more)

Thermodynamics is the deep theory behind steam engine design (and many other things) -- it doesn't tell you how to build a steam engine, but to design a good one you probably need to draw on it somewhat.

This post feels like a gesture at a deep theory behind truth-oriented forum / community design (and many other things) -- it certainly doesn't help tell you how to build one, but you have to think at least around what it talks about to design a good one. Also applicable to many other things, of course.

It also has virtue of being very short. Per-word one of my favorite posts.

I read this sequence and then went through the whole thing.  Without this sequence I'd probably still be procrastinating / putting it off.  I think everything else I could write in review is less important than how directly this impacted me.

Still, a review: (of the whole sequence, not just this post)

First off, it signposts well what it is and who it's for.  I really appreciate when posts do that, and this clearly gives the top level focus and whats in/out.

This sequence is "How to do a thing" - a pretty big thing, with a lot of steps and bran... (read more)

I'm not sure I use this particular price mechanism fairly often, but I think this post was involved in me moving toward often figuring out fair prices for things between friends and allies, which I think helps a lot. The post puts together lots of the relevant intuitions, which is what's so helpful about it. +4

Summary

I summarize this post in a slightly reverse order. In AI alignment, one core question is how to think about utility maximization. What are agents doing that maximize utility? How does embeddedness play into this? What can we prove about such agents? Which types of systems become maximizers of utility in the first place?

This article reformulates expected utility maximization in equivalent terms in the hopes that the new formulation makes answering such questions easier. Concretely, a utility function u is given, and the goal of a u-maximizer is to ch... (read more)

This gave a satisfying "click" of how the Simulacra and Staghunt concepts fit together. 

Things I would consider changing:

1. Lion Parable. In the comments, John expands on this post with a parable about lion-hunters who believe in "magical protection against lions." That parable is actually what I normally think of when I think of this post, and I was sad to learn it wasn't actually in the post. I'd add it in, maybe as the opening example.

2. Do we actually need the word "simulacrum 3"? Something on my mind since last year's review is "how much work are... (read more)

I still think this is basically correct, and have raised my estimation of how important it is in x-risk in particular.  The emphasis on doing The Most Important Thing and Making Large Bets push people against leaving slack, which I think leads to high value but irregular opportunities for gains being ignored.

I generally endorse the claims made in this post and the overall analogy. Since this post was written, there are a few more examples I can add to the categories for slow takeoff properties. 

Learning from experience

  • The UK procrastinated on locking down in response to the Alpha variant due to political considerations (not wanting to "cancel Christmas"), though it was known that timely lockdowns are much more effective.
  • Various countries reacted to Omicron with travel bans after they already had community transmission (e.g. Canada and the UK), while it wa
... (read more)

I don't think this post added anything new to the conversation, both because Elizabeth Van Nostrand's epistemic spot check found essentially the same result previously and because, as I said in the post, it's "the blog equivalent of a null finding." 

I still think it's slightly valuable - it's useful to occasionally replicate reviews. 

(For me personally, writing this post was quite valuable - it was a good opportunity to examine the evidence for myself, try to appropriately incorporate the different types of evidence into my prior, and form my own opinions for when clients ask me related questions.) 

This came out in April 2019, and bore a lot of fruit especially in 2020. Without it, I wouldn't have thought about the simulacra concept and developed the ideas, and without those ideas, I don't think I would have made anything like as much progress understanding 2020 and its events, or how things work in general. 

I don't think this was an ideal introduction to the topic, but it was highly motivating regarding the topic, and also it's a very hard topic to introduce or grok, and this was the first attempt that allowed later attempts. I think we should reward all of that.

This is an excellent post, with a valuable and well-presented message. This review is going to push back a bit, talk about some ways that the post falls short, with the understanding that it's still a great post.

There's this video of a toddler throwing a tantrum. Whenever the mother (holding the camera) is visible, the child rolls on the floor and loudly cries. But when the mother walks out of sight, the toddler soon stops crying, gets up, and goes in search of the mother. Once the toddler sees the mother again, it's back to rolling on the floor crying.

A k... (read more)

This is my post. It is fundamentally a summary of an overview paper, which I wrote to introduce the concept to the community, and I think it works for that purpose. In terms of improvements there are a few I would make; I would perhaps include the details about why people choose megaprojects as a venue, for completeness' sake.  It might have helped if I provided more examples in the post to motivate engagement; these are projects like powerplants, chip fabs, oil rigs and airplanes, or in other words the fundamental blocks of modern civilization.

I cont... (read more)

This is a cogent, if sparse, high-level analysis of the epistemic distortions around megaprojects in AI and other fields.

It points out that projects like the human brain project and the fifth generation computer systems project made massive promises, raised around a billion dollars, and totally flopped. I don't expect this was a simple error, I expect there were indeed systematic epistemic distortions involved, perpetuated at all levels.

It points out that similar scale projects are being evaluated today involving various major AI companies globally, and po... (read more)

This post introduces a potentially very useful model, both for selecting problems to work on and for prioritizing personal development. This model could be called "The Pareto Frontier of Capability". Simply put:

  1. By an efficient markets-type argument, you shouldn't expect to have any particularly good ways of achieving money/status/whatever - if there was an unusually good way of doing that, somebody else would already be exploiting it.
  2. The exception to this is that if only a small amount of people can exploit an opportunity, you may have a shot. So you s
... (read more)

As has been mentioned elsewhere, this is a crushingly well-argued piece of philosophy of language and its relation to reasoning. I will say this post strikes me as somewhat longer than it needs to be, but that's also my opinion on much of the Sequences, so it is at least traditional.

Also, this piece is historically significant because it played a big role in litigating a community social conflict (which is no less important for having been (being?) mostly below the surface), and set the stage for a lot of further discussion. I think it's very important tha... (read more)

Review by the author:

I continue to endorse the contents of this post.

I don't really think about the post that much, but the post expresses a worldview that shapes how I do my research - that agency is a mechanical fact about the workings of a system.

To me, the main contribution of the post is setting up a question: what's a good definition of optimisation that avoids the counterexamples of the post? Ideally, this definition would refer or correspond to the mechanistic properties of the system, so that people could somehow statically determine whether a giv

... (read more)

I wrote about this post extensively as part of my essay on Rationalist self-improvement. The general idea of this post is excellent: gathering data for a clever natural experiment of whether Rationalists actually win. Unfortunately, the analysis itself is very lacking and is not very data-driven.

The core result is: 15% of SSC readers who were referred by LessWrong made over $1,000 in crypto, 3% made $100,000. These quantities require quantitative analysis: Is 15%/3% a lot or a little compared to matched groups like the Silicon Valley or Libertarian blogosp... (read more)

It strikes me as pedagogically unfortunate that sections i. and ii. (on arguments and proof-steps being locally valid) are part of the same essay as as sections iii.–vi. (on what this has to do with the function of Law in Society). Had this been written in the Sequences-era, one would imagine this being (at least) two separate posts, and it would be nice to have a reference link for just the concept of argumentative local validity (which is obviously correct and important to have a name for, even if some of the speculations about Law in sections iii.–vi. turned out to be wrong).

+9. This is a powerful set of arguments pointing out how humanity will literally go extinct soon due to AI development (or have something similarly bad happen to us). A lot of thought and research went into an understanding of the problem that can produce this level of understanding of the problems we face, and I'm extremely glad it was written up.

This is IMO actually a really important topic, and this is one of the best posts on it. I think it probably really matters whether the AIs will try to trade with us or care about our values even if we had little chance of making our actions with regards to them conditional on whether they do. I found the arguments in this post convincing, and have linked many people to it since it came out. 

Lightcone has evolved a bit since Jacob wrote this, and also I have a somewhat different experience from Jacob. 

Updates:

  • "Meeting day" is really important to prevent people being blocked by meetings all week, but, it's better to do it on Thursday than Tuesday (Tuesday Meeting Days basically kill all the momentum you built up on Monday)
  • We hit the upper limits of how many 1-1 public DM channels really made sense (because it grew superlinearly with the number of employees). We mostly now have "wall channels" (i.e. raemon-wall), where people who want to me
... (read more)

This was one of those posts that I dearly wish somebody else besides me had written, but nobody did, so here we are. I have no particular expertise. (But then again, to some extent, maybe nobody does?)

I basically stand by everything I wrote here. I remain pessimistic for reasons spelled out in this post, but I also still have a niggling concern that I haven’t thought these things through carefully enough, and I often refer to this kind of stuff as “an area where reasonable people can disagree”.

If I were rewriting this post today, three changes I’d make wou... (read more)

I'm reaffirming my relatively long review of Drexler's full QNR paper.

Drexler's QNR proposal seems like it would, if implemented, guide AI toward more comprehensible systems. It might modestly speed up capabilities advances, while being somewhat more effective at making alignment easier.

Alas, the full paper is long, and not an easy read. I don't think I've managed to summarize its strengths well enough to persuade many people to read it.

This post and its companion have even more resonance now that I'm deeper into my graduate education and conducting my research more independently.

Here, the key insight is that research is an iterative process of re-scoping the project and execution on the current version of the plan. You are trying to make a product sufficient to move the conversation forward, not (typically) write the final word on the subject.

What you know, what resources you have access to, your awareness of what people care about, and what there's demand for, depend on your output. Tha... (read more)

This post is one of the best available explanations of what has been wrong with the approach used by Eliezer and people associated with him.

I had a pretty favorable recollection of the post from when I first read it. Rereading it convinced me that I still managed to underestimate it.

In my first pass at reviewing posts from 2022, I had some trouble deciding which post best explained shard theory. Now that I've reread this post during my second pass, I've decided this is the most important shard theory post. Not because it explains shard theory best, but bec... (read more)

Clearly a very influential post on a possible path to doom from someone who knows their stuff about deep learning! There are clear criticisms, but it is also one of the best of its era. It was also useful for even just getting a handle on how to think about our path to AGI.

The post is influential, but makes multiple somewhat confused claims and led many people to become confused. 

The central confusion stems from the fact that genetic evolution already created a lot of control circuitry before inventing cortex, and did the obvious thing to 'align' the evolutionary newer areas: bind them to the old circuitry via interoceptive inputs. By this mechanism, genome is able to 'access' a lot of evolutionary relevant beliefs and mental models. The trick is the higher/more distant to genome models are learned in part to predict in... (read more)

My review mostly concerns the SMTM's A Chemical Hunger part of this review. RaDVaC was interesting if not particularly useful, but SMTM's series has been noted by many commenters to be a strange theory, possibly damaging, and there were, as of my last check, no response by SMTM to the various rebuttals.

It does not behoove rationalism to have members that do not respond to critical looks at their theories. They stand to do a lot of damage and cost a lot of lives if taken seriously.

Guzey substantially retracted this a year later. I think it would be great to publish both together as a case study of self-experimentation, but would be against publishing this on its own. 

Get minimum possible sustainable amount of sleep -> get enough sleep to have maximum energy during the day

Sleep makes me angry. I mean, why on Earth do I have to spend hours every day lying around unconscious?????????

In 2019, trying to learn about the science behind sleep I read Why We Sleep and got so angry at it for being essentially pseudoscience that I spent

... (read more)

Reviewing this quickly because it doesn't have a review.

I've linked this post to several people in the last year. I think it's valuable for people (especially junior researchers or researchers outside of major AIS hubs) to be able to have a "practical sense" of what doing independent alignment research can be like, how the LTFF grant application process works, and some of the tradeoffs of doing this kind of work. 

This seems especially important for independent conceptual work, since this is the path that is least well-paved (relative to empirical work... (read more)

The first elephant seal barely didn't make it into the book, but this is our last chance. Will the future readers of LessWrong remember the glory of elephant seal?

I like this post in part because of the dual nature of the conclusion, aimed at two different audiences. Focusing on the cost of implementing various coordination schemes seems... relatively unexamined on LW, I think. The list of life-lessons is intelligible, actionable, and short.

On the other hand, I think you could probably push it even further in "Secret of Our Success" tradition / culture direction. Because there's... a somewhat false claim in it: "Once upon a time, someone had to be the first person to invent each of these concepts."

This seems false ... (read more)

Returning to this essay, it continues to be my favorite Paul post (even What Failure Looks Like only comes second), and I think it's the best way to engage with Paul's work than anything else (including the Eliciting Latent Knowledge document, which feels less grounded in the x-risk problem, is less in Paul's native language, and gets detailed on just one idea for 10x the space thus communicating less of the big picture research goal). I feel I can understand all the arguments made in this post. I think this should be mandatory reading before reading Elici... (read more)

Epistemic Status: I don't actually know anything about machine learning or reinforcement learning and I'm just following your reasoning/explanation.

 

From each state, we can just check each possible action against the action-value function $q(s_t, a_t), and choose the action that returns the highest value from the action-value function. Greedy search against the action-value function for the optimal policy is thus equivalent to the optimal policy. For this reason, many algorithms try to learn the action-value function for the optimal policy.

This do... (read more)

The combination of this post, and an earlier John post (Parable of the Dammed) has given me some better language for understanding what's going on in negotiations and norm-setting, two topics that I think are quite valuable. The concept of "you could actually move the Empire State Building, maybe, and that'd affect the Schelling point of meeting places", was a useful intuition pump for both "you can move norm Schelling points around" (as well as how difficult to think of that task as).

Two years later, I suppose we know more than we did when the article was written. I would like to read some postscript explaining how well this article has aged.

Both this document and John himself have been useful resources to me as I launch into my own career studying aging in graduate school. One thing I think would have been really helpful here are more thorough citations and sourcing. It's hard to follow John's points ("In sarcopenia, one cross-section of the long muscle cell will fail first - a “ragged red” section - and then failure gradually spreads along the length.") and trace them back to any specific source, and it's also hard to know which of the synthetic insights are original to John and which are in... (read more)

I’ll set aside what happens “by default” and focus on the interesting technical question of whether this post is describing a possible straightforward-ish path to aligned superintelligent AGI.

The background idea is “natural abstractions”. This is basically a claim that, when you use an unsupervised world-model-building learning algorithm, its latent space tends to systematically learn some patterns rather than others. Different learning algorithms will converge on similar learned patterns, because those learned patterns are a property of the world, not an ... (read more)

You can see my other reviews from this and past years, and check that I don't generally say this sort of thing:

This was the best post I've written in years. I think it distilled an idea that's perennially sorely needed in the EA community, and presented it well. I fully endorse it word-for-word today.

The only edit I'd consider making is to have the "Denial" reaction explicitly say "that pit over there doesn't really exist".

(Yeah, I know, not an especially informative review - just that the upvote to my past self is an exceptionally strong one.)

One factor no one mentions here is the changing nature of our ability to coordinate at all. If our ability to coordinate in general is breaking down rapidly, which seems at least highly plausible, then that will likely carry over to AGI, and until that reverses it will continuously make coordination on AGI harder same as everything else. 

In general, this post and the answers felt strangely non-"messy" in that sense, although there's also something to be said for the abstract view. 

In terms of inclusion, I think it's a question that deserves more thought, but I didn't feel like the answers here (in OP and below) were enlightening enough to merit inclusion. 

I chose this particular post to review because I think it does a great job of highlighting soe of the biases and implicit assumptions that Zack makes throughout the rest of the sequence. Therefore this review should be considered not just a review of this post, but also all subsequent posts in Zack's sequence.

Firstly, I think the argument Zack is making here is reasonable. He's saying that if a fact is relevant to an argument it should be welcome, and if it's not relevant to an argument it should not be.

Throughout the rest of the sequence, he continues to ... (read more)

Post is very informal. It reads like, well, a personal blog post. A little in the direction of raw freewriting. It's fluid. Easy to read and relate to.

That matters, when you're trying to convey nuanced information about how minds work. Relatable means the reader is making connections with their personal experiences; one of the most powerful ways to check comprehension and increase retention. This post shows a subtle error as it appears from the inside. It doesn't surprise me that this post sparked some rich discussion in the comments.

To be frank, I'd be ve... (read more)

I still broadly agree with everything that I said in this post. I do feel that it is a little imprecise, in that I now have much more detailed and gears-y models for many of its claims. However, elaborating on those would require an entirely new post (one which I currently working on) with a sequence's worth of prerequisites. So if I were to edit this post, I would probably mostly leave it as it is, but include a pointer to the new post once it's finished.

In terms of this post being included in a book, it is worth noting that the post situates it... (read more)

The LW team is encouraging authors to review their own posts, so:

In retrospect, I think this post set out to do a small thing, and did it well. This isn't a grand concept or a vast inferential distance, it's just a reframe that I think is valuable for many people to try for themselves.

I still bring up this concept quite a lot when I'm trying to help people through their psychological troubles, and when I'm reflecting on my own.

I don't know whether the post belongs in the Best of 2018, but I'm proud of it.

Insofar as the AI Alignment Forum is part of the Best-of-2018 Review, this post deserves to be included. It's the friendliest explanation to MIRI's research agenda (as of 2018) that currently exists.

I don't think this would fit into the 2022 review. Project Lawful has been quite influential, but I find it hard to imagine a way its impact could be included in a best-of.

Including this post in particular strikes me as misguided, as it contains none of the interesting ideas and lessons from Project Lawful, and thus doesn't make any intellectual progress.

One could try to do the distillation of finding particularly interesting or enlightening passages from the text, but that would be

  1. A huge amount of work[1], but maybe David Udell's sequence could be used
... (read more)

This post publicly but non-confrontationally rebutting an argument that had been put forward and promoted by others was a tremendous community service, of a type we see too rarely, albeit far more often in this community than most. It does not engage in strawmanning, it clearly lays out both the original claim and the evidence, and it attempts to engage positively, including trying to find concrete predictions that the disputing party could agree with. 

I think this greatly moved community consensus on a moderately important topic in ways that were ver... (read more)

This post's point still seems correct, and it still seems important--I refer to it at least once a week.

I think this point is really crucial, and I was correct to make it, and it continues to explain a lot of disagreements about AI safety.

Weakly positive on this one overall.  I like Coase's theory of the firm, and like making analogies with it to other things.  I don't think this application felt like it quite worked to me, and trying to write up why.

One thing is I think feels off is an incomplete understanding of the Coase paper.  What I think the article gets correct: Coase looks at the difference between markets (economists preferred efficient mechanism) and firms / corporation, and observes that transaction costs (for people these would be contracts, but in general all tr... (read more)

Self-Review

I feel pretty happy with this post in hindsight! Nothing major comes to mind that I'd want to change.

I think that agency is a really, really important concept, and one of the biggest drivers of ways my life has improved. But the notion of agency as a legible, articulated concept (rather than just an intuitive notion) is foreign to a lot of people, and jargon-y. I don't think there was previously a good post cleanly explaining the concept, and I'm very satisfied that this one exists and that I can point people to it.

I particularly like my framin... (read more)

This post aims to clarify the definitions of a number of concepts in AI alignment introduced by the author and collaborators. The concepts are interesting, and some researchers evidently find them useful. Personally, I find the definitions confusing, but I did benefit a little from thinking about this confusion. In my opinion, the post could greatly benefit from introducing mathematical notation[1] and making the concepts precise at least in some very simplistic toy model.

In the following, I'll try going over some of the definitions and explicating my unde... (read more)

What this post does for me is that it encourages me to view products and services not as physical facts of our world, as things that happen to exist, but as the outcomes of an active creative process that is still ongoing and open to our participation. It reminds us that everything we might want to do is hard, and that the work of making that task less hard is valuable. Otherwise, we are liable to make the mistake of taking functionality and expertise for granted.

What is not an interface? That's the slipperiest aspect of this post. A programming language i... (read more)

An Orthodox Case Against Utility Functions was a shocking piece to me. Abram spends the first half of the post laying out a view he suspects people hold, but he thinks is clearly wrong, which is a perspective that approaches things "from the starting-point of the universe". I felt dread reading it, because it was a view I held at the time, and I used as a key background perspective when I discussed bayesian reasoning. The rest of the post lays out an alternative perspective that "starts from the standpoint of the agent". Instead of my beliefs being about t... (read more)

It would be slightly whimsical to include this post without any explanation in the 2020 review. Everything else in the review is so serious, we could catch a break from apocalypses to look at an elephant seal for ten seconds.

The central point of this article was that conformism was causing society to treat COVID-19 with insufficient alarm. Its goal was to give its readership social sanction and motivation to change that pattern. One of its sub-arguments was that the media was succumbing to conformity. This claim came with an implication that this post was ahead of the curve, and that it was indicative of a pattern of success among rationalists in achieving real benefits, both altruistically (in motivating positive social change) and selfishly (in finding alpha).

I thought it wo... (read more)

I think this post labels an important facet of the world, and skillfully paints it with examples without growing overlong. I liked it, and think it would make a good addition to the book.

There's a thing I find sort of fascinating about it from an evaluative perspective, which is that... it really doesn't stand on its own, and can't, as it's grounded in the external world, in webs of deference and trust. Paul Graham makes a claim about taste; do you trust Paul Graham's taste enough to believe it? It's a post about expertise that warns about snake oil salesm... (read more)

I liked this post a lot. In general, I think that the rationalist project should focus a lot more on "doing things" than on writing things. Producing tools like this is a great example of "doing things". Other examples include starting meetups and group houses.

So, I liked this post a) for being an example of "doing things", but also b) for being what I consider to be a good example of "doing things". Consider that quote from Paul Graham about "live in the future and build what's missing". To me, this has gotta be a tool that exists in the future, and I app... (read more)

Overall, you can break my and Jim's claims down into a few categories:
* Descriptions of things that had already happened, where no new information has overturned our interpretation (5)
* CDC made a guess with insufficient information, was correct (1- packages)
* CDC made a guess with insufficient information, we'll never know who was right because the terms were ambiguous (1- the state of post-quarantine individuals)
* CDC made a guess with insufficient information and we were right (1- masks)

That overall seems pretty good. It's great that covid didn't turn o... (read more)

(I am the author)

I still like & stand by this post. I refer back to it constantly. It does two things:

1. Argue that an AI-induced point of no return could significantly before, or significantly after, world GDP growth accelerates--and indeed will probably come before!

2. Argue that we shouldn't define timelines and takeoff speeds in terms of economic growth. So, against "is there a 4 year doubling before a 1 year doubling?" and against "When will we have TAI = AI capable of doubling the economy in 4 years if deployed?"

I think both things are pretty impo... (read more)

This post makes a straightforward analytic argument clarifying the relationship between reason and experience. The popularity of this post suggests that the ideas of cultural accumulation of knowledge, and the power of reason, have been politicized into a specious Hegelian opposition to each other. But for the most part neither Baconian science nor mathematics (except for the occasional Ramanujan) works as a human institution except by the accumulation of knowledge over time.

A good follow-up post would connect this to the ways in which modernist ideology p... (read more)

I had not read this post until just now. I think it is pretty great.

I also already had a vague belief that I should consume more timeless content. But, now I suddenly have a gearsy model that makes it a lot more intuitive why I might want to consume more timeless content. I also have a schema of how to think about "what is valuable to me?"

I bounced off this post the first couple times because, well, it opens with math and math makes my eyes glaze over and maybe it shouldn't cause that but it is what it is. I suspect it would be worth rewriting this post (or writing an alternate version), that puts the entire model in verbal-english front and center.

If this post is selected, I'd like to see the followup made into an addendum—I think it adds a very important piece, and it should have been nominated itself.

Self-review: Looking back, this post is one of the first sightings of a simple, very useful concrete suggestion to have chargers ready to go literal everywhere you might want them, and that is a remarkably large life improvement that got through to many people and that I'm very happy I realized. 

However, that could easily be more than all of this post's value, because essentially no one embraced the central concept of Duel Wielding the phones themselves. And after a few months, I stopped doing so as well, in favor of not getting confused about which p... (read more)

This post surprised me a lot. It still surprises me a lot, actually. I've also linked it a lot of times in the past year. 

The concrete context where this post has come up is in things like ML transparency research, as well as lots of theories about what promising approaches to AGI capabilities research are. In particular, there is a frequently recurring question of the type "to what degree do optimization processes like evolution and stochastic gradient descent give rise to understandable modular algorithms?". 

This was a profoundly impactful post and definitely belongs in the review. It prompted me and many others to dive deep into understanding how emotional learnings have coherence and to actually engage in dialogue with them rather than insisting they don't make sense. I've linked this post to people more than probably any other LessWrong post (50-100 times) as it is an excellent summary and introduction to the topic. It works well as a teaser for the full book as well as a standalone resource.

The post makes both conceptual and pragmatic claims. I haven't exa... (read more)

I'm trying out making some polls about posts for the Review (using the predictions feature). You can answer by hovering over the scale and clicking a number to indicate your agreement with the claim. 

Making more land out of the about 50mi^2 shallow water in the San Francisco Bay, South of the Dumbarton Bridge, would... 

... (read more)

As mentioned in my comment, this book review overcame some skepticism from me and explained a new mental model about how inner conflict works. Plus, it was written with Kaj's usual clarity and humility. Recommended.

This seems to me like a valuable post, both on the object level, and as a particularly emblematic example of a category ("Just-so-story debunkers") that would be good to broadly encourage.

The tradeoff view of manioc production is an excellent insight, and is an important objection to encourage: the original post and book (haven't read in the entirety) appear to have leaned to heavily on what might be described as a special case of a just-so story: the phenomena is a behavior difference is explained as an absolute by using a post-hoc framework, and then doe... (read more)

In a field like alignment or embedded agency, it's useful to keep a list of one or two dozen ideas which seem like they should fit neatly into a full theory, although it's not yet clear how. When working on a theoretical framework, you regularly revisit each of those ideas, and think about how it fits in. Every once in a while, a piece will click, and another large chunk of the puzzle will come together.

Selection vs control is one of those ideas. It seems like it should fit neatly into a full theory, but it's not yet clear what that will look like. I revis... (read more)

This review is more broadly of the first several posts of the sequence, and discusses the entire sequence. 

Epistemic Status: The thesis of this review feels highly unoriginal, but I can't find where anyone else discusses it. I'm also very worried about proving too much. At minimum, I think this is an interesting exploration of some abstract ideas. Considering posting as a top-level post. I DO NOT ENDORSE THE POSITION IMPLIED BY THIS REVIEW (that leaving immoral mazes is bad), AND AM FAIRLY SURE I'M INCORRECT.

The rough thesis of "Meditations on Moloch"... (read more)

To effectively extend on Raemon's commentary:

I think this post is quite good, overall, and adequately elaborates on the disadvantages and insufficiencies of the Wizard's Code of Honesty beyond the irritatingly pedantic idiomatic example. However, I find the implicit thesis of the post deeply confusing (that EY's post is less "broadly useful" than it initially appears). As I understand them, the two posts are saying basically identical things, but are focused in slightly different areas, and draw very different conclusions. EY's notes the issues with the wi... (read more)

This sort of thing is exactly what Less Wrong is supposed to produce. It's a simple, straightforward and generally correct argument, with important consequences for the world, which other people mostly aren't making. That LW can produce posts like this—especially with positive reception and useful discussion—is a vindication of this community's style of thought.

I haven't thought about the bat and ball question specifically very much since writing this post, but I did get a lot of interesting comments and suggestions that have sort of been rolling around my head in background mode ever since. Here's a few I wanted to highlight:

Is the bat and ball question really different to the others? First off, it was interesting to see how much agreement there was with my intuition that the bat and ball question was interestingly different to the other two questions in the CRT. Reading through the comments I count four other p

... (read more)

I view this post as providing value in three (related) ways:

  1. Making a pedagogical advancement regarding the so-called inner alignment problem
  2. Pointing out that a common view of "RL agents optimize reward" is subtly wrong
  3. Pushing for thinking mechanistically about cognition-updates

 

Re 1: I first heard about the inner alignment problem through Risks From Learned Optimization and popularizations of the work. I didn't truly comprehend it - sure, I could parrot back terms like "base optimizer" and "mesa-optimizer", but it didn't click. I was confused.

Some mon... (read more)

I feel like Project Lawful, as well as many of Lintamande's other glowfic since then, have given me a whole lot deeper an understanding of... a collection of virtues including honor, honesty, trustworthiness, etc, which I now mostly think of collectively as "Law".

I think this has been pretty valuable for me on an intellectual level—I think, if you show me some sort of deontological rule, I'm going to give a better account of why/whether it's a good idea to follow it than I would have before I read any glowfic.

It's difficult for me to separate how much of t... (read more)

Based on occasional conversations with new people, I would not be surprised if a majority of people who got into alignment between April 2022 and April 2023 did so mainly because of this post. Most of them say something like "man, I did not realize how dire the situation looked" or "I thought the MIRI folks were on it or something".

While the concept that looking at the truth even when it hurts is important isn't revolutionary in the community, I think this post gave me a much more concrete model of the benefits. Sure, I knew about the abstract arguments that facing the truth is valuable, but I don't know if I'd have identified it as an essential skill for starting a company, or as being a critical component of staying in a bad relationship. (I think my model of bad relationships was that people knew leaving was a good idea, but were unable to act on that information—but in retrospect inability to even consider it totally might be what's going on some of the time.)

I still endorse the breakdown of "sharp left turn" claims in this post. Writing this helped me understand the threat model better (or at all) and make it a bit more concrete.

This post could be improved by explicitly relating the claims to the "consensus" threat model summarized in Clarifying AI X-risk. Overall, SLT seems like a special case of that threat model, which makes a subset of the SLT claims: 

  • Claim 1 (capabilities generalize far) and Claim 3 (humans fail to intervene), but not Claims 1a/b (simultaneous / discontinuous generalization) or Claim
... (read more)

I continue to endorse this categorization of threat models and the consensus threat model. I often refer people to this post and use the "SG + GMG → MAPS" framing in my alignment overview talks. I remain uncertain about the likelihood of the deceptive alignment part of the threat model (in particular the requisite level of goal-directedness) arising in the LLM paradigm, relative to other mechanisms for AI risk. 

In terms of adding new threat models to the categorization, the main one that comes to mind is Deep Deceptiveness (let's call it Soares2), whi... (read more)

This is the kind of post I'd love to see more of on LW, so it's with a heavy heart I insist it not go into the review without edits. This post needs to mention interactions with medication much earlier and more prominently. It would be nice if you could count on people not taking suggestions like this until they'd read the entire post and done some of their own research, but you can't, and I think this caveat is important enough and easy enough to explain that it is worth highlighting at the very beginning. 

If you're using it regularly you also need to consider absorption of nutrients, although that's probably not a big deal for most people when used occasionally?

I think I still mostly stand behind the claims in the post, i.e. nuclear is undervalued in most parts of society but it's not as much of a silver bullet as many people in the rationalist / new liberal bubble would make it seem. It's quite expensive and even with a lot of research and de-regulation, you may not get it cheaper than alternative forms of energy, e.g. renewables. 

One thing that bothered me after the post is that Johannes Ackva (who's arguably a world-leading expert in this field) and Samuel + me just didn't seem to be able to communicate w... (read more)

[This is a review for the whole sequence.]

I think of LessWrong as a place whose primary purpose is and always has been to develop the art of rationality. One issue is that this mission tends to attract a certain kind of person -- intelligent, systematizing, deprioritizing social harmony, etc -- and that can make it harder for other kinds of people to participate in the development of the art of rationality. But rationality is for everyone, and ideally the art would be equally accessible to all.

This sequence has many good traits, but one of the most disting... (read more)

This post consists of comments on summaries of a debate about the nature and difficulty of the alignment problem. The original debate was between Eliezer Yudkowsky and Richard Ngo but this post does not contain the content from that debate. This posts is mostly of commentary by Jaan Tallinn on that debate, with comments by Eliezer.

The post provides a kind of fascinating level of insight into true insider conversations about AI alignment. How do Eliezer and Jaan converse about alignment? Sure, this is a public setting, so perhaps they communicate differentl... (read more)

I think this post was valuable for starting a conversation, but isn't the canonical reference post on Frame Control I'd eventually like to see in the world. But re-reading the comments here, I am struck by the wealth of great analysis and ideas in the ensuing discussion

John Wentworth's comment about Frame Independence:

The most robust defense against abuse is to foster independence in the corresponding domain. [...] The most robust defense against financial abuse is to foster financial independence [...] if I am in not independent in some domain, then I am

... (read more)

I was always surprised that small changes in public perception, a slight change in consumption or political opinion can have large effects. This post introduced the concept of the social behaviour curves for me, and it feels like explains quite a lot of things. The writer presents some example behaviours and movements (like why revolutions start slowly or why societal changes are sticky), and then it provides clear explanations for them using this model. Which explains how to use social behaviour curves and verifies some of the model's predictions at the s... (read more)

I've written a bunch elsewhere about object-level thoughts on ELK. For this review, I want to focus instead on meta-level points.

I think ELK was very well-made; I think it did a great job of explaining itself with lots of surface area, explaining a way to think about solutions (the builder-breaker cycle), bridging the gap between toy demonstrations and philosophical problems, and focusing lots of attention on the same thing at the same time. In terms of impact on the growth and development on the AI safety community, I think this is one of the most importa... (read more)

I liked this post, but I don't think it belongs in the review.  It's very long, it needs Zoe's also-very-long post for context, and almost everything you'll learn is about Leverage specifically, with few generalizable insights.  There are some exceptions ("What to do when society is wrong about something?" would work as a standalone post, for example), but they're mostly just interesting questions without any work toward a solution.  I think the relatively weak engagement that it got, relative to its length and quality, reflects that: Less W... (read more)

Self-Review

If you read this post, and wanted to put any of it into practice, I'd love to hear how it went! Whether you tried things and it failed, tried things and it worked, or never got round to trying anything at all. It's hard to reflect on a self-help post without data on how much it helped!

Personal reflections: I overall think this is pretty solid advice, and am very happy I wrote this post! I wrote this a year and a half ago, about an experiment I ran 4 years ago, and given all that, this holds up pretty well. I've refined my approach a fair bit, b... (read more)

Elephant seal is a picture of an elephant seal. It has a mysterious Mona Lisa smile that I can't pin down, that shows glee, intent, focus, forward-looking-ness, and satisfaction. It's fat and funny-looking. It looks very happy lying on the sand. I give this post a +4.

(This review is taken from my post Ben Pace's Controversial Picks for the 2020 Review.)

Introduction to Cartesian Frames is a piece that also gave me a new philosophical perspective on my life. 

I don't know how to simply describe it. I don't know what even to say here. 

One thing I can say is that the post formalized the idea of having "more agency" or "less agency", in terms of "what facts about the world can I force to be true?". The more I approach the world by stating things that are going to happen, that I can't change, the more I'm boxing-in my agency over the world. The more I treat constraints as things I could fight to chang... (read more)

This post is still endorsed, it still feels like a continually fruitful line of research. A notable aspect of it is that, as time goes on, I keep finding more connections and crisper ways of viewing things which means that for many of the further linked posts about inframeasure theory, I think I could explain them from scratch better than the existing work does. One striking example is that the "Nirvana trick" stated in this intro (to encode nonstandard decision-theory problems), has transitioned from "weird hack that happens to work" to "pops straight out... (read more)

Why This Post Is Interesting

This post takes a previously-very-conceptually-difficult alignment problem, and shows that we can model this problem in a straightforward and fairly general way, just using good ol' Bayesian utility maximizers. The formalization makes the Pointers Problem mathematically legible: it's clear what the problem is, it's clear why the problem is important and hard for alignment, and that clarity is not just conceptual but mathematically precise.

Unfortunately, mathematical legibility is not the same as accessibility; the post does have... (read more)

Ajeya's timelines report is the best thing that's ever been written about AI timelines imo. Whenever people ask me for my views on timelines, I go through the following mini-flowchart:

1. Have you read Ajeya's report?

--If yes, launch into a conversation about the distribution over 2020's training compute and explain why I think the distribution should be substantially to the left, why I worry it might shift leftward faster than she projects, and why I think we should use it to forecast AI-PONR instead of TAI.

--If no, launch into a conversation about Ajey... (read more)

This post is both a huge contribution, giving a simpler and shorter explanation of a critical topic, with a far clearer context, and has been useful to point people to as an alternative to the main sequence. I wouldn't promote it as more important than the actual series, but I would suggest it as a strong alternative to including the full sequence in the 2020 Review. (Especially because I suspect that those who are very interested are likely to have read the full sequence, and most others will not even if it is included.)

Looking back, this all seems mostly correct, but missing a couple, assumed steps. 

 I've talked to one person since about their mild anxiety talking to certain types of people; I found two additional steps that helped them.

  1. Actually trying to become better
  2. Understanding that their reaction is appropriate for some situations (like the original trauma), but it's overgeneralized to actually safe situations.

These steps are assumed in this post because, in my case, it's obvious I'm overreacting (there's no drone) and I understand PTSD is common and treat... (read more)

For the Review, I'm experimenting with using the predictions feature to poll users for their opinions about claims made in posts. 

The first two cites Scott almost verbatim, but for the third I tried to specify further. 

Feel free to add your predictions above, and let me know if you have any questions about the experienc... (read more)

Biorisk - well wouldn't it be nice if we'd all been familiar with the main principles of biorisk before 2020? i certainly regretted sticking my head in the sand.

> If concerned, intelligent people cannot articulate their reasons for censorship, cannot coordinate around principles of information management, then that itself is a cause for concern. Discussions may simply move to unregulated forums, and dangerous ideas will propagate through well intentioned ignorance.

Well. It certainly sounds prescient in hindsight, doesn't it?

Infohazards in particular cro... (read more)

One year later, I remain excited about this post, from its ideas, to its formalisms, to its implications. I think it helps us formally understand part of the difficulty of the alignment problem. This formalization of power and the Attainable Utility Landscape have together given me a novel frame for understanding alignment and corrigibility.

Since last December, I’ve spent several hundred hours expanding the formal results and rewriting the paper; I’ve generalized the theorems, added rigor, and taken great pains to spell out what the theorems do and do not ... (read more)

"Epistemic Status: Confident"?

That's surprising to me.

I skipped past that before reading, and read it as fun, loose speculation. I liked it, as that.

But I wouldn't have thought it deserves "confident".

I'm not sure if I should give it less credence or more, now.

I'm confused.

When this article came out, I put a bit of money into alternate cryptocurrencies that I thought might have upside. They are now worth less than I invested.

I think it's good to review how you did in the past, but it's important not to overlearn specific lessons. In retrospect, I think that this article should have put more emphasis on that point.

Scott wonders how anyone could ever find this surprising. I think it's like many things - the underlying concept is obviously there once you point it out, but it's easier not to think about or notice it, and easier not to have a model of what's going on beyond a vague sense that it is there and that this counts as the virtuous level of noticing.

My sense over time of how important this is gets bigger, not smaller, and I see almost no one properly noticing the taste of the Lotus. So this seems like one of the most important posts.

“The Tails Coming Apart as a Metaphor for Life” should be retitled “The Tails Coming Apart as a Metaphor for Earth since 1800.” Scott does three things, 1) he notices that happiness research is framing dependent, 2) he notices that happiness is a human level term, but not specific at the extremes, 3) he considers how this relates to deep seated divergences in moral intuitions becoming ever more apparent in our world.

He hints at why moral divergence occurs with his examples. His extreme case of hedonic utilitarianism, converting... (read more)

Most people who commented on this post seemed to recognise it from their experience and get a general idea of what the different cultures look like (although some people differ on the details, see later). This is partly because it is explained well but also because I think the names were chosen well.

Here are a few people saying that they have used/referenced it: 1, 2, 3 plus me.

From a LW standpoint thinking about this framing helps me to not be offended by blunt comments. My family was very combat culture but in life in general I find people are unwilling ... (read more)

I'm generally in favor of public praise and private criticism, but this post really rubbed me the wrong way. To me it reads as a group of neurotic people getting together to try to get out of neuroticism by being even more neurotic at each other. Or, that in a quest to avoid interacting with the layer of intentions, let's go arbitrarily deep on the recursion stack at the algorithmic/strategy layer of understanding.

Also really bothered by calling a series of reactions spread over time levels of meta. Actually going meta would be paying attention to the structure of the back and forth rather than the individual steps in the back and forth.

Epistemics: Yes, it is sound. Not because of claims (they seem more like opinions to me), but because it is appropriately charitable to those that disagree with Paul, and tries hard to open up avenues of mutual understanding.

Valuable: Yes. It provides new third paradigms that bring clarity to people with different views. Very creative, good suggestions.

Should it be in the Best list?: No. It is from the middle of a conversation, and would be difficult to understand if you haven't read a lot about the 'Foom debate'.

Improved: The same concepts... (read more)

I thought this post and associated paper was worse than Richard's previous sequence "AGI safety from first principles", but despite that, I still think it's one of the best pieces of introductory content for AI X-risk. I've also updated that good communication around AI X-risk stuff will probably involve writing many specialized introductions that work within the epistemic frames and methodologies of many different communities, and I think this post does reasonably well at that for the ML community (though I am not a great judge of that).

This is a great complement to Eliezer's 'List of lethalities' in particular because in cases of disagreements beliefs of most people working on the problem were and still mostly are are closer to this post. Paul writing it provided a clear, well written reference point, and with many others expressing their views in comments and other posts, helped made the beliefs in AI safety more transparent.

I still occasionally reference this post when talking to people who after reading a bit about the debate e.g. on social media first form oversimplified model of the... (read more)

This is one of the few posts on LW from 2022 that I shared with people completely unrelated to the scene, because it was so fun.

Sometimes posts don't have to be about the most important issues to be good. They can just be good.

Meta level I wrote this post in 1-3 hours, and am very satisfied with the returns per unit time! I don't think this is the best or most robust post I could have written, and I think some of these theories of impact are much more important than others. But I think that just collecting a ton of these in the same place was a valuable thing to do, and have heard from multiple people who appreciated this post's existence! More importantly, it was easy and fun, and I personally want to take this as inspiration to find more, easy-to-write-yet-valuable things to d... (read more)

I haven't talked to that many academics about AI safety over the last year but I talked to more and more lawmakers, journalists, and members of civil society. In general, it feels like people are much more receptive to the arguments about AI safety. Turns out "we're building an entity that is smarter than us but we don't know how to control it" is quite intuitively scary. As you would expect, most people still don't update their actions but more people than anticipated start spreading the message or actually meaningfully update their actions (probably still less than 1 in 10 but better than nothing).

Since this post was written, OpenAI has done much more to communicate its overall approach to safety, making this post somewhat obsolete. At the time, I think it conveyed some useful information, although it was perceived as more defensive than I intended.

My main regret is bringing up the Anthropic split, since I was not able to do justice to the topic. I was trying to communicate that OpenAI maintained its alignment research capacity, but should have made that point without mentioning Anthropic.

Ultimately I think the post was mostly useful for sparking some interesting discussion in the comments.

I think this post makes a true and important point, a point that I also bring up from time to time.

I do have a complaint though: I think the title (“Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc”) is too strong. (This came up multiple times in the comments.)

In particular, suppose it takes N unlabeled parameters to solve a problem with deep learning, and it takes M unlabeled parameters to solve the same problem with probabilistic programming. And suppose that M<N, or even M<<N, which I think is generally plausible.

If P... (read more)

Summary

  • public discourse of politics is too focused on meta and not enough focused on object level
  • the downsides are primarily in insufficient exploration of possibility space

Definitions

  • "politics" is topics related to government, especially candidates for elected positions, and policy proposals
  • opposite of meta is object level - specific policies, or specific impacts of specific actions, etc
  • "meta" is focused on intangibles that are an abstraction away from some object-level feature, X, e.g. someones beliefs about X, or incentives around X, or media coverage v
... (read more)

I really like this post. It's a crisp, useful insight, made via a memorable concrete example (plus a few others), in a very efficient way. And it has stayed with me. 

  • Paul's post on takeoff speed had long been IMO the last major public step in the dialogue on this subject (not forgetting to honorably mention Katja's crazy discontinuous progress examples and Kokotajlo's arguments against using GPD as a metric), and I found it exceedingly valuable to read how it reads to someone else who has put in a great deal of work into figuring out what's true about the topic, thinks about it in very different ways, and has come to different views on it. I found this very valuable for my own understanding of the subject, and I felt I
... (read more)

I was less black-pilled when I wrote this - I also had the idea that though my own attempts to learn AI safety stuff had failed spectacularly perhaps I could encourage more gifted people to try the same. And given my skills or lack thereof,  I was hoping this may be some way I could have an impact. As trying is the first filter.  Though the world looks scarier now than when I wrote this, to those of high ability I would still say this:  we are very close to a point where your genius will not be remarkable, where one can squeeze thoughts more beautiful and clear than you have any hope to achieve from a GPU. If there was ever a time to work on the actually important problems, it is surely now.  

A great example of taking the initiative and actually trying something that looks useful, even when it would be weird or frowned upon in normal society. I would like to see a post-review, but I'm not even sure if that matters. Going ahead and trying something that seems obviously useful, but weird and no one else is doing is already hard enough. This post was inspiring. 

My first reaction when this post came out was being mad Duncan got the credit for an idea I also had, and wrote a different post than the one I would have written if I'd realized this needed a post. But at the end of the day the post exists and my post is imaginary, and it has saved me time in conversations with other people because now they have the concept neatly labeled.

Since writing this, I've run across even more examples:

  • The transatlantic telegraph was met with celebrations similar to the transcontinental railroad, etc. (somewhat premature as the first cable broke after two weeks). Towards the end of Samuel Morse's life and/or at his death, he was similarly feted as a hero.
  • The Wright Brothers were given an enormous parade and celebration in their hometown of Dayton, OH when they returned from their first international demonstrations of the airplane.

I'd like to write these up at some point.

Related: The poetry of progress (another form of celebration, broadly construed)

I haven't had time to reread this sequence in depth, but I wanted to at least touch on how I'd evaluate it. It seems to be aiming to be both a good introductory sequence, while being a "complete and compelling case I can for why the development of AGI might pose an existential threat".

The question is who is this sequence for,  what is it's goal, and how does it compare to other writing targeting similar demographics. 

Some writing that comes to mind to compare/contrast it with includes:

... (read more)

I remember this post very fondly. I often thought back to it and it inspired some thoughts of my own about rationality (which I had trouble writing down and are waiting in a draft to be written fully some day). I haven't used any of the phrases introduced here (Underperformance Swamp, Sinkholes of Sneer, Valley of Disintegration...), and I'm not sure whether it was the intention.

The post starts with the claim that rationalists "basically got everything about COVID-19 right and did so months ahead of the majority of government officials, journalists, and su... (read more)

If coordination services command high wages, as John predicts, this suggests that demand is high and supply is limited. Here are some reasons why this might be true:

  1. Coordination solutions scale linearly (because the problem is a general one) or exponentially (due to networking effects).
  2. Coordination is difficult, unpleasant, risky work.
  3. Coordination relies on further resources that are themselves in limited supply or on information that has a short life expectancy, such as involved personal relationships, technical knowhow that depends on a lot of implicit k
... (read more)

The killer advice here was masks, which was genuinely controversial in the larger world at the time. When we wrote a summary of the best advice. two weeks later, masks were listed under "well duh but also there's a shortage". 

Of the advice that we felt was valuable enough to include in the best-of summary, but hadn't gone to fixation yet, there were 4.5 tips. Here's my review of those

Cover your high-touch surfaces with copper tape 

I think the science behind this was solid, but turned out to be mostly irrelevant for covid-19 because it was so domi... (read more)

I wrote this relatively early in my journey of self-studying neuroscience. Rereading this now, I guess I'm only slightly embarrassed to have my name associated with it, which isn’t as bad as I expected going in. Some shifts I’ve made since writing it (some of which are already flagged in the text):

  • New terminology part 1: Instead of “blank slate” I now say “learning-from-scratch”, as defined and discussed here.
  • New terminology part 2: “neocortex vs subcortex” → “learning subsystem vs steering subsystem”, with the former including the whole telencephalon and
... (read more)

(I am the author)

I still like & endorse this post. When I wrote it, I hadn't read more than the wiki articles on the subject. But then afterwards I went and read 3 books (written by historians) about it, and I think the original post held up very well to all this new info. In particular, the main critique the post got -- that disease was more important than I made it sound, in a way that undermined my conclusion -- seems to have been pretty wrong. (See e.g. this comment thread, these follow up posts)

So, why does it matter? What contribution did this po... (read more)

We all saw the GPT performance scaling graphs in the papers, and we all stared at them and imagined extending the trend for another five OOMs or so... but then Lanrian went and actually did it! Answered the question we had all been asking! And rigorously dealt with some technical complications along the way.

I've since referred to this post a bunch of times. It's my go-to reference when discussing performance scaling trends.

It seems like the core thing that this post is doing is treating the concept of "rule" as fundamental. 

If you have a general rule plus some exceptions, then obviously that "general rule" isn't the real process that is determining the results. And noticing that (obvious once you look at it) fact can be a useful insight/reframing.

The core claim that this post is putting forward, IMO, is that you should think of that "real process" as being a rule, and aim to give it the virtues of good rules such as being simple, explicit, stable, and legitimate (having... (read more)

This post seems to me to be misunderstanding a major piece of Paul's "sluggish updating" post, and clashing with Paul's post in ways that aren't explicit.

The core of Paul's post, as I understood it, is that incentive landscapes often reward people for changing their stated views too gradually in response to new arguments/evidence, and Paul thinks he has often observed this behavioral pattern which he called "sluggish updating." Paul illustrated this incentive landscape through a story involving Alice and Bob, where Bob is thinking through his optimal strat... (read more)

The author does a good job articulating his views on why Buddhist concentration and insight practices can lead to psychological benefits. As somebody who has spent years practicing these practices and engaging with various types of (Western) discourse about them, the author's psychological claims seem plausible to a point. He does not offer a compelling mechanism for why introspective awareness of sankharas should lead to diminishing them. He also offers no account for why if insight does dissolve psychological patterns, it would preferentially dissolve ne... (read more)

After reading this, I went back and also re-read Gears in Understanding (https://www.lesswrong.com/posts/B7P97C27rvHPz3s9B/gears-in-understanding) which this is clearly working from. The key question to me was, is this a better explanation for some class of people? If so, it's quite valuable, since gears are a vital concept. If not, then it has to introduce something new in a way that I don't see here, or it's not worth including.

It's not easy to put myself in the mind of someone who doesn't know about gears. 

I think the original Gears in Understandin... (read more)

The central point here seems strong and important. One can, as Scott notes, take it too far, but mostly yes one should look where there are very interesting things even if the hit rate is not high, and it's important to note that. Given the karma numbers involved and some comments sometimes being included I'd want assurance that we wouldn't include any of that with regard to particular individuals. 

That comment section, though, I believe has done major harm and could keep doing more even in its current state, so I still worry about bringing more focus... (read more)

I didn't feel like I fully understood this post at the time when it was written, but in retrospect it feels like it's talking about essentially the same thing as Coherence Therapy does, just framed differently.

Any given symptom is coherently produced, in other words, by either (1) how the individual strives, without conscious awareness, to carry out strategies for safety or well-being; or (2) how the individual responds to having suffered violations of safety or well-being. This model of symptom production is squarely in accord with the construct
... (read more)

As you would expect from someone who was one of the inspirations for the post, I strongly approve of the insight/advice contained herein. I also agree with the previous review that there is not a known better write-up of this concept. I like that this gets the thing out there compactly.

Where I am disappointed is that this does not feel like it gets across the motivation behind this or why it is so important - I neither read this and think 'yes that explains why I care about this so much' or 'I expect that this would move the needle much on p... (read more)

"Caring less" was in the air. People were noticing the phenomenon. People were trying to explain it. In a comment, I realized that I was in effect telling people to care less about things without realizing what I was doing. All we needed was a concise post to crystallize the concept, and eukaryote obliged.

The post, especially the beginning, gets straight to the point. It asks the question of why we don't hear more persuasion in the form of "care less", offers a realistic example and a memorable graphic, and calls to action. This is... (read more)

I still generally endorse this post, though I agree with everyone else's caveats that many arguments aren't like this. The biggest change is that I feel like I have a slightly better understanding of "high-level generators of disagreement" now, as differences in priors, contexts, and categorizations - see my post "Mental Mountains" for more.

TurnTrout is obviously correct that "robust grading is... extremely hard and unnatural" and that loss functions "chisel circuits into networks" and don't directly determine the target of the product AI. Where he loses me is the part where he suggests that this makes alignment easier and not harder. I think that all this just means we have even less control over the policy of the resulting AI, the default end case being some bizarre construction in policyspace with values very hard to determine based on the recipe. I don't understand what point he's making in the above post that contradicts this.

Reading Project Lawful (so far, which is the majority of Book 1) has given me a strong mental pointer to the question of "how to model a civilization that you find yourself in" and "what questions to ask when trying to improve it and fix it", from a baseline of not really having a pointer to this at all (I have only lived in one civilization and I've not been dropped into a new one before). I would do many things differently to Keltham (I suspect I'd build prediction markets before trying to scale up building roads) but it's nonetheless extremely valuable ... (read more)

I think Redwood's classifier project was a reasonable project to work towards, and I think this post was great because it both displayed a bunch of important virtues and avoided doubling down on trying to always frame one's research in a positive light. 

I was really very glad to see this update come out at the time, and it made me hopeful that we can have a great discourse on LessWrong and AI Alignment where when people sometimes overstate things, they can say "oops", learn and move on. My sense is Redwood made a pretty deep update from the first post they published (and this update), and hasn't made any similar errors since then.

The thing I want most from LessWrong and the Rationality Community writ large is the martial art of rationality. That was the Sequences post that hooked me, that is the thing I personally want to find if it exists, that is what I thought CFAR as an organization was pointed at.

When you are attempting something that many people have tried before- and to be clear, "come up with teachings to make people better" is something that many, many people have tried before- it may be useful to look and see what went wrong last time.

In the words of Scott Alexander, "I’m... (read more)

I currently think that the case study of computer security is among one of the best places to learn about the challenges that AI control and AI Alignment projects will face. Despite that, I haven't seen that much writing trying to bridge the gap between computer security and AI safety. This post is one of the few that does, and I think does so reasonably well.

Sharp Left Turn: a more important problem (and a more specific threat model) than people usually think

The sharp left turn is not a simple observation that we've seen capabilities generalise more than alignment. As I understand it, it is a more mechanistic understanding that some people at MIRI have, of dynamics that might produce systems with generalised capabilities but not alignment.

Many times over the past year, I've been surprised by people in the field who've read Nate's post but somehow completely missed the part where it talks about specific dynamic... (read more)

I wrote a review here. There, I identify the main generators of Christiano's disagreement with Yudkowsky[1] and add some critical commentary. I also frame it in terms of a broader debate in the AI alignment community.

  1. ^

    I divide those into "takeoff speeds", "attitude towards prosaic alignment" and "the metadebate" (the last one is about what kind of debate norms should we have about this or what kind of arguments should we listen to.)

This post was, in the end, largely a failed experiment. It did win a lesser prize, and in a sense that proved its point, and I had fun doing it, but I do not think it successfully changed minds, and I don't think it has lasting value, although someone gave it a +9 so it presumably worked for them. The core idea - that EA in particular wants 'criticism' but it wants it in narrow friendly ways and it discourages actual substantive challenges to its core stuff - does seem important. But also this is LW, not EA Forum. If I had to do it over again, I wouldn't bother writing this.

Retrospective: I think this is the most important post I wrote in 2022. I deeply hope that more people benefit by fully integrating these ideas into their worldviews. I think there's a way to "see" this lesson everywhere in alignment: for it to inform your speculation about everything from supervised fine-tuning to reward overoptimization. To see past mistaken assumptions about how learning processes work, and to think for oneself instead. This post represents an invaluable tool in my mental toolbelt.

I wish I had written the key lessons and insights more p... (read more)

An excellent article that gives a lot of insight into LLMs. I consider it a significant piece of deconfusion.

Earlier this year I spent a lot of time trying to understand how to do research better. This post was one of the few resources that actually helped. It described several models that I resonated with, but which I had not read anywhere else. It essentially described a lot of the things I was already doing, and this gave me more confidence in deciding to continue doing full time AI alignment research. (It also helps that Karnofsky is an accomplished researcher, and so his advice has more weight!)

This post very cleverly uses Conway's Game of Life as an intuition pump for reasoning about agency in general. I found it to be both compelling, and a natural extension of the other work on LW relating to agency & optimization. The post also spurred some object-level engineering work in Life, trying to find a pattern that clears away Ash. It also spurred people in the comments to think more deeply about the implications of the reversibility of the laws of physics. It's also reasonably self-contained, making it a good candidate for inclusion in the Review books.

Every few weeks I have the argument with someone that clearly AI will increase GDP drastically before it kills everyone. The arguments in this post are usually my first response. GDP doesn't mean what you think it means, and we don't actually really know how to measure economic output in the context of something like an AI takeoff, and this is important because that means you can't use GDP as a fire alarm, even in slow takeoff scenarios. 

This post has continued to be an important part of Ray's sense of How To Be a Good Citizen, and was one of the posts feeding into the Coordination Frontier sequence.

As an experiment, I'm going to start by listing the things that stuck with me, without rereading the post, and then note down things that seem important upon actual re-reading:

Things that stuck with me

  • If you're going to break an agreement, let people know as early as possible.
  • Try to take on as much of the cost of the renege-ing as you can.
  • Insofar as you can't take on the costs of renege-ing in
... (read more)

On the whole I agree with Raemon’s review, particularly the first paragraph.

A further thing I would want to add (which would be relatively easy to fix) is that the description and math of the Kelly criterion is misleading / wrong.

The post states that you should:

bet a percentage of your bankroll equivalent to your expected edge

However the correct rule is:

bet such that you are trying to win a percentage of your bankroll equal to your percent edge.

(emphasis added)

The 2 definitions give the same results for 1:1 bets but will give strongly diverging r... (read more)

I like what this post is trying to do more than I like this post. (I still gave it a +4.)

That is, I think that LW has been flirting with meditation and similar practices for years, and this sort of 'non-mystical explanation' is essential to make sure that we know what we're talking about, instead of just vibing. I'm glad to see more of it.

I think that no-self is a useful concept, and had written a (shorter, not attempting to be fully non-mystical) post on the subject several months before. I find myself sort of frustrated that there isn't a clear sentence ... (read more)

There's a lot of attention paid these days to accommodating the personal needs of students. For example, a student with PTSD may need at least one light on in the classroom at all times. Schools are starting to create mechanisms by which a student with this need can have it met more easily.

Our ability to do this depends on a lot of prior work. The mental health community had to establish PTSD as a diagnosis; the school had to create a bureaucratic mechanism to normalize accommodations of this kind; and the student had to spend a significant amount of time ... (read more)

tl;dr – I'd include Daniel Kokotajlo's 2x2 grid model in the book, as an alternate take on Simulacra levels.

Two things feel important to me about this Question Post:

  • This post kicked off discussion of how the evolving Simulacra Level definitions related to the original Baudrillard example. Zvi followed up on that here. This feels "historically significant", but not necessarily something that's going to stand the test of time as important in its own right.
  • Daniel Kokotajlo wrote AFAICT the first instance of the alternate 2x2 Grid model of Simulacrum levels. T
... (read more)

This post holds up well in hindsight.  I still endorse most of the critiques here, and the ones I don't endorse are relatively unimportant.  Insofar as we have new evidence, I think it tends to support the claims here.

In particular:

  • Framing few-shot learning as "meta-learning" has caused a lot of confusion.  This framing made little sense to begin with, for the reasons I note in this post, and there is now some additional evidence against it.
  • The paper does very little to push the envelope of what is possible in NLP, even though GPT-3 is proba
... (read more)

This post's main contribution is the formalization of game-theoretic defection as gaining personal utility at the expense of coalitional utility

Rereading, the post feels charmingly straightforward and self-contained. The formalization feels obvious in hindsight, but I remember being quite confused about the precise difference between power-seeking and defection—perhaps because popular examples of taking over the world are also defections against the human/AI coalition. I now feel cleanly deconfused about this distinction. And if I was confused about... (read more)

Self review: I'm very flattered by the nomination!

Reflecting back on this post, a few quick thoughts:

  • I put a lot of effort into getting better at teaching, especially during my undergrad (publishing notes, mentoring, running lectures, etc). In hindsight, this was an amazing use of time, and has been shockingly useful in a range of areas. It makes me much better at field-building, facilitating fellowships, and writing up thoughts. Recently I've been reworking the pedagogy for explaining transformer interpretability work at Anthropic, and I've been shocked a
... (read more)

Self-review: Looking at the essay year and a half later I am still reasonably happy about it.

In the meantime I've seen Swiss people recommending it as an introductory text for people asking about Swiss political system, so I am, of course, honored, but it also gives me some confidence in not being totally off.

If I had to write the essay again, I would probably give less prominence to direct democracy and more to the concordance and decentralization, which are less eye-catchy but in a way more interesting/important.

Also, I would probably pay some attention ... (read more)

I did a lot of writing at the start of covid, most of which was eventually eclipsed by new information (thank God). This is one of a few pieces I wrote during that time I refer to frequently, in my own thinking and in conversation with others. The fact even very exogenous-looking changes to the economy are driven by economic fuckery behind the scenes was very clarifying for me in examing the economy as a whole. 

This is one of those posts, like "when money is abundant, knowledge is the real wealth," that combines a memorable and informative and very useful and important slogan with a bunch of argumentation and examples to back up that slogan. I think this type of post is great for the LW review.

I haven't found this advice super applicable to my own life (because I already generally didn't do things that were painful...) but it has found application in my thinking and conversation with friends. I think it gets at an important phenomenon/problem for many people and provides a useful antidote.

This is one of those posts, like "pain is not the unit of effort," that combines a memorable and informative and very useful and important slogan with a bunch of argumentation and examples to back up that slogan. I think this type of post is great for the LW review.

When I first read this post, I thought it was boring and unimportant: trivially, there will be some circumstances where knowledge is the bottleneck, because for pretty much all X there will be some circumstances where X is the bottleneck.

However, since then I've ended up saying the slogan "when ... (read more)

I think practical posts with exercises are underprovided on LessWrong, and that this sequence in particular inculcated a useful habit. Babble wasn't a totally new idea, but I see people use it more now than they did before this sequence.

Worth including both for "Reveal Culture" as a concept, and for the more general thoughts on "what is required for a culture."

People I know still casually refer to Tell Culture, and I still wish they would instead casually refer to Reveal Culture, which seems like a frame much less likely to encourage people to shoot themselves in the foot. 

I still end up using the phrase "Tell Culture" when it comes up in meta-conversation, because I don't expect most people to have heard of Reveal Culture and I'd have to stop and explain the difference. I'm annoyed by that, and hope for this post to become more common knowledge.

This is a very important point to have intuitively integrated into one's model, and I charge a huge premium to activities that require this kind of reliability. I hope it makes the cut.

I also note that someone needs to write The Costs of Unreliability and I authorize reminding me in 3 months that I need to do this.

This post is hard enough to get through that the original person who nominated it didn't make it, and also I tried and gave up in order to look at more other things instead. I agree that it's possible there is something here, but we didn't build upon it, and if we put it in the book people are going to be confused as to what the hell is going on. I don't think we should include. 

This feels like an extremely important point. A huge number of arguments devolve into exactly this dynamic because each side only feels one of (the Rock|the Hard Place) as a viscerally real threat, while agreeing that the other is intellectually possible. 

Figuring out that many, if not most, life decisions are "damned if you do, damned if you don't" was an extremely important tool for me to let go of big, arbitrary psychological attachments which I initially developed out of fear of one nasty outcome.

I love this post, it's a really healthy way of exploring assumptions about one's goals and subagents. I think it's really hard to come up with simple diagrams that communicate key info, and I am impressed by choices such as changing the color of the path over time. I also find it insightful in matters relating to what a distracted agent looks like, or how adding subgoals can improve things.

It's the sort of thing I'd like to see more rationalists doing, and it's a great read, and I feel very excited about more of this sort of work on LessWrong. I hope it inspires more LessWrongers to build on it. I expect to vote it at somewhere between +5 and +7.

I specifically endorse the "literally just include the paragraph about buying lots of chargers" idea that Zvi suggested.

Echoing previous reviews (it's weird to me the site still suggested this to review anyway, seems like it was covered already?) I would strongly advise against including this. While it has a useful central point - that specificity is important and you should look for and request it - I agree with other reviewers that the style here is very much the set of things LW shouldn't be about, and LWers shouldn't be about, but that others think LW-style people are about, and it's structuring all these discussions as if arguments are soldiers and the goal is to win w... (read more)

Partial Self Review:

There's an obvious set of followup work to be done here, which is to ask "Okay, this post was vague poetry meant to roughly illustrate a point. But, how many words do you actually precisely have?" What are the in-depth models that let you predict precisely how much nuance you have to work with?

Less obvious to me is whether this post should become a longer, more rigorous post, or whether it should stay it's short, poetic self, and have those questions get explored in a different post with different goals. 

Also less obvious to me is ... (read more)

How do you review a post that was not written for you? I’m already doing research in AI Alignment, and I don’t plan on creating a group of collaborators for the moment. Still, I found some parts of this useful.

Maybe that’s how you do it: by taking different profiles, and running through the most useful advice for each profile from the post. Let’s do that.

Full time researcher (no team or MIRIx chapter)

For this profile (which is mine, by the way), the most useful piece of advice from this post comes from the model of transmitters and receivers. I’m convinced... (read more)

This post proposes 4 ideas to help building gears-level models from papers that already passed the standard epistemic check (statistics, incentives):

  • Look for papers which are very specific and technical, to limit the incentives to overemphasize results and present them in a “saving the world” light.
  • Focus on data instead of on interpretations.
  • Read papers on different aspects of the same question/gear
  • Look for mediating variables/gears to explain multiple results at once

(The second section, “Zombie Theories”, sounds more like epistemic check than gears-level ... (read more)

Concise. The post briefly sums up the fields and directions where rationality have been developed on the site, then asks for users to lists the big open questions that are still left to answer.

  • The post is mostly useful to 1) people wishing to continue their training in rationality after they went through the recommendations and are looking for what they should do next and 2) continue the conversation on how to improve rationality systematically. The post itself lists a few of the fields that have been developed and are being developed, in the answers there
... (read more)

I want to have this post in a physical book so that I can easily reference it.

It might actually work better as a standalone pamphlet, though. 

I think we should encourage posts which are well-delimited and research based; "here's a question I had, and how I answered it in a finite amount of time" rather than "here's something I've been thinking about for a long time, and here's where I've gotten with it".

Also, this is an engaging topic and well-written.

I feel the "final thoughts" section could be tightened up/shortened, as to me it's not the heart of the piece.

This is the second time I've seen this. Now it seems obvious. I remember liking it the first time, but also remember it being obvious. That second part of the memory is probably false. I think it's likely that this explained the idea so well that I now think it's obvious.

In other words: very well done.

I remember thinking when I originally read this 'oh this is insightful' and then again when I re-read it I had the same thought. Then I realized that's exactly the type of one feels-like-an-insight thinking the review is trying to get us away from! I've never used the concept or even thought about it since I first read the post, nor encountered it elsewhere, despite assuming I would do so. Bad sign.

I understand that this post seems wise to some people. To me, it seems like a series of tautologies on the surface, with an understructure of assumptions that are ultimately far more important and far more questionable. The basic assumption being made is that society-wide "memetic collapse" is a thing; the evidence given for this (even if you follow the links) is weak, and yet the attitude throughout is that further debate on this point is not worth our breath.

I am a co-author of statistics work with somebody whose standards of mathematical rigou... (read more)

This a first pass review that's just sort of organizing my thinking about this post.

This post makes a few different types of claims:

  • Hyperselected memes may be worse (generally) than weakly selected ones
  • Hyperselected memes may specifically be damaging our intelligence/social memetic software
  • People today are worse at negotiating complex conflicts from different filter bubbles
  • There's a particular set of memes (well represented in 1950s sci-fi) that was particularly important, and which are not as common nowadays.

It has a question which is listed although not

... (read more)

I still endorse most of this post, but https://docs.google.com/document/d/1cEBsj18Y4NnVx5Qdu43cKEHMaVBODTTyfHBa8GIRSec/edit has clarified many of these issues for me and helped quantify the ways that science is, indeed, slowing down.

When I think of useful concepts in AI alignment that I frequently refer to, there are a bunch from the olden days (e.g. “instrumental convergence”, “treacherous turn”, …), and a bunch of idiosyncratic ones that I made up myself for my own purposes, and just a few others, one of which is “concept extrapolation”. For example I talk about it here. (Others in that last category include “goal misgeneralization” [here’s how I use the term] (which is related to concept extrapolation) and “inner and outer alignment” [here’s how I use the term].)

So anyway, in the c... (read more)

This post summarises and elaborates on Neil Postman's underrated "Amusing Ourselves to Death", about the effects of mediums (especially television) on public discourse. I wrote this post in 2019 and posted it to LessWrong in 2022.

Looking back at it, I continue to think that Postman's book is a valuable and concise contribution and formative to my own thinking on this topic. I'm fond of some of the sharp writing that I managed here (and less fond of other bits).

The broader question here is: "how does civilisation set up public discourse on important topics ... (read more)

I really liked this post since it took something I did intuitively and haphazardly and gave it a handle by providing the terms to start practicing it intentionally. This had at least two benefits:

First it allowed me to use this technique in a much wider set of circumstances, and to improve the voices that I already have. Identifying the phenomenon allowed it to move from a knack which showed up by luck, to a skill.

Second, it allowed me to communicate the experience more easily to others, and open the possibility for them to use it as well. Unlike many less... (read more)

It's rare that I encounter a lesswrong post that opens up a new area of human experience - especially rare for a post that doesn't present an argument or a new interpretation or schema for analysing the world.

But this one does. A simple review, with quotes, of an ethnographical study of late 19th century Russian peasants, opened up a whole new world and potentially changed my vision of the past.

Worth it from its many book extracts and choice of subject matter.

I was impressed by this post. I don't have the mathematical chops to evaluate it as math -- probably it's fairly trivial -- but I think it's rare for math to tell us something so interesting and important about the world, as this seems to do. See this comment where I summarize my takeaways; is it not quite amazing that these conclusions about artificial neural nets are provable (or provable-given-plausible-conditions) rather than just conjectures-which-seem-to-be-borne-out-by-ANN-behavior-so-far? (E.g. conclusions like "Neural nets trained on very complex ... (read more)

There's a scarcity of stories about how things could go wrong with AI which are not centered on the "single advanced misaligned research project" scenario. This post (and the mentioned RAAP post by Critch) helps partially fill that gap.

It definitely helped me picture / feel some of what some potential worlds look like, to the degree I currently think something like this -- albeit probably slower, as mentioned in the story -- is more likely than the misaligned research project disaster.

It also is a (1) pretty good / fun story and (2) mentions the elements within the story which the author feels are unlikely, which is virtuous and helps prevent higher detail from being mistaken for plausibility.

Partly I just want to signal-boost this kind of message.

But I also just really like the way this post covers the topic. I didn't have words for some of these effects before, like how your goals and strategies might change even if your values stay the same.

The whole post feels like a great invitation to the topic IMO.

I didn't reread it in detail just now. I might have more thoughts were I to do so. I just want this to have a shot at inclusion in final voting. Getting unconfused about self-love is, IMO, way more important than most models people discuss on this site.

In many ways, this post is frustrating to read. It isn't straigthforward, it needlessly insults people, and it mixes irrelevant details with the key ideas.

And yet, as with many of Eliezer's post, its key points are right.

What this post does is uncover the main epistemological mistakes made by almost everyone trying their hands at figuring out timelines. Among others, there is:

  • Taking arbitrary guesses within a set of options that you don't have enough evidence to separate
  • Piling on arbitrary assumption on arbitraty assumption, leading to completely uninforma
... (read more)

One of the posts which has been sitting in my drafts pile the longest is titled "Economic Agents Who Have No Idea What's Happening". The draft starts like this:

Eight hundred years ago, a bloomery produces some iron. The process is not tightly controlled - the metal may contain a wide range of carbon content or slag impurities, and it’s not easy to measure the iron’s quality. There may be some externally-visible signs, but they’re imperfect proxies for the metal’s true composition. The producer has imperfect information about their own outputs.

Tha

... (read more)

This post feels quite important from a global priorities standpoint. Nuclear war mitigation might have been one of the top priorities for humanity (and to be clear it's still plausibly quite important). But given that the longtermist community has limited resources, it matters a lot whether something falls in the top 5-10 priorities. 

A lot of people ask "Why is there so much focus on AI in the longtermist community? What about other x-risks like nuclear?". And I think it's an important, counterintuitive answer that nuclear war probably isn't an x-risk... (read more)

This was a concept which it never occurred to me that people might not have, until I saw the post. Noticing and drawing attention to such concepts seems pretty valuable in general. This post in particular was short, direct, and gave the concept a name, which is pretty good; the one thing I'd change about the post is that it could use a more concrete, everyday example/story at the beginning.

I strongly upvoted this.

On one hand – the CFAR handbook would be a weird fit for the anthology style books we have published so far. But, it would be a great fit for being a standalone book, and I think it makes sense to use the Review to take stock of what other books we should be publishing.

The current version of the CFAR handbook isn't super optimized for being read outside the context of a workshop. I think it'd be worth the effort of converting it both into standalone posts that articulate particular concepts, and editing together into a more cohesive... (read more)

This is the post that first spelled out how Simulacra levels worked in a way that seemed fully comprehensive, which I understood.

I really like the different archetypes (i.e. Oracle, Trickster, Sage, Lawyer, etc). They showcased how the different levels blend together, while still having distinct properties that made sense to reason about separately. Each archetype felt very natural to me, like I could imagine people operating in that way.

The description Level 4 here still feels a bit inarticulate/confused. This post is mostly compatible with the 2x2 grid v... (read more)

I think this post does a good job of focusing on a stumbling block that many people encounter when trying to do something difficult. Since the stumbling block is about explicitly causing yourself pain, to the extent that this is a common problem and that the post can help avoid it, that's a very high return prospect.

I appreciate the list of quotes and anecdotes early in the post; it's hard for me to imagine what sort of empirical references someone could make to verify whether or not this is a problem. Well known quotes and a long list of anecdotes is a su... (read more)

I second Daniel's comment and review, remark that this is an exquisite example of distillation, and state that I believe this might be one of the most important texts of the last decade.

Also, I fixed an image used in the text, here's the fixed version:

Fixed recursive reward modeling

I will vote a 9 on this post.

I think this excerpt from Rationality: From AI to Zombies' preface says it all.

It was a mistake that I didn't write my two years of blog posts with the intention of helping people do better in their everyday lives. I wrote it with the intention of helping people solve big, difficult, important problems, and I chose impressive-sounding, abstract problems as my examples.

In retrospect, this was the second-largest mistake in my approach. It ties in to the first-largest mistake in my writing which was that I didn't realize that the big problem in learning thi

... (read more)

(I am the author of this piece)

In short, I recommend against including this post in the 2020 review.

Reasons against inclusion

  • Contained large mistakes in the past, might still contain mistakes (I don't know of any)
    • I fixed the last mistakes I know of two months ago
    • It's hard to audit because of the programming language it's written in
  • Didn't quite reach its goal
    • I wanted to be able to predict the decrease in ability to forecast long-out events, but the brier scores are outside the 0-1 range for long ranges (>1 yr), which shouldn't be the case if w
... (read more)

A deceptively awesome example of original seeing.

This post feels like an important part of what I've referred to as The CFAR Development Branch Git Merge. Between 2013ish and 2017ish, a lot of rationality development happened in person, which built off the sequences. I think some of that work turned out to be dead ends, or a bit confused, or not as important as we thought at the time. But a lot of it was been quite essential to rationality as a practice. I'm glad it has gotten written up.

The felt sense, and focusing, have been two surprisingly important tools for me. One use case not quite mentioned here... (read more)

Echoing Raemon that this has become one of my standard reference points and I anticipate linking to this periodically for a long time. I think it's important. 

I'm also tagging this as something I should build upon explicitly some time soon, when I have the bandwidth for that, and I'm tagging Ben/Raemon to remind me of this in 6 months if I haven't done so yet, whether or not it makes the collection.

These issues are key ones to get right, involve difficult trade-offs, and didn't have a good descriptor that I know about until this post. 

Consider this as two posts.

The first post is Basketballism. That post is awesome. Loved it. 

The second post is the rest of the post. That post tries to answer the question in the title, but doesn't feel like it makes much progress to me. There's some good discussion that goes back and forth, but mostly everyone agrees on what should be clear to all: No, rationalism doesn't let you work miracles at will, and we're not obviously transforming the world or getting key questions reliably right. Yes, it seems to be helpful, and generally the people who do i... (read more)

I got an email from Jacob L. suggesting I review my own post, to add anything that might offer a more current perspective, so here goes...

One thing I've learned since writing this is that counterfactualizing, while it doesn't always cause akrasia, it is definitely an important part of how we maintain akrasia: what some people have dubbed "meta-akrasia".

When we counterfactualize that we "should have done" something, we create moral license for our past behavior. But also, when we encounter a problem and think, "I should [future action]", we are often licen... (read more)

I like this post a lot.

I'm noticing an unspoken assumption: that Amish culture hasn't changed "much" since the 1800s. If that's not the case... it's not that anything here would necessarily be false, but it would be an important omission.

Like, taking this post as something that it's not-quite but also not-really-not, it uses the Amish as an example in support of a thesis: "cultural engineering is possible". You can, as a society, decide where you want your society to go and then go there. The Amish are an existence proof, and Ray bounces from them to askin... (read more)

I stand by this piece, and I now think it makes a nice complement to discussions of GPT-3. In both cases, we have significant improvements in chunking of concepts into latent spaces, but we don't appear to have anything like a causal model in either. And I've believed for several years that causal reasoning is the thing that puts us in the endgame.

(That's not to say either system would still be safe if scaled up massively; mesa-optimization would be a reason to worry.)

I trust past-me to have summarized CAIS much better than current-me; back when this post was written I had just finished reading CAIS for the third or fourth time, and I haven't read it since. (This isn't a compliment -- I read it multiple times because I had a lot of trouble understanding it.)

I've put in two points of my own in the post. First:

(My opinion: I think this isn't engaging with the worry with RL agents -- typically, we're worried about the setting where the RL agent is learning or planning at test time, which can happen in learn-to-learn and on

... (read more)

This has been one of the most useful posts on LessWrong in recent years for me personally. I find myself often referring to it, and I think almost everyone underestimates the difficulty gap between critiquing others and proposing their own, correct, ideas.

I believe this is an important gears-level addition to posts like hyperbolic growth, long-term growth as a sequence of exponential modes and an old yudkowsky post I am unable to find at the moment.

I don't know how closely these texts are connected, but Modeling the Human Trajectory picks up one year later, creating two technical models: one stochastically fitting and extrapolating GDP growth; the other providing a deterministic outlook, considering labor, capital, human capital, technology and production (and, in one case, natural resources). Roodman arriv... (read more)

A good example of crucial arguments, in the wild.

I'm not sure I like it. It looks like a lot of talking past each other. Very casually informative of different perspectives without much direct confrontation. Relatively good for an internet argument, but still not as productive as one might hope for between experts debating a serious topic. I'm glad for the information; I strongly value concretely knowing that sometimes arguments play out like this.

But I still don't like it.

(To be fair, this is comments on a facebook link post. I feel Ben misleads with technical truths when he describes this as an "actual debate" occurring in "a public space".)

The problem with evaluating a post like this is that long post is long and slow and methodical, and making points that I (and I'm guessing most others who are doing the review process) already knew even at the time it was written in 2017. So it's hard to know whether the post 'works' at doing the thing it is trying to do, and also hard to know whether it is an efficient means of transmitting that information. 

Why can't the post be much shorter and still get its point across? Would it perhaps even get the point across better if it was much shorter, bec... (read more)

I really dislike the central example used in this post, for reasons explained in this article. I hope it isn't included in the next LW book series without changing to a better example.

I think it was important to have something like this post exist. However, I now think it's not fit for purpose. In this discussion thread, rohinmshah, abramdemski and I end up spilling a lot of ink about a disagreement that ended up being at least partially because we took 'realism about rationality' to mean different things. rohinmshah thought that irrealism would mean that the theory of rationality was about as real as the theory of liberalism, abramdemski thought that irrealism would mean that the theory of rationality would be about as real as the theo

... (read more)

For the past few years I've read Logan-stuff, and felt a vague sense of impatience about it, and a vague sense of "if I were more patient, maybe a good thing would happen though?". This year I started putting more explicit effort into cultivating patience.

I've read this post thrice now, and each time I start out thinking "yeah, patience seems like a thing I could have more off"... and then I get to the breakdown of "tenacity, openness and thoroughness" and go "oh shit I forgot about that breakdown. What a useful breakdown." It feels helpful because it sugg... (read more)

I found this post to be a clear and reasonable-sounding articulation of one of the main arguments for there being catastrophic risk from AI development. It helped me with my own thinking to an extent. I think it has a lot of shareability value.

This experiment didn't really work out, but it's the kind of thing I expect to produce really great results every once in a while. 

This post didn't feel particularly important when I first read it.

Yet I notice that I've been acting on the post's advice since reading it. E.g. being more optimistic about drug companies that measure a wide variety of biomarkers.

I wasn't consciously doing that because I updated due to the post. I'm unsure to what extent the post changed me via subconscious influence, versus deriving the ideas independently.

I think this point is incredibly important and quite underrated, and safety researchers often do way dumber work because they don't think about it enough.

This post helped me relate to my own work better. I feel less confused about what's going on with the differences between my own working pace and the pace of many around me. I am obviously more like a 10,000 day monk than a 10 day monk, and I should think and plan accordingly. 

Partly because I read this post, I spend frewer resources frantically trying to show off a Marketable Product(TM) as quickly as possible ("How can I make a Unit out of this for the Workshop next month?"), and I spend more resources aiming for the progress I actually think would ... (read more)

This is one of those things that seems totally obvious after reading and makes you wonder how anyone thought otherwise but is somehow non-trivial anyways. 

I continue to stand by this post.

I believe that in our studies of human cognition, we have relatively neglected the aggressive parts of it. We understand they're there, but they're kind of yucky and unpleasant, so they get relatively little attention. We can and should go into more detail, try to understand, harness and optimize aggression, because it is part of the brains that we're trying to run rationality on.

I am preparing another post to do this in more depth.

I stand by what I said here: this post has a good goal but the implementation embodies exactly the issue it's trying to fight. 

This paper, like others from Anthropic, is is exemplary science and exceptional science communication. The authors are clear, precise and thorough. It is evident that their research motivation is to solve a problem, and not to publish a paper, and that their communication motivation is to help others understand, and not to impress.

I think this point is very important, and I refer to it constantly.

I wish that I'd said "the prototypical AI catastrophe is either escaping from the datacenter or getting root access to it" instead (as I noted in a comment a few months ago).

The world is a complicated and chaotic place. Anything could interact with everything, and some of these are good. This post describes that general paralysis of the insane can be cured with malaria. At least if they do not die during the treatment.

If late-stage syphilis (general paralysis) isn't treated, then they probably die 3-5 years with progressively worse symptoms each year. So even when 5-20% of the died immediately when the treatment started, they still had better survival rates in one and five years. A morbid example of an expected value choice: w... (read more)

I ended up referring back to this post multiple times when trying to understand the empirical data on takeoff speeds and in-particular for trying to estimate the speed of algorithmic progress independent of hardware progress. 

I also was quite interested in the details here in order to understand the absolute returns to more intelligence/compute in different domains. 

One particular follow-up post I would love to see is to run the same study, but this time with material disadvantage. In-particular, I would really like to see, in both chess and go, ... (read more)

Most of the writing on simulacrum levels have left me feeling less able to reason about them, that they are too evil to contemplate. This post engaged with them as one fact in the world among many, which was already an improvement. I've found myself referring to this idea several times over the last two years, and it left me more alert to looking for other explanations in this class. 

This is a great theorem that's stuck around in my head this last year! It's presented clearly and engagingly, but more importantly, the ideas in this piece are suggestive of a broader agent foundations research direction. If you wanted to intimate that research direction with a single short post that additionally demonstrates something theoretically interesting in its own right, this might be the post you'd share.

I think this is an excellent response (I'd even say, companion piece) to Joe Carlsmith's also-excellent report on the risk from power-seeking AI. On a brief re-skim I think I agree with everything Nate says, though I'd also have a lot more to add and I'd shift emphasis around a bit. (Some of the same points I did in fact make in my own review of Joe's report.)

Why is it important for there to be a response? Well, the 5% number Joe came to at the end is just way too low. Even if you disagree with me about that, you'll concede that a big fraction of the ratio... (read more)

This strikes me as a core application of rationality. Learning to notice implicit "should"s and tabooing them. The example set is great.

Some of the richness is in the comments. Raemon's in particular highlights an element that strikes me as missing: The point is to notice the feeling of judging part of the territory as inherently good or bad, as opposed to recognizing the judgment as about your assessment of how you and/or others relate to the territory.

But it's an awful lot to ask of a rationality technique to cover all cases related to its domain.

If all ... (read more)

This post is a solid list of advice on self-studying. It matches the Review criterion of posts that continue to affect my behavior. (I don't explicitly think back to the post, but I do occasionally bring up its advice in my mind, like "as yes, it's actually good to read multiple textbooks concurrently and not necessarily finish them".)

I actually disagree with the "most important piece of advice", which is to use spaced repetition software. Multiple times in my life, I have attempted to incorporate an SRS habit into my life, reflecting on why it previously ... (read more)

I consider this post as one of the most important ever written on issues of timelines and AI doom scenario. Not because it's perfect (some of its assumptions are unconvincing), but because it highlights a key aspect of AI Risk and the alignment problem which is so easy to miss coming from a rationalist mindset: it doesn't require an agent to take over the whole world. It is not about agency.

What RAAPs show instead is that even in a purely structural setting, where agency doesn't matter, these problem still crop up!

This insight was already present in Drexle... (read more)

This post is in my small list of +9s that I think count as a key part of how I think, where the post was responsible for clarifying my thinking on the subject. I've had a lingering confusion/nervousness about having extreme odds (anything beyond 100:1) but the name example shows that seeing odds ratios of 20,000,000:1 is just pretty common. I also appreciated Eliezer's corollary: "most beliefs worth having are extreme", this also influences how I think about my key beliefs.

(Haha, I just realized that I curated it back when it was published.)

I went back and forth on this post a lot. Ultimately the writing is really fantastic and I appreciate the thought and presence Joe put into it. It doesn't help me understand why someone would care about ant experience but it does help me understand the experience of caring about ant sentience.

This post feels like a fantasy description of a better society, one that I would internally label "wish-fulfilment". And yet it is history! So it makes me more hopeful about the world. And thus I find it beautiful.

"Search versus design" explores the basic way we build and trust systems in the world. A few notes: 

  • My favorite part is the definitions about an abstraction layer being an artifact combined with a helpful story about it. It helps me see the world as a series of abstraction layers. We're not actually close to true reality, we are very much living within abstraction layers — the simple stories we are able to tell about the artefacts we build. A world built by AIs will be far less comprehensible than the world we live in today. (Much more like biology is
... (read more)

This post was well written, interesting, had multiple useful examples, and generally filled in models of the world. I haven't explicitly fact-checked it but it accords with things I've read and verified elsewhere.

This post is hard for me to review, because I both 1) really like this post and 2) really failed to deliver on the IOUs. As is, I think the post deserves highly upvoted comments that are critical / have clarifying questions; I give some responses, but not enough that I feel like this is 'complete', even considering the long threads in the comments.

[This is somewhat especially disappointing, because I deliberately had "December 31st" as a deadline so that this would get into the 2019 review instead of the 2020 review, and had hoped this would be the first p... (read more)

One of the main problems I think about is how science and engineering are able to achieve such efficient progress despite the very high dimensionality of our world - and how we can more systematically leverage whatever techniques provide that efficiency. One broad class of techniques I think about a lot involves switching between search-for-designs and search-for-constraints - like proof and counterexample in math, or path and path-of-walls in a maze.

My own writing on the topic is usually pretty abstract; I'm thinking about it algorithmic terms, as a searc... (read more)

Create a Full Alternative Stack is probably in the top 15 ideas I got from LW in 2020. Thinking through this as an option has helped me decide when and where to engage with "the establishment" in many areas (e.g. academia). Some parts of my life I work with the mazes whilst trying not getting too much of it on me, and some parts of my life I try to build alternative stacks. (Not the full version, I don't have the time to fix all of civilization.) I give it +4.

Broader comment on the Mazes sequence as a whole:

The sequence is an extended meditation on a theme

... (read more)

Self Review. I'm quite confident in the core "you should be capable of absorbing some surprise problems happening to you, as a matter of course". I think this is a centrally important concept for a community of people trying to ambitious things, that will constantly be tempted to take on more than they can handle.

2. The specific quantification of "3 surprise problems" can be reasonably debated (although I think my rule-of-thumb is a good starting point, and I think the post is clear about my reasoning process so others can make their own informed choice)

3.... (read more)

Apparently this has been nominated for the review. I assume that this is implicitly a nomination for the book, rather than my summary of it. If so, I think the post itself serves as a review of the book, and I continue to stand by the claims within.

This post is what first gave me a major update towards "an AI with a simple single architectural pattern scaled up sufficiently could become AGI", in other words, there doesn't necessarily have to be complicated fine-tuned algorithms for different advanced functions–you can get lots of different things from the same simple structure plus optimization. Since then, as far as I can tell, that's what we've been seeing.

I liked this article. It presents a novel view on mistake theory vs conflict theory, and a novel view on bargaining.

However, I found the definitions and arguments a bit confusing/inadequate.

Your definitions:

"Let's agree to maximize surplus. Once we agree to that, we can talk about allocation."

"Let's agree on an allocation. Once we do that, we can talk about maximizing surplus."

The wording of the options was quite confusing to me, because it's not immediately clear what "doing something first" and "doing some other thing second" really means.

For example, th... (read more)

So I reread this post, found I hadn't commented... and got a strong desire to write a response post until I realized I'd already written it, and it was even nominated. I'd be fine with including this if my response also gets included, but very worried about including this without the response. 

In particular, I felt the need to emphasize the idea that Stag Hunts frame coordination problems as going against incentive gradients and as being maximally fragile and punishing, by default. 

If even one person doesn't get with the program, for any reason, ... (read more)

I don't know whether it was this post, or maybe just a bunch of things I learned while trying to build LessWrong, but this feels like it has become a pretty important part of my model of how organizations work, and also what kind of things I pay attention to in my personal development. 

Some additional consequences of things that I believe that feel like they extend on this post: 

  • Automating is often valuable because it frequently replaces tasks that were really costly because they had to be executed reliably
  • I am very hesitant to start projects in
... (read more)

THIS. TIMES 1000.

I want more people to know this about the Amish! More people should have the concept of "distributed community with intentional norms that has a good relationship with the government and can mostly run on their own legal system" floating around in their head.

For followups, I'd want to see

  1. discussing the issues without proposing solutions
  2. a review of the history + outcomes of similar attempts to take the benedict option
    • this could turn out to be a terrible idea in practice, and if so I want to know that so I can start harping on about my next
... (read more)

At first when I read this, I strongly agreed with Zack's self-review that this doesn't make sense to include in context, but on reflection and upon re-reading the nominations, I think he's wrong and it would add a lot of value per page to do so, and it should probably be included. 

The false dichotomy this dissolves, where either you have to own all implications, so it's bad to say true things that imply things that are true but focus upon would have unpleasant consequences, or it has to be fine to ignore all the extra communication that's involved in ... (read more)

[NB: this is a review of the paper, which I have recently read, not of the post series, which I have not]

For a while before this paper was published, several people in AI alignment had discussed things like mesa-optimization as serious concerns. That being said, these concerns had not been published in their most convincing form in great details. The two counterexamples that I’m aware of are the posts What does the universal prior actually look like? by Paul Christiano, and Optimization daemons on Arbital. However, the first post only discussed the issue i... (read more)

  • Olah’s comment indicates that this is indeed a good summary of his views.
  • I think the first three listed benefits are indeed good reasons to work on transparency/interpretability. I am intrigued but less convinced by the prospect of ‘microscope AI’.
    • The ‘catching problems with auditing’ section describes an ‘auditing game’, and says that progress in this game might illustrate progress in using interpretability for alignment. It would be good to learn how much success the auditors have had in this game since the post was published.
    • One test of ‘microscope
... (read more)

I would probably include this post in the review as-is if I had to. However, I would quite prefer the post to change somewhat before putting it in the Best Of Book.

Most importantly, I think, is the title and central handle. It does an important job, but it does not work that well in the wild among people who don't share the concept handle. Several people have suggested alternatives. I don't know if any of them are good enough, but I think now is a good time to reflect on a longterm durable name.

I'd also like to see some more explicit differentiation of "as... (read more)

This post is well written and not over-long. If the concepts it describes are unfamiliar to you, it is a well written introduction. If you're already familiar with them, you can skim it quickly for a warm feeling of validation.

I think the post would be even better with a short introduction describing its topic and scope, but I'm aware that other people have different preferences. In particular:

  • There are more than two 'cultures' or styles of discussion, perhaps many more. The post calls this out towards the end (apparently this is new in
... (read more)

I do not understand Logical Induction, and I especially don't understand the relationship between it and updating on evidence. I feel like I keep viewing Bayes as a procedure separate from the agent, and then trying to slide LI into that same slot, and it fails because at least LI and probably Bayes are wrongly viewed that way.

But this post is what I leaned on to shift from an utter-darkness understanding of LI to a heavy-fog one, and re-reading it has been very useful in that regard. Since I am otherwise not a person who would be expected to understa... (read more)

I love how cleanly this brings up its point and asks the question. My answer is essentially that you can do this if and only if you can create expectation of Successful Failure in some way. Thus, if failing person's real mission can be the friends they made along the way or skills they developed or lessons learned, or they still got a healthy paycheck, or the attempt brings them honor, or whatever, that's huge.

Writing a full response is on my list of things to eventually do, which is rare for posts that are over a year old.

These are an absolute blast. I'm not rating it as important because it all seems so obvious to me that it would go down like this, and it's hard to see why people need convincing, but perhaps they do? Either way, it's great fun to read the examples again.

Note: this is on balance a negative review of the post, at least least regarding the question of whether it should be included in a "Best of LessWrong 2018" compilation. I feel somewhat bad about writing it given that the author has already written a review that I regard as negative. That being said, I think that reviews of posts by people other than the author are important for readers looking to judge posts, since authors may well have distorted views of their own works.

  • The idea behind AUP, that ‘side effect avoidance’ should mean minimising changes in
... (read more)

This project (best read in the bolded link, not just in this post) seemed and still seems really valuable to me. My intuitions around "Might AI have discontinuous progress?" become a lot clearer once I see Katja framing them in terms of concrete questions like "How many past technologies had discontinuities equal to ten years of past progress?". I understand AI Impacts is working on an updated version of this, which I'm looking forward to.

I read this post when it initially came out. It resonated with me to such an extent that even three weeks ago, I found myself referencing it when counseling a colleague on how to deal with a student whose heterodoxy caused the colleague to make isolated demands for rigor from this student.

The author’s argument that Nurture Culture should be the default still resonates with me, but I think there are important amendments and caveats that should be made. The author said:

"To a fair extent, it doesn’t even matter if you believe that someone... (read more)

I think awareness of this effect is tremendously important. Your immune system needs to fight cancer (mindless unregulated replication) in order for you to function and pursue any goal with a lower time preference than the mindless replicators. But what's even worse than cancer is a disease that coopts the immune system, leading to a lowered ability to fight off infections in general. People who care about the future are concerned about no-value aligned replication outcompeting human values. But they should also be concerned about agentic processes that specifically undermine the ability to do low time preference work aka antisocial punishers and the things that lead them to exist and flourish.

This my own post. I continue to talk and think a lot about the world from the perspective of solving coordination problems where facilitating the ability for people to build common knowledge is one of the central tools. I'm very glad I wrote the post, it made a lot of my own thinking more rigorous and clear.

This post seems to be making a few claims, which I think can be evaluated separately:

1) Decoupling norms exist
2) Contextualizing norms exist 
3) Decoupling and contextualization norms are useful to think as opposites (either as a dichotomy or spectrum)

(i.e. there are enough people using those norms that it's a useful way to carve up the discussion-landscape)

There's a range of "strong" / "weak" versions of these claims – decoupling and/or contextualization might be principled norms that some people explicitly endorse, or they might just be clusters of t

... (read more)

I support this post being included in the Best-of-2018 Review.

It does a good job of starting with a straightforward concept, and explaining it clearly and vividly (a SlateStarScott special). And then it goes on to apply the concept to another phenomenon (ethical philosophy) and make more sense of an oft-observed phenomenon (the moral revulsion to both branches of thought experiments, sometimes by the same individual).

Reply: "Relevance Norms; Or, Gricean Implicature Queers the Decoupling/Contextualizing Binary" (further counterreplies in the comment section)

I argue that this post should not be included in the Best-of-2018 compilation.

I am surprised that a post with nearly 650 karma doesn't have a review yet. It seems like it should have at least one so it can go through to the voting phase.

I think that (1) this is a good deconfusion post, (2) it was an important post for me to read, and definitely made me conclude that I had been confused in the past, (3) and one of the kinds of posts that, ideally, in some hypothetical and probably-impossible past world, would have resulted in much more discussion and worked-out-cruxes in order to forestall the degeneration of AI risk arguments into mutually incomprehensible camps with differing premises, which at this point is starting to look like a done deal?

On the object level: I currently think that --... (read more)

As of October, MIRI has shifted its focus. See their announcement for details.

I looked up MIRI's hiring page and it's still in about the same state. This kind of makes sense given the FTX implosion. But I would ask whether MIRI is unconcerned with the criticism it received here and/or actively likes their approach to hiring? We know Eliezer Yudkowsky, who's on their senior leadership team and board of directors, saw this, because he commented on it.

I found it odd that 3/5 members of the senior leadership team, Malo Bourgon, Alex Vermeer, and Jimmy Rintjema... (read more)

I am flattered that someone nominated this but I don't know why. I still believe in the project, but this doesn't match at all what I'd look to in this kind of review? The vision has changed and narrowed substantially. So this is a historical artifact of sorts, I suppose, but I don't see why it would belong.

I really like this post, this is very influential about how I think about plans, and what to work on. I do think its a bit vague though, and lacking in a certain kind of general formulation. It may be better if there were more examples listed where the technique could be used.

Sazen turns out to be present everywhere once you see it. I'm noticing it in the news, in teaching, and in learning. I realized it in my FB posts about interesting concepts and on Twitter. I refer to Sazen not only in rationality discourse but have mentioned it to friends and family too. Being aware of Sazen helped me improve my communication. 

Since this post was written, I feel like there's been a zeitgeist of "Distillation Projects." I don't know how causal this post was, I think in some sense the ecosystem was ripe for a Distillation Wave) But it seemed useful to think about how that wave played out.

Some of the results have been great. But many of the results have felt kinda meh to me, and I now have a bit of a flinch/ugh reaction when I see a post with "distillation" in it's title. 

Basically, good distillations are a highly skilled effort. It's sort of natural to write a distillation of... (read more)

This post is very cute. I also reference it all the time to explain the 'inverse cat tax.' you You can ask my colleagues, I definitely talk about that model a bunch. So, perhaps strangely, this is my most-referenced post of 2022. 🙃

My explanation of a model tax: this forum (and the EA Forum) really like models, so to get a post to be popular, you gotta put in a model.

I've referenced this post several times. I think the post has to balance being a straw vulcan with being unwilling to forcefully say its thesis, and I find Raemon to be surprisingly good at saying true things within that balance. It's also well-written, and a great length. Candidate for my favorite post of the year.

This post expresses an important idea in AI alignment that I have essentially believed for a long time, and which I have not seen expressed elsewhere. (I think a substantially better treatment of the idea is possible, but this post is fine, and you get a lot of points for being the only place where an idea is being shared.)

A few points:

  • This post caused me to notice the ways that I find emergencies attractive. I am myself drawn to scenarios where all the moves are forced moves. If there's a fire, there's no uncertainty, no awkwardness, I just do whatever I can right now to put it out. It's like reality is just dragging me along and I don't really have to take responsibility for the rest of it, because all the tradeoffs are easy and any harder evaluations must be tightly time-bounded. I have started to notice the unhealthy ways in which I am drawn to things that have this natu
... (read more)

This was my favorite non-AI post of 2022; perhaps partly because the story of Toni Kurz serves as a metaphor for the whole human enterprise. Reading about these mens' trials was both riveting and sent me into deep reflection about my own values.

This post helped me understand the motivation for the Finite Factored Sets work, which I was confused about for a while. The framing of agency as time travel is a great intuition pump. 

I think this post points towards something important, which is a bit more than what the title suggests, but I have a problem describing it succinctly. :)

Computer programming is about creating abstractions, and leaky abstractions are a common enough occurrence to have their own wiki page. Most systems are hard to comprehend as a whole, and a human has to break them into parts which can be understood individually. But these are not perfect cuts, the boundaries are wobbly, and the parts "leak" into each other.

Most commonly these leaks happen because of a tech... (read more)

I and some others on Lightcone team have continued to use this exercise from time to time. Jacob Lagerros got really into it, and would ask us to taboo 'should' whenever someone made a vague claim about what we should do. In all honesty this was pretty annoying. :P 

But, it highlighted another use of tabooing 'should', which is checking what assumptions are shared between people. (i.e. John's post is mostly seems to be addressing "single player mode", where you notice your own shoulds and what ignorance that conceals. But sometimes, Alice totally under... (read more)

I really, really liked this idea. In some sense it's just reframing the idea of trade-offs. But it's a really helpful (for me) reframe that makes it feel concrete and real to me.

I'd long been familiar with "the expert blind spot" — the issue where experts will forget what it's like to see like a non-expert and will try to teach from there. Like when aikido teachers would tell me to "just relax, act natural, and let the technique just happen on its own." That makes sense if you've been practicing that technique for a decade! But it's awful advice to give a ... (read more)

This post has a lot of particular charms, but also touches on a generally under-represented subject in LessWrong: the simple power of deliberate practice and competence. The community seems saturated with the kind of thinking that goes [let's reason about this endeavor from all angles and meta-angles and find the exact cheat code to game reality] at the expense of the simple [git gud scrub]. Of course, gitting gud at reason is one very important aspect of gitting gud in general, but only one aspect.

The fixation on calibration and correctness in this commun... (read more)

I did eventually get covid.

As was the general pattern with this whole RADVAC episode, it's ambiguous how much the vaccine helped. I caught covid in May 2022, about 15 months after the radvac doses, and about 13 months after my traditional vaccine (one shot J&J). In the intervening 15 months, my general mentality about covid was "I no longer need to give a shit"; I ate in restaurants multiple times per week, rode BART to and from the office daily, went to crowded places sometimes, traveled, and generally avoided wearing a mask insofar as that was socially acceptable.

The best compliment I can give this post is that the core idea seems so obviously true that it seems impossible that I haven't thought of or read it before. And yet, I don't think I have.

Aside from the core idea that it's scientifically useful to determine the short list of variables that fully determine or mediate an effect, the secondary claim is that this is the main type of science that is useful and the "hypothesis rejection" paradigm is a distraction. This is repeated a few times but not really proven, and it's not hard to think of counterexamples: m... (read more)

I've thought a good amount about Finite Factored Sets in the past year or two, but I do sure keep going back to thinking about the world primarily in the form of Pearlian causal influence diagrams, and I am not really sure why. 

I do think this one line by Scott at the top gave me at least one pointer towards what was happening: 

but I'm trained as a combinatorialist, so I'm giving a combinatorics talk upfront.

In the space of mathematical affinities, combinatorics is among the branches of math I feel most averse to, and I think that explains a good... (read more)

  1. This post is worthwhile and correct, with clear downstream impact.  It might be the only non-AI post of 2021 that I've heard cited in in-person conversation -- and the cite immediately improved the discussion.
  2. It's clearly written and laid out; unless you're already an excellent technical writer, you can probably learn something by ignoring its content and studying its structure.

This post is among the most concrete, actionable, valuable post I read from 2021. Earlier this year, when I was trying to get a handle on the current-state-of-AI, this post transformed my opinion of Interpretability research from "man, this seems important but it looks so daunting and I can't imagine interpretability providing enough value in time" to "okay, I actually see a research framework I could expect to be scalable."

I'm not a technical researcher so I have trouble comparing this post to other Alignment conceptual work. But my impression, from seein... (read more)

This post is the most comprehensive answer to the question "what was really going on at Leverage Research" anyone has ever given, and that question has been of interest to many in the LW community. I'm happy to see it's been nominated for the year-end review; thank you to whomever did that!

Since this got nominated, now's a good time to jump in and note that I wish that I had chosen different terminology for this post.

I was intending for "final crunch time" to be a riff on Eliezer saying, here, that we are currently in crunch time.

This is crunch time for the whole human species, and not just for us but for the intergalactic civilization whose existence depends on us. This is the hour before the final exam and we're trying to get as much studying done as possible.

I said explicitly, in this post, "I'm going to refer to this last stretch of a fe... (read more)

This might be the lowest karma post that I've given a significant review vote for. (I'm currently giving it a 4). I'd highly encourage folk to give it A Think.

This post seems to be asking an important question of how to integrate truthseeking and conflict theory. I think this is probably one of the most important questions in the world. Conflict is inevitable. Truthseeking is really important. They are in tension. What do we do about that?

I think this is an important civilizational question. Most people don't care nearly enough about truthseeking in the fi... (read more)

This post is on a very important topic: how could we scale ideas about value extrapolation or avoiding goal misgeneralisation... all the way up to superintelligence? As such, its ideas are very worth exploring and getting to grips to. It's a very important idea.

However, the post itself is not brilliantly written, and is more of "idea of a potential approach" than a well crafted theory post. I hope to be able to revisit it at some point soon, but haven't been able to find or make the time, yet.

I haven't thought about Oliver Sipple since I posted my original comment. Revisiting it now, I think it is a juicier consequentialist thought experiment than the trolley problem or the surgeon problem. Partly, this is because the ethics of the situation depend so much on which aspect you examine, at which time, and illustrates how deeply entangled ethical discourse is with politics and PR.

It's also perfectly plausible to me that Oliver's decline was caused by the psychological effect of unwanted publicity and the dissolution of his family ties. But I'm not... (read more)

The ideas in this post greatly influence how I think about AI timelines, and I believe they comprise the current single best way to forecast timelines.

A +12-OOMs-style forecast, like a bioanchors-style forecast, has two components:

  1. an estimate of (effective) compute over time (including factors like compute getting cheaper and algorithms/ideas getting better in addition to spending increasing), and
  2. a probability distribution on the (effective) training compute requirements for TAI (or equivalently the probability that TAI is achievable as a function of tr
... (read more)

I upvoted this highly for the review. I think of this as a canonical reference post now for the sort of writing I want to see on LessWrong. This post identified an important problem I've seen a lot of people struggle with, and writes out clear instructions for it. 

I guess a question I have is "how many people read this and had it actually help them write more quickly?". I've personally found the post somewhat helpful, but I think mostly already had the skill.

Pro: The piece aimed to bring a set of key ideas to a broad audience in an easily understood, actionable way, and I think it does a fair job of that. I would be very excited to see similar example-filled posts actionably communicating important ideas. (The goal here feels related to this post https://distill.pub/2017/research-debt/) 

Con: I don't think it adds new ideas to the conversation. Some people commented on the sale-sy style of the intro, and I think it's a fair criticism. The piece prioritizes engagingness and readability over nuance. 

I read this post at the same time as reading Ascani 2019 and Ricón 2021 in an attempt to get clear about anti-aging research. Comparing these three texts against each other, I would classify Ascani 2019 as trying to figure out whether focusing on anti-aging research is a good idea, Ricón 2021 trying to give a gearsy overview of the field (objective unlocked: get Nintil posts cross-posted to LessWrong), and this text as showing what has already been accomplished.

In that regard it succeeds perfectly well: The structure of Part V is so clean I suspect that it... (read more)

The Skewed and the Screwed: When Mating Meets Politics is a post that compellingly explains the effects of gender ratios in a social space (a college, a city, etc).

There's lots of simple effects here that I never noticed. For example, if there's a 55/45 split of the two genders (just counting the heterosexual people), then the minority gender gets an edge of selectiveness, which they enjoy (everyone gets to pick someone they like a bit more than they otherwise would have), but for the majority gender, 18% of them do not have a partner. It's really bad for ... (read more)

Protecting Large Projects Against Mazedom is all key advice that seemed unintuitive to me when I was getting started doing things in the world, but now all the advice seems imperative to me. I've learned a bunch of this by doing it "the hard way" I guess. I give this post +4.

Broader comment on the Mazes sequence as a whole:

The sequence is an extended meditation on a theme, exploring it from lots of perspective, about how large projects and large coordination efforts end up being eaten by Moloch. The specific perspective reminds me a bit of The Screwtape Le

... (read more)

It's really great when alignment work is checked in toy models.

In this case, I was especially intrigued by the way it exposed how the different kinds of baselines influence behavior in gridworlds, and how it highlighted the difficulty of transitioning from a clean conceptual model to an implementation.

Also, the fact that a single randomly generated reward function was sufficient for implementing AUP in SafeLife is quite is quite astonishing. Another advantage of implementing your theorems—you get surprised by reality!

Unfortunately, some parts of the post w... (read more)

This was a really interesting post, and is part of a genre of similar posts about acausal interaction with consequentialists in simulatable universes.

The short argument is that if we (or not us, but someone like us with way more available compute) try to use the Kolmogorov complexity of some data to make a decision, our decision might get "hijacked" by simple programs that run for a very very long time and simulate aliens who look for universes where someone is trying to use the Solomonoff prior to make a decision and then based on what decision they want,... (read more)

I am grateful for this post. I'm very interested in mechanism design in general, and the design of political systems specifically, so this post has been very valuable both in introducing me to some of the ideas of the Swiss political system, and in showing what their consequences are in practice. 

I thought a lot about the things I learned about Switzerland from this post. I also brought it up a lot in discussion, and often pointed people to this post to learn about the Swiss political system.

Two things that came up when I discussed the Swiss political... (read more)

I drank acetone in 2021, which was plausibly causally downstream of this post.

Unfortunately, the problem described here is all too common.  Many 'experts' give advice as if their lack of knowledge is proof. That's just not the way the world works, but we have many examples of it that are probably salient to most people, though I don't wish to get into them.

Where this post is lacking is that it won't convince anyone who doesn't already agree with it, and doesn't have any real way to deal with it (not that it should, solving that would be quite an accomplishment).

Thus, this is simply another thing to keep in mind, where experts use terms in ways that are literally meaningless to the rest of the populace, because the expert usage is actually wrong. If you are in these fields, push back on it.

This was a great read at the time and still holds up. It's one of the rare artifacts that can only produced after a decade or two, which is an account of major shifts in a person's perspective over the course of a decade or two. (In that way it's similar in genre for me as Buck's post in the review.)

It's a very excitingly written history, and gives me insight into the different perspectives on the issue of psycholinguistics, and helps me frame the current situation in AI. I expect to vote on this somewhere between +5 and +7.

I stand by my nomination. This was the most serious attempt I am aware of to set up straightforward amplification of someone's reasoning in this way, it was competently executed, the diagrams showing the results are awesome, and I am proud that this sort of work is on LessWrong. It's only a baby step, but I think this step is exciting and I hope it encourages others to run further with it.

I like this post a lot and it feels fairly foundational to me. But... I don't have a strong impression that the people I most wanted to take heed of it really did. 

In my mind this post pairs with my followup "Can you eliminate memetic scarcity instead of fighting?", which also didn't seem to take off as a key tool for conflict resolution.

I feel like there's some core underlying problem where even acknowledging this sort of problem feels a bit like ceding ground, and I don't know what to do about it. To be fair I also think this argument can be used as... (read more)

I think this post and the Gradient Hacking post caused me to actually understand and feel able to productively engage with the idea of inner-optimizers. I think the paper and full sequence was good, but I bounced off of it a few times, and this helped me get traction on the core ideas in the space. 

I also think that some parts of this essay hold up better as a core abstraction than the actual mesa-optimizer paper itself, though I am not at all confident about this. But I just noticed that when I am internally thinking through alignment problems relate... (read more)

This is a more kludgy dense read than some of Kaj's other writing. I think I'm mostly only making sense of it because I'm familiar with similar ideas already. Some of those from Kaj's later posts! I guess I'm not that interested in an overview of a particular book? I can't tell if I read this post before, or if the same points were repeated in other writing. But I'm getting stuck on some clinical wordiness. 

Doesn't seem... foundational? It's a starting-to-build on literature and other posts. I'm not sure how someone else would build on it.

If anything,... (read more)

There's are factual claims in this section:

The point is, I know of a few people, acquaintances of mine, who, even when asked to try to find flaws, could not detect anything weird or mistaken in the GPT-2-generated samples.

There are probably a lot of people who would be completely taken in by literal “fake news”, as in, computer-generated fake articles and blog posts. This is pretty alarming. Even more alarming: unless I make a conscious effort to read carefully, I would be one of them.

I'm a little uncertain of how I would test this since it seems ... (read more)

This post made me try adding more randomness to my life for a week or so. I learned a small amount. I remain excited about automated tools that help do things like this, e.g. recent work from Ought.

So first off... I'd forgotten this existed. That's obviously a negative indication in terms of how much it guided my thinking over the past two years! It also meant I got to see it with fresh eyes two years later. 

I think the central point the post thinks it is making is that, extending on the original econ paper, search effectiveness can rapidly become impossible to improve by expanding size of one's search, if those you are searching understand they are in competition. To improve results further, one must instead improve average quality in the searc... (read more)

In the comments of this post, Scott Garrabrant says:

I think that Embedded Agency is basically a refactoring of Agent Foundations in a way that gives one central curiosity based goalpost, rather than making it look like a bunch of independent problems. It is mostly all the same problems, but it was previously packaged as "Here are a bunch of things we wish we understood about aligning AI," and in repackaged as "Here is a central mystery of the universe, and here are a bunch things we don't understand about it." It is not a coincidence that they are the sa

... (read more)

I do not think this is a strong analysis. Things were a lot more complicated than this, on many levels. Analyzing that in detail would be more interesting. This post seems more interested in the question of 'what grade should we get for our efforts' than in learning from the situation going forward, which is what I think is the far more interesting problem.

That's not to say that the actual evaluation is especially unfair. I give myself very low marks because I had the trading skills to know better, or I should have had them, and the spare c... (read more)

Big fan of this but, like most of us, I knew all this already. What I want to know is, how effective is/was this when not preaching to the choir? What happens when someone who doesn't understand MIRI's mission starts to read this? I'd like to think it helps them grok what is going on reasonably often, but I could be fooling myself, and that question is ultimately the test of how vital this really is.

I still feel some desire to finish up my "first pass 'help me organize my thoughts' review". I went through the post, organizing various claims and concepts. I came away with the main takeaway "Wowzers there is so much going on in this post. I think this could have been broken up into a full sequence, each post of which was saying something pretty important." 

There seem to be four major claims/themes here:

  • Aesthetics matter, being style-blind or style-rejecting puts you at a disadvantage
  • It particularly is disadvantageous to cede "the entire concept of
... (read more)

This is very interesting. I do not have a good chance of being able to try this out, so I cannot evaluate any of the claims made directly, but it seems well-written, well-thought, and all in all a top-tier post.

Pretty minimal in and of itself, but has prompted plenty of interesting discussion. Operationally that suggests to me that posts like this should be encouraged, but not by putting them into "best of" compilations.

This does exactly what it sets out to do: presents an issue, shows why we might care, and lays out some initial results (including both intuitive and counterintuitive ones). It's not world-shaking for me, but it certainly carries its weight.

This is truly one of the best posts I've read. It guides the reader through a complex argument in a way that's engaging and inspiring. Great job.

This post is close in my mind to Alex Zhu's post Paul's research agenda FAQ. They each helped to give me many new and interesting thoughts about alignment. 

This post was maybe the first time I'd seen a an actual conversation about Paul's work between two people who had deep disagreements in this area - where Paul wrote things, someone wrote an effort-post response, and Paul responded once again. Eliezer did it again in the comments of Alex's FAQ, which also was a big deal for me in terms of learning.

I weakly think this post should be included in Best of LessWrong 2018. Although I'm not an expert, the post seems sound. The writing style is nice and relaxed. The author highlights a natural dichotomy; thinking about Babble/Prune has been useful to me on several occasions. For example, in a research brainstorming / confusion-noticing session, I might notice I'm not generating any ideas (Prune is too strong!). Having this concept handle lets me notice that more easily.

One improvement to this post could be the inclusion of specific examples of how the author used this dichotomy to improve their idea generation process.

I don't recommend this post for the Best-of-2018 Review.

It's an exploration of a fascinating idea, but it'skind of messy and unusually difficult to understand (in the later sections). Moreover, the author isn't even sure whether it's a good concept or one that will be abused, and in addition worries about it becoming a popularized/bastardized concept in a wider circle. (Compare what happened to "virtue signaling".)

I read posts as a beginner - and thinking about a wider-access book format ...

Great writing style - very accessible.   Honest and informative. 
A modern-day explorer of the frontiers of the mind and human experience.

Edit Notes:

1. I'd make this the 1st paragraph: "In recent years, Circling has caught the eye of rationalists... " include a "WTF is circling?" as a question for a wider audience! and the LW bit isn't necessary now. 

2. Include a definition for inferential distance for ease of reading to newbies.


3... (read more)

Although normally I am all for judging arguments by their merits, regardless of who speaks them, I think that in this particular case we need to think twice before including the essay in the "Best of 2018" book. The notoriety of the author is such that including it risks serious reputation damage for the community, especially that the content of the essay might be interpreted as a veiled attempt to justify the author's moral transgressions. To be clear, I am not saying we should censor everything that this man ever said, but giving it the spotlight in "Best of 2018" seems like a bad choice.

This is an example of a clear textual writeup of a principle of integrity. I think it's a pretty good principle, and one that I refer to a lot in my own thinking about integrity.

But even if I thought it was importantly flawed, I think integrity is super important, and therefore I really want to reward and support people thinking explicitly about it. That allows us to notice that our notions are flawed, and improve them, and it also allows us to declare to each other what norms we hold ourselves to, instead of sort of typical minding and assuming that our notion of integrity matches others' notion, and then being shocked when they behave badly on our terms.

Someone working full-time on an approach to the alignment problem that they feel optimistic about, and writing annual reflections on their work, is something that has been sorely lacking. +4

I think of this as a fairly central post in the unofficial series on How to specialize in Problems We Don't Understand (which, in turn, is the post that most sums up what I think the art of rationality is for. Or at least the parts I'm most excited about).

These kinds of overview posts are very valuable, and I think this one is as well. I think it was quite well executed, and I've seen it linked a lot, especially to newer people trying to orient to the state of the AI Alignment field, and the ever growing number of people working in it. 

I am not a huge fan of shard theory, but other people seem into it a bunch. This post captured at least a bunch of my problems with shard theory (though not all of them, and it's not a perfect post). This means the post at least has saved me some writing effort a bunch of times. 

Nuclear famine (and relatedly nuclear winter) is one of those things that comes up all the time in discussion about existential and catastrophic risk, and this post (together with the previous one) continues to be one of the things I reference most frequently when that topic comes up.

My previous review of this is in this older comment. Recap:

  • I tried teaching a variation of this exercise that was focused on observing your internal state (sort of as an alternative to "Focusing". I forgot to include the "meta strategy" step, which upon reflection was super important (so important I independently derived it for another portion of the same workshop I was running at the time). The way I taught this exercise it fell flat, but I think this was probably in my presentation.
  • I did have people do the "practice observing things" (physical things, no
... (read more)

I think this post was good as something like a first pass.

There's a large and multi-armed dynamic in modern Western liberal society that is a kind of freezing-in-place, as more and more moral weight gets attached to whether or not one is consciously avoiding harm in more and more ways.

For the most part, this is a positive process (and it's at the very least well-intentioned). But it's not as strategic as it could be, and substantially less baby could be thrown out with the bathwater.

This was an attempt to gesture at some baby that, I think, is being thrown... (read more)

Good post, and relevant in 2023 because of a certain board-related brouhaha.

i think about this story from time to time. it speaks to my soul.

  • it is cool that straight-up utopian fiction can have this effect on me.
  • it yanks me in a state of longing. it's as if i lost this world a long time ago, and i'm desperately trying to regain it.

i truly wish everything will be ok :,)

thank you for this, tamsin.

This piece was reasonably well-appreciated (over 100 points) but I nevertheless think of it as one of my most underrated posts, given my sense of how important/crucial the insight is. For me personally, this is one of the largest epiphanies of the past decade, and I think this is easily among the top three most valuable bits of writing I did in 2022. It's the number one essay I go out of my way to promote to the attention of people who already occasionally read my writing, given its usefulness and its relative obscurity.

If I had the chance to write this ov... (read more)

This makes an important point that I find myself consistently referring to - almost none of the confidence in predictions, even inside the rationalist community, is based on actual calibration data. Experts forecast poorly, and we need to stop treating expertise or argumentation as strong stand-alone reasons to accept claims which are implicitly disputed by forecasts.

On the other hand, I think that this post focused far too much on Eliezer. In fact, there are relatively few people in the community who have significant forecasting track records, and this co... (read more)

I did not see this post when it was first put on the forum, but reading it now, my personal view of this post is that it continues a trend of wasting time on a topic that is already a focus of too much effort, with little relevance to actual decisions, and no real new claim that the problems were relevant or worth addressing.

I was even more frustrated that it didn't address most of the specific arguments put forward in our paper from a year earlier on why value for decisionmaking was finite, and then put forward seeral arguments we explicitly gave reasons ... (read more)

IMO the biggest contribution of this post was popularizing having a phrase for the concept of mode collapse in the context of LLMs and more generally and as an example of a certain flavor of empirical research on LLMs. Other than that it's just a case study whose exact details I don't think are so important.

Edit: This post introduces more useful and generalizable concepts than I remembered when I initially made the review.

To elaborate on what I mean by the value of this post as an example of a certain kind of empirical LLM research: I don't know of much pu... (read more)

In a narrow technical sense, this post still seems accurate but in a more general sense, it might have been slightly wrong / misleading. 

In the post, we investigated different measures of FP32 compute growth and found that many of them were slower than Moore's law would predict. This made me personally believe that compute might be growing slower than people thought and most of the progress comes from throwing more money at larger and larger training runs. While most progress comes from investment scaling, I now think the true effective compute growth... (read more)

This post helped me distinguish between having good reasons for my beliefs, and being able to clearly communicate and explain my reasoning, and (to me) painted the latter as pro-social and as a virtue rather than a terrible cost I was expected to pay.

Strong agree, crucial points, +4.

This post was actually quite elightening, and felt more immediately helpful for me in understanding the AI X-risk case than any other single document. I think that's because while other articles on AI risk can seem kind of abstract, this one considers concretely what kind of organization would be required for navigating around alignment issues, in the mildly inconvenient world where alignment is not solved without some deliberate effort, which put me firmly into near-mode.

This is a relatively banal meta-commentary on reasons people sometimes give for doing worst-case analysis, and the differences between those reasons. The post reads like a list of things with no clear through-line. There is a gesture at an important idea from a Yudkowsky post (the logistic success curve idea) but the post does not helpfully expound that idea. There is a kind of trailing-off towards the end of the post as things like "planning fallacy" seem to have been added to the list with little time taken to place them in the context of the other thing... (read more)

This post is pretty important to me. "Understanding frames" and "Understanding coordination theory" are like my two favorite things, and this post ties them together. 

When I previously explored frames, I was mostly thinking through the lens of "how do people fail to communicate." I like that this post goes into the more positively-focused "what are frames useful for, and how might you design a good frame on purpose?". Much discussion of frames on LW has been a bit vague and woo-y. I like that this post frames Frames as a technical product, and approac... (read more)

A good review of work done, which shows that the writer is following their research plan and following up their pledge to keep the community informed.

The contents, however, are less relevant, and I expect that they will change as the project goes on. I.e. I think it is a great positive that this post exists, but it may not be worth reading for most people, unless they are specifically interested in research in this area. They should wait for the final report, be it positive or negative.

I liked this post when I read it. It matched my sense that (e.g.) using "outside view" to refer to Hanson's phase transition model of agriculture->industry->AI was overstating the strength of the reasoning behind it.

But I've found that I've continued to use the terms "inside view" and "outside view" to refer to the broad categories sketched out in the two Big Lists O' Things. Both in my head and when speaking. (Or I'll use variants like "outside viewish" or similar.)

I think there is a meaningful distinction here: the reasoning moves on the "Outside" ... (read more)

This post provides a maximally clear and simple explanation of a complex alignment scheme. I read the original "learning the prior" post a few times but found it hard to follow. I only understood how the imitative generalization scheme works after reading this post (the examples and diagrams and clear structure helped a lot). 

I like this research agenda because it provides a rigorous framing for thinking about inductive biases for agency and gives detailed and actionable advice for making progress on this problem. I think this is one of the most useful research directions in alignment foundations since it is directly applicable to ML-based AI systems. 

I wrote up a bunch of my high-level views on the MIRI dialogues in this review, so let me say some things that are more specific to this post. 

Since the dialogues are written, I keep coming back to the question of the degree to which consequentialism is a natural abstraction that will show up in AI systems we train, and while this dialogue had some frustrating parts where communication didn't go perfectly, I still think it has some of the best intuition pumps for how to think about consequentialism in AI systems. 

The other part I liked the most w... (read more)

I have a very contrarian take on Gödel's Incompleteness Theorem, which is that it's widely misunderstood, most things people conclude from it are false, and it's actually largely irrelevant. This is why I was excited to read a review of this book I've heard so much fuss about, to see if it would change my mind.

Well, it didn't. Sam himself doesn't think the second half of the book (where we talk about conclusions) is all that strong, and I agree. So as an exploration of what to do with the theorem, this review isn't that useful; it's more of a negative exam... (read more)

I think this post makes an important point -- or rather, raises a very important question, with some vivid examples to get you started. On the other hand, I feel like it doesn't go further, and probably should have -- I wish it e.g. sketched a concrete scenario in which the future is dystopian not because we failed to make our AGIs "moral" but because we succeeded, or e.g. got a bit more formal and complemented the quotes with a toy model (inspired by the quotes) of how moral deliberation in a society might work, under post-AGI-alignment conditions, and ho... (read more)

This post has tentatively entered my professional worldview. "Big if true."

I'm looking at this through the lens of "how do we find/create the right people to help solve x-risk and other key urgent problems." The track record of AI/rationalist training programs doesn't seem that great. (i.e. they seem to typically work mostly via selection[1]). 

In the past year, I've seen John attempt to make an actual training regimen for solving problems we don't understand. I feel at least somewhat optimistic about his current training attempts, partly because his m... (read more)

I like post because it: -- Focuses on a machine which is usually non-central to accounts of the industrial revolution (at least in others which I've read), which makes novel and interesting to those interested in the roots of progress -- And has a high ratio of specific empirical detail to speculation -- Furthermore separates speculation from historical claims pretty cleanly

This post is a good review of a book, to an space where small regulatory reform could result in great gains, and also changed my mind about LNT. As an introduction to the topic, more focus on economic details would be great, but you can't be all things to all men.

This post gave me a piece of jargon that I've found useful since reading it. In some sense, the post is just saying "sometimes people don't do Bayesian updating", which is a pretty cold take. But I found it useful to read through a lot of examples and discussion of what the deal might be. In my practice of everyday rationality, this post made it easier for me to stop and ask things like, "Is this a trapped prior? Might my [or others'] reluctance to update be due to whatever mechanisms cause a trapped prior?"

(Self-review.)

I was surprised by how well-received this was!

I was also a bit disappointed at how many commenters focused on the AI angle. Not that it necessarily matters, but to me, this isn't a story about AI. (I threw in the last two paragraphs because I wasn't sure how to end it in a way that "felt like an ending.")

To me, this story is an excuse for an exploration about how concepts work (inspired by an exchange with John Wentworth on "Unnatural Categories Are Optimized for Deception"). The story-device itself is basically a retread of "That Alien Messa... (read more)

This is a post that gave me (an ML noob) a great deal of understanding of how language models work — for example the discussion of the difference between "being able to do a task" and "knowing when to perform that task" is one I hadn't conceptualized before reading this post, and makes a large difference in how to think about the improvements from scaling. I also thought the characterization of the split between different schools of thought and what they pay attention to was quite illuminating.

I don't have enough object-level engagement for my recommendation to be much independent evidence, but I still will be voting this either a +4 or +9, because I personally learned a bunch from it.

I've referenced this post, or at least this concept, in the past year. I think it's fairly important. I've definitely seen this dynamic. I've felt it as a participant who totally wants a responsible authority figure to look up to and follow, and I've seen in how people respond to various teacher-figures in the rationalsphere.

I think the rationalsphere lucked out in its founding members being pretty wise, and going out of their way to try to ameliorate a lot of the effects here, and still those people end up getting treated in a weird cult-leader-y way even... (read more)

This post is one of the LW posts a younger version of myself would have been most excited to read. Building on what I got from the Embedded Agency sequence, this post lays out a broad-strokes research plan for getting the alignment problem right. It points to areas of confusion, it lists questions we should be able to answer if we got this right, it explains the reasoning behind some of the specific tactics the author is pursuing, and it answers multiple common questions and objections. It leaves me with a feeling of "Yeah, I could pursue that too if I wanted, and I expect I could make some progress" which is a shockingly high bar for a purported plan to solve the alignment problem. I give this post +9.

One particularly important thing I got out of this post was crystallizing a complaint I sometimes have about people using anthropic reasoning. If someone says there's trillion-to-1 evidence for (blah) based on anthropics, it's actually not so crazy to say "well I don't believe (blah) anyway, based on the evidence I get from observing the world", it seems to me.

Or less charitably to myself, maybe this post is helping me rationalize my unjustified and unthinking gut distrust of anthropic reasoning :-P

Anyway, great post.

This post was a great dive into two topics:

  • How an object-level research field has gone, and what are the challenges it faces.
  • Forming a model about how technologically optimistic projects go.

I think this post was good on it's first edition, but became great after the author displayed admirable ability to update their mind and willingness to update their post in light of new information.

Overall I must reluctantly only give this post a +1 vote for inclusion, as I think the books are better served by more general rationality content, but I'm terms of what I would like to see more of on this site, +9. Maybe I'll compromise and give +4.

For a long time, I could more-or-less follow the logical arguments related to e.g. Newcomb’s problem, but I didn’t really get it, like, it still felt wrong and stupid at some deep level. But when I read Joe’s description of “Perfect deterministic twin prisoner’s dilemma” in this post, and the surrounding discussion, thinking about that really helped me finally break through that cloud of vague doubt, and viscerally understand what everyone’s been talking about this whole time. The whole post is excellent; very strong recommend for the 2021 review.

This post trims down the philosophical premises that sit under many accounts of AI risk. In particular it routes entirely around notions of agency, goal-directedness, and consequentialism. It argues that it is not humans losing power that we should be most worried about, but humans quickly gaining power and misusing such a rapid increase in power.

Re-reading the post now, I have the sense that the arguments are even more relevant than when it was written, due to the broad improvements in machine learning models since it was written. The arguments in this po... (read more)

This post culminates years of thinking which formed a dramatic shift in my worldview. It is now a big part of my life and business philosophy, and I've showed it to friends many times when explaining my thinking. It's influenced me to attempt my own bike repair, patch my own clothes, and write web-crawlers to avoid paying for expensive API access. (The latter was a bust.)

I think this post highlights using rationality to analyze daily life in a manner much deeper than you can find outside of LessWrong. It's in the spirit of the 2012 post "Rational Toothpast... (read more)

This post exemplifies the rationalist virtues of curiosity and scholarship. This year's review is not meant to judge whether posts should be published in a book, but I do wonder how a LW project to create a workbook or rationality curriculum (including problem sets) would look like. I imagine posts like this one would feature prominently in either case.

So I do think such posts deserve recognition, though in what form I am less sure.


On an entirely unrelated note, it makes me sad that the Internet is afflicted with link rot and impermanence, and that LW isn'... (read more)

Although Zvi's overall output is fantastic, I don't know which specific posts of his should be called timeless, and this is particularly tricky for these valuable but fast-moving weekly Covid posts. When it comes to judging intellectual progress, however, things are maybe a bit easier?

After skimming the post, a few points I noticed were: Besides the headline prediction which did not come to pass, this post also includes lots of themes which have stood the test of time or remained relevant since its publication: e.g. the FDA dragging its feet wrt allowing C... (read more)

So the obvious take here is that this is a long post full of Paths Forward and basically none of those paths forward were taken, either by myself or others. 

Two years later, many if not most of those paths do still seem like good ideas for how to proceed, and I continue to owe the world Moloch's Army in particular. And I still really want to write The Journey of the Sensitive One to see what would happen. And so on. When the whole Covid thing is behind us sufficiently and I have time to breathe I hope to tackle some of this. 

But the bottom line f... (read more)

This post was important to my own thinking because it solidified the concept that there exists the thing Obvious Nonsense, that Very Serious People would be saying such Obvious Nonsense, that the government and mainstream media would take it seriously and plan and talk on such a basis, and that someone like me could usefully point out that this was happening, because when we say Obvious Nonsense oh boy are they putting the Obvious in Nonsense. It's strange to look back and think about how nervous I was then about making this kind of call, even when it was ... (read more)

I was surprised that I had misremembered this post significantly. Over the past two years somehow my brain summarized this as "discontinuities barely happen at all, maybe nukes, and even that's questionable." I'm not sure where I got that impression. 

Looking back here I am surprised at the number of discontinuities discovered, even if there are weird sampling issues of what trendlines got selected to investigate.

Rereading this, I'm excited by... the sort of sheer amount of details here. I like that there's a bunch of different domains being explored, ... (read more)

This is a nice little post, that explores a neat idea using a simple math model. I do stand by the idea, even if I remain confused about the limits about its applicability.


The post has received a mixed response. Some people loved it, and I have received some private messages from people thanking me for writing it. Others thought it was confused or confusing.

In hindisight, I think the choice of examples is not the best. I think a cleaner example of this problem would be from the perspective of a funder, who is trying to finance researchers to solve a concre... (read more)

I haven't started lifelogging, due to largely having other priorities, and lifelogging being kinda weird, and me not viscerally caring about my own death that much. 

But I think this post makes a compelling case that if I did care about those things, having lots of details about who-I-am might matter. In addition to cryonics, I can imagine an ancestor resurrection process that has some rough archetypes of "what a baseline human of a given era is like", and using lifelog details to fill in the gaps. 

I'm fairly philosophically confused about how muc... (read more)

Mark mentions that he got this point from Ben Pace. A few months ago I heard the extended version from Ben, and what I really want is for Ben to write a post (or maybe a whole sequence) on it. But in the meantime, it's an important idea, and this short post is the best source to link to on it.

This post’s claim seems to have a strong and weak version, both of which are asserted at different places in the post.

  1. Strong claim: At some level of wealth and power, knowledge is the most common or only bottleneck for achieving one’s goals.
  2. Weak claim: Things money and power cannot obtain can become the bottleneck for achieving one’s goals.

The claim implied by the title is the strong form. Here is a quote representing the weak form:

“As one resource becomes abundant, other resources become bottlenecks. When wealth and power become abundant, anything we... (read more)

Crisis and opportunity during coronavirus seemed cute to me at the time, and now I feel like an idiot for not realizing it more. My point here is "this post was really right in retrospect and I should've listened to it at the time". This post, combined with John's "Making Vaccine", have led me to believe I was in a position to create large amounts of vaccine during the pandemic, at least narrowly for my community, and (more ambitiously) made very large amounts (100k+) in some country with weak regulation where I could have sold it. I'm not going to flesh o... (read more)

Based on my own experience and the experience of others I know, I think knowledge starts to become taut rather quickly - I’d say at an annual income level in the low hundred thousands.

I really appreciate this specific calling out of the audience for this post. It may be limiting, but it is also likely limiting to an audience with a strong overlap with LW readership.

Everything money can buy is “cheap”, because money is "cheap".

I feel like there's a catch-22 here, in that there are many problems that probably could be solved with money, but I don't know how ... (read more)

I enjoyed writing this post, but think it was one of my lesser posts. It's pretty ranty and doesn't bring much real factual evidence. I think people liked it because it was very straightforward, but I personally think it was a bit over-rated (compared to other posts of mine, and many posts of others). 

I think it fills a niche (quick takes have their place), and some of the discussion was good. 

This is the only one I rated 9. It looks like a boring post full of formulas but it is actually quite short and - as Reamon wrote in the curation - completes the system of all possible cooperation games giving them succinct definitions and names. 

I really liked this post in 2020, and I really like this post now. I wish I had actually carved this groove into my habits of thoguht. I'm working on doing that now.

One complaint: I find the bolded "This post is not about that topic." to be distracting. I recommend unbolding, and perhaps removing the part from "This post" through "that difference."

There's not much to say about the post itself. It is a question. Some context is provided. Perhaps more could have been, but I think it's fine.

What I want to comment on is the fact that I see this as an incredibly important question. I would really love to see something like microCOVID Project, but for other risks. And I would pay a pretty good amount of money for access to it. At least $1,000. Probably more if I had to.

Why do I place such a high value on this question? Because IMO, death is very, very, very bad, and so it make sense to go to great lengths... (read more)

There are some posts with perennial value, and some which depend heavily on their surrounding context. This post is of the latter type. I think it was pretty worthwhile in its day (and in particular, the analogy between GPT upgrades and developmental stages is one I still find interesting), but I leave it to you whether the book should include time capsules like this.

It's also worth noting that, in the recent discussions, Eliezer has pointed to the GPT architecture as an example that scaling up has worked better than expected, but he diverges from the thes... (read more)

I think there have been a few posts about noticing by now, but as Mark says, I think The Noticing Skill is extremely valuable to get early on in the rationality skill tree. I think this is a good explanation for why it is important and how to go about learning it.

TODO: dig up some of the other "how to learn noticing" intro posts and see how some others compare to this one as a standalone introduction. I think this might potentially be the best one. At the very least I really like the mushroom metaphor at the beginning. (If I were assembling the Ideal Noticing Intro post from scratch I might include the mushroom example even if I changed the instructions on how to learn the rationality-relevant-skills)

I like that this post addresses a topic that is underrepresented on Less Wrong and does so in a concise technical manner approachable to non-specialists. It makes accurate claims. The author understands how drawing (and drawing pedagogy) works.

As someone who was involved in the conversations, and who cares about and focuses on such things frequently, this continues to feel important to me, and seems like one of the best examples of an actual attempt to do the thing being done, which is itself (at least partly) an example of the thing everyone is trying to figure out how to do. 

What I can't tell is whether anyone who wasn't involved is able to extract the value. So in a sense, I "trust the vote" on this so long as people read it first, or at least give it a chance, because if that doesn't convince them it's worthwhile, then it didn't work. Whereas if it does convince them, it's great and we should include it.

I didn't notice until just recently that this post fits into a similar genre as (what I think) the Moral Mazes discussion is pointing at (which may be different from what Zvi thinks).

Where one of the takeaways from Moral Mazes might be: "if you want your company to stay aligned, try not to grow the levels of hierarchy too much, or be extremely careful when you do."

"Don't grow the layers of hierarchy" is (in practice) perhaps a similar injunction to "don't grow the company too much at all" (since you need hierarchy to scale)

Immoral Mazes posits a specific f... (read more)

This was the first major, somewhat adversarial doublecrux that I've participated in.

(Perhaps this is a wrong framing. I participated in many other significant, somewhat adversarial doublecruxes before. But, I dunno, this felt significantly harder than all the previous ones, the point where it feels like a difference in kind)

It was a valuable learning experience for me. My two key questions for "Does this actually make sense as part of the 2019 Review Book" are:

  • Is this useful to others for learning how to doublecrux, pass ITTs, etc in a lowish-trust-setting
... (read more)

Self Review. 

I still think this is true, and important. Honestly, I'd like to bid for it being required-reading among org-founders in the rationalsphere (alongside Habryka's Integrity post)

I think healthy competition is particularly important for a (moderately small) constellation of orgs and proto-orgs to have in mind if they are trying to scale up and impact the world at large, while maintaining integrity. (i.e. the rationality/x-risk/EA ecosystem). 

I think this is one of the key answers to "what safeguards do we have against evolving into a mo... (read more)

This is another great response post from Zvi.

It takes a list of issues that Zvi didn't get to cherry pick, and then proceeds to explain all them with a couple of core tools: Goodhart's Law, Asymmetric Justice/Copenhagen Interpretation of Ethics, Forbidden Considerations, Power, and Theft. I learned a lot and put a lot of key ideas together in this post. I think it makes a great follow-up read to some of the relevant articles (i.e. Asymmetric Justice, Goodhart Taxonomy, etc).

The only problem is it's very long. 8.5k words. That's about 4% of last year's book... (read more)

It's hard to know how to judge a post that deems itself superseded by a post from a later year, but I lean toward taking Daniel at his word and hoping we survive until the 2021 Review comes around.

The content here is very valuable, even if the genre of "I talked a lot with X and here's my articulation of X's model" comes across to me as a weird intellectual ghostwriting. I can't think of a way around that, though.

Last minute review. Daniel Kokotajlo, the author of this post, has written a review as a separate post, within which he identifies a flawed argument here and recommends against this post's inclusion in the review on that basis.

I disagree with that recommendation. The flaw Daniel identifies and improves does not invalidate the core claim of the post. It does appear to significantly shift the conclusion within the post, but:

  • I still feel that this still falls within the scope of the title and purpose of the post. 
  • I feel the shifted conclusion falls withi
... (read more)

I really liked this sequence. I agree that specificity is important, and think this sequence does a great job of illustrating many scenarios in which it might be useful.

However, I believe that there are a couple implicit frames that permeate the entire sequence, alongside the call for specificity.  I believe that these frames together can create a "valley of bad rationality" in which calls for specificity can actually make you worse at reasoning than the default.

------------------------------------

The first of these frames is not just that being speci... (read more)

I have now linked at least 10 times to the heading on "'Generate evidence of difficulty' as a research purpose" section of this post. It was a thing that I kind of wanted to point to before this post came out, but felt confused about it, and this post finally gave me a pointer to it. 

I think that section was substantially more novel and valuable to me than the rest of this post, but it is also evidence that others might have also not had some of the other ideas on their map, and so they might found it similarly valuable because of a different section. 

Writing this post helped clarify my understanding of the concepts in both taxonomies - the different levels of specification and types of Goodhart effects. The parts of the taxonomies that I was not sure how to match up usually corresponded to the concepts I was most confused about. For example, I initially thought that adversarial Goodhart is an emergent specification problem, but upon further reflection this didn't seem right. Looking back, I think I still endorse the mapping described in this post.

I hoped to get more comments on this post... (read more)

  • I think this paper does a good job at collecting papers about double descent into one place where they can be contrasted and discussed.
  • I am not convinced that deep double descent is a pervasive phenomenon in practically-used neural networks, for reasons described in Rohin’s opinion about Preetum et. al.. This wouldn’t be so bad, except the limitations of the evidence (smaller ResNets than usual, basically goes away without label noise in image classification, some sketchy choices made in the Belkin et al experiments) are not really addressed or highlight
... (read more)

On reflection, I endorse the conclusion and arguments in this post. I also like that it's short and direct. Stylistically, it argues for a behavior change among LessWrong readers who sometimes make surveys, rather than being targeted at general LessWrong readers. In particular, the post doesn't spend much time or space building interest about surveys or taking a circumspect view of them. For this reason, I might suggest a change to the original post to add something to the top like "Target audience: LessWrong readers who often or occasionally make form... (read more)

Over the last year, I've thought a lot about human/AI power dynamics and influence-seeking behavior. I personally haven't used the strategy-stealing assumption (SSA) in reasoning about alignment, but it seems like a useful concept.

Overall, the post seems good. The analysis is well-reasoned and reasonably well-written, although it's sprinkled with opaque remarks (I marked up a Google doc with more detail). 

If this post is voted in, it might be nice if Paul gave more room to big-picture, broad-strokes "how does SSA tend to fail?" discussion, discussing ... (read more)

The basic claim of this post is that Paul Graham has written clearly and well about unlearning the desire to do perfectly on tests, but that his actions are incongruous, because he has built the organization that most encourages people to do perfectly on tests.

Not that he has done no better – he has done better than most – but that he is advertising himself as doing this, when he has instead probably just made much better tests to win at.

Sam Altman's desire to be a monopology

On tis the post offers quotes giving evidence saying:

  • YC is a gatekeeper to funding
... (read more)

tl;dr: If this post included a section discussing push-poll concerns and advocating (at least) caution and (preferably) a policy that'd be robust against human foibles, I'd be interested in having this post in the 2019 Review Book.

I think this is an interesting idea that should likely get experimented with.

A thing I was worried about when this first came out, and still worried about, is the blurriness between "survey as tool to gather data" and "survey as tool to cause action in the respondent." 

Some commenters said "this seems like push-polling, isn'... (read more)

I don't feel the "stag hunt" example to be a good fit to the situation described, but the post is clear in explaining the problem and suggesting how to adapt to it.

  • The post helps understand in which situations group efforts where everyone has to invest heavy resources aren't likely to work, focusing on the different perspectives and inferential frames people have on the risks/benefits of the situation. The post is a bit lacking on possible strategies to promote stag hunts, but it specified it would focus on the Schelling choice being "rabbit".
  • The suggestio
... (read more)

I found this post important to developing what's proved to be a useful model for me of thinking about neural annealing as a metaphor for how the brain operates in a variety of situations. In particular, I think it makes a lot of sense when thinking about what it is that meditation and psychedelics do to the brain, and consequently helps me think about how to use them as part of Zen practice.

One thing I like about this post is that it makes claims that should be verifiable via brain studies in that we should see things like brain wave patterns that correspo... (read more)

(Self-review.) I oppose including this post in a Best-of-2019 collection. I stand by what I wrote, but, as with "Relevance Norms", this was a "defensive" post; it exists as a reaction to "Meta-Honesty"'s candidacy in the 2018 Review, rather than trying to advance new material on its own terms.

The analogy between patch-resistence in AI alignment and humans finding ways to dodge the spirit of deontological rules, is very important, but not enough to carry the entire post.

A standalone canon-potential explanation of why I think we need a broader conception of ... (read more)

I think this post is excellent, and judging by the comments I diverge from other readers in what I liked about it.

In the first, I endorse the seriously-but-not-literally standard for posting concepts. The community - rightly in my view - is under continuous pressure to provide high quality posts, but when the standard gets too high we start to lose introduction of ideas and instead they just languish in the drafts folder, sometimes for years. In order to preserve the start of the intellectual pipeline, posts of this level must continue to be produced.

In th... (read more)

I did not follow the Moral Mazes discussion as it unfolded. I came across this article context-less. So I don't know that it adds much to Lesswrong. If that context is relevant, it should get a summary before diving in. From my perspective, its inclusion in the list was a jump sideways.

It's written engagingly. I feel Yarkoni's anger. Frustration bleeds off the page, and he has clearly gotten on a roll. Not performing moral outrage, just *properly, thoroughly livid* that so much has gone wrong in the science world.

We might need that.

What he wrote does not o... (read more)

This is probably the post I got the most value out of in 2018. This is not so much because the precise ideas (although I have got value out of the principle of meta-honesty, directly), but because it was an attempt to understand and resolve a confusing, difficult domain. Eliezer explores various issues facing meta-honesty – the privilege inherent in being fast-talking enough to remain honesty in tricky domains, and the various subtleties of meta-honesty that might make it too subtly a set of rules to coordinate around.

This illustration of "how to contend w

... (read more)

This phenomenon is closely related to "regression towards the mean". It is important, when discussing something like this, to include such jargon names, because there is a lot of existing writing and thought on the topic. Don't reinvent the wheel.

Other than that, it's a fine article.

This is a moderately interesting and well-written example, but did not really surprise me at any point. Worth having, but wouldn't be something I'd go out of my way to recommend.

It's nice to see such an in-depth analysis of the CRT questions. I don't really share drossbucket's intuition - for me the 100 widget question feels counterintuitive the same way as the ball and bat question, but neither feels really aversive, so it was hard for me to appreciate the feelings that generated this post. But this gives a good example of an idea of "training mathematical intuitions" I hadn't thought about before.

I think about this framing quite a lot. Is what I say going to lead to people assuming roughly the thing I think even if I'm not precise. So the concept is pretty valuable to me. 

I don't know if it was the post that did it, but maybe!

I've used this a bit, but not loads. I prefer fatebook, metaculus and manifold and betting. I don't quite know why I don't use it more, here are some guesses.

  • I found the tool kind of hard to use
  • It was hard to search for the kind of information that I use to forecast
  • Often I would generate priors based on my current state, but those were wrong in strange ways (I knew something happened but after the deadline)
  • It wasn't clear that it was helping me to get better versus doing lots of forecasting on other platforms.

This post seems like it was quite influential. This is basically a trivial review to allow the post to be voted on.

I think this concept is important. It feels sort of... incomplete. Like, it seems like there are some major follow threads, which are:

  • How to teach others what useful skills you have.
  • How to notice when an expert has a skill, and 
    • how to ask them questions that help them tease out the details.

This feels like a helpful concept to re-familiarize myself with as I explore the art of deliberate practice, since "actually get expert advice on what/how to practice" is one of the most centrally recommended facets.

This is a short self-review, but with a bit of distance, I think understanding 'limits to legibility' is one of the maybe top 5 things an aspiring rationalist should deeply understand and lack of this leads to many bad outcomes in both rationalist and EA communities.

In a very brief form, maybe the most common cause of EA problem and stupidities are attempts to replace illegible S1 boxes able to represent human values such as 'caring' by legible, symbolically described, verbal moral reasoning subject to memetic pressure.

Maybe the most common cause of rationalist problems and difficulties with coordination are cases where people replace illegible smart S1 computations with legible S2 arguments.

I really liked this post. It's not world-shattering, but it was a nice clear dive into a specific topic that I like learning about. I would be glad about a LessWrong with more posts like this.

I am not that excited about marginal interpretability research, but I have nevertheless linked to this a few times. I think this post both clarifies a bunch of inroads into making marginal interpretability progress, but also maps out how long the journey between where we are and where many important targets are for using interpretability methods to reduce AI x-risk.

Separately, besides my personal sense that marginal interpretability research is not a great use of most researcher's time, there are really a lot of people trying to get started doing work on A... (read more)

This was some pretty engaging sci-fi, and I'm glad it's on LessWrong.

I'm proud of this post, but it doesn't belong in the Best-of-2022 collection because it's on a niche topic.

This is short, has good object level advice, and points at a useful general lesson.

A lot of LessWrong articles are meta. They're about how to find out about things, or abstract theories about how to make decisions. This article isn't like that. Learning the specific lesson it's trying to teach takes minutes, and might save a life. Not "might save a life" as in "the expected value means somewhere out there in the distant world or distant future the actuarial statistics might be a little different." "Might save a life" as in "that person who, if it comes up,... (read more)

I used to deal with disappointment by minimizing it (e.g. it's not that important) or consoling myself (e.g. we'll do better next time). After reading this piece, I think to myself "disappointment is baby grief". 

Loss is a part of life, whether that is loss of something concrete/"real" or something that we imagined or hoped for. Disappointment is an opportunity to practice dealing with loss, so that I will be ready for the inevitable major losses in the future. I am sad because I did not get what I'd wanted or hoped for, and that is okay.

I really like this paper! This is one of my favourite interpretability papers of 2022, and has substantially influenced my research. I voted at 9 in the annual review. Specific things I like about it:

  • It really started the "narrow distribution" focused interpretability, just examining models on sentences of the form "John and Mary went to the store, John gave a bag to" -> " Mary". IMO this is a promising alternative focus to the "understand what model components mean on the full data distribution" mindset, and worth some real investment in. Model compo
... (read more)

Hmmmm.

So when I read this post I initially thought it was good. But on second thought I don't think I actually get that much from it. If I had to summarise it, I'd say

  • a few interesting anecdotes about experiments where measurement was misleading or difficult
  • some general talk about "low bit experiments" and how hard it is to control for cofounders

The most interesting claim I found was the second law of experiment design. To quote: "The Second Law of Experiment Design: if you measure enough different stuff, you might figure out what you’re actually meas... (read more)

Practically useful and expands/words coherently intuitions I already had. Before I had said intuition, I tended to take a lower-than-optimal amount of risks so I think this post will be similarly useful to careful people like me.

While not a particularly important concept in my mind it ends up being one of the ones I use the most, competitive with "Moloch", which is pretty impressive.

This 'medical miracle' story engenders hope. To suffer a chronic illness and to have no tangible answers regarding (diagnosis for too many), treatment, or treatment that works, is disheartening, frustrating and draining. Here is a detailed account of one quest that, having exhausted the standard intellectual-medical approach, remarkably results in relief. This writer gets deeply into her specific experience, and the reminder that there is a case for exploring intuition, along with unlikely luck, is uplifting.  

Origin and summary

This post arose from a feeling in a few conversations that I wasn't being crisp enough or epistemically virtuous enough when discussing the relationship between gradient-based ML methods and natural selection/mutate-and-select methods. Some people would respond like, 'yep, seems good', while others were far less willing to entertain analogies there. Clearly there was some logical uncertainty and room for learning, so I decided to 'math it out' and ended up clarifying a few details about the relationship, while leaving a lot unresolved. E... (read more)

I like this because it reminds me:

  • before complaining about someone not making the obvious choice, first ask if that option actually exists (e.g. are they capable of doing it?)
  • before complaining about a bad decision, to ask if the better alternatives actually exist (people aren't choosing a bad option because they think it's better than a good option; they're choosing it because all other options are worse)

However, since I use it for my own thinking, I think of it more as an imaginary/mirage option instead of a fabricated option. It is indeed an option... (read more)

I'm glad for this article because it sparked the conversation about the relevance of behavioral economics. I also agree with Scott's criticism of it (which unfortunately isn't part of the review). But together they made for a great update on the state of behavioral economics.

I checked if there's something new in the literature since these articles were published, and found this paper by three of the authors who wrote the 2020 article Scott wrote about in his article. They conclude that "the evidence of loss aversion that we report in this paper and in Mrkv... (read more)

I think the general claim this post makes is

  • incredibly important
  • well argued
  • non obvious to many people

I think there's an objection here that value != consumption of material resources, hence the constraints on growth may be far higher than the author calculates. Still, the article is great

Brilliant article. I’m also curious about the economics side of things.

I found an article which estimates that nuclear power would be two orders of magnitude cheaper if the regulatory process were to be improved, but it doesn’t explain the calculations which led to the ‘two orders of magnitude’ claim. https://www.mackinac.org/blog/2022/nuclear-wasted-why-the-cost-of-nuclear-energy-is-misunderstood

I think this is my second-favorite post in the MIRI dialogues (for my overall review see here). 

I think this post was valuable to me in a much more object-level way. I think this post was the first post that actually just went really concrete on the current landscape of efforts int he domain of AI Notkilleveryonism and talked concretely about what seems feasible for different actors to achieve, and what isn't, in a way that parsed for me, and didn't feel either like something obviously political, or delusional. 

I didn't find the part about differ... (read more)

The Georgism series was my first interaction with a piece of economic theory that tried to make sense by building a different model than anything I had seen before. It was clear and engaging. It has been a primary motivator in my learning more about economics. 

I'm not sure how the whole series would work in the books, but the review of Progress and Poverty was a great introduction to all the main ideas. 

There are lots of anecdotes about choosing the unused path and being the disruptor, but I feel this post explains the idea more clearly, with better analogies and boundaries.

To achieve a goal you have to build a lot of skills (deliberate practice) and apply them when it is really needed (maximum performance). Less is talked about searching for the best strategy and combination of skills. I think "deliberate play" is a good concept for this because it shows that strategy research is a small but important part of playing well.

I just really like the clarity of this example. Noticing concrete lived experience at this level of detail. It highlights the feeling in my own experience and makes me more likely to notice it in real time when it's happening in my own life.

As a 2021 "best of" post, the call for people to share their experiences doesn't make as much sense, particularly should this post end up included in book form. I'm not sure how that fits with the overall process though. I don't wish Anna hadn't asked for more examples!

So, "Don't Shoot the Dog" is a collection of parenting advice based solely on the principle of reinforcement learning, i.e., the idea that kids do things more if they are rewarded and less if they're punished. It gets a lot out of this, including things that many parents do wrong. And the nicest thing is that, because everything is based on such a simple idea, most of the advice is self-evident. Pretty good, considering that learning tips are often controversial.

For example, say you ask your kid to clean her room, but she procrastinates on the task. When s... (read more)

I liked this story enough to still remember it, separately from the original Sort By Controversial story.  Trade across moral divide is a useful concept to have handles for.

I appreciate this post, though mostly secondhand.  It's special to me because it provided me with a way to participate more-or-less directly in an alignment project: one of my glowfic buddies decided to rope me in to write a glowfic thread in this format for the project [here](https://glowfic.com/posts/5726).  I'd like to hear more updates about how it's gone in the last year, though!

This was a very interesting read. Aside from just illuminating history and how people used to think differently, I think this story has a lot of implications for policy questions today.

The go-to suggestions for pretty much any structural ill in the world today is to "raise awareness" and "appoint someone". These two things often make the problem worse. "Raising awareness" mostly acts to give activists moral license to do nothing practical about the problem, and can even backfire by making the problem a political issue. For example, a campaign to raise awar... (read more)

EDIT: Oops, in a tired state I got muddled between this AMA post and the original introduction of dath ilan made in an April Fool's post in 2014 (golly, that's a while back)

When this was published, I had little idea of how ongoing a concept dath ilan would become in my world. I think there's value both in the further explorations of this (e.g. Mad Investor Chaos glowfic and other glowfics that illustrate a lot of rationality and better societal function than Earth), but also in just the underlying concept of "what would have produced you as the median pers... (read more)

I'm glad I started using Anki. I did Anki 362/365 days this year. 

I averaged 19 minutes of review a day (although I really think review tended to take longer), which nominally means I spent 4.75 clock-days studying Anki. 

Seems very worth it, in my experience. 

One of the things I like about this post the most, is it shows how much careful work is required to communicate. It's not a trick, it's not just being better than other people, it's simple, understandable, and hard work.

I think all the diagrams are very helpful, and really slow down and zoom in on the goal of nailing down an interpretation of your words and succeeding at communicating.

There's a very related concept (almost so obvious that you could say it's contained in the post) of finding common ground, where, in negotiation/conflict/disagreement, you ca... (read more)

I'm torn here because this post is incremental progress, and the step size feels small for inclusion in the books. OTOH small-but-real progress is better than large steps in the wrong direction, and this post has the underrepresented virtues of "strong connection to reality" and "modeling the process of making progress". And just yesterday I referred someone to it because it contained concepts they needed.

I think this post was important. I used the phrase 'Law of No Evidence: Any claim that there is “no evidence” of something is evidence of bullshit' several times (mostly in reply to tweets using that phrase or when talking about an article that uses the phrase).

Was it important intellectual progress? I think so. Not as a cognitive tool for use in an ideal situation, where you and others are collaboratively truth-seeking - but for use in adversarial situations, where people, institutions and authorities lie, mislead and gaslight you.

It is not a tool meant t... (read more)

As I said in my original comment here, I'm not a parent, so I didn't get a chance to try this. But now I work at a kindergarten, and was reminded of this post by the review process, so I can actually try it! Expect another review after I do :)

This is a personal anecdote, so I'm not sure how to assess it as an intellectual contribution. That said, global developments like the Covid pandemic sure have made me more cynical towards our individual as well as societal ability to notice and react to warning signs. In that respect, this story is a useful complement to posts like There's No Fire Alarm for Artificial General Intelligence, Seeing the Smoke, and the 2021 Taliban offensive in Afghanistan (which even Metaculus was pretty blindsided by).

And separately, the post resulted in some great discussi... (read more)

Focusing on the Alpha (here 'English Strain') parts only and looking back, I'm happy with my reasoning and conclusions here. While the 70% prediction did not come to pass and in hindsight my estimate of 70% was overconfident, the reasons it didn't happen were that some of the inputs in my projection were wrong, in ways I reasoned out at the time would (if they were wrong in these ways) prevent the projection from becoming true. And at the time, people weren't making the leap to 'Alpha will take over, and might be a huge issue in some worlds depending on it... (read more)

I think this was an important question, even though I'm uncertain what effect it had.

It's interesting to note that this question was asked at the very beginning of the pandemic, just as it began to enter the public awareness (Nassim Taleb published a paper on the pandemic a day after this question was asked, and the first coronavirus post on LW was published 3 days later).

During the pandemic we have seen the degraded epistemic condition in effect, it was noticed very early (LW Example), and continued throughout the pandemic (e.g, supreme court judges stati... (read more)

I think this post was pretty causal in my interest in coordination theory/practice. In particular I think it helped shift my thinking from a somewhat unproductive "why can't I coordinate well with rationalists, who seem particularly hard to get to agree on anything?" to a more useful "how do you solve problems at scale?"

The CFAR handbook is very valuable but I wouldn't include it in the 2020 review. Or if then more as a "further reading" section at the end. Actually, such a list could be valuable. It could include links to relevant blogs (e.g. those already supporting cross-posting). 

Wow, I really love that this has been updated and appendix'd. It's really nice to see how this has grown with community feedback and gotten polished this from a rough concept.

Creating common knowledge on how 'cultures' of communication can differ seems really valuable for a community focused on cooperatively finding truth.

In this post, the author describes a pathway by which AI alignment can succeed even without special research effort. The specific claim that this can happen "by default" is not very important, IMO (the author himself only assigns 10% probability to this). On the other hand, viewed as a technique that can be deliberately used to help with alignment, this pathway is very interesting.

The author's argument can be summarized as follows:

  • For anyone trying to predict events happening on Earth, the concept of "human values" is a "natural abstraction", i.e. someth
... (read more)

The problem outlined in this post results from two major concerns on lesswrong: risks from advanced AI systems and irrationality due to parasitic memes.

It presents the problem of persuasion tools as continuous with the problems humanity has had with virulent ideologies and sticky memes, exacerbated by the increasing capability of narrowly intelligent machine learning systems to exploit biases in human thought. It provides (but doesn't explore) two examples from history to support its hypothesis: the printing press as a partial cause of the 30 years war, an... (read more)

Self-review: I still like this post a lot; I went through and changed some punctuation typos, but besides that I think it's pretty good.

There are a few things I thought this post did.

First, be an example of 'rereading great books', tho it feels a little controversial to call Atlas Shrugged a great book. The main thing I mean by that is that it captures some real shard of reality, in a way that looks different as your perspective changes and so is worth returning to, rather than some other feature related to aesthetics or skill or morality/politics.

Second, ... (read more)

Thinking about this now, not to sound self-congratulatory, but I'm impressed with the quantity and quality of examples I was able to stumble across. I'm a huge believer in examples and concreteness. Most of the time I'm unhappy with the posts I write in large part because I'm unhappy with the quantity and quality of examples. But it's just so hard to think of good ones, and posting seems better than not posting, so I post.

I still endorse this post pretty strongly. Giving it a google strikes me as something that is still significantly underutilized. By the ... (read more)

I think this post was among the more crisp updates that helped me understand Benquo's worldview, and shifted my own. I think I still disagree with many of Benquo's next-steps or approach, but I'm actually not sure. Rereading this post is highlighting some areas I notice I'm confused about.

This post clearly articulates a problem with having language both have a function of "communicating about object level facts" and "political coalitions, attacks/defense, etc". It makes it really difficult to communicate about important true facts without poking at the soc... (read more)

The back-and-forth (here and elsewhere) between Kaj & pjeby was an unusually good, rich, productive discussion, and it would be cool if the book could capture some of that. Not sure how feasible that is, given the sprawling nature of the discussion.

Building off Raemon's review, this feels like it is an attempt to make a 101-style point that everyone needs to understand if they don't already (not as rationalists, but as people in general) but that seems to me like it fails because those reading it will fall into the categories of (1) those who already got it and (2) those who need to get it but won't. 

I think this post (and similarly, Evan's summary of Chris Olah's views) are essential both in their own right and as mutual foils to MIRI's research agenda. We see related concepts (mesa-optimization originally came out of Paul's talk of daemons in Solomonoff induction, if I remember right) but very different strategies for achieving both inner and outer alignment. (The crux of the disagreement seems to be the probability of success from adapting current methods.)

Strongly recommended for inclusion.

This is a true engagement with the ideas in Paul original post. It actively changed my mind – at first I thought Paul was making a good recommendation, but now I think it was a bad one. It helped me step back from a very detailed argument and notice what rationalist virtues were in play. I think it's a great example of what a rebuttal of someone else's post looks like. I'd like to see it in the review, and I will vote on it somewhere between +3 and +7.

I really enjoyed this post. It was fun to read and really drove home the point about starting with examples. I also thought it was helpful that it didn't just saying, "teach by example". I feel that simplistic idea is all too common and often leads to bad teaching where example after example is given with no clear definitions or high level explanations. However, this article emphasized how one needs to build on the example to connect it with abstract ideas. This creates a bridge between what we already understand and what we are learning.

As I was thinking... (read more)

The notion of paradigm shifts has felt pretty key to how I think about intellectual progress (which in turn means "how do I think about lesswrong?"). A lot of my thinking about this comes from listening to talks by Geoff Anders (huh, I just realized I was literally at a retreat organized by an org called Paradigm at the time, which was certainly not coincidence). 

In particular, I apply the paradigm-theory towards how to think about AI Alignment and Rationality progress, both of which are some manner of "pre-paradigmatic."

I think this post is a good wr... (read more)

This post starts as a discussion of babies enjoying simple repetitive games and observes that for babies this is how they learn a skill. It then suggests that we should apply the same frame to understand adults who engage in seemingly maladaptive social behaviors, such as repetitive arguments, romantic drama, and being shocking to get attention. Finally, it gives several ideas of what might being happening in very abstract terms, in the language of machine learning. It fails to connect any of these abstract, machine-learning-type explanations to any of the... (read more)

adamshimi says almost everything I wanted to say in my review, so I am very glad he made the points he did, and I would love for both his review and the top level post to be included in the book. 

The key thing I want to emphasize a bit more is that I think the post as given is very abstract, and I have personally gotten a lot of value out of trying to think of more concrete scenarios where gradient hacking can occur. 

I think one of the weakest aspects of the post is that it starts with the assumption that an AI system has already given rise to an... (read more)

This post makes assertions about YC's culture which I find really fascinating. If it's a valid assessment of YC, I rather expect it to have broad implications for the whole capitalist and educational edifice. I've found lots of crystallized insight in Paul Graham's writing, so if his project is failing in the dimensions he's explicitly pointed out as important this seems like critical evidence towards how hard the problem space really is.

What does it mean for the rationalist commmunity if selecting people for quickness of response correlates with anxiety o... (read more)

This post is an observation about a difference between the patients in the doctor's prior practice dealing with poor Medicaid patients, and her current practice dealing with richer patients. The former were concerned with their relationships, the latter with their accomplishments. And the former wanted pills, the later often refused pills. And for these richer patients, refusing pills is a matter of identity - they want to be the type of people who can muddle through and don't need pills. They continue at jobs they hate, because they want to be the type of... (read more)

I experimented with extracting some of the core claims from this post into polls: 

Personally, I find that answering polls like these make me more of a "quest participant" than a passive reader. They provide a nice "think for yourself" prompt, that then makes me look at the essay with a more active mindset. But others might have different experiences, feel free to provide feedback on how it worked for... (read more)

I'm the author, writing a review/reflection.

I wrote this post mainly to express myself and make more real my understanding of my own situation. The summer of 2019 I was doing a lot of exploration on how I felt and experience the world, and also I was doing lots of detective work trying to understand "how I got to now."

The most valuable thing it adds is a detailed example of what it feels like to mishandle advice about emotions from the inside. This was prompted by the fact that younger me "already knew" about dealing with his emotions, and I wanted to writ... (read more)

Oh man, I loved this post. Very vivid mental model for subtle inferential gaps and cross-purposes.

It's surprising it's been so long since I thought about it! Surely if it's such a strong and well-communicated mental model, I would have started using it.. So why did I not?

My guess: the vast majority of conversation frames are still *not the frame which contains talking about frames*. I started recognizing when I wanted to have a timeout and clarify the particular flavor of conversation desired before continuing, but it felt like I broke focus or rappor anyt... (read more)

Lesswrong review of Zettelkasten - I stumbled upon this post a few weeks ago, and it solidified several of my vague thoughts on how I might make my notes more useful. In particular, it helped me think of ways I could unify the structures and content-linkage between my roam-notes, orgmode notes, filesystem, and paper journal. I especially appreciated the background context and long-term followup. This post proved an invaluable branching point. I would love it if abram integrated the followup insights back in to the overall post.

That said, I didn't actually ... (read more)

May I just say: Aaaaaa!

This post did not update my explicit model much, but it sure did give my intuition a concrete picture to freak out about. Claim 2 especially. I greatly look forward to the rewrite. Can I interest you in sending an outline/draft to me to beta read?

Given your nomination was for later work building on this post and spinning off discussion, you can likely condense this piece and summarize the later work / responses. (Unless you are hoping they get separately nominated for 2020?)

Your "See: Colonialism" as a casual aside had me cracking up... (read more)

I like this post and would like to see it curated, conditional on the idea actually being good. There are a few places where I'd want more details about the world before knowing if this was true.

  • Who owns this land? I'm guessing this is part of the Guadalupe Watershed, though I'm not sure how I'd confirm that.

This watershed is owned and managed by the Santa Clara Valley Water District.

  • What legal limits are there on use of the land? Wikipedia notes:

The bay was designated a Ramsar Wetland of International Importance on February 2, 2012.

I don't know what that ... (read more)

I think Raemon’s comments accurately describe my general feeling about this post-intriguing, but not well-optimized for a post.

However, I also think that this post may be the source of a subtle misconception in simulacra levels that the broader LessWrong community has adopted. Specifically, I think the distinction between 3 and 4 is blurred in this post, and tries to draw the false analogy that 1:2::3:4. Going from 3 (masks the absence of a profound reality) to 4 (no profound reality) is more clearly described not as a “widespread understanding” that they... (read more)

This post seems helpful in that it expands on the basic idea of the copenhagen interpretation of ethics, and when I first read it was modestly impactful to me, though it was mostly a way to reorganize what I already knew from the examples that Zvi uses. 

It seems to be very accurate and testable, through simple tests of moral intuitions? 

I would like to see more expanding on the conditions that get normal people out of this frame of mind, about suprising places that it pops up, and about realistic incentive design that can be used personally to get this to not happen in your brain.

This is a nice, simple model for thinking. But I notice that both logic and empiricism sometimes have "shortcuts" — non-obvious ways to shorten, or otherwise substantially robustify, the chain of (logic/evidence). It's reasonable to imagine that intuition/rationality would also have various shortcuts; some that would correspond to logical/empirical shortcuts, and some that would be different. Communication is more difficult when two people are using chains of reasoning that differ substantially in what shortcuts they use. You could ge... (read more)

Basic politeness rules, explained well for people who don't find them obvious, yay!

As I recall, this is a solid, well-written post. Skimming it over again prior to reviewing it, nothing stands out to me as something worth mentioning here. Overall, I probably wouldn't put it on my all-time best list, or re-read it too often, but I'm certainly glad I read it once; it's better than "most" IMO, even among posts with (say) over 100 karma.

I think the primary value of this post is in prompting Benquo's response. That's not nothing, but I don't think it's top-shelf, because it doesn't really explore the game theory of "care more" vs. "care less" attempts between two agents whose root values don't necessarily align.

This essay makes a valuable contribution to the vocabulary we use to discuss and think about AI risk. Building a common vocabulary like this is very important for productive knowledge transmission and debate, and makes it easier to think clearly about the subject.

I've had a read of this post.

It seems rather whiny. I'm struggling to see the value to the advancement of rational thinking.

edited to add - Imagine if this was an English comprehension test and the question was "with which character does the author most identify with?"

Decision theory is hard. In trying to figure out why DT is useful (needed?) for AI alignment in the first place, I keep running into weirdness, including with bargaining.

Without getting too in-the-weeds: I'm pretty damn glad that some people out there are working on DT and bargaining.

Holden's posts on writing and research (this one, The Wicked Problem Experience and Useful Vices for Wicked Problems) have been the most useful posts for me from Cold Takes and been directly applicable to things I've worked on. For instance the Wicked Problem Experience was published soon after I wrote a post about the discovery of laws of nature, and resonated with and validated some of how I approached and worked on that. I give all three +4.

This is a better spirit with which to accomplish great and important tasks than most I have around me, and I'm grateful that it was written up. I give this +4.

In his dialogue Deconfusing Some Core X-risk Problems, Max H writes:

Yeah, coordination failures rule everything around me. =/

I don't have good ideas here, but something that results in increasing the average Lawfulness among humans seems like a good start. Maybe step 0 of this is writing some kind of Law textbook or Sequences 2.0 or CFAR 2.0 curriculum, so people can pick up the concepts explicitly from more than just, like, reading glowfic and absorbing it by osmosis. (In planecrash terms, Coordination is a fragment of Law that follows from Validity, Util

... (read more)

Safetywashing describes a phenomenon that is real, inevitable, and profoundly unsurprising (I am still surprised whenever I see it, but that's my fault for knowing something is probable and being surprised anyway). Things like this are fundamental to human systems; people who read the Sequences know this.

This post doesn't prepare people, at all, for the complexity of how this would play out in reality. It's possible that most posts would fail to prepare people, because these posts change goalposts; and in the mundane process of following their incentives, ... (read more)

I really enjoyed this sequence, it provides useful guidance on how to combine different sources of knowledge and intuitions to reason about future AI systems. Great resource on how to think about alignment for an ML audience. 

This post makes a pretty straightforward and important point, and I've referenced it a few times since then. It hasn't made a huge impact , and it isn't the best explanation, but I think it's a good one that covers the basics, and I think it could be linked to more frequently.

I had read this post at the time, but forgotten about it since then. I've also made the central point of this post many times, and wish I had linked to it more, since it's a pretty good explanation.

One of the best parts of the CFAR handbook and most of the sections are solid as a standalone. A compressed instruction manual to the human brain. The analogies and examples are incredibly effective at covering the approach from different angles, transmitting information about "how to win" into your brain with high fidelity, and what I predict will be a low failure rate for a wide variety of people.

  1. Be Present is a scathing criticism of how modern society keeps people sort of half-alive. It's my favorite and what I found to be the most important meta-skill;
... (read more)

Self Review. 

I wasn't sure at the time the effort I put into this post would be worth it. I spent around 8 hours I think, and I didn't end up with a clear gearsy model of how High Reliability Tends to work.

I did end up following up on this, in "Carefully Bootstrapped Alignment" is organizationally hard. Most of how this post applied there was me including the graph from the vague "hospital Reliability-ification process" paper, in which I argued:

The report is from Genesis Health System, a healthcare service provider in Iowa that services 5 hospitals. N

... (read more)

I've had a vague intent to deliberately apply this technique since first reading this two years ago. I haven't actually done so, alas.

It still looks pretty good to me on paper, and feels like something I should attempt at some point. 

I like this article for having a amusing stories told with cute little diagrams that manage to explain a specific mental technique. At the end of reading it, I sat down, drew some trees, nodded, and felt like I'd learned a new tool. It's not a tool I use explicitly very often, but I use it a little, using it more wouldn't hurt, and if it happens to be a tool you'd use a lot (maybe because it covers a gap or mistake you make more) then this article is a really good explanation of how to use it and why.

It's interesting to compare this to Goal Factoring. They... (read more)

I've long believed TAPs are a fundamental skill-building block. But I've noticed lately that I never really gained, or solidified as well as I'd like, the skill of building TAPs. 

I just reread this post to see if it'd help. One paragraph that stands out to me is this:

And in cases where this is not enough—where your trigger does indeed fire, but after two weeks of giving yourself the chance to take the stairs, you discover that you have actually taken yourself up on it zero times—the solution is not TAPs!  The problem lies elsewhere—it's not an is

... (read more)

Solid, aside from the faux-pass self-references. If anyone wonders why people would have a high p(doom), especially Yudkowsky himself, this doc solves the problem in a single place. Demonstrates why AI safety is superior to most other elite groups; we don't just say why we think something, we make it easy to find as well. There still isn't much need for Yudkowsky to clarify further, even now.

I'd like to note that my professional background makes me much better at evaluating Section C than Sections A and B. Section C is highly quotable, well worth multiple ... (read more)

This is huge if it works; you're basically able to reprogram your entire life. 

I haven't gotten it to work yet, this post makes it look like it's not that hard to set up. In my case at least, it will probably require multiple days to set the trigger. Probably worth many hours of effort to set a trigger for "stop and think about what I should be thinking about", but I've heard at least one other person having a hard time setting the trigger such that it would go off. 

I, at least, was led to think by this post that it would less effort than actuall... (read more)

I think this post has the highest number of people who have reached out privately to me, thanking me for it. (I think 4-5 people). So seems like it's been at least pretty useful to some people.

I previously wrote an addendum a few months after publishing, mostly saying that while I stood by the technical words of the post, I felt like the narrative vibe of the post somewhat oversold my particular approach. Major grievings still seem to take a long time.

One update is that, the following December after writing this, I did a second "Private Dark Solstice" ritu... (read more)

Overall I'm delighted with this post. It gave me a quick encapsulation of an idea I now refer to a lot, and I've received many reports of it inspiring other people to run helpful tests.

A number of my specifics were wrong; it now looks like potatoes were irrelevant or at least insufficient for weight loss, and I missed the miracle of watermelon.  I think this strengthens rather than weakens the core philosophical point of the post, although of course I'd rather have been right all along. 

My current view is this post is decent at explaining something which is "2nd type of obvious" in a limited space, using a physics metaphor.  What is there to see is basically given in the title: you can get a nuanced understanding of the relations between deontology, virtue ethics and consequentialism using the frame of "effective theory" originating in physics, and using "bounded rationality" from econ.

There are many other ways how to get this: for example, you can read hundreds of pages of moral philosophy, or do a degree in it.  Advantage of t... (read more)

I think this post successfully got me to notice this phenomena when I do it, at least sometimes. 

For me, the canonical example here is "how clean exactly are we supposed to keep the apartment?", where there's just a huge array of how much effort (and how much ambient clutter) is considered normal.

I notice that, while I had previously read both Setting the Default and Choosing the Zero Point, this post seemed to do a very different thing (this is especially weird because it seems like structurally it's making the exact same argument as Setting the Defa... (read more)

I continue to frequently refer back to my functional understanding of bounded distrust. I now try to link to 'How To Bounded DIstrust' instead because it's more compact, but this is I think the better full treatment for those who have the time. I'm sad this isn't seeing more support, presumably because it isn't centrally LW-focused enough? But to me this is a core rationalist skill not discussed enough, among its other features.

I hadn't seen this post at all until a couple weeks ago. I'd never heard "exfohazard" or similar used. 

Insisting on using a different word seems unnecessary. I see how it can be confusing. I also ran into people confused by this a few years ago, and proposed "cognitohazard" for the "thing that harms the knower" subgenre. That also has not caught on. XD The point is, I'm pro-disambiguating the terms, since they have different implications. But I still believe what I did then, that the original broader meaning of the word "infohazard" is occasionally us... (read more)

I like this post! Steven Byrnes, and Jacob Cannell are two people with big models of the brain and intelligence which give concrete predictions which are unique, and large contributors to my own thinking. The post can only be excellent, and indeed it is! Byrnes doesn't always respond to Cannell how I would, but his responses usually shifted my opinion somewhat.

I like many aspects of this post. 

  • It promotes using intuitions from humans. Using human, social, or biological approaches is neglected compared to approaches that are more abstract and general. It is also scalable, because people can work on it that wouldn't be able to work directly on the abstract approaches.
  • It reflects on a specific problem the author had and offers the same approach to readers.
  • It uses concrete examples to illustrate.
  • It is short and accessible. 

I like the alternative presentation of Logical Induction. I believe Logical Induction to be an important concept and for such concepts making it accessible for different audiences or different cognitive styles is great.

I'm a father of four sons myself, and I'm very happy about this write-up and its lens on observation of learning of the growing child. Everything must be learned, and it happens stage by stage. Each stage builds on top of previous stages and in many cases, there are very many small steps that can be observed.

Initially, I sorta felt bummed out that a post-singularity utopia would render my achievements meaningless. After reading this, I started thinking about it more and now I feel less bummed out. Could've done with mentions of biblically accurate angels playing incomprehensible 24d MMOs or omniscient buddhas living in equianimous cosmic bliss but it still works.

Ideally reviews would be done by people who read the posts last year, so they could reflect on how their thinking and actions changed. Unfortunately, I only discovered this post today, so I lack that perspective.

Posts relating to the psychology and mental well being of LessWrongers are welcome and I feel like I take a nugget of wisdom from each one (but always fail to import the entirety of the wisdom the author is trying to convey.) 

 
The nugget from "Here's the exit" that I wish I had read a year ago is "If your body's emergency mobilization sys... (read more)

I'm glad I did this project and wrote this up. When your goal is to make a thing to make the AI alignment community wiser, it's not really obvious how to tell if you're succeeding, and this was a nice step forward in doing that in a way that "showed my work". That said, it's hard to draw super firm conclusions, because of bias in who takes the survey and some amount of vagueness in the questions. Also, if the survey says a small number of people used a resource and all found it very useful, it's hard to tell if people who chose not to use the resource woul... (read more)

i think, in retrospect, this feature was a really great addition to the website.

This post has been surprisingly important to me, and has made me notice how I was confused around what motivation is, conceptually. I've used Steam as a concept maybe once a week, both when introspecting during meditation and when thinking about AI alignment.

I remember three different occasions where I've used steam:

  1. Performing a conceptual analysis of "optimism" in this comment, in which I think I've clarified some of the usage of "optimism", and why I feel frustrated by the word.
  2. When considering whether to undertake a risky and kind of for-me-out-of-di
... (read more)

This isn't a post that I feel compelled to tell everyone they need to read, but nonetheless the idea immediately entered my lexicon, and I use it to this day, which is a pretty rare feat. +4

(Sometimes, instead of this phrase, I say that I'm feeling precious about my idea.)

One review criticized my post for being inadequate at world modeling - readers who wish to learn more about predictions are better served by other books and posts (but also praised me for being willing to update its content after new information arrived). I don't disagree, but I felt it was necessary to clarify my background of writing it.

First and foremost, this post was meant specifically as (1) a review of the research progress on Whole Brain Emulation of C. elegans, and (2) a request for more information from the community. I became aware of this resea... (read more)

Many people believe that they already understand Dennett's intentional stance idea, and due to that will not read this post in detail. That is, in many cases, a mistake. This post makes an excellent and important point, which is wonderfully summarized in the second-to-last paragraph:

In general, I think that much of the confusion about whether some system that appears agent-y “really is an agent” derives from an intuitive sense that the beliefs and desires we experience internally are somehow fundamentally different from those that we “merely” infer and a

... (read more)

I like this for the idea of distinguishing between what is real (how we behave) vs what is perceived (other people's judgment of how we are behaving). It helped me see that rather than focusing on making other people happy or seeking their approval, I should instead focus on what I believe I should do (e.g. what kinds of behaviour create value in the world) and measure myself accordingly. My beliefs may be wrong, but feedback from reality is far more objective and consistent than things like social approval, so it's a much saner goal. And more importantly,... (read more)

I'm in two minds about this post.

On one hand, I think the core claim is correct Most people are generally too afraid of low negative EV stuff like lawsuits, terrorism, being murdered etc... I think this is also a subset of the general argument that goes something like "most people are too cowardly. Being less cowardly is in most cases better"

That being said, I have a few key problems with this article that make me downvote it.

  • I feel like it's writing to persuade, not to explain. It's all arguments for not caring about lawsuits and no examination of why y
... (read more)

I'd ideally like to see a review from someone who actually got started on Independent Alignment Research via this document, and/or grantmakers or senior researchers who have seen up-and-coming researchers who were influenced by this document.

But, from everything I understand about the field, this seems about right to me, and seems like a valuable resource for people figuring out how to help with Alignment. I like that it both explains the problems the field faces, and it lays out some of the realpolitik of getting grants.

Actually, rereading this, it strikes me as a pretty good "intro to the John Wentworth worldview", weaving a bunch of disparate posts together into a clear frame. 

Learning what we can about how ML algorithms generalize seems very important. The classical philosophy of alignment tends to be very pessimistic about anything like this possibly being helpful. (That is, it is claimed that trying to reward "happiness-producing actions" in the training environment is doomed, because the learned goal will definitely generalize to something not-what-you-meant like "tiling the galaxies with smiley faces.") That is, of course, the conservative assumption. (We would prefer not to bet the entire future history of the world on AI ... (read more)

I haven't read a ton of Dominic Cummings, but the writing of his that I have read had a pretty large influence on me. It is very rare to get any insider story about how politics works internally from someone who speaks in a mechanistic language about the world, and I pretty majorly updated my models of how to achieve political outcomes, and also majorly updated upwards on my ability to achieve things in politics without going completely crazy (I don't think politics had no effect on Cumming's sanity, but he seems to have weathered it in a healthier way than the vast majority of other people I've seen go into it).

I have been doing various grantmaking work for a few years now, and I genuinely think this is one of the best and most important posts to read for someone who is themselves interested in grantmaking in the EA space. It doesn't remotely cover everything, but almost everything that it does say isn't said anywhere else.

I feel like this post is the best current thing to link to for understanding the point of coherence arguments in AI Alignment, which I think are really crucial, and even in 2023 I still see lots of people make bad arguments either overextending the validity of coherence arguments, or dismissing coherence arguments completely in an unproductive way.

This post in particular feels like it has aged well and became surprisingly very relevant in the FTX situation. Indeed post-FTX I saw a lot of people who were confidently claiming that you should not take a 51% bet to double or nothing your wealth, even if you have non-diminishing returns to money, and I sent this post to multiple people to explain why I think that criticism is not valid. 

A decent introduction to the natural abstraction hypothesis, and how testing it might be attempted. A very worthy project, but it isn't that easy to follow for beginners, nor does it provide a good understanding of how the testing might work in detail. What might consist a success, what might consist a failure of this testing? A decent introduction, but only an introduction, and it should have been part of a sequence or a longer post.

I still appreciate this post for getting me to think about the question "how much language can dogs learn?". I also still find the evidence pretty sus, and mostly tantalizing in the form of "man I wish there were more/better experiments like this."

BUT, what feels (probably?) less sus to me is JenniferRM's comment about the dog Chaser, who learned explicit nouns and verbs. This is more believable to me, and seems to have had more of a scientific setup. (Ideally I'd like to spend this review-time spot-checking that the paper seems reasonable, alas, in the gr... (read more)

This post has successfully stuck around in my mind for two years now! In particular, it's made me explicitly aware of the possibility of flinching away from observations because they're normie-tribe-coded.

I think I deny the evidence on most of the cases of dogs generating complex English claims. But it was epistemically healthy for that model anomaly to be rubbed in my face, rather than filter-bubbled away plus flinched away from and ignored.

This is a fantastic piece of economic reasoning applied to a not-flagged-as-economics puzzle! As the post says, a lot of its content is floating out there on the internet somewhere: the draw here is putting all those scattered insights together under their common theory of the firm and transaction costs framework. In doing so, it explicitly hooked up two parts of my world model that had previously remained separate, because they weren't obviously connected.

I think this is a fantastically clear analysis of how power and politics work, that made a lot of things click for me. I agree it should be shorter but honestly every part of this is insightful. I find myself confused even how to review it, because I don't know how to compare this to how confusing the world was before this post. This is some of the best sense-making I have read about how governmental organizations function today.

There's a hope that you can just put the person who's most obviously right in charge. This post walks through the basic things th... (read more)

I'm torn about this one. On the one hand, it's basically a linkpost; Katja adds some useful commentary but it's not nearly as important/valuable as the quotes from Lewis IMO. On the other hand, the things Lewis said really need to be heard by most people at some point in their life, and especially by anyone interested in rationality, and Katja did LessWrong a service by noticing this & sharing it with the community. I tentatively recommend inclusion.

The comments have some good discussion too.

This post was personally meaningful to me, and I'll try to cover that in my review while still analyzing it in the context of lesswrong articles.

I don't have much to add about the 'history of rationality' or the description of interactions of specific people.

Most of my value from this post wasn't directly from the content, but how the content connected to things outside of rationality and lesswrong.  So, basically, i loved the citations.

Lesswrong is very dense in self-links and self-citations, and to a lesser degree does still have a good number of li... (read more)

I think this post is valuable because it encourages people to try solving very hard problem, specifically by showing them how they might be able to do that! I think its main positive effect is simply in pointing out that it is possible to get good at solving hard problems, and the majority of the concretes in the post are useful for continuing to convince the reader of this basic possibility.

I'm leaving this review primarily because this post somehow doesn't have one yet, and it's way too important to get dropped out of the Review!

ELK had some of the most alignment community engagement of any technical content that I've seen. It is extremely thorough, well-crafted, and aims at a core problem in alignment. It serves as an examplar of how to present concrete problems to induce more people to work on AI alignment.

That said, I personally bounced after reading the first few pages of the document. It was good as far as I got, but it was pretty effortful to get through, and (as mentioned above) already had tons of attention on it.

I like this post for reinforcing a point that I consider important about intellectual progress, and for pushing against a failure mode of the Sequences-style rationalists.

As far as I can tell, intellectual progress is made bit by bit with later building on earlier Sequences. Francis Bacon gets credit for landmark evolution of the scientific method, but it didn't spring from nowhere, he was building on ideas that had built on ideas, etc.

This says the same is true for our flavor of rationality. It's built on many things, and not just probability theory.

The f... (read more)

I think of all the posts that Holden has written in the last two years, this is the one that I tend to refer to by far the most, in-particular the "size of economy" graph.

I think there are a number of other arguments that lead you to roughly the same conclusion ("that whatever has been happening for the last few centuries/millenia has to be an abnormal time in history, unless you posit something very cyclical"), that other people have written about (Luke's old post about "there was only one industrial revolution" is the one that I used to link for this the... (read more)

I was aware of this post and I think read it in 2021, but kind of bounced off it the dumb reason that "split and commit" sounds approximately synonymous with "disagree and commit", though Duncan is using it in a very different way.

In fact, the concept means something pretty damn useful, is my guess, and I can begin to see cases where I wish I was practicing this more. I intended to start. I might need to invent a synonym to make it feel less like an overloaded term. Or disagree and commit on matters of naming things :P

This was an important and worthy post.

I'm more pessimistic than Ajeya; I foresee thorny meta-ethical challenges with building AI that does good things and not bad things, challenges not captured by sandwiching on e.g. medical advice. We don't really have much internal disagreement about the standards by which we should judge medical advice, or the ontology in which medical advice should live. But there are lots of important challenges that are captured by sandwiching problems - sandwiching requires advances in how we interpret human feedback, and how we tr... (read more)

I'm generally happy to see someone do something concrete and report back, and this was an exceptionally high-value thing to try. 

This post felt like a great counterpoint to the drowning child thought experiment, and as such I found it a useful insight. A reminder that it's okay to take care of yourself is important, especially in these times and in a community of people dedicated to things like EA and the Alignment Problem. 

This was a useful and concrete example of a social technique I plan on using as soon as possible. Being able to explain why is super useful to me, and this post helped me do that. Explaining explicitly the intuitions behind communication cultures is useful for cooperation. This post feels like a step in the right direction in that regard.

I really enjoyed this post as a compelling explanation of slack in a domain that I don't see referred to that often. It helped me realize the value of having "unproductive" time that is unscheduled. It's now something I consider when previously I did not. 

This is a post about the mystery of agency. It sets up a thought experiment in which we consider a completely deterministic environment that operates according to very simple rules, and ask what it would be for an agentic entity to exist within that.

People in the game of life community actually spent some time investigating the empirical questions that were raised in this post. Dave Greene notes:

The technology for clearing random ash out of a region of space isn't entirely proven yet, but it's looking a lot more likely than it was a year ago, that a work

... (read more)

This post attempts to separate a certain phenomenon from a certain very common model that we use to understand that phenomenon. The model is the "agent model" in which intelligent systems operate according to an unchanging algorithm. In order to make sense of their being an unchanging algorithm at the heart of each "agent", we suppose that this algorithm exchanges inputs and outputs with the environment via communication channels known as "observations" and "actions".

This post really is my central critique of contemporary artificial intelligence discourse.... (read more)

This is an essay about methodology. It is about the ethos with which we approach deep philosophical impasses of the kind that really matter. The first part of the essay is about those impasses themselves, and the second part is about what I learned in a monastery about addressing those impasses.

I cried a lot while writing this essay. The subject matter -- the impasses themselves -- are deeply meaningful to me, and I have the sense that they really do matter.

It is certainly true that there are these three philosophical impasses -- each has been discussed in... (read more)

The underlying assumption of this post is looking increasingly unlikely to obtain. Nevertheless, I find myself back here every once and a while, wistfully fantasizing about a world that might have been.

I think the predictions hold up fairly well, though it's hard to evaluate, since they are conditioning on something unlikely, and because it's only been 1.5 years out of 20, it's unsurprising that the predictions look about as plausible now as they did then. I've since learned that the bottleneck for drone delivery is indeed very much regulatory, so who know... (read more)

Quick self-review:

Yep, I still endorse this post. I remember it fondly because it was really fun to write and read. I still marvel at how nicely the prediction worked out for me (predicting correctly before seeing the data that power/weight ratio was the key metric for forecasting when planes would be invented). My main regret is that I fell for the pendulum rocket fallacy and so picked an example that inadvertently contradicted, rather than illustrated, the point I wanted to make! I still think the point overall is solid but I do actually think this embar... (read more)

Solid post. Comments:

  • long; I find it hard to parse as a result. Formatting could be improved significantly to improve skimmability. tldr helps, but if the rest of the post's words are worth their time to read, they could use better highlighting - probably bold rather than italic.
  • I'm very unclear how this differs from a happy price. The forking of the term seems unnecessary.
  • This concept entered my thinking a long time ago.
  • Use of single-currency trade assumes an efficient market; the law of one price is broken by today's exponentially inefficient market
... (read more)

I found this post a delightful object-level exploration of a really weird phenomenon (the sporadic occurrence of the "tree" phenotype among plants). The most striking line for me was:

Most “fruits” or “berries” are not descended from a common “fruit” or “berry” ancestor. Citrus fruits are all derived from a common fruit, and so are apples and pears, and plums and apricots – but an apple and an orange, or a fig and a peach, do not share a fruit ancestor.

What is even going on here?!

On a meta-level my takeaway was to be a bit more humble in saying what complex/evolved/learned systems should/shouldn't be capable of/do.

Brief review: I think this post represents a realization many people around here have made, and says it clearly. I think it's fine to keep it as a record that people used to be blasé about the ease of secrecy, and later learned that it was much more complex than they thought. I think I'm at +1.

My quick two-line review is something like: this post (and its sequel) is an artifact from someone with an interesting perspective on the world looking at the whole problem and trying to communicate their practical perspective. I don't really share this perspective, but it is looking at enough of the real things, and differently enough to the other perspectives I hear, that I am personally glad to have engaged with it. +4.

Since the very beginning of LW, there has been a common theme that you can't always defer to experts, or that the experts aren't always competent, or that experts on a topic don't always exist, or that you sometimes have to do your own reasoning to determine who the experts are, etc. (E.g. LW on Hero Licensing, or on the Correct Contrarian Cluster, or on Inadequate Equilibria; or ACX in Movie Review: Don't Look Up.)

I don't think this post makes a particularly unique contribution to that larger theme, but I did appreciate its timing, and how it made and ref... (read more)

I'm not qualified to assess the accuracy of this post, but do very much appreciate its contribution to the discussion.

  • I appreciated the historical overview of vaccination at a time just before the mRNA vaccines had become formally approved anywhere. And more generally, I always like to see the Progress Studies perspective of history on LW, even if I don't always agree with it.
  • This post also put the various vaccines in context, and how and why vaccination technology was developed.
  • And it made clear that the vaccine technology you use fundamentally changes it
... (read more)

On one hand, AFAICT the math here is pretty fuzzy, and one could have written this post without it, instead just using the same examples to say "you should probably be less risk averse." I think, in practice for most people, the math is a vague tribal signifier that you can trust the post, to help the advice go down.

But, I see this post in a similar reference class to Bayes' Theorem. I think most people don't actually need to know Bayes Theorem. They need to remember a few useful heuristics like "remember the base rates, not just the salient evidence you c... (read more)

In this post I speculated on the reasons for why mathematics is so useful so often, and I still stand behind it. The context, though, is the ongoing debate in the AI alignment community between the proponents of heuristic approaches and empirical research[1] ("prosaic alignment") and the proponents of building foundational theory and mathematical analysis (as exemplified in MIRI's "agent foundations" and my own "learning-theoretic" research agendas).

Previous volleys in this debate include Ngo's "realism about rationality" (on the anti-theory side), the pro... (read more)

This post states a subproblem of AI alignment which the author calls "the pointers problem". The user is regarded as an expected utility maximizer, operating according to causal decision theory. Importantly, the utility function depends on latent (unobserved) variables in the causal network. The AI operates according to a different, superior, model of the world. The problem is then, how do we translate the utility function from the user's model to the AI's model? This is very similar to the "ontological crisis" problem described by De Blanc, only De Blanc ... (read more)

Conversations with Ray clarified for me how much secret keeping is a skill, separate from any principles about when would agree keeping a secret was good in principle, which has been very helpful in thinking through confidentiality agreements/decisions. 

I... haven't actually used this technique verbatim through to completion. I've made a few attempts to practice and learn it on my own, but usually struggled a bit to reach conclusions that felt right.

I have some sense that this skill is important, and it'd be worthwhile for me to go to a workshop similar to the one where Habryka and Eli first put this together. This feels like it should be an important post, and I'm not sure if my struggle to realize it's value personally is more due to "it's not as valuable as I thought" or "you actually have to do a fair... (read more)

I think this is an important skill and I'm glad it's written up at all. I would love to see the newer version Eli describes even more though. 

This post defines and discusses an informal notion of "inaccessible information" in AI.

AIs are expected to acquire all sorts of knowledge about the world in the course of their training, including knowledge only tangentially related to their training objective. The author proposes to classify this knowledge into "accessible" and "inaccessible" information. In my own words, information inside an AI is "accessible" when there is a straightforward way to set up a training protocol that will incentivize the AI to reliably and accurately communicate this inform... (read more)

Can crimes be discussed literally? makes a short case that when you straightforwardly describe misbehavior and wrongdoing, people commonly criticize the language you use, reading it as an attempt to build a coalition to attack the parties you're talking about. At the time I didn't think that this was my experience, and thought the post was probably wrong and confused. I don't remember when I changed my mind, but nowadays I'm much more aware of requests on me to not talk about what a person or group has done or is doing. I find myself the subject of such re... (read more)

What are some beautiful, rationalist artworks? has many pieces of art that help me resonate with what rationality is about.

Look at this statue.

A rationalist must rebuild their self and their mind.

That's the first piece, there's many more, that help me have a visual handle on rationality. I give this post a +4.

This is an extensions of the Embedded Agency philosophical position. It is a story told using that understanding, and it is fun and fleshes out lots of parts of bayesian rationality. I give it +4.

(This review is taken from my post Ben Pace's Controversial Picks for the 2020 Review.)

The best single piece of the whole Mazes sequence. It's the one to read to get all the key points. High in gears, low in detail. I give it +4.

Broader comment on the sequence as a whole:

The sequence is an extended meditation on a theme, exploring it from lots of perspective, about how large projects and large coordination efforts end up being eaten by Moloch. The specific perspective reminds me a bit of The Screwtape Letters. In The Screwtape Letters, the two devils are focused on causing people to be immoral. The explicit optimization for vices and persona

... (read more)

A Significant Portion of COVID-19 Transmission is Presymptomatic argued for something that is blindingly obvious now, but a real surprise to me at the time. Covid has an incubation period of up to 2 weeks at the extreme, where you can have no symptoms but still give it to people. This totally changed my threat model, where I didn't need to know if someone was symptomatic, but instead I had to calculate how much risk they took in the last 7-14 days. The author got this point out fast (March 14th) which I really appreciated. I give this +4.

(This review is ta... (read more)

This is one of multiple posts by Steven that explain the cognitive architecture of the brain. All posts together helped me understand the mechanism of motivation and learning and answered open questions. Unfortunately, the best post of the sequence is not in 2020. I recommend including either this compact and self-contained post or the longer (and currently higher voted) My computational framework for the brain.

This was a promising and practical policy idea, of a type that I think is generally under-provided by the rationalist community. Specifically. it attempts to actually consider how to solve a problem, instead of just diagnosing or analyzing it. Unfortunately, it took far too long to get attention paid, and the window for its usefulness has passed.

I think microCOVID was a hugely useful tool, and probably the most visibly useful thing that rationalists did related to the pandemic in 2020.

In graduate school, I came across micromorts, and so was already familiar with the basic idea; the main innovation for me in microCOVID was that they had collected what data was available about the infectiousness of activities and paired it with a updating database on case counts.

While the main use I got out of it was group house harmony (as now, rather than having to carefully evaluate and argue over particular acti... (read more)

This post went in a similar direction as Daniel Kokotajlo's 2x2 Simulacrum grid. It seems to have a "medium amount of embedded worldmodel", contrasted with some of Zvi's later simulacra writing (which I think bundle a bunch of Moral-Maze-ish considerations into Simulacrum 4) and Daniel's grid-version (which is basically unopinionated about where the levels came from)

I like that this post notes the distinction between domains where Simulacrum 3 is a degenerate form of level 1, vs domains where Simulacrum 3 is the "natural" form of expression.

Of the agent foundations work from 2020, I think this sequence is my favorite, and I say this without actually understanding it.

The core idea is that Bayesianism is too hard. And so what we ultimately want is to replace probability distributions over all possible things with simple rules that don't have to put a probability on all possible things. In some ways this is the complement to logical uncertainty - logical uncertainty is about not having to have all possible probability distributions possible, this is about not having to put probability distributi... (read more)

Of the posts in the "personal advice" genre from 2020, this is the one that made the biggest impression on me.

The central lesson is that there is a core generator of good listening,, and if you can tap into your curiosity about the other person's perspective, this both automatically makes you take actions that come off as good listening, and also improves the value of what you do eventually say.

Since reading it, I can supply anecdotal evidence that it's good advice for people like me. Not only do you naturally sound like one of those highly empathetic peop... (read more)

As mentioned in the post, I think it's personally helpful to look back, and is a critical service to the community as well. Looking back at looking back, there are things I should add to this list - and even something (hospital transmission) which I edited more recently because I have updated against having been wrong about in this post - but it was, of course, an interim postmortem, so both of these types of post-hoc updates seem inevitable.

I think that the most critical lesson I learned was to be more skeptical of information sources generally - even the... (read more)

I've been asked to self-review this post as part of the 2020 review. I pretty clearly still stand by it given that I was willing to crosspost it from my own blog 5 years after I originally wrote it. But having said that, I've had some new insights since mid-2020, so let me take a moment and re-read the post and make sure it doesn't now strike me as fatally confused...

...yeah, no, it's good! I made a couple of small formatting and phrasing edits just now but it's otherwise ready to go from my perspective.

The post is sort of weirdly contextual in that it's p... (read more)

Happened to look this post up again this morning and apparently it's review season, so here goes...

This post inspired me to play around with some very basic visualisation exercises last year. I didn't spend that long on it, but I think of myself as having a very weak visual imagination and this pushed me in the direction of thinking that I could improve this a good deal if I put the work in. It was also fascinating to surface some old visual memories.

I'd be intrigued to know if you've kept using these techniques since writing the post.

I haven't followed the covid discourse that'd indicate whether this post's ideas turned out to make sense. But, I really appreciated how this post went about investigating its model, and explored the IF-THEN ramifications of that investigation.

I like how it broke down "how does one take this hypothesis seriously?", i.e:

  1. There are things we could do to get better information.
  2. There are things individuals or small groups can do to improve their situation.
  3. There are things society as a whole could try to do that don’t have big downsides.
  4. We could take bold action
... (read more)

I think this post is interesting as a historical document. I would like to look back at this post in 2050 with the benefits of hindsight.

I like this post because it following its advice has improved my quality of life.

I've already written a comment with a suggestion that this post needs a summary so that you can benefit from it, even if you don't feel like wading through a bunch of technical material.

This post is excellent, in that it has a very high importance-to-word-count ratio. It'll take up only a page or so, but convey a very useful and relevant idea, and moreover ask an important question that will hopefully stimulate further thought.

This was important to the discussions around timelines at the time, back when the talk about timelines felt central. This felt like it helped give me permission to no longer consider them as central, and to fully consider a wide range of models of what could be going on. It helped make me more sane, and that's pretty important.

It was also important for the discussion about the use of words and the creation of clarity. There's been a long issue of exactly when and where to use words like "scam" and "lie" to describe things - when is it accurate, when is it ... (read more)

I can't think of a question on which this post narrows my probability distribution.

Not recommended.

In general, I think this post does a great job of articulatng a single, incomplete frame. Others in the review take umbrage with the moralizing tone, but I think the moralizing tone is actually quite useful to give an inside view of this frame. 

I believe this frame is incomplete, but gives an important perspective that is often ignored in the Lesswrong/Gray tribe.

I haven't reviewed the specific claims of the literature here, but I did live through a pandemic where a lot of these concerns came up directly, and I think I can comment directly on the experience.

  • Some LessWrong team members disagree with me on how bad remote-work is. I overall thought it was "Sort of fine, it made some things a bit harder, other things easier. It made it harder to fix some deeper team problems, but we also didn't really succeed at fixing those team problems for in previous non-pandemic years."
    • Epistemic Status, btw: I live the farthest aw
... (read more)

I think I have juuust enough background to follow the broad strokes of this post, but not to quite grok the parts I think Abram was most interested in. 

I definitely caused me to think about credit assignment. I actually ended up thinking about it largely through the lens of Moral Mazes (where challenges of credit assignment combine with other forces to create a really bad environment). Re-reading this post, while I don't quite follow everything, I do successfully get a taste of how credit assignment fits into a bunch of different domains.

For the "myop... (read more)

The post attempts to point out the important gap between fighting over norms/values and getting on the same page about what people's norms/values even are, and offers a linguistic tool to help readers navigate it in their life.

A lot of (the first half of) the post feels like An Intuitive Introduction to Being Pro Conversation Before Fighting, and it's all great reading.

I think the OP wants to see people really have conversation about these important differences in values, and is excited about that. Duncan believes that this phrase is a key step allowing (c... (read more)

I think that, among those who've done serious thought about how intellectual progress happens, it was pretty well known that in some domains a lot of research is happening on forums, and that forum participation as a research strategy can work. But in the broader world, most people treat forums as more like social spaces, and have a model of research works that puts it in distant, inaccessible institutional settings. Many people think research means papers in prestigious journals, with no model of where those papers come from. I think it's worth making common knowledge that getting involved in research can be as simple as tweaking your forum subscriptions.

I observe: There are a techniques floating around the rationality community, with models attached, where the techniques seem anecdotally effective, but the descriptions seem like crazy woo. This post has a model that predicts the same techniques will work, but the model is much more reasonable (it isn't grounded out in axon-connections, but in principle it could be). I want to resolve this tension in this post's favor. In fact I want that enough to distrust my own judgment on the post. But it does look probably true, in the way that models of mind can ever be true (ie if you squint hard enough).

This is not the clearest or the best explanation of simulacrum levels on LessWrong, but it is the first. The later posts on the subject (Simulacra and Subjectivity, Negative Feedback and Simulacra, Simulacra Levels and Their Interactions) are causally downstream of it, and are some of the most important posts on LessWrong. However, those posts were written in 2020, so I can't vote for them in the 2019 review.

I have applied the Simulacrum Levels concept often. I made spaced-repetition cards based on them. Some questions are easy to notice and ask, in simula... (read more)

For me, this is the paper where I learned to connect ideas about delegation to machine learning. The paper sets up simple ideas of mesa-optimizers, and shows a number of constraints and variables that will determine how the mesa-optimizers will be developed – in some environments you want to do a lot of thinking in advance then delegate execution of a very simple algorithm to do your work (e.g. this simple algorithm Critch developed that my group house uses to decide on the rent for each room), and in some environments you want to do a little thinking and ... (read more)

Note 1: This review is also a top-level post.

Note 2: I think that 'robust instrumentality' is a more apt name for 'instrumental convergence.' That said, for backwards compatibility, this comment often uses the latter. 

In the summer of 2019, I was building up a corpus of basic reinforcement learning theory. I wandered through a sun-dappled Berkeley, my head in the clouds, my mind bent on a single ambition: proving the existence of instrumental convergence. 

Somehow. 

I needed to find the right definitions first, and I couldn't even imagine what... (read more)

So, reviewing this seriously seems like a pretty big todo, which has not yet been done. I don't feel qualified to do it. But... this feels plausible enough to consider in at least a bit more depth, and if taken seriously it might have ramifications on how to think about current events.

I am interested in at least seeing a rough pass of how this post fares in the vote. I'd like to see a distillation of this post, plus Scott's Ages of Discord post, plus the SSC subreddit's response to Peter Turchin's response. (Maybe this already happened in some SSC highligh... (read more)

I... feel like this post is important. But I'm not actually sure how to use it and build around it.

I have vague memories of seeing this link-dropped by folk in the Benquo/Jessicata/Zack/Zvi crowd in various comments, but usually in a way that feels more like an injoke than a substantive point. 

I just checked the three top-level-post pingbacks, and I do think they make meaningful reference to this post. Which I think is sufficient for "yes this concept got followed up on in the past 2 years". But I'm left with a vague frustration with the concept feeli... (read more)

Author here: I think this post could use a bunch of improvements. It spends a bunch of time on tangential things (e.g. the discussion of Inadequacy and why this doesn't come through in textbooks, spending a while initially setting up a view to then tear down). 

But really what would be nice is to have it do a much better job at delivering the core insight. This is currently just done in two bullets + one exercise for the reader. 

Even more important would be to include JenniferRM's comment which adds a core mechanism (something like "cultural learn... (read more)

This points out something true and important that is often not noticed, and definitely is under-considered. That seems very good. The question I ask is, did this cause other people to realize this effect exists, and to remember to notice and think about it more? I don't know either way.

If so, it's an important post, and I'd be at moderately excited to include it. 

If not, it's not worth the space. 

I'm guessing this post could be improved/sharpened relatively easily, if it did get included - it's good, and there's nothing wrong exactly, but feels l... (read more)

I've written up a review here, which I made into a separate post because it's long.

Now that I read the instructions more carefully, I realize that I maybe should have just put it here and waited for mods to promote it if they wanted to. Oops, sorry, happy to undo if you like.

This is a retroactively obvious concept that I'd never seen so clearly stated before, which makes it a fantastic contribution to our repertoire of ideas. I've even used it to sanity-check my statements on social media. Well, I've tried.

Recommended, obviously.

This makes a simple and valuable point. As discussed in and below Anna's comment, it's very different when applied to a person who can interact with you directly versus a person whose works you read. But the usefulness in the latter context, and the way I expect new readers to assume that context, leads me to recommend it.

I liked the comments on this post more than I liked the post itself. As Paul commented, there's as much criticism of short AGI timelines as there is of long AGI timelines; and as Scott pointed out, this was an uncharitable take on AI proponents' motives.

Without the context of those comments, I don't recommend this post for inclusion.

Here are prediction questions for the predictions that TurnTrout himself provided in the concluding post of the Reframing Impact sequence

Elicit Prediction (eli
... (read more)

I continue to agree with my original comment on this post (though it is a bit long-winded and goes off on more tangents than I would like), and I think it can serve as a review of this post.

If this post were to be rewritten, I'd be particularly interested to hear example "deployment scenarios" where we use an AGI without human models and this makes the future go well. I know of two examples:

  1. We use strong global coordination to ensure that no powerful AI systems with human models are ever deployed.
  2. We build an AGI that can do science / engineering really wel
... (read more)

(You can find a list of all 2019 Review poll questions here.)

I've referred and linked to this post in discussions outside the rationalist community; that's how important the principle is. (Many people understand the idea in the domain of consent, but have never thought about it in the domain of epistemology.)

Recommended.

I've known about S-curves for a long time, and I don't think I read this the first time. If you don't know S-curves exist, this has good info, and it seems to be well explained. There are also a few useful nuggets otherwise. As someone who has long known of S-curves, hard to say how big an insight this is to others, but my instinct is that while I have nothing against this post and I'm very glad it exists, this isn't sufficiently essential to justify including. 

I think the CAIS framing that Eric Drexler proposed gave concrete shape to a set of intuitions that many people have been relying on for their thinking about AGI. I also tend to think that those intuitions and models aren't actually very good at modeling AGI, but I nevertheless think it productively moved the discourse forward a good bit. 

In particular I am very grateful about the comment thread between Wei Dai and Rohin, which really helped me engage with the CAIS ideas, and I think were necessary to get me to my current understanding of CAIS and to ... (read more)

So. I have the distinct sense I just read an unusually mathematical vagueblog.

Was there a way to explain these dynamics with concrete examples, and *NOT* have those groups' politics blow up in your face about it? Not sure. I'm really not sure. Could do with a flow chart?

I would be fascinated to see this in the form of a flowchart, and *then* run an experiment to test if jointly going through it shortens the time it takes to get two people arguing over norms/punishments to a state of double-crux.

This is so lovely! Pure happy infodump energy. Reveling in the wonder of reality.

It's very close to a zetetic explanation, which I massively approve of. You might even say it's a concrete example.

(Side note: Rebar reinforced concrete was a mistake. It rusts in place and this fucks up so much modern architecture.)

I think the point this post makes is right, both as a literal definition of what a rule is, and of how you should respond to the tendency to make "exceptions." I prefer the notion of a "framework" to a rule, because it suggests that the rules can be flexible, layered, and only operating in specific contexts (where appropriate). For example, I'm trying to implement a set of rules about when I take breaks from work, but the rule "25 minutes on, 5 minutes off" only is valid when I'm actually at work.

My point of disagreement is the conclusion - that exceptions... (read more)

tl;dr – I'd like to see further work that examines a ton of examples of real coordination problems that rationalists have run into ("stag hunt" shaped and otherwise), and then attempt to extract more general life lessons and ontologies from that. 

...

1.5 years later, this post still seems good for helping to understand the nuances of stag hunts, and I think was worth a re-read. But something that strikes me as I reread it is that it doesn't have any particular takeaway. Or rather, it doesn't compress easily. 

I spent 5 minutes seeing if I could dis... (read more)

I read this post only half a year ago after seeing it being referenced in several different places, mostly as a newer, better alternative to the existing FOOM-type failure scenarios. I also didn't follow the comments on this post when it came out.

This post makes a lot of sense in Christiano's worldview, where we have a relatively continuous, somewhat multipolar takeoff which to a large extent inherits the problem in our current world. This is especially applies to part I: we already have many different instances of scenarios where humans follow measured in... (read more)

(Self-review.) I oppose including this post in a Best-of-2019 collection. I stand by what I wrote, but it's not potential "canon" material, because this was a "defensive" post for the 2018 Review: if the "contextualizing vs. decoupling" idea hadn't been as popular and well-received as it was, there would be no reason for this post to exist.

A standalone Less Wrong "house brand" explanation of Gricean implicature (in terms of Bayesian signaling games, probably?) could be a useful reference post, but that's not what this is.

The factual point that moderate liberals are more censorious is easy to lose track of, and I saw confusion about it today that sent me back to this article.

I appreciate that this post starts from a study, and outlines not just the headline from the study but the sample size. I might appreciate more details on the numbers, such as how big the error bars are, especially for subgroups stats.

Historical context links are good, and I confirm that they state what they claim to state.

Renee DiResta is no longer at New Knowledge, though her previous work there is st... (read more)

I see where Raemon is going with this, and for a simplified model, where number of words is the only factor, this is at least plausible. Super-simplified models can be useful not only insofar as they make accurate predictions, but because they suggest what a slightly more complex model might look like.

In this case, what other factors play into the number of people you can coordinate with about X words?

Motivation (payment, commitment to a cause, social ties, status) Repetition, word choice, presentation Intelligence of the audience Concreteness and familiar... (read more)

in retrospect it looks like understanding cognitive biases doesn’t actually make you substantially more effective

 

I'm not convinced that this is true, or that it's an important critique of the original sequences.

 

Looking at the definition of agent, I'm curious how this matches with Cartesian Frames.

Given that we want to learn to think about humans in a new way, we should look for ways to map the new way of thinking into a native mode of thought

I was very happy to read this pingback, but it's purely anecdotal. There are better sources for t... (read more)

This post sparked some meta topic ideas to extend the conversation on note taking and productivity:

  • A list of 50 factors influencing productivity, such as "notetaking methods," "desk setup" and "cold-emailing experts to ask questions" so that people could get a broad perspective on aspects of their productivity to explore.
  • A map of books or web pages listing numerous examples and descriptions in each factor category so that people could experiment.
  • When people study productivity methods, how do they go about it? Are the research methods sound?
  • I tried this met
... (read more)

This post gave a slightly better understanding of the dynamics happening inside SGD. I think deep double descent is strong evidence that something like a simplicity prior exists in SGG, which might have actively bad generalization properties, e.g. by incentivizing deceptive alignment. I remain cautiously optimistic that approaches like Learning the Prior can get circumnavigate this problem.

Can you help me paint a specific mental picture of a driver being exploited by Uber?

I've had similar "exploitation" arguments with people:

"Commodification" and "dehumanization" don't mean anything unless you can point to their concrete effects.

I think your way of handling it is much, much better than how I've handled it. It comes across as less adversarial while still making the other person do the work of explaining themselves better. I've found that small tricks like this can completely flip a conversation from dysfunctional to effective. I'll have to remember to use your suggestion.

This post raises some reasonable-sounding and important-if-true hypotheses. There seems to be a vast open space of possible predictions, relevant observations, and alternative explanations. A lot of it has good treatment, but not on LW, as far as I know.

I would recommend this post as an introduction to some ideas and a starting point, but not as a good argument or a basis for any firm conclusions. I hope to see more content about this on LW in the future.

The content here is pretty awesome. I'm a little wary of including it in our review because it is, as author notes, more of a general-audience thing, but it's both a lot of fun and is making important points.

Re-reading this for review was a weird roller-coaster. I had remembered (in 2018) my strong takeaway that aesthetics mattered to rationality, and that "Aesthetic Doublecrux" would be an important innovation.

But I forgot most of the second half of the article. And when I got to it, I had such a "woah" moment that I stopped writing this review, went to go rewrite my conclusion in "Propagating Facts into Aesthetics" and then forgot to finish the actual review. The part that really strikes me is her analysis of Scott:
 

Sometimes I can almost feel this happ

... (read more)

This essay defines and clearly explains an important property of human moral intuitions: the divergence of possible extrapolations from the part of the state spaces we're used to think about. This property is a challenge in moral philosophy, that has implications on AI alignment and long-term or "extreme" thinking in effective altruism. Although I don't think that it was especially novel to me personally, it is valuable to have a solid reference for explaining this concept.

I support the inclusion of this post in the Best-of-2018 Review.

It's a thorough explicit explanation of a core concept in group epistemology, allowing aspects of social reality to better add up to normality, and so it's extremely relevant to this community.

The consensus goals strongly needs rethinking imo. This is a clear and fairly simple start at such an effort. Challenging the basics matters. 

Still seems too early to tell if this is right, but man is it a crux (explicit or implicit).

 

Terence Tao seems to have gotten some use out of the most recent LLMs.

I cannot claim to have used this tool very much since it was announced, but this sure seems like the way to go if one wanted to quickly get better at any kind of medium or long-term forecasting. I would also really love a review from someone who has actually used it a bunch, or a self-review by the people who built it. 

I am confused about whether the videos are real and exactly how much faster AIs could be run. But I think at the very least it's a promising direction to look for grokkable bounds on how advanced AI will go

I feel kind of conflicted about this post overall, but I certainly think using the concept fairly frequently.

This post is not only a groundbreaking research into the nature of LLMs but also a perfect meme. Janus's ideas are now widely cited at AI conferences and papers around the world. While the assumptions may be correct or incorrect, the Simulators theory has sparked huge interest among a broad audience, including not only AI researchers. Let's also appreciate the fact that this post was written based on the author's interactions with non-RLHFed GPT-3 model, well before the release of ChatGPT or Bing, and it has accurately predicted some quirks in their behavi... (read more)

This post is really important as a lot of other materials on LessWrong (notably AI to Zombies) really berate the idea that trying out things that haven't been tested via the Scientific Method. 
This post explains that some (especially health) conditions may go completely outside the scope of testable-via-scientific-method, and at some point turning to chance is a good idea, reminding us that intuition may be often wrong but it can work wonders when used as a last resort. 
This is something to remember when trying to solve problems that don't seem to have one perfect mathematical solution (yet).

I liked this for the idea that fear of scarcity can drive "unreasonable" behaviors. This helps me better understand why others may behave in "undesirable" ways and provides a more productive way of addressing the problem than blaming them for being e.g. selfish. This also provides a more enjoyable way of changing my behaviors. Instead of being annoyed with myself for e.g. being too scared to talk to people, I look out for tiny accomplishments (e.g. speaking up when someone got my order wrong) and the benefits it brings (e.g. getting what I wanted to order), to show myself that I am capable. The more capable I feel, the less afraid I am of the world.

This post was critically important to the core task of solving alignment - or deciding we can't solve it in time and must take low-odds alternative strategies.

Letting Eliezer be the lone voice in the wilderness isn't a good idea. This post and others like it trying to capture his core points in a different voice are crucial.

After going back and forth between this post and the original LoL several times, I think Zvi has captured the core points very well.

I think this post did good work in its moment, but doesn't have that much lasting relevance and can't see why someone would revisit at this point. It shouldn't be going into any timeless best-of lists.

Doesn't say anything particularly novel but presents it in a very clear and elegant manner which has value in and of itself.

The way I understood it, this post is thinking aloud while embarking on the scientific quest of searching for search algorithms in neural networks. It's a way to prepare the ground for doing the actual experiments. 

Imagine a researcher embarking on the quest of "searching for search". I highlight in cursive the parts present in the post (if they are present at least a little):

- At some point, the researcher reads Risks From Learned Optimization.
- They complain: "OK, Hubinger, fine, but you haven't told me what search is anyway"
- They read or get invol... (read more)

This is a great reference for the importance and excitement in Interpretability.

I just read this for the first time today. I’m currently learning about Interpretability in hopes I can participate, and this post solidified my understanding of how Interpretability might help.

The whole field of Interpretability is a test of this post. Some of the theories of change won’t pan out. Hopefully many will. Perhaps more theories not listed will be discovered.

One idea I’m surprised wasn’t mentioned is the potential for Interpretability to supercharge all of the scien... (read more)

This post suggests a (and, quite possibly, the) way to select outcome in bargaining (fully deterministic multi-player game).

ROSE values replace Competition-Cooperation ones by having players not compete against each other in attempts to extract more utility but average their payoffs over possible initiative orders. A probable social consequence is noted, that people wouldn't threaten everyone else in order to get something they want but would rather maximize own utility (improve their situation themselves).

ROSE values are resistant to unconditional threats... (read more)

I think this post highlights some of the difficulties in transmitting information between people - particularly the case of trying to transmit complex thoughts via short aphorisms.

I think the comments provided a wealth of feedback as to the various edge cases that can occur in such a transmission, but the broad strokes of the post remain accurate: understanding compressed wisdom can't really be done without the life experience that the wisdom tried to compress to begin with.

If I was to rewrite the post, I'l likely emphasize the takeaway that, when giving a... (read more)

Since simplicity is such a core heuristic of rationality, having a correct conceptualization of complexity is important.

The usual definition of complexity used on LW (and elsewhere) is Kolmogorov Complexity, which the length of its shortest code. Nate suggests a compelling idea - that another way to be simple is to have many different codes. For example, a state with five 3-bit codes is simpler than a state with one 2-bit code.

Nate justifies that with a bunch of math, and some physics, which I don't understand well enough to comment on.

A bunch of people wh... (read more)

Probably not the most important thing ever, but this is really pleasing to look at, from the layout to the helpful pictures, which makes it an absolute joy to read.

Also pretty good at explaining Chinchilla scaling too I guess.

I tend to get into pointless internet arguments where both parties end up agreeing but in a bitter angry way somehow so I needed to hear this. 

This made me grok Löb's theorem enough to figure out how to exploit it for self-improvement purposes.

Motivated me to actually go out and do agent foundations so thumbs up!

I found this post extremely clear, thoughtful, and enlightening, and it’s one of my favorite lesswrong (cross)posts of 2022. I have gone back and reread it at least a couple times since it was first posted, and I cited it recently here.

This post demonstrates another surface of the important interplay between our "logical" (really just verbal) part-of-mind and our emotional part-of-mind. Other posts on this site, including by Kaj Sotala and Valentine, go into this interplay and how our rationality is affected by it.

It's important to note, both for ourselves and for our relationships with others, that the emotional part is not something that can be dismissed or fought with, and I think this post does well in explaining an important facet of that. Plus, when we're shown the possible pitfalls ahead of any limerence, we can be more aware of it when we do fall in love, which is always nice.

I really like this post because it directly clarified my position on ethics, namely making me abandon unbounded utilities. I want to give this post a Δ and +4 for doing that, and for being clearly written and fairly short.

FWIW, this is not a post that I ended up integrating into my thinking in the past 2 years. I had a conversation with Duncan once where he brought up the metaphor and it made sense in context, but it doesn't feel like a super natural category to me.

I think this post does two things well:

  • helps lower the internal barrier for what is "worth posting" on LW
  • helps communicate the epistemic/communication norms that define good rationalish writing

I think this post describes an important idea for political situations.

While online politics is a mind-killer, it (mostly) manages to avoid "controversial" topics and stays on a meta-level. The examples show that in group decisions the main factor is not the truth of statements but the early focus of attention. This dynamic can be used for good or bad, but it feels like it really happens a lot, and accurately describes an aspect of social reality.

Fails to make a clear point; talks about the ability to publish in the modern world, then brushes over cancel culture, immigration, and gender differences. Needs to make a stronger argument and back it up with evidence.

This post points at an interesting fact: some people, communities or organizations already called themselves "rationalists" before the current rationalist movement. It brings forth the idea that the rationalist movement may be anchored in a longer history than what might first seem from reading LessWrong/Overcoming Bias/Eliezer history.

However, this post reads more like a Wikipedia article, or an historical overview. It does not read like it has a goal. Is this post making some sort of argument that the current rationalist community is descended from those... (read more)

Our attention is one of the most valuable resources we have, and it is now through recent AI developments in NLP and machine vision that we are realizing that it might very well be a fundamental component of intelligence itself. 

This post brings this point to attention (pun intended) by using video games as examples, and encourages us to optimize the way we use this limited resource to maximize information gain and to improve our cooperation skills by avoiding being 'sound absorbers'. 

On one level, the post used a simple but emotionally and logically powerful argument to convince me that the creation of happy lives is good. 

On a higher level, I feel like I switch positions of population ethics every time I read something about it, so I am reluctant to predict that I will hold the post's position for much time. I remain unsettled that the field of population ethics, which is central to long-term visions of what the future should look like, has so little solid knowledge. My thinking, and therefore my actions, will remain split among ... (read more)

The post expands on the intuition of ML field that reinforcement learning doesn't always work and getting it to work is fiddly process.

In the final chapter, a DeepMind paper that argues that 'one weird trick' will work, is demolished.

This post is short, but important. The fact that we regularly receive enormously improbable evidence is relevant for a wide variety of areas. It's an integral part of having accurate beliefs, and despite this being such a key idea, it's underappreciated generally (I've only seen this post referenced once, and it's never come up in conversation with other rationalists). 

Consider this a short review of the entire "This Most Important Century" sequence, not just the Introduction post.

This series was one of the first and most compelling writings I read when I was first starting to consider AI risk. It basically swayed me from thinking AI was a long way off and will likely have moderate impact among technologies, to thinking AI will likely be transformative and come in the next few decades.

After that I decided to become an AI alignment researcher, in part because of these posts. So the impact of these posts on me personally w... (read more)

I want to see Adam do a retrospective on his old goal-deconfusion stuff.

I am quite happy about this post. It appears to be easily digestible by readers, and it points to a very important problem.

Rationalists can be great, but if there's one thing we're vulnerable to, then it is probably to get carried away with theories about things that turn out to be detached from reality.

One type of work that I would like to see more of in the rationalist community is comprehensive empirical work, to help work against the problem described here. Of course, it's much easier to ask for that than to provide it. It might also be good to develop more rationalist theory about how to efficiently pick the relevant empirical things to work on.

Mixed feelings. I want to produce a rewrite of this incorporating all of the background information and inferences I have in the addendum, and wish I had been a good enough writer/thoonker to do so in the first place. But overall I'm happy with it and was glad others found it informative :)

Writing up your thoughts is useful. Both for communication and for clarification to oneself. Not writing for fear of poor epistemics is an easy failure mode to fall into, and this post clearly lays out how to write anyway. More writing equals more learning, sharing, and opportunities for coordination and cooperation. This directly addresses a key point of failure when it comes to groups of people being more rational. 

A great explanation of something I've felt, but not been able to articulate. Connecting the ideas of Stag-Hunt, Coordination problems, and simulacrum levels is a great insight that has paid dividends as an explanatory tool. 

It's a fine overview of modern language models. Idea of scaling all the skills at the same time is highlighted, different from human developmental psychology. Since publishing 500B-PaLM models seemed to have jumps at around 25% of the tasks of BIG-bench.

Inadequacy of measuring average performance on LLM is discussed, where a proportion is good, and rest is outright failure from human PoV. Scale seems to help with rate of success.

Paul creates a sub problem of alignment which is "alignment with low stakes." Basically, this problem has one relaxation from the full problem: We never have to care about single decisions, or more formally traps cannot happen in a small set of actions.

Another way to say it is we temporarily limit distributional shift to safe bounds.

I like this relaxation of the problem, because it gets at a realistic outcome we may be able to reach, and in particular it let's people work on it without much context.

However, the fact inner alignment doesn't need to be solved may be a problem depending on your beliefs about outer vs inner alignment.

I'd give it a +3 in my opinion.

This is a great post that exemplifies what it is conveying quite well. I have found it very useful when talking with people and trying to understand why I am having trouble explaining or understanding something. 

This post introduces the concept of a "cheerful price" and (through examples and counterexamples) narrows it down to a precise notion that's useful for negotiating payment. Concretely:

  1. Having "cheerful price" in your conceptual toolkit means you know you can look for the number at which you are cheerful (as opposed to "the lowest number I can get by on", "the highest number I think they'll go for", or other common strategies). If you genuinely want to ask for an amount that makes you cheerful and no more, knowing that such a number might exist at all is u
... (read more)

I'm not sure how I feel about this post. 

Here are three different things I took it to mean:

  1. There are two different algorithms you might want to follow. One is "uphold a specific standard that you care about meeting". The other is "Avoiding making people upset (more generally)." The first algorithm is bounded, the second algorithm is unbounded, and requires you to model other people.
  2. You might call the first algorithm "Uphold honor" and the second algorithm "Manage PR concerns", and using those names is probably a better intuition-guide.
  3. The "Avoiding ma
... (read more)

The Moral Mazes sequence prompted a lot of interesting hypotheses about why large parts of the world seem anti-rational (in various senses). I think this post is the most crisp summary of the entire model. 

I agree with lionhearted that it'd be really nice if somehow the massive list of things could be distilled into chunks that are easier to conceptualize, but sympathize that reality doesn't always lend itself to such simplification. I'd be interested in Zvi taking another stab at it though.

One good idea to take out of this is that other people's ability to articulate their reasons for their belief can be weak—weak enough that it can distract from the strength of evidence for the actual belief. (More people can catch a ball than explain why it follows the arc that it does).

This idea seems obviously correct, all the responses to objections seem correct, and the chance of this happening any time soon is about epsilon. 

In some sense I wish the reasons it will never happen were less obvious than they are, so it would be a better example of our inability to do things that are obviously correct. 

The question is, how much does this add to the collection. Do we want to use a slot on practical good ideas that we could totally do if we could do things, and used to do? I'm not sure. 

Kaj_sotala's book summary provided me with something I hadn't seen before - a non-mysterious answer to the question of consciousness. And I say this as someone who took graduate level courses in neuroscience (albeit a few years before the book was published). Briefly, the book defines consciousness as the ability to access and communicate sensory signals, and shows that this correlates highly with those signals being shared over a cortical Global Neuronal Workspace (GNW). It further correlates with access to working memory. The review also gives a great ac... (read more)

Author of the post here. I edited the post by:

(1) adding an introduction — for context, and to make the example in Part I less abrupt

(2) editing the last section — the original version was centered on my conversations with Rationalists in 2011-2014; I changed it to be a more general discussion, so as to broaden the post's applicability and make the post more accessible

(You can find a list of all 2019 Review poll questions here.)

(You can find a list of all review poll questions here.)

I've read a lot of books in the self-help/therapy/psychology cluster, but this is the first which gives a clear and plausible model of why the mental structure they're all working with (IFS exiles, EMDR unprocessed memories, trauma) has enough fitness-enhancing value to evolve despite the obvious costs.

It seems to me that there has been enough unanswered criticism of the implications of coherence theorems for making predictions about AGI that it would be quite misleading to include this post in the 2019 review. 

In an earlier review, johnswentworth argues:

I think instrumental convergence provides a strong argument that...we can use trade-offs with those resources in order to work out implied preferences over everything else, at least for the sorts of "agents" we actually care about (i.e. agents which have significant impact on the world).

I think this... (read more)

A Question post!

I think I want to write up a summary of the 2009 Nobel Prize book I own on commons governance. This post had me update to think it's more topically relevant than I realized.

The LW review could use more question posts, if the goal is to solidify something like a canon of articles to build on. A question invites responses. I am disappointed in the existing answers, which appear less thought through than the question. Good curation, good nomination.

I like that this responds to a conflict between two of Eliezer's posts that are far apart in time. That seems like a strong indicator that it's actually building on something.

Either "just say the truth", or "just say whatever you feel you're expected to say" are both likely better strategies.

I find this believable but not obvious. For example, if the pressure on you is you'll be executed for saying the truth, saying nothing is probably better that saying the truth. If the pressure on you is remembering being bullied on tumblr, and you're being asked if you... (read more)

(Epistemic status: I don’t have much background in this. Not particularly confident, and attempting to avoid making statements that don’t seem strongly supported.)

I found this post interesting and useful, because it brought a clear unexpected result to the fore, and proposed a potential model that seems not incongruent with reality. On a meta-level, I think supporting these types of posts is quite good, especially because this one has a clear distinction between the “hard thing to explain” and the “potential explanation,” which seems very important to allo... (read more)

I’m pretty impressed by this post overall, not necessarily because of the object-level arguments (though those are good as well), but because I think it’s emblematic of a very good epistemic habit that is unfortunately rare. The debate between Hanson and Zvi over this, like habryka noted, is a excellent example of how to do good object-level debate that reveals details of shared models over text. I suspect that this is the best post to canonize to reward that, but I’m not convinced of this. On the meta-level, the one major improvement/further work I’d lik... (read more)

I enjoyed this post. It brings a more world-wide view to LW (sorely missed in some things I've read here) and makes the important point that we don't all think the same. Experiences can be very different and so are our reactions and reasonings, coming with there own logic. We should not ignore the human element of how the world works.

I would suggest a bit of an edit to move the description of the game with punishment to after the non-punishment results just for ease of reading and absorption.

-- -

I also enjoyed reading the supporting material h... (read more)

I think this post is a short yet concrete description of what I and others find controlling and mind-warping about the present-day schooling system. It's one of the best short-posts that I can link to on this subject. +4

I disagree with the conclusion of this post, but still found it a valuable reference for a bunch of arguments I do think are important to model in the space.

I summarized my thoughts on this sequence in my other review on the next post in this sequence. Most of my thoughts there also apply to this post.

I tested it on a bunch of different things, important and unimportant. It did exactly what it said it did; substantially better predictions, immediate results, costs either little or no effort (aside from the initial cringe away from using this technique at all). It just works.

Looking at it like training/practicing a martial art is helpful as well.

Rich, mostly succinct pedagogy of timeless essentials, highly recommended reference post.

The prose could be a little tighter and less self-conscious in some places. (Things like "I won't go through all of that in this post. There are several online resources that do a good job of explaining it" break the flow of mathematical awe and don't need to be said.)

"Exfohazard" is a quicker way to say "information that should not be leaked". AI capabilities has progressed on seemingly-trivial breakthroughs, and now we have shorter timelines.

The more people who know and understand the "exfohazard" concept, the safer we are from AI risk.

More framings help the clarity of the discussion. If someone doesn't understand (or agree with) classic AI-takeover scenarios, this is one of the posts I'd use to explain them.

This is an important post which I think deserves inclusion in the best-of compilation because despite it's usefulness and widespread agreement about that fact, it seems notto have been highlighted well to the community. 

After I read this, I started avoiding reading about others' takes on alignment so I could develop my own opinions.

I think this post was a good exercise to clarify my internal model of how I expect the world to look like with strong AI. Obviously, most of the very specific predictions I make are too precise (which was clear at the time of writing) and won't play out exactly like that but the underlying trends still seem plausible to me. For example, I expect some major misuse of powerful AI systems, rampant automation of labor that will displace many people and rob them of a sense of meaning, AI taking over the digital world years before taking over the physical world ... (read more)

I still stand behind most of the disagreements that I presented in this post. There was one prediction that would make timelines longer because I thought compute hardware progress was slower than Moore's law. I now mostly think this argument is wrong because it relies on FP32 precision. However, lower precision formats and tensor cores are the norm in ML, and if you take them into account, compute hardware improvements are faster than Moore's law. We wrote a piece with Epoch on this: https://epochai.org/blog/trends-in-machine-learning-hardware

If anything, ... (read more)

A nice write-up of something that actually matters, specifically the fact that so many people rely on a general factor of doom to answer questions.

While I think there is reason to have correlation between these factors, and a very weak form of the General Factor of Doom is plausible, I overall like the point that people probably use a General Factor of Doom too much, and offers some explanations of why.

Overall, I'd give this a +1 or +4. A reasonably useful post that talks about a small thing pretty competently.

I read this post when it came out and felt like it was an important (perhaps even monumental) piece of intellectual progress on the topic of rationality. In hindsight, it didn't affect me much and I've only thought back to it a few times, mainly shortly after reading it.

Man, I haven't had time to thoroughly review this, but given that it's an in-depth review of another post up for review, it seems sad not to include it.

This piece took an important topic that I hadn't realized I was confused/muddled about, convinced me I was confused/muddled about it, while simultaneously providing a good framework for thinking about it. I feel like I have a clearer sense of how Worst Case Thinking applies in alignment.

I also appreciated a lot of the comments here that explore the topic in more detail.

I think this exchange between Paul Christiano (author) and Wei Dai (commenter) is pretty important food for thought, for anyone interested in achieving a good future in the long run, and for anyone interested in how morality and society evolve more generally.

Rereading this 2 years later, I'm still legit-unsure about how much it matters. I still think coordination capacity is one of the most important things for a society or for an organization. Coordination Capital is one of my few viable contenders for a resource that might solve x-risk

The questions here, IMO, are:

  • Is coordination capacity a major bottleneck?
  • Are novel coordination schemes an important way to reduce that bottleneck, or just a shiny distraction? (i.e. maybe there's just a bunch of obvious wisdom we should be following, and if we just did
... (read more)

This one was fun to play with and it was nice to feel like I was helping.
 

"Anyone who resists? Why, I'll simply mulch them," said Tyranicca. Many, many people resisted, and Tyrannica prepared her mulching machine.

Her workers did the rest. 0.15%

I think I've known about happy/cheerful prices for a long time, (from before this post) and yet I find myself using the concept only once or twice a year, and not in a particularly important way. 

This was despite it seeming like a very valuable concept.

I think this is likely because people's happy prices can be quite high (too high to be acceptable) and yet it's worth it to still trade at less than this.

What I do think is valuable and this posts teaches, even if it's unintentionally, is you don't have to magically tied to the "market price" or "fair price" – you can just negotiate for what you want.

This post has stayed with me as a canonical example of how to effect political change. It is shockingly different to many standard narratives about joining the borg and moving your way up it for scraps of power, it is detailed and it is a true account. I am very grateful to have read this post, and I give it +9.

This is an outstanding blog post that would make a weird book chapter. I remember those posts from 2015 and 2016, and this is the cold postmodernist water on those modernist flames. Though on the flip side, one of the best parts about all of the advice on how not to do spaced repetition in the classroom is that it's valuable advice on how to do spaced repetition in the classroom.

This review of A Pattern Language is dense and one can see the influence the book had on the poster - as it had on me. I think it is worth promoting the work on Christopher Alexander who had such a strong influence on many fields, not just architecture, but also software engineering.

(Self-review.) My comment that we could stop wearing masks was glib; I didn't foresee the Delta, Omicron, etc. waves. But I think the general point stands.

I think that the work we did on the question of finite or infinite value settles an important practical question about whether, in the real world, we need to think about infinite value. While there are remaining objections, I think it is clear that the possibility of infinite value is conditional on specific factual and improbable claims, and because of the conditionals involved, this has minimal to no impact on decision-making more generally, since most choices do not involve those infinities - and so finite decision theories should suffice, and attempts ... (read more)

I cite the ideas in this piece (especially "you're trying to coordinate unilaterally and that's gonna fail") a lot. I do think Raemon's thoughts have clarified since then and ideally he'd do a substantial edit, but I'm glad the thoughts got written down at all.

Good and important, but long. I'd like to see a short summary in the book.

My story of this post:

  • Someone provided a model of the world that relied on specific beliefs about how container shipping and ports worked.
  • The beliefs about the gears of container shipping were wrong, in ways that meant the example couldn't support (or weaken) the larger claim.
  • This post spelled out details on container shipping so we could understand why the claim was nonsensical.

 

It was also generally interesting and grounded in a way I'd like to see more of on LW.

Full disclosure: I offered the writer money for fleshing out his original comment, although at the moment I can't remember if he actually took me up on it. 

This post gives us a glimpse at the exploitable inefficiencies of prediction markets. Whether prediction markets become mainstream or not is yet to be seen, but even today there are ways for sharp people to make money arbitraging and reading the fine print of the market rules carefully.

A year after publishing this essay, I still think this is an important and useful idea, and I think back to it whenever I try to analyze or predict the behavior of leaders and the organizations they lead.

Unfortunately, I didn't end up writing any of the follow-up posts I said I wanted to write, like the one reviewing the evidence for the theory, which I think would have made this post a lot stronger. (if you want to help with these posts send me a message, though I might only have time to work on it in February)

I wrote to Bruce Bueno de Mesquita, one of th... (read more)

Edit to shorten (more focus on arguments, less rhetorics), and include the top comment by jbash as a response / second part. The topic is important, but the article seems to have a bottom line already written.

Important topic. Needs some editing. At the very least, do not name Geoff, and possibly no one specific (unless the book editors want to expose themselves to a possible lawsuit). Also, links to Twitter and Facebook posts will not work on paper.

Perhaps there is a solution for both: quote the relevant parts of the Twitter and Facebook posts in the article, with names removed.

A fascinating example how natural categories can defy our naive expectations.

Unless you are a biologist, would you ever consider a category that contains beans, peas, lentils, peanuts,... and a 30 meters tall tree? And yet from certain perspective these are like peas in a pod.

What else is like this?

One thing I've learned since then: I now think this is wrong:

To my (limited) understanding, this does not produce a significantly different immune response than injecting the antigen directly.

My understanding now (which is still quite limited) is that there is an improved immune response. If I have it right, the reason is that in a traditional vaccine, the antigen only exists in the bloodstream; with an mRNA vaccine, the antigen originates inside the cell—which more closely mimics how an actual virus works.

There is a tendency among nerds and technology enthusiasts to always adopt the latest technology, to assume newer is necessarily better. On the other hand, some people argue that as time passes, some aspects of technology are in fact getting worse. Jonathan Blow argues this point extensively on the topic of software development and programming (here is a 1-hour-talk on the topic). The Qt anecdote in this post is an excellent example of the thing he'd complain about.

Anyway, in this context, I read this post as recognizing that when you replace wired devices... (read more)

I still really like this post, and rereading it I'm surprised how well it captures points I'm still trying to push because I see a lot of people out there not quite getting them, especially by mixing up models and reality in creative ways. I had not yet written much about the problem of the criterion at this time, for example, yet it carries all the threads I continue to think are important today. Still recommend reading this post and endorse what it says.

It strikes me that this post looks like a (AFAICT?) a stepping stone towards the Eliciting Latent Knowledge research agenda, which currently has a lot of support/traction. Which makes this post fairly historically important.

I've highly voted this post for a few reasons. 

First, this post contains a bunch of other individual ideas I've found quite helpful for orienting. Some examples:

  • Useful thoughts on which term definitions have "staying power," and are worth coordinating around.
  • The zero/single/multi alignment framework.
  • The details on how to anticipate legitimize and fulfill governance demands.

But my primary reason was learning Critch's views on what research fields are promising, and how they fit into his worldview. I'm not sure if I agree with Critch, but I think "Figur... (read more)

The Four Children of the Seder as the Simulacra Levels is an interpretation of a classic Jewish reading through the lens of simulacra levels. It makes an awful lot of sense to me, helps me understand them better, and also engages the simulacra levels with the perspective of "how should a society deal with these sorts of people/strategies". I feel like I got some wisdom from that, but I'm not sure how to describe it. Anyway, I give this post a +4.

I think "Simulacra Levels and theri Interactions" is the best post on Simulacra levels, and this is the second p... (read more)

Radical Probabilism is an extensions of the Embedded Agency philosophical position. I remember reading is and feeling a strong sense that I really got to see a well pinned-down argument using that philosophy. Radical Probabilism might be a +9, will have to re-read, but for now I give it +4.

(This review is taken from my post Ben Pace's Controversial Picks for the 2020 Review.)
 

Covid-19: My Current Model was where I got most of my practical Covid updates. It so obvious now, but risk follows a power law (i.e. I should focus on reducing my riskiest 1 or 2 activities), surfaces are mostly harmless (this was when I stopped washing my packages), outdoor activity is relatively harmless (me and my housemates stopped avoiding people on the street around this time), and more. I give this +4.

(This review is taken from my post Ben Pace's Controversial Picks for the 2020 Review.)

I like this, in the sense that it's provoking fascinating thoughts and makes me want to talk with the author about it further. As a communication of a particular concept? I'm kinda having a hard time following what the intent is.

Initial reaction: I like this post a lot. It's short, to the point. It has examples relating its concept to several different areas of life: relationships, business, politics, fashion. It demonstrates a fucky dynamic that in hindsight obviously game-theoretically exists, and gives me an "oh shit" reaction.

Meditating a bit on an itch I had: what this post doesn't tell me is how common this dynamic, or how to detect when it's happening.

While writing this review: hm, is this dynamic meaningfully different from the idea of a costly signal?

Thinking about the ex... (read more)

Author here. One thing I think I've done wrong in the post is to equate black-box-search-in-large-parametrized-space with all of machine learning. I've now added this paragraph at the end of chapter 1:

Admittedly, the inner alignment model is not maximally general. In this post, we've looked at black box search, where we have a parametrized model and do SGD to update the parameters. This describes most of what Machine Learning is up to in 2020, but it does not describe what the field did pre-2000 and, in the event of a paradigm shift similar to the deep l

... (read more)

On the one hand this is an interesting and useful piece of data on AI scaling and the progress of algorithms. It's also important because it makes the point that the very notion of "progress of algorithms" implies hardware overhang as important as >10 years of Moore's law. I also enjoyed the follow-up work that this spawned in 2021.

Crucial. I definitely remember reading this and thinking it was one of the most valuable posts I'd seen all year. Good logical structure.

But it's hard to read? It has jarring, erratic rhetoric flow; succinct where elaboration is predictably needed, and verbose where it is redundant. A mathematician's scratch notes, I think.

I agree it would be good to add a note about push polling, but it's also good to note that the absence of information is itself a choice! The most spare possible survey is not necessarily the most informative. The question of what is a neutral framing is a tricky one, and a question about the future that deliberate does not draw attention to responsibilities is not necessarily less push-poll-y than one that does.

Agreeing that just the final paragraph would be a good idea to include; otherwise, I don't think this passes my bar for "worth including as best-of."

Given all of the discussion around simulacra, I would be disappointed if this post wasn't updated in light of this.

S-curves are a concept that I use frequently.

I would love to see a more concise version of this.

This is an excellent post - my only question is how accurately this translates the Buddhism which is not something I'm qualified to have a strong opinion on. Nonethless, it matches my limited understanding of meditation.

Of the two progress studies up for review, I think this is better than the invention of concrete one. Mostly because it dips more into how the development of fertilizer interacted with other domains (notably: war), as well as some politics/history.

This part was actually most interesting to me, which you may have missed if you started reading and then decided "meh, I don't care how artificial fertilizer was invented."

The Alchemy of Air is as much about the lives of Haber and Bosch, and what happened after their process became a reality, as it is about the s

... (read more)

While the sort of Zettelkasten-adjacent notes that I do in Roam have really helped how I do research, I'd say No to this article. The literal Zettelkasten method is adapted to a world without hypertext, which is why I describe [what everyone does in Roam] as Zettelkasten-adjacent instead of Zettelkasten proper.

This is not to knock this post, it's a good overview of the literal Zettelkasten method. But I don't think it should be included.

These are good lists of open problems, although as Ben notes are bad lists if they are to be considered all the open problems. I don't think that is the fault of the post, and it's easy enough to make clear the lists are not meant to be complete. 

This seems like a spot where a good list of open problems is a good idea, but here we're mostly going to be taking a few comments. I think that's still a reasonable use of space, but not exciting enough to think of this as important.

I'm all for such things existing and a book entirely composed of such things seems like it should exist, but I don't know what it would be doing in this particular book. 

The combination of the two previous reviews, by hamnox and fiddler, seem to summarize: It's a pure happy infodump that doesn't add much, that gets you a lot of upvotes, and that says more about the voting system than about what is valuable.

I don't think this post introduced its ideas to many people, including Raemon and Ben who nominated it. Nor does it seem like it provides a superior frame with which to examine those issues. Not recommended.

Zvi wrote a two whole posts on perfect/imperfect competition and how more competition can be bad. However, this is the only post that has really stuck with me in teaching me how increased competition can be worse overall for the system, and helped me appreciate Moloch in more detail. I expect to vote for this post around +4 or +5.

As with one or two others by Zvi, I think it's a touch longer than it needs to be, and can be made more concise.

This is a core piece of a mental toolkit, being able to quantify life choices like this, and the post explains it well. I think I would like the a version in the book to spend a bit more space helping the reader do the calculation that you do in the Clearer Thinking tool. A lot of the value of the post is in showing how to use the number to make decisions.

I think it's a valuable post, and I expect to vote for it somewhere in the range of +2 to +4.

I'm probably going to write a second review that is more accessible. But, first: I made a couple vague promises here:

  • I said I would try to think of examples of what it would look like, if "someone I trusted, who looked like they had a deep model, in fact was just highly motivated." (I said I'd think about it in advance so that if I learned new facts about someone I trusted, I wouldn't confabulate excuses for them)
  • I said I would think more about my factual cruxes for "propagating the level of fear/disgust/concern that Benquo/Jessica/Zack had, into my own ae
... (read more)

I continue to think this post is important, for basically the same reasons as I did when I curated it. I think for many conversations, having the affordance and vocabulary to talk about frames makes the difference between them going well and them going poorly.

So, I think this post is pretty bad as a 'comprehensive' list of the open problems, or as 'the rationality agenda'. All of the top answers (Wei, Scott, Brienne, Thrasymachus) add something valuable, but I'd be pretty unhappy if this was considered the canonical answer to "what is the research agenda of LW", or our best attempt at answering that question (I think we can do a lot better). I think it doesn't address many things I care about. Here's a few examples:

  • What are the best exercises for improving your rationality? Fermi estimates, Thinking Physics, Ca
... (read more)

On initially reading it, I found it quite interesting, but over time it's come to shape my thinking much more than I expected.

Robin has correctly pointed out that blackmail is just a special case of free trade between apparently consenting adults, which tends to be pretty good, and you need quite a strong argument for making the law interfere with that. He also points out that it creates good incentives not to do things that you wouldn't want people finding out about.

However Zvi's point is that this is an incredibly strong incentive for someone to ruin you... (read more)

Alas, I haven't made it through this post. I do not understand what I have made of it, and nor does anyone else I know (except maybe Jacob Falkovich). I do wish there had been real conversation around this post, and I think there's some probability (~30%) that I will look back and deeply regret not engaging with it much more, but in my current epistemic state I can only vote against its inclusion in the book. Somewhere around -1 to -4.

I've made these comments previously, but for purposes of having at least one official review:

  1. I think the names aren't optimal. Zombie days or Slug days or some-such seem like an improvement for "recovery days." I think it's also possible that Rest day might be more unambiguous if it were called a "Restorative day" or somethjing.
  2. I think the post doesn't quite call enough attention to "this is specifically about listening to your gut. Your gut is a specific part of your body. Listening to it is a skill you might not have. Listening to it is a useful source o
... (read more)

A good explanation of the difference between intellectual exploration and promoting people. You don't need to agree with everything someone says, and you don't even need to like them, but if they occasionally provide good insight, they are worth taking into account. If you propagate this strategy, you may even get to a "wisdom of the crowds" scenario - you'll have many voices to integrate in your own thinking, potentially getting you farther along than if you just had one thought leader you liked.

Having many smart people you don't necessarily agree with, l... (read more)

I don't have much to say in a review I didn't already say in my nomination. But, a key point of this post is "the math checks out in a way that thoroughly dissolves a confusion" and I'd kinda like it if someone else did a more thorough review that the math actually checks out.

Update: Made these changes

I originally wrote this post because I saw quite a few of what I perceived mistakes in the reasoning of rationalists around predicting trends and innovation.

  • People confusing s-curves with exponentinal growth.
  • People confusing evolution and diffusion curves, and assuming they were the same thing.
  • People making basic mistakes about how technologies would likely evolve, because they didn't understand historical evolutionary patterns.

At the time, I thought that simply making a post explaining the models they were missing would create a ... (read more)

Okay, whenever I read this post, I don't get it.

There's some fermi-estimation happening, but the fermi is obviously wrong. As Benquo points out, certain religions have EVERYONE read their book, memorize it, chant it, discuss it every Sunday (or Saturday).

I feel like the post is saying "there are lots of bandwidth problems. the solution to all of them is '5'." and I don't get why 5.

So I read Ray's comment on Daniel Filan's review, where he says:

...at some maximum scale, your coordination-complexity is bottlenecked on a single working-memory-cluster, which (

... (read more)

This reminds me of That Alien Message, but as a parable about mesa-alignment rather than outer alignment. It reads well, and helps make the concepts more salient. Recommended.

I made some prediction questions for this, and as of January 9th, there interestingly seems to be some disagreement with the author on these. 

Would definitely be curious for some discussion between Matthew and some of the people with low-ish predictions. Or perhaps for Matthew to clarify the argument made on these points, and see if that changes people's minds.

I took some liberties in operationalising what seemed to me a core thesis underlying the post. Let me know if you think it doesn't really capture the important stuff!

(You can find a list of all review poll questions here.)

Broken image link! Broken image link! Sad.

Using "evolution" when referring to things other than breeding populations, BOO!

Important concept. Very light on backing evidence and references. I wanna hear more about Systems and Networks angles. Could do with fewer examples of Innovation.

I tend to try to do things that I think are in my comparative advantage. This post hammered home the point that comparative advantage exists along multiple dimensions. For example, as a pseudo-student, I have almost no accumulated career capital, so I risk less by doing projects that might not pan out (under the assumption that career capital gets less useful over time). This fact can be combined other properties I have to more precisely determine comparative advantage.

This post also gives the useful intuition that being good at multiple things exponentially cuts down the number of people you're competing with. I use this heuristic a reasonable amount when trying to decide the best projects to be working on.

Include bendini's post with it.

But it shows all the free energy in the world. Good nod to Inadequate Equilibriua.

[Rambly notes while voting.] This post has some merit, but it feels too...jumpy, and, as the initial comments point out, it's unclear in what's being considered "explicit" vs "implicit" communication. Only getting to the comments did I realize that the author's sense of those words was not quite my own.

I'm also not sure it's either 1) telling the whole picture, vs 2) correct. A couple of examples are brought, but examples are easy to cherry-pick. The fact that the case brought with Bruce Lee seemed to be in favor of a non-compassionate feels maybe, maybe l

... (read more)

Interesting list, but seems to have a triumphalist bias. I doubt that "50K years ago, nobody could imagine changing the world" is true, and I suspect that "hunter-gathering cultures have actually found locally-optimal ways of life, and were generally happier and healthier than most premodern post-agricultural people" was a much bigger factor than most of these.

I would weakly support this post's inclusion in the Best-of-2018 Review. It's a solid exposition of an important topic, though not a topic that is core to this community.

I would not include this in the Best-of-2018 Review.

While it's good and well-researched, it's more or less a footnote to the Slate Star Codex post linked above. (I think there's an argument for back-porting old SSC posts to LW with Scott's consent, and if that were done I'd have nominated several of those.)

That's certainly an interesting position in discussion about what people want!

Namely, that actions and preferences are just conditionally-activated and those context activations are balanced against each other. That means that person's preference system may be not only incomplete but incoherent in architecture, and moral systems and goals obtained via reflection are almost certainly not total (will lack in some contexts), creating problem in RLHF.

The first assumption, that part of neurons is basically randomly initialized, can't be tested really well becau... (read more)

The post explains difference between two similar-looking positions:

  1. Model gets reward, and attempts to maximize it by selecting appropriate actions.
  2. Model does some actions on the environment, and selection/mutation process calculates the reward to find out how to maximize it by modifying the model.

Those positions differ in level at which optimization pressure is applied. Actually, either of them can be implemented (so, the "reward" value can be given to the model), but common RL uses the second option.

I don't have much to say, I just use the word "exfohazard" a lot. Would like a term for things that are not infohazardous in the way the basilisk is alleged, but close enough to the latter to cause distress to unprepared readers.

An early paper that Anthropic then built on to produce their recent exciting results. I found the author's insight and detailed parameter tuning advice helpful.

The author seems aware of all the issues with why RL could be dangerous, but wants to use it anyway, rather than looking for alternatives: frankly it feels confused to me. I found it unhelpful.

This short post is astounding because it succinctly describes, and prescribes, how to pay attention, to become grounded when a smart and sensitive human could end up engulfed in doom. The post is insightful and helpful to any of us in search of clarity and coping.

The references to research for the clarification and countering of assertions, made in a previous piece on sleep, allows for useful knowledge sharing. And the examples of the effects of sleep deprivation are mostly hilarious!

Simply by the author stating and exploring examples of 'greyed out options' one is reminded of possible choices, some of which may benefit the reader. Feeling stuck, fettered, having little control, direction, without meaning or purpose, struggling, or subject to ennui, or varying degrees of stress or anxiety, might be helped by considering physical/psychological, actionable changes in behaviour. Trying new stuff/things/ways of being, may be at the edges of one's thought or comfort zone; the reader is gently reminded to look. This writing gives me good pause; it encourages the act of reflecting on personal possibilities, and subsequent impetus to pursue something novel, (thereby challenging) which may be life enhancing.

So, most people see sleep as something that's obviously beneficial, but this post was great at sparking conversation about this topic, and questioning that assumption about whether sleep is good. It's well-researched and addresses many of the pro-sleep studies and points regarding the issue. 

I'll like to see people do more studies on the effects of low sleep on other diseases or activities. There's many good objections in the comments, such as increased risk of Alzheimer's, driving while sleepy and how the analogy of sleep deprivation to fasting may b... (read more)

[This comment is no longer endorsed by its author]Reply

I just gave this a re-read, I forgot what a trip it is to read the thoughts of Eliezer Yudkowsky. It continues to be some of my favorite stuff in recent years written on LessWrong.

It's hard to relate to the world with a level of mastery over basic ideas as Eliezer has. I don't mean with this to vouch that his perspective is certainly correct, but I believe it is at least possible, and so I think he aspires to a knowledge of reality that I rarely if ever aspire to. Reading it inspires me to really think about how the world works, and really figure out what I know and what I don't. +9

(And the smart people dialoguing with him here are good sports for keeping up their side of the argument.)

This post got me to do something like exposure therapy to myself in 10+ situations, which felt like the "obvious" thing to do in those situations. This is a huge amount of life-change-per-post

This concept helped me clarify my thinking around questions like "how much should I charge for doing X" and "how much should I pay to resolve inconvenience Y", in ways that have resulted in substantially changed behavior today.

I found this post valuable for putting a meme "into the water" which, on the margin, I expect made people's risk assessments more accurate.

While there are probably first-mover disadvantages for highly legible institutions attempting to follow this advice, it seems much more possible for individuals, which makes this an example of an inadequate equilibrium.

The problem under consideration is very important for some possible futures of humanity.

However, author's eudamonic wishlist is self-admittedly geared for fiction production, and don't seem to be very enforceable. 

Simulated intelligence is real intelligence. Although probably nobody is going to simulate the person/figure properly, the typical LW reader will actually occasionally outperform the actual person/figure.

Not sure to what extent shoulder advisors outperform deliberate thinking, especially compared to other CFAR handbook techniques.

I spent a few months in late 2021/early 2022 learning about various alignment research directions and trying to evaluate them. Quintin's thoughtful comparison between interpretability and 1960s neuroscience in this post convinced me of the strong potential for interpretability research more than I think anything else I encountered at that time.

Update: Since I think this series of posts should be considered together, I've just written a new review of the entire "The Most Important Century" series. So I'm effectively retracting the review I wrote of the All Possible Views About Humanity's Future Are Wild post below, which is now redundant, and in which I think I also confused some of the content between this post and This Can't Go On (another post in the same series).

I found this post early in my reading on AI risk and EA topics. It persuaded me about how the world as we know know it is completely... (read more)

Projectlawful or the CFAR handbook might be better options than hpmor

I really enjoyed this. Taking the time to lay this out feels more useful than just reading about it in a textbook lecture. The same way doing a math or code problem makes it stick in my head more. One of the biggest takeaways for me was realizing that it was possible to break economic principles down this far in a concrete way that felt graspable. I think this is a good demonstration of that kind of work. 

Cleary articulating the extra costs involved is valuable. I have seen the time tradeoff before, but I didn't think through the other costs that I as a human also go through. 

More than a year since writing this post, I would still say it represents the key ideas in the sequence on mesa-optimisation which remain central in today's conversations on mesa-optimisation. I still largely stand by what I wrote, and recommend this post as a complement to that sequence for two reasons:

First, skipping some detail allows it to focus on the important points, making it better-suited than the full sequence for obtaining an overview of the area. 

Second, unlike the sequence, it deemphasises the mechanism of optimisation, and explicitly cas... (read more)

I think this post significantly benefits in popularity, and lacks in rigor and epistemic value, from being written in English. The assumptions that the post makes in some part of the post contradict the judgements reached in others, and the entire post, in my eyes, does not support its conclusion. I have two main issues with the post, neither of which involve the title or the concept, which I find excellent:

First, the concrete examples presented in the article point towards a different definition of optimal takeover than is eventually reached. All of the p... (read more)

very clear and simple. tempting to dismiss this as not significant/novel, but there is a place for presenting basic things well.

And it's positively framed. We could all use a little hope right now.

The noise in my model of what AI safety research is supposed to do and I had learned to ignore it. It surprises me how big a difference it makes, how comparatively calm and settled I feel to have the typical success narratives in front of me, disambiguated from each other. There's much more confusion to tackle, but it seems more manageable.

The next time I stumble... (read more)

I would like to see a post on this concept included in the best of 2018, but I also agree that there are reputational risks given the author. I'd like to suggest possible compromise - perhaps we could include the concept, but write our own explanation of the concept instead of including this article?

this post made me understand something i did not understand before that seems very important. important enough that it made me reconsider a bunch of related beliefs about ai.

My actual honest reaction to this sort of thing: Please, please stop. This kind of thinking actively drives me and many others I know away from LW/EA/Rationality. I see it strongly as asking the wrong questions with the wrong moral frameworks, and using it to justify abominable conclusions and priorities, and ultimately the betrayal of humanity itself - even if people who talk like this don't write the last line of their arguments, it's not like the rest of us don't notice it. I don't have any idea what to say to someone who writes &apo... (read more)

This review is not very charitable, because I think the meaning of the post is different than how they present it.

The things it describes at the beginning are clearly true, with plea-bargaining being institutionalized lying, but this is, in the end, a poorly written plea to not say so. It's not that I don't see a point. It would be nice to be able to simply describe things as they are without it being a fight, but that isn't how things would work out. Words like 'lie' and 'fraud' have their extremely negative connotations because they are extremely negativ... (read more)