Reviews (All Years)

Sorted by Top

Things To Take Away From The Essay

First and foremost: Yudkowsky makes absolutely no mention whatsoever of the VNM utility theorem. This is neither an oversight nor a simplification. The VNM utility theorem is not the primary coherence theorem. It's debatable whether it should be considered a coherence theorem at all.

Far and away the most common mistake when arguing about coherence (at least among a technically-educated audience) is for people who've only heard of VNM to think they know what the debate is about. Looking at the top-voted comments on this ess... (read more)

I didn't like this post. At the time, I didn't engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn't actually engage with the idea very much. So it seems like a good idea to say something now.

The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don't think it's my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it's a straw-man of the view it&... (read more)

In my opinion, the biggest shift in the study of rationality since the Sequences were published were a change in focus from "bad math" biases (anchoring, availability, base rate neglect etc.) to socially-driven biases. And with good reason: while a crash course in Bayes' Law can alleviate many of the issues with intuitive math, group politics are a deep and inextricable part of everything our brains do.

There has been a lot of great writing describing the issue like Scott’s essays on ingroups and outgroups and Robin Hanson’s the... (read more)

I think this post might be the best one of all the MIRI dialogues. I also feel confused about how to relate to the MIRI dialogues overall.

A lot of the MIRI dialogues consist of Eliezer and Nate saying things that seem really important and obvious to me, and a lot of my love for them comes from a feeling of "this actually makes a bunch of the important arguments for why the problem is hard". But the nature of the argument is kind of closed off. 

Like, I agree with these arguments, but like, if you believe these arguments, having traction on AI Alignment... (read more)

This review is mostly going to talk about what I think the post does wrong and how to fix it, because the post itself does a good job explaining what it does right. But before we get to that, it's worth saying up-front what the post does well: the post proposes a basically-correct notion of "power" for purposes of instrumental convergence, and then uses it to prove that instrumental convergence is in fact highly probable under a wide range of conditions. On that basis alone, it is an excellent post.

I see two (related) central problems, from which various o... (read more)

This is an unusually difficult post to review. In an ideal world, we'd like to be able to review things as they are, without reference to who the author is. In many settings, reviews are done anonymously (with the author's name stricken off), for just this reason. This post puts that to the test: the author is a pariah. And ordinarily I would say, that's irrelevant, we can just read the post and evaluate it on its own merits.

Other comments have mentioned that there could be PR concerns, ie, that making the author's existence and participation on LessWrong

... (read more)

In this essay, ricraz argues that we shouldn't expect a clean mathematical theory of rationality and intelligence to exist. I have debated em about this, and I continue to endorse more or less everything I said in that debate. Here I want to restate some of my (critical) position by building it from the ground up, instead of responding to ricraz point by point.

When should we expect a domain to be "clean" or "messy"? Let's look at everything we know about science. The "cleanest" domains are mathematics and fundamental physics. There, we have crisply defined

... (read more)

I think that strictly speaking this post (or at least the main thrust) is true, and proven in the first section. The title is arguably less true: I think of 'coherence arguments' as including things like 'it's not possible for you to agree to give me a limitless number of dollars in return for nothing', which does imply some degree of 'goal-direction'.

I think the post is important, because it constrains the types of valid arguments that can be given for 'freaking out about goal-directedness', for lack of a better term. In my mind, it provokes various follo

... (read more)

1. Manioc poisoning in Africa vs. indigenous Amazonian cultures: a biological explanation?

Note that while Josef Henrich, the author of TSOOS, correctly points out that cassava poisoning remains a serious public health concern in Africa, he doesn't supply any evidence that it wasn't also a public health issue in Amazonia. One author notes that "none of the disorders which have been associated with high cassava diets in Africa have been found in Tukanoans or other indigenous groups on cassava-based diets in Amazonia."

Is this because Tukanoans have superior p... (read more)

What does this post add to the conversation?

Two pictures of elephant seals.

How did this post affect you, your thinking, and your actions?

I am, if not deeply, but certainly affected by this post. I felt some kind of joy looking at these animals. It calmed my anger and made my thoughts somewhat happier. I started to believe the world can become a better place, and I would like to make it happen. This post made me a better person.

Does it make accurate claims? Does it carve reality at the joints? How do you know?

The title says elephant seals 2 and c... (read more)

[this is a review by the author]

I think what this post was doing was pretty important (colliding two quite different perspectives). In general there is a thing where there is a "clueless / naive" perspective and a "loser / sociopath / zero-sum / predatory" perspective that usually hides itself from the clueless perspective (with some assistance from the clueless perspective; consider the "see no evil, hear no evil, speak no evil" mindset, a strategy for staying naive). And there are lots of difficulties in trying to establish communication. And the dial

... (read more)

Tldr; I don’t think that this post stands up to close scrutiny although there may be unknown knowns anyway. This is partly due to a couple of things in the original paper which I think are a bit misleading for the purposes of analysing the markets.

The unknown knowns claim is based on 3 patterns in the data:

“The mean prediction market belief of replication is 63.4%, the survey mean was 60.6% and the final result was 61.9%. That’s impressive all around.”

“Every study that would replicate traded at a higher probability of suc... (read more)

This post is a review of Paul Christiano's argument that the Solomonoff prior is malign, along with a discussion of several counterarguments and countercounterarguments. As such, I think it is a valuable resource for researchers who want to learn about the problem. I will not attempt to distill the contents: the post is already a distillation, and does a a fairly good job of it.

Instead, I will focus on what I believe is the post's main weakness/oversight. Specifically, the author seems to think the Solomonoff prior is, in some way, a distorted model of rea... (read more)

I wrote up a longer, conceptual review. But I also did a brief data collection, which I'll post here as others might like to build on or go through a similar exercise. 

In 2019 YC released a list of their top 100 portfolio companies ranked by valuation and exit size, where applicable.

So I went through the top 50 companies on this list, and gave each company a ranking ranging from -2 for "Very approval-extracting" to 2 for "Very production-oriented".  

To decide on that number, I asked myself questions like "Would growth of this company seem cancero... (read more)

Looking back, I have quite different thoughts on this essay (and the comments) than I did when it was published. Or at least much more legible explanations; the seeds of these thoughts have been around for a while.

On The Essay

The basketballism analogy remains excellent. Yet searching the comments, I'm surprised that nobody ever mentioned the Fosbury Flop or the Three-Year Swim Club. In sports, from time to time somebody comes along with some crazy new technique and shatters all the records.

Comparing rationality practice to sports practice, rationality has ... (read more)

Here are my thoughts.

  1. Being honest is hard, and there are many difficult and surprising edge-cases, including things like context failures, negotiating with powerful institutions, politicised narratives, and compute limitations.
  2. On top of the rule of trying very hard to be honest, Eliezer's post offers an additional general rule for navigating the edge cases. The rule is that when you’re having a general conversation all about the sorts of situations you would and wouldn’t lie, you must be absolutely honest. You can explicitly not answer certain questions if
... (read more)

I don't know if I'll ever get to a full editing of this. I'll jot notes here of how I would edit it as I reread this.

  • I'd ax the whole opening section.
    • That was me trying to (a) brute force motivation for the reader and (b) navigate some social tension I was feeling around what it means to be able to make a claim here. In particular I was annoyed with Oli and wanted to sidestep discussion of the lemons problem. My focus was actually on making something in culture salient by offering a fake framework. The thing speaks for itself once you l
... (read more)

This post provides a valuable reframing of a common question in futurology: "here's an effect I'm interested in -- what sorts of things could cause it?"

That style of reasoning ends by postulating causes.  But causes have a life of their own: they don't just cause the one effect you're interested in, through the one causal pathway you were thinking about.  They do all kinds of things.

In the case of AI and compute, it's common to ask

  • Here's a hypothetical AI technology.  How much compute would it require?

But once we have an answer to this quest... (read more)

In this essay Paul Christiano proposes a definition of "AI alignment" which is more narrow than other definitions that are often employed. Specifically, Paul suggests defining alignment in terms of the motivation of the agent (which should be, helping the user), rather than what the agent actually does. That is, as long as the agent "means well", it is aligned, even if errors in its assumptions about the user's preferences or about the world at large lead it to actions that are bad for the user.

Rohin Shah's comment on the essay (which I believe is endorsed

... (read more)

I do not like this post. I think it gets most of its rhetorical oomph from speaking in a very moralizing tone, with effectively no data, and presenting everything in the worst light possible; I also think many of its claims are flat-out false. Let's go through each point in order.

1. You can excuse anything by appealing to The Incentives

No, seriously—anything. Once you start crying that The System is Broken in order to excuse your actions (or inactions), you can absolve yourself of responsibility for all kinds of behaviors that, on paper, should raise red f

... (read more)

In this essay, Rohin sets out to debunk what ey perceive as a prevalent but erroneous idea in the AI alignment community, namely: "VNM and similar theorems imply goal-directed behavior". This is placed in the context of Rohin's thesis that solving AI alignment is best achieved by designing AI which is not goal-directed. The main argument is: "coherence arguments" imply expected utility maximization, but expected utility maximization does not imply goal-directed behavior. Instead, it is a vacuous constraint, since any agent policy can be regarded as maximiz

... (read more)

As far as I can tell, this post successfully communicates a cluster of claims relating to "Looking, insight meditation, and enlightenment". It's written in a quite readable style that uses a minimum of metaphorical language or Buddhist jargon. That being said, likely due to its focus as exposition and not persuasion, it contains and relies on several claims that are not supported in the text, such as:

  • Many forms of meditation successfully train cognitive defusion.
  • Meditation trains the ability to have true insights into the mental causes of mental process
... (read more)

(I reviewed this in a top-level post: Review of 'But exactly how complex and fragile?'.)

I've thought about (concepts related to) the fragility of value quite a bit over the last year, and so I returned to Katja Grace's But exactly how complex and fragile? with renewed appreciation (I'd previously commented only a very brief microcosm of this review). I'm glad that Katja wrote this post and I'm glad that everyone commented. I often see private Google docs full of nuanced discussion which will never see the light of day, and that makes me sad, and I'm happy ... (read more)

I strongly oppose collation of this post, despite thinking that it is an extremely well-written summary of an interesting argument on an interesting topic. The reason that I do so is because I believe it represents a substantial epistemic hazard because of the way it was written, and the source material it comes from. I think this is particularly harmful because both justifications for nominations amount to "this post was key in allowing percolation of a new thesis unaligned with the goals of the community into community knowledge," which is a justificatio... (read more)

I have several problems with including this in the 2018 review. The first is that it's community-navel-gaze-y - if it's not the kind of thing we allow on the frontpage because of concerns about newcomers seeing a bunch of in-group discussion, then it seems like we definitely wouldn't want it to be in a semi-public-facing book, either. 

The second is that I've found that most discussion of the concept of 'status' in rationalist circles to be pretty uniformly unproductive, and maybe even counterproductive. People generally only discuss 'status' when they

... (read more)

Connection to Alignment

One of the main arguments in AI risk goes something like:

  • AI is likely to be a utility maximizer (or goal-directed in some other sense)
  • Goodhart, instrumental convergence, etc make powerful goal-directed agents dangerous by default

One common answer to this is "ok, how about we make AI which isn't goal-directed"?

Unconscious Economics says: selection effects will often create the same effect as goal-directedness, even if we're trying to build a non-goal-directed AI.

Discussions around CAIS are one obvious application. Paul's "you get what... (read more)

A year later, I continue to agree with this post; I still think its primary argument is sound and important. I'm somewhat sad that I still think it is important; I thought this was an obvious-once-pointed-out point, but I do not think the community actually believes it yet.

I particularly agree with this sentence of Daniel's review:

I think the post is important, because it constrains the types of valid arguments that can be given for 'freaking out about goal-directedness', for lack of a better term."

"Constraining the types of valid arguments" is exactly the... (read more)

I read this post for the first time in 2022, and I came back to it at least twice. 

What I found helpful

  • The proposed solution: I actually do come back to the “honor” frame sometimes. I have little Rob Bensinger and Anna Salamon shoulder models that remind me to act with integrity and honor. And these shoulder models are especially helpful when I’m noticing (unhelpful) concerns about social status.
  • A crisp and community-endorsed statement of the problem: It was nice to be like “oh yeah, this thing I’m experiencing is that thing that Anna Salamon calls PR
... (read more)

In “Why Read The Classics?”, Italo Calvino proposes many different definitions of a classic work of literature, including this one:

A classic is a book which has never exhausted all it has to say to its readers.

For me, this captures what makes this sequence and corresponding paper a classic in the AI Alignment literature: it keeps on giving, readthrough after readthrough. That doesn’t mean I agree with everything in it, or that I don’t think it could have been improved in terms of structure. But when pushed to reread it, I found again and again that I had m... (read more)

There are two separate lenses through which I view the idea of competitive markets as backpropagation.

First, it's an example of the real meat of economics. Many people - including economists - think of economics as studying human markets and exchange. But the theory of economics is, to a large extent, general theory of distributed optimization. When we understand on a gut level that "price = derivative", and markets are just implementing backprop, it makes a lot more sense that things like markets would show up in other fields - e.g. AI or b... (read more)

Zack's series of posts in late 2020/early 2021 were really important to me. They were a sort of return to form for LessWrong, focusing on the valuable parts.

What are the parts of The Sequences which are still valuable? Mainly, the parts that build on top of Korzybski's General Semantics and focus hard core on map-territory distinctions. This part is timeless and a large part of the value that you could get by (re)reading The Sequences today. Yudkowsky's credulity about results from the social sciences and his mind projection fallacying his own mental quirk... (read more)

I think this post, as promised in the epistemic status, errs on the side of simplistic poetry. I see its core contribution as saying that the more people you want to communicate to, the less you can communicate to them, because the marginal people aren't willing to put in work to understand you, and because it's harder to talk to marginal people who are far away and can't ask clarifying questions or see your facial expressions or hear your tone of voice. The numbers attached (e.g. 'five' and 'thousands of people') seem to not be super precise.

That being sa... (read more)

I've been pleasantly surprised by how much this resource has caught on in terms of people using it and referring to it (definitely more than I expected when I made it). There were 30 examples on the list when was posted in April 2018, and 20 new examples have been contributed through the form since then. I think the list has several properties that contributed to wide adoption: it's fun, standardized, up-to-date, comprehensive, and collaborative.

Some of the appeal is that it's fun to read about AI cheating at tasks in unexpected ways (I&apo... (read more)

I <3 Specificity

For years, I've been aware of myself "activating my specificity powers" multiple times per day, but it's kind of a lonely power to have. "I'm going to swivel my brain around and ride it in the general→specific direction. Care to join me?" is not something you can say in most group settings. It's hard to explain to people that I'm not just asking them to be specific right now, in this one context. I wish I could make them see that specificity is just this massively under-appreciated cross-domain power. That's why I wanted this sequence to... (read more)

It was interesting to re-read this article 2 years later.  It reminds me that I am generally working with a unique subset of the population, which is not fully representative of human psychology.  That being said, I believe this article is misleading in important ways, which should be clarified.  The article focused too much on class, and it is hard to see it as anything but classist. While I wrote an addendum at the end, this really should have been incorporated into the entire article and not tacked on, as the conclusions one would re... (read more)

ETA 1/12: This review is critical and at times harsh, not because I want to harshly criticize the post or the author, but because I did not consider harshness of criticism when writing. I still think the post is positive-net-value, and might even vote it up in the review. I especially want to emphasize that I do not think it is in any way useful to blame or punish the author for the things I complain about below; this is intended as a "pointing out a problematic habit which a lot of people have and society often encourages" criticism, not a "bad thing must... (read more)

Selection vs Control is a distinction I always point to when discussing optimization. Yet this is not the two takes on optimization I generally use. My favored ones are internal optimization (which is basically search/selection), and external optimization (optimizing systems from Alex Flint’s The ground of optimization). So I do without control, or at least without Abram’s exact definition of control.

Why? Simply because the internal structure vs behavior distinction mentioned in this post seems more important than the actual definitions (which seem constra... (read more)

A brief authorial take - I think this post has aged well, although as with Caring Less (https://www.lesswrong.com/posts/dPLSxceMtnQN2mCxL/caring-less), this was an abstract piece and I didn't make any particular claims here.

I'm so glad that A) this was popular B) I wasn't making up a new word for a concept that most people already know by a different name, which I think will send you to at least the first layer of Discourse Hell on its own.

I've met at least one person in the community who said they knew and thought about this post a lot, well before they'd

... (read more)

In this post, the author proposes a semiformal definition of the concept of "optimization". This is potentially valuable since "optimization" is a word often used in discussions about AI risk, and much confusion can follow from sloppy use of the term or from different people understanding it differently. While the definition given here is a useful perspective, I have some reservations about the claims made about its relevance and applications.

The key paragraph, which summarizes the definition itself, is the following:

An optimizing system is a system that

... (read more)

This post is the best overview of the field so far that I know of. I appreciate how it frames things in terms of outer/inner alignment and training/performance competitiveness--it's very useful to have a framework with which to evaluate proposals and this is a pretty good framework I think.

Since it was written, this post has been my go-to reference both for getting other people up to speed on what the current AI alignment strategies look like (even though this post isn't exhaustive). Also, I've referred back to it myself several times. I learned a lot from... (read more)

I think this post should be included in the best posts of 2018 collection. It does an excellent job of balancing several desirable qualities: it is very well written, being both clear and entertaining; it is informative and thorough; it is in the style of argument which is preferred on LessWrong, by which I mean makes use of both theory and intuition in the explanation.

This post adds to the greater conversation by displaying rationality of the kind we are pursuing directed at a big societal problem. A specific example of what I mean that distinguishes this... (read more)

I thought I'd add a few quick notes as the author.

As I reread this, a few things jump out for me:

  • I enjoy its writing style. Its clarity is probably part of why it was nominated.
  • I'd now say this post is making a couple of distinct claims:
    • External forces can shape what we want to do. (I.e., there are lotuses.)
    • It's possible to notice this in real time. (I.e., you can notice the taste of lotuses.)
    • It's good to do so. Otherwise we find our wanting aligned with others' goals regardless of how they relate to our own.
    • If you notice this, you
... (read more)

I really like this post. I think it points out an important problem with intuitive credit-assignment algorithms which people often use. The incentive toward inaction is a real problem which is often encountered in practice. While I was somewhat aware of the problem before, this post explains it well.

I also think this post is wrong, in a significant way: asymmetric justice is not always a problem and is sometimes exactly what you want. in particular, it's how you want a justice system (in the sense of police, judges, etc) to work.

The book Law's Order explai... (read more)

“Phase change in 1960’s” - first claim is california’s prison pop went from 5k to 25k. According to wikipedia this does seem to happen… but then it’s immediately followed by a drop in prison population between 1970 and 1980. It also looks like the growth is pretty stable starting in the 1940s.

According to this prison pop in California was a bit higher than 5k historically, 6k-8k, and started growing in 1945 by about 1k/year fairly consistently until 1963. It was then fairly steady, even dropping a bit, until 1982 when it REALLY exploded, more than doubling... (read more)

I just re-read this sequence. Babble has definitely made its way into my core vocabulary. I think of "improving both the Babble and Prune of LessWrong" as being central to my current goals, and I think this post was counterfactually relevant for that. Originally I had planned to vote weakly in favor of this post, but am currently positioning it more at the upper-mid-range of my votes.

I think it's somewhat unfortunate that the Review focused only on posts, as opposed to sequences as a whole. I just re-read this sequence, and I think the posts More Babble, P

... (read more)

[Disclaimer: I'm reading this post for the first time now, as of 1/11/2020. I also already have a broad understanding of the importance of AI safety. While I am skeptical about MIRI's approach to things, I am also a fan of MIRI. Where this puts me relative to the target demographic of this post, I cannot say.]

Overall Summary

I think this post is pretty good. It's a solid and well-written introduction to some of the intuitions behind AI alignment and the fundamental research that MIRI does. At the same time, the use of analogy made the post m... (read more)

This is my post.

How my thinking has changed

I've spent much of the last year thinking about the pedagogical mistakes I made here, and am writing the Reframing Impact sequence to fix them. While this post recorded my 2018-thinking on impact measurement, I don't think it communicated the key insights well. Of course, I'm glad it seems to have nonetheless proven useful and exciting to some people!

If I were to update this post, it would probably turn into a rehash of Reframing Impact. Instead, I'll just briefly state the argument as I would present it today.

... (read more)
  • Oh man, what an interesting time to be writing this review!
  • I've now written second drafts of an entire sequence that more or less begins with an abridged (or re-written?) version of "Catching the Spark". The provisional title of the sequence is "Nuts and Bolts Of Naturalism".  (I'm still at least a month and probably more from beginning to publish the sequence, though.) This is the post in the sequence that's given me the most trouble; I've spent a lot of the past week trying to figure out where I stand with it.
  • I think if I just had to answer "yes" or
... (read more)

The work linked in this post was IMO the most important work done on understanding neural networks at the time it came out, and it has also significantly changed the way I think about optimization more generally.

That said, there's a lot of "noise" in the linked papers; it takes some digging to see the key ideas and the data backing them up, and there's a lot of space spent on things which IMO just aren't that interesting at all. So, I'll summarize the things which I consider central.

When optimizing an overparameterized system, there are many many different... (read more)

This post is making a valid point (the time to intervene to prevent an outcome that would otherwise occur, is going to be before the outcome actually occurs), but I'm annoyed with the mind projection fallacy by which this post seems to treat "point of no return" as a feature of the territory, rather than your planning algorithm's map.

(And, incidentally, I wish this dumb robot cult still had a culture that cared about appreciating cognitive algorithms as the common interest of many causes, such that people would find it more natural to write a post about "p... (read more)

The referenced study on group selection on insects is "Group selection among laboratory populations of Tribolium," from 1976. Studies on Slack claims that "They hoped the insects would evolve to naturally limit their family size in order to keep their subpopulation alive. Instead, the insects became cannibals: they ate other insects’ children so they could have more of their own without the total population going up." 

This makes it sound like cannibalism was the only population-limiting behavior the beetles evolved. According to the original study, ho... (read more)

What's the type signature of goals?

The type signature of goals is the overarching topic to which this post contributes. It can manifest in a lot of different ways in specific applications:

  • What's the type signature of human values?
  • What structure types should systems biologists or microscope AI researchers look for in supposedly-goal-oriented biological or ML systems?
  • Will AI be "goal-oriented", and what would be the type signature of its "goal"?

If we want to "align AI with human values", build ML interpretability tools, etc, then that's going to be pretty to... (read more)

This post states the problem of gradient hacking. It is valuable in that this problem is far from obvious, and if plausible, very dangerous. On the other hand, the presentation doesn’t go into enough details, and so leaves gradient hacking open to attacks and confusion. Thus instead of just reviewing this post, I would like to clarify certain points, while interweaving my critics about the way gradient hacking was initially stated, and explaining why I consider this problem so important.

(Caveat: I’m not pretending that any of my objections are unknown to E... (read more)

The only way to get information from a query is to be willing to (actually) accept different answers. Otherwise, conservation of expected evidence kicks in. This is the best encapsulation of this point, by far, that I know about, in terms of helping me/others quickly/deeply grok it. Seems essential.

Reading this again, the thing I notice most is that I generally think of this point as being mostly about situations like the third one, but most of the post's examples are instead about internal epistemic situations, where someone can't confidently conclude or ... (read more)

Author here. I still endorse the post and have continued to find it pretty central to how I think about myself and nearby ecosystems.

I just submitted some major edits to the post. Changes include:

1. Name change ("Robust, Coherent Agent")

After much hemming and hawing and arguing, I changed the name from "Being a Robust Agent" to "Being a Robust, Coherent Agent." I'm not sure if this was the right call.

It was hard to pin down exactly one "quality" that the post was aiming at. Coherence was the single word that pointed towards "what sort of agent to become." ... (read more)

Uncharitable Summary

Most likely there’s something in the intuitions which got lost when transmitted to me via reading this text, but the mathematics itself seems pretty tautological to me (nevertheless I found it interesting since tautologies can have interesting structure! The proof itself was not trivial to me!). 

Here is my uncharitable summary:

Assume you have a Markov chain M_0 → M_1 → M_2 → … → M_n → … of variables in the universe. Assume you know M_n and want to predict M_0. The Telephone theorem says two things:

  • You don’t need to keep a
... (read more)

I continue to believe that the Grabby Aliens model rests on an extremely sketchy foundation, namely the anthropic assumption “humanity is randomly-selected out of all intelligent civilizations in the past present and future”.

For one thing, given that the Grabby Aliens model does not weight civilizations by their populations, it follows that, if the Grabby Aliens model is right, then all the “popular” anthropic priors like SIA and SSA and UDASSA and so on are all wrong, IIUC.

For another (related) thing, in order to believe the Grabby Aliens model, we need t... (read more)

A short note to start the review that the author isn’t happy with how it is communicated. I agree it could be clearer and this is the reason I’m scoring this 4 instead of 9. The actual content seems very useful to me.

AllAmericanBreakfast has already reviewed this from a theoretical point of view but I wanted to look at it from a practical standpoint.

***

To test whether the conclusions of this post were true in practice I decided to take 5 examples from the Wikipedia page on the Prisoner’s dilemma and see if they were better modeled by Stag Hunt or Schelling... (read more)

In this post, the author presents a case for replacing expected utility theory with some other structure which has no explicit utility function, but only quantities that correspond to conditional expectations of utility.

To provide motivation, the author starts from what he calls the "reductive utility view", which is the thesis he sets out to overthrow. He then identifies two problems with the view.

The first problem is about the ontology in which preferences are defined. In the reductive utility view, the domain of the utility function is the set of possib... (read more)

This essay provides some fascinating case studies and insights about coordination problems and their solutions, from a book by Elinor Ostrom. Coordination problems are a major theme in LessWrongian thinking (for good reasons) and the essay is a valuable addition to the discussion. I especially liked the 8 features of sustainable governance systems (although I wish we got a little more explanation for "nested enterprises").

However, I think that the dichotomy between "absolutism (bad)" and "organically grown institutions (good)" that the essay creates needs

... (read more)

I was surprised that this post ever seemed surprising, which either means it wasn't revolutionary, or was *very* revolutionary. Since it has 229 karma, seems like it was the latter. I feel like the same post today would have been written with more explicit references to reinforcement learning, reward, addiction, and dopamine. The overall thesis seems to be that you can get a felt sense for these things, which would be surprising - isn't it the same kind of reward-seeking all the way down, including on things that are genuinely valuable? Not sure how to model this.

I think about this post a lot, and sometimes in conjunction with my own post on common knowlege.

As well as it being a referent for when I think about fairness, it also ties in with how I think about LessWrong, Arbital and communal online endeavours for truth. The key line is:

For civilization to hold together, we need to make coordinated steps away from Nash equilibria in lockstep.

You can think of Wikipedia as being a set of communally editable web pages where the content of the page is constrained to be that which we can easily gain common knowledge of its

... (read more)

The core of this post seems to be this

  • Decoupling norms: It is considered eminently reasonable to require your claims to be considered in isolation - free of any context or potential implications. An insistence on raising these issues despite a decoupling request are often seen as sloppy thinking or attempts to deflect.
  • Contextualising norms: It is considered eminently reasonable to expect certain contextual factors or implications to be addressed. Not addressing these factors is often seen as sloppy or even an intentional evasion.

As Zack_M_Davis points out ... (read more)

I still think this post is correct in spirit, and was part of my journey towards good understanding of neuroscience, and promising ideas in AGI alignment / safety.

But there are a bunch of little things that I got wrong or explained poorly. Shall I list them?

First, my "neocortex vs subcortex" division eventually developed into "learning subsystem vs steering subsystem", with the latter being mostly just the hypothalamus and brainstem, and the former being everything else, particularly the whole telencephalon and cerebellum. The main difference is that the "... (read more)

This post is an excellent distillation of a cluster of past work on maligness of Solomonoff Induction, which has become a foundational argument/model for inner agency and malign models more generally.

I've long thought that the maligness argument overlooks some major counterarguments, but I never got around to writing them up. Now that this post is up for the 2020 review, seems like a good time to walk through them.

In Solomonoff Model, Sufficiently Large Data Rules Out Malignness

There is a major outside-view reason to expect that the Solomonoff-is-malign ar... (read more)

I think Luna Lovegood and the Chamber of Secrets would deserve to get into the Less Wrong Review if all we cared about were its merits. However, the Less Wrong Review is used to determine which posts get into a book that is sold on paper for money. I think this story should be disqualified from the Less Wrong Review on the grounds that Harry Potter fanfiction must remain non-commercial, especially in the strict sense of traditional print publishing.

The discussion around It's Not the Incentives, It's You, was pretty gnarly. I think at the time there were some concrete, simple mistakes I was making. I also think there were 4-6 major cruxes of disagreement between me and some other LessWrongers. The 2019 Review seemed like a good time to take stock of that.

I've spent around 12 hours talking with a couple people who thought I was mistaken and/or harmful last time, and then 5-10 writing this up. And I don't feel anywhere near done, but I'm reaching the end of the timebox so here goes.

Core Claims

I think th... (read more)

The parent-child model is my cornerstone of healthy emotional processing. I'd like to add that a child often doesn't need much more than your attention. This is one analogy of why meditation works: you just sit down for a while and you just listen

The monks in my local monastery often quip about "sitting in a cave for 30 years", which is their suggested treatment for someone who is particularly deluded. This implies a model of emotional processing which I cannot stress enough: you can only get in the way. Take all distractions away from someone and t... (read more)

Quick authorial review: This post has brought me the greatest joy from other sources referring to it, including Marginal Revolution (https://marginalrevolution.com/marginalrevolution/2018/10/funnel-human-experience.html) and the New York Times bestseller "The Uninhabitable Earth". I was kind of hoping to supply a fact about the world that people could use in many different lights, and they have (see those and also like https://unherd.com/2018/10/why-are-woke-liberals-such-enemies-of-the-past/ )

An unintentional takeaway from this attention is solidifying my

... (read more)

One of the founders of Circling Europe sincerely and apropos-of-nothing thanked me for writing this post earlier this year, which I view as a sign that there were good consequences of me writing this post. My guess is that a bunch of rationalists found their way to Circling, and it was beneficial for people.

I've heard it said that this is one of the more rationalist-friendly summaries of Circling. I don't know it's the best possible such, but I think it's doing OK. I would certainly write it differently now, but shrug.

At this point I&... (read more)

Author here.

I still believe this article is a important addition to the discussion of inadequate equilibria. While Scott Alexander's Moloch post and Eliezer Yudkowsky's book are great for introduction and discussion of the topic, both of them fail, in my opinion, to convey the sheer complexity of the problem as it occurs in the real world. That, I think, results in readers thinking about the issue in simple malthusian or naive game-theoretic terms and eventually despairing about inescapability of suboptimal Nash equilibria.

What I try to present is a world

... (read more)

I wrote this post, and at the time I just wrote it because... well, I thought I'd be able to write a post with a grand conclusion about how science used to check the truth, and then point to how it changed, but I was so surprised to find that journals had not one sentence of criticism in them at all. So I wrote it up as a question post instead, framing my failure to answer the question as 'partial work' that 'helped define the question'.

In retrospect, I'm really glad I wrote the post, because it is a clear datapoint about how science does not work. I have

... (read more)

This is a negative review of an admittedly highly-rated post.

The positives first; I think this post is highly reasonable and well written. I'm glad that it exists and think it contributes to the intellectual conversation in rationality. The examples help the reader reason better, and it contains many pieces of advice that I endorse.

But overall, 1) I ultimately disagree with its main point, and 2) it's way too strong/absolutist about it.

Throughout my life of attempting to have true beliefs and take effective actions, I have quite strongly learned some disti... (read more)

A Cached Belief

I find this Wired article an important exploration of an enormous wrong cached belief in the medical establishment: namely that based on its size, Covid would be transmitted exclusively via droplets (which quickly fall to the ground), rather than aerosols (which hang in the air). This justified a bunch of extremely costly Covid policy decisions and recommendations: like the endless exhortations to disinfect everything and to wash hands all the time. Or the misguided attempt to protect people from Covid by closing public parks and playgrounds... (read more)

I still think this is great. Some minor updates, and an important note:

Minor updates: I'm a bit less concerned about AI-powered propaganda/persuasion than I was at the time, not sure why. Maybe I'm just in a more optimistic mood. See this critique for discussion. It's too early to tell whether reality is diverging from expectation on this front. I had been feeling mildly bad about my chatbot-centered narrative, as of a month ago, but given how ChatGPT was received I think things are basically on trend.
Diplomacy happened faster than I expected, though in a ... (read more)

I wrote this post about a year ago.  It now strikes me as an interesting mixture of

  1. Ideas I still believe are true and important, and which are (still) not talked about enough
  2. Ideas that were plausible at the time, but are much less so now
  3. Claims I made for their aesthetic/emotional appeal, even though I did not fully believe them at the time

In category 1 (true, important, not talked about enough):

  • GPT-2 is a source of valuable evidence about linguistics, because it demonstrates various forms of linguistic competence that previously were only demonstrated
... (read more)

I think this post is incredibly useful as a concrete example of the challenges of seemingly benign powerful AI, and makes a compelling case for serious AI safety research being a prerequisite to any safe further AI development. I strongly dislike part 9, as painting the Predict-o-matic as consciously influencing others personality at the expense of short-term prediction error seems contradictory to the point of the rest of the story. I suspect I would dislike part 9 significantly less if it was framed in terms of a strategy to maximize predictive accuracy.... (read more)

Rereading this post, I'm a bit struck by how much effort I put into explaining my history with the underlying ideas, and motivating that this specifically is cool. I think this made sense as a rhetorical move--I'm hoping that a skeptical audience will follow me into territory labeled 'woo' so that they can see the parts of it that are real--and also as a pedagogical move (proofs may be easy to verify, but all of the interesting content of how they actually discovered that line of thought in concept space has been cleaned away; in this post, rather than hid... (read more)

Since others have done a contextualized review, I'll aim to do a decoupled review, with a caveat that I think the contextual elements are important for consideration with inclusion into the compendium.

Okay. There’s a social interaction concept that I’ve tried to convey multiple times in multiple conversations, so I’m going to just go ahead and make a graph.
I’m calling this concept “Affordance Widths”.

I'd like to see a clear definition here before launching into an example. In fact, there's no clear ... (read more)

Tl;dr I encourage people who changed their behavior based on this post or the larger sequence to comment with their stories.

I had already switched to freelance work for reasons overlapping although not synonymous with moral mazes when I learned the concept, and since then the concept has altered how I approach freelance gigs. So I’m in general very on board with the concept.

But as I read this, I thought about my friend Jessica, who’s a manager at a Fortune 500 company. Jessica is principled and has put serious (but not overwhelming) effort into enacting th... (read more)

Frames that describe perception can become tools for controlling perception.

The idea of simulacra has been generative here on LessWrong, used by Elizabeth in her analysis of negative feedback, and by Zvi in his writings on Covid-19. It appears to originate in private conversations between Benjamin Hoffman and Jessica Taylor. The four simulacra levels or stages are a conception of Baudrillard’s, from Simulacra and Simulation. The Wikipedia summary quoted on the original blog post between Hoffman and Taylor has been reworded several times by various authors ... (read more)

This post is based on the book Moral Mazes, which is a 1988 book describing "the way bureaucracy shapes moral consciousness" in US corporate managers. The central point is that it's possible to imagine relationship and organization structures in which unnecessarily destructive behavior, to self or others, is used as a costly signal of loyalty or status.

Zvi titles the post after what he says these behaviors are trying to avoid, motive ambiguity. He doesn't label the dynamic itself, so I'll refer to it here as "disambiguating destruction" (DD). Before procee... (read more)

First, some meta-level things I've learned since writing this:

  1. What people crave most is very practical advice on what to buy. In retrospect this should have been more obvious to me. When I look for help from others on how to solve a problem I do not know much about, the main thing I want is very actionable advice, like "buy this thing", "use this app", or "follow this Twitter account".

  2. Failing that, what people want is legible, easy-to-use criteria for making decisions on their own. Advice like "Find something with CRI>90, and more CRI is better" i

... (read more)

Self Review.

I still endorse the broad thrusts of this post. But I think it should change at least somewhat. I'm not sure how extensively, but here are some considerations

Clearer distinctions between Prisoner's Dilemma and Stag Hunts

I should be more clear about what the game theoretical distinctions I'm actually making between Prisoners Dilemma and Stag Hunt. I think Rob Bensinger rightly criticized the current wording, which equivocates between "stag hunting is meaningfully different" and "'hunting rabbit' has nicer aesthetic properties than 'defect'".&nbs... (read more)

There is a joke about programmers, that I picked up long ago, I don't remember where, that says: A good programmer will do hours of work to automate away minutes of drudgery. Some time last month, that joke came into my head, and I thought: yes of course, a programmer should do that, since most of the hours spent automating are building capital, not necessarily in direct drudgery-prevention but in learning how to automate in this domain.

I did not think of this post, when I had that thought. But I also don't think I would've noticed, if that joke had crosse... (read more)

I revisited this post a few months ago, after Vaniver's review of Atlas Shrugged.

I've felt for a while that Atlas Shrugged has some really obvious easy-to-articulate problems, but also offers a lot of value in a much-harder-to-articulate way. After chewing on it for a while, I think the value of Atlas Shrugged is that it takes some facts about how incentives and economics and certain worldviews have historically played out, and propagates those facts into an aesthetic. (Specifically, the facts which drove Rand's aesthetics presumably came from growing up i... (read more)

This post seems excellent overall, and makes several arguments that I think represent the best of LessWrong self-reflection about rationality. It also spurred an interesting ongoing conversation about what integrity means, and how it interacts with updating.

The first part of the post is dedicated to discussions of misaligned incentives, and makes the claim that poorly aligned incentives are primarily to blame for irrational or incorrect decisions. I’m a little bit confused about this, specifically that nobody has pointed out the obvious corollary: the peop... (read more)

Hi, I'm pleased to see that this has been nominated and has made a lasting impact.

Do I have any updates? I think it aged well. I'm not making any particular specific claims here, but I still endorse this and think it's an important concept.

I've done very little further thinking on this. I was quietly hoping that others might pick up the mantle and write more on strategies for caring less, as well as cases where this should be argued. I haven't seen this, but I'd love to see more of it.

I've referred to it myself when talking about values that I think people

... (read more)

Author here.

In the hindsight, I still feel that the phenomenon is interesting and potentially important topic to look into. I am not aware of any attempt to replicate or dive deeper though.

As for my attempt to explain the psychology underlying the phenomenon I am not entirely happy with it. It's based only on introspection and lacks sound game-theoretic backing.

By the way, there's one interesting explanation I've read somewhere in the meantime (unfortunately, I don't remember the source):

Cooperation may incur different costs on different participants. If y

... (read more)

This is a long and good post with a title and early framing advertising a shorter and better post that does not fully exist, but would be great if it did. 

The actual post here is something more like "CFAR and the Quest to Change Core Beliefs While Staying Sane." 

The basic problem is that people by default have belief systems that allow them to operate normally in everyday life, and that protect them against weird beliefs and absurd actions, especially ones that would extract a lot of resources in ways that don't clearly pay off. And they similarl... (read more)

The goal of this post is to help us understand the similarities and differences between several different games, and to improve our intuitions about which game is the right default assumption when modeling real-world outcomes.

My main objective with this review is to check the game theoretic claims, identify the points at which this post makes empirical assertions, and see if there are any worrisome oversights or gaps. Most of my fact-checking will just be resorting to Wikipedia.

Let’s start with definitions of two key concepts.

Pareto-optimal: One dimension ... (read more)

There are two aspects of this post worth reviewing: as an experiment in a different mode of discourse, and as a description of the procession of simulacra, a schema originally advanced by Baudrillard.

As an experiment in a diffferent mode of discourse, I think this was a success on its own terms, and a challenge to the idea that we should be looking for the best blog posts rather than the behavior patterns that lead to the best overall discourse.

The development of the concept occurred over email quite naturally without forceful effort. I would have written ... (read more)

The material here is one seed of a worldview which I've updated toward a lot more over the past year. Some other posts which involve the theme include Science in a High Dimensional World, What is Abstraction?, Alignment by Default, and the companion post to this one Book Review: Design Principles of Biological Circuits.

Two ideas unify all of these:

  1. Our universe has a simplifying structure: it abstracts well, implying a particular kind of modularity.
  2. Goal-oriented systems in our universe tend to evolve a modular structure which reflects the structure of the u
... (read more)

I notice I am confused.

I feel as though these type of posts add relatively little value to LessWrong, however, this post has quite a few upvotes. I don’t think novelty is a prerequisite for a high-quality post, but I feel as though this post was both not novel and not relevant, which worries me. I think that most of the information presented in this article is a. Not actionable b. Not related to LessWrong, and c. Easily replaceable with a Wikipedia or similar search. This would be my totally spot balled test for a topical post: at least one of these 3 must... (read more)

I find it deeply sad that many of us feel the need to frequently link to this article - I don't think I have ever done so, because if I need to explain local validity, then perhaps I'm talking to the wrong people? But certainly the ignoring of this principle has gotten more and more blatant and common over time since this post, so it's becoming less reasonable to assume that people understand such things. Which is super scary.

This post kills me. Lots of great stuff, and I think this strongly makes the cut. Sarah has great insights into what is going on, then turns away from them right when following through would be most valuable. The post is explaining why she and an entire culture is being defrauded by aesthetics. That is it used to justify all sorts of things, including high prices and what is cool, based on things that have no underlying value. How it contains lots of hostile subliminal messages that are driving her crazy. It's very clear. And then she... doesn't see the fnords. So close!

This post should be included in the Best-of-2018 compilation.

This is not only a good post, but one which cuts to the core of what this community is about. This site began not as a discussion of topics X, Y, and Z, but as a discussion of how to be... less wrong than the world around you (even/especially your own ingroup), and the difficulties this entails. Uncompromising honesty and self-skepticism are hard, and even though the best parts are a distillation of other parts of the Sequences, people need to be reminded more often than they need to be instructed.

Epistemic Status

I am an aspiring selection theorist and I have thoughts.

 


 

Why Selection Theorems?

Learning about selection theorems was very exciting. It's one of those concepts that felt so obviously right. A missing component in my alignment ontology that just clicked and made everything stronger.

 

Selection Theorems as a Compelling Agent Foundations Paradigm

There are many reasons to be sympathetic to agent foundations style safety research as it most directly engages the hard problems/core confusions of alignment/safety. However, one concer... (read more)

ELK was one of my first exposures to AI safety. I participated in the ELK contest shortly after moving to Berkeley to learn more about longtermism and AI safety. My review focuses on ELK’s impact on me, as well as my impressions of how ELK affected the Berkeley AIS community.

Things about ELK that I benefited from

Understanding ARC’s research methodology & the builder-breaker format. For me, most of the value of ELK came from seeing ELK’s builder-breaker research methodology in action. Much of the report focuses on presenting training strategies and pres... (read more)

I remain pretty happy with most of this, looking back -- I think this remains clear, accessible, and about as truthful as possible without getting too technical.

I do want to grade my conclusions / predictions, though.

(1). I predicted that this work would quickly be exceeded in sample efficiency. This was wrong -- it's been a bit over a year and EfficientZero is still SOTA on Atari. My 3-to-24-month timeframe hasn't run out, but I said that I expected "at least a 25% gain" towards the start of the time, which hasn't happened.

(2). There has been a shift to... (read more)

I suppose, with one day left to review 2021 posts, I can add my 2¢ to my own here.

Overall I still like this post. I still think it points at true things and says them pretty well.

I had intended it as a kind of guide or instruction manual for anyone who felt inspired to create a truly potent rationality dojo. I'm a bit saddened that, to the best of my knowledge, no one seems to have taken what I named here and made it their own enough to build a Beisutsu dojo. I would really have liked to see that.

But this post wasn't meant to persuade anyone to do it. It w... (read more)

If you judge your social media usage by whether the average post you read is good or bad, you are missing half of the picture. The rapid context switching incurs an invisible cost even if the interaction itself is positive, as does the fact that you expect to be interrupted. "[T]he knowledge that interruptions could come at every time will change your mental state", as Elizabeth puts it.

This is the main object-level message of this post, and I don't have any qualms with it. It's very similar to what Sam Harris talks about a lot (e.g., here), and it seems t... (read more)

The post is still largely up-to-date. In the intervening year, I mostly worked on the theory of regret bounds for infra-Bayesian bandits, and haven't made much progress on open problems in infra-Bayesian physicalism. On the other hand, I also haven't found any new problems with the framework.

The strongest objection to this formalism is the apparent contradiction between the monotonicity principle and the sort of preferences humans have. While my thinking about this problem evolved a little, I am still at a spot where every solution I know requires biting a... (read more)

Alexandros Marinos (LW profile) has a long series where he reviewed Scott's post:

The Potemkin argument is my public peer review of Scott Alexander’s essay on ivermectin. In this series of posts, I go through that essay in detail, working through the various claims made and examining their validity. My essays will follow the structure of Scott’s essay, structured in four primary units, with additional material to follow

 This is his summary of the series, and this is the index. Here's the main part of the index:

Introduction

Part 1: Introduction (TBC)

Part

... (read more)

The post claims:

I have investigated this issue in depth and concluded that even a full scale nuclear exchange is unlikely (<1%) to cause human extinction.

This review aims to assess whether having read the post I can conclude the same.

The review is split into 3 parts:

  • Epistemic spot check
  • Examining the argument
  • Outside the argument

Epistemic spot check

Claim: There are 14,000 nuclear warheads in the world.

Assessment: True

Claim: Average warhead yield <1 Mt, probably closer to 100kt

Assessment: Probably true, possibly misleading. Values I found were:

... (read more)

I've stepped back from thinking about ML and alignment the last few years, so I don't know how this fits into the discourse about it, but I felt like I got important insight here and I'd be excited to include this. The key concept that bigger models can be simpler seems very important. 

In my words, I'd say that when you don't have enough knobs, you're forced to find ways for each knob to serve multiple purposes slash combine multiple things, which is messy and complex and can be highly arbitrary, whereas with lots of knobs you can do 'the thing you na... (read more)

The notion of specificity may be useful, but to me its presentation in terms of tone (beginning with the title "The Power to Demolish Bad Arguments") and examples seemed rather antithetical to the Less Wrong philosophy of truth-seeking.

For instance, I read the "Uber exploits its drivers" example discussion as follows: the author already disagrees with the claim as their bottom line, then tries to win the discussion by picking their counterpart's arguments apart, all the while insulting this fictitious person with asides like "By sloshing around his mental ... (read more)

I'm a bit torn here, because the ideas in the post seem really important/useful to me (e.g., I use these phrases as a mental pointer sometimes), such that I'd want anyone trying to make sense of the human situation to have access to them (via this post or a number of other attempts at articulating much the same, e.g. "Elephant and the Brain"). And at the same time I think there's some crucial misunderstanding in it that is dangerous and that I can't articulate. Voting for it anyhow though.

[Update: the new version is now live!!]

[Author writing here.]

The initial version of this post was written quickly on a whim, but given the value people have gotten from this post (as evidenced by the 2018 Review nomination and reviews), I think it warrants a significant update which I plan to write in time for possibly publication in a book, and ideally the Review voting stage.

Things I plan to include in the update:

... (read more)

I hadn't realized this post was nominated, partially because of my comment, so here's a late review. I basically continue to agree with everything I wrote then, and I continue to like this post for those reasons, and so I support including it in the LW Review.

Since writing the comment, I've come across another argument for thinking about intent alignment -- it seems like a "generalization" of assistance games / CIRL, which itself seems like a formalization of an aligned agent in a toy setting. In assistance games, the agent explici... (read more)

Many people pointed out that the real cost of a Bitcoin in 2011 or whenever wasn't the couple of cents that it cost, but the several hours of work it would take to figure out how to purchase it. And that costs needed to be discounted by the significant risk that a Bitcoin purchased in 2011 would be lost or hacked - or by the many hours of work it would have taken to ensure that didn't happen. Also, that there was another hard problem of not selling your 2011-Bitcoins in 2014. I agree that all of these are problems with the original post, and tha... (read more)

I was going to write a longer review but I realised that Ben’s curation notice actually explains the strengths of this post very well so you should read that!

In terms of including this in the 2018 review I think this depends on what the review is for.

If the review is primarily for the purpose of building common knowledge within the community then including this post maybe isn’t worth it as it is already fairly well known, having been linked from SSC.

On the other hand if the review process is at least partly for, as Raemon put it:

“I wan... (read more)

This is a review of my own post.

The first thing to say is that for the 2018 Review Eli’s mathematicians post should take precedence because it was him who took up the challenge in the first place and inspired my post. I hope to find time to write a review on his post.

If people were interested (and Eli was ok with it) I would be happy to write a short summary of my findings to add as a footnote to Eli’s post if it was chosen for the review.

***

This was my first post on LessWrong and looking back at it I think it still holds up fairly well.

There... (read more)

I remember reading this post and thinking it is very good and important. I have since pretty much forgot about it and it's insights, probably because I didn't think much about GDPs anyway. Rereading the post, I maintain that it is very good and important. Any discussion of GDP should be with the understanding of what this post says, which I summarized to myself like so (It's mostly a combination of edited excerpts from the post):

Real GDP is usually calculated by adding up the total dollar value of all goods, using prices from some recent year (every few ye

... (read more)

I read this sequence and then went through the whole thing.  Without this sequence I'd probably still be procrastinating / putting it off.  I think everything else I could write in review is less important than how directly this impacted me.

Still, a review: (of the whole sequence, not just this post)

First off, it signposts well what it is and who it's for.  I really appreciate when posts do that, and this clearly gives the top level focus and whats in/out.

This sequence is "How to do a thing" - a pretty big thing, with a lot of steps and bran... (read more)

I'm not sure I use this particular price mechanism fairly often, but I think this post was involved in me moving toward often figuring out fair prices for things between friends and allies, which I think helps a lot. The post puts together lots of the relevant intuitions, which is what's so helpful about it. +4

This gave a satisfying "click" of how the Simulacra and Staghunt concepts fit together. 

Things I would consider changing:

1. Lion Parable. In the comments, John expands on this post with a parable about lion-hunters who believe in "magical protection against lions." That parable is actually what I normally think of when I think of this post, and I was sad to learn it wasn't actually in the post. I'd add it in, maybe as the opening example.

2. Do we actually need the word "simulacrum 3"? Something on my mind since last year's review is "how much work are... (read more)

I generally endorse the claims made in this post and the overall analogy. Since this post was written, there are a few more examples I can add to the categories for slow takeoff properties. 

Learning from experience

  • The UK procrastinated on locking down in response to the Alpha variant due to political considerations (not wanting to "cancel Christmas"), though it was known that timely lockdowns are much more effective.
  • Various countries reacted to Omicron with travel bans after they already had community transmission (e.g. Canada and the UK), while it wa
... (read more)

This will not be a full review—it's more of a drive-by comment which I think is relevant to the review process.

However, the defense establishment has access to classified information and models that we civilians do not have, in addition to all the public material. I’m confident that nuclear war planners have thought deeply about the risks of climate change from nuclear war, even though I don’t know their conclusions or bureaucratic constraints.

I am extremely skeptical of and am not at all confident in this conclusion. Ellsberg's The Doomsday Machine descri... (read more)

I don't think this post added anything new to the conversation, both because Elizabeth Van Nostrand's epistemic spot check found essentially the same result previously and because, as I said in the post, it's "the blog equivalent of a null finding." 

I still think it's slightly valuable - it's useful to occasionally replicate reviews. 

(For me personally, writing this post was quite valuable - it was a good opportunity to examine the evidence for myself, try to appropriately incorporate the different types of evidence into my prior, and form my own opinions for when clients ask me related questions.) 

Simulacra levels were probably the biggest incorporation to the rationalist canon in 2020. This was one of maybe half-a-dozen posts which I think together cemented the idea pretty well. If we do books again, I could easily imagine a whole book on simulacra, and I'd want this post in it.

I've alluded to this in other comments, but I think worth spelling out more comprehensively here.

I think this post makes a few main points:

  1. Categories are not arbitrary. You might need different categories for different purposes, but categories are for helping you think about the things you care about, and a category that doesn't correspond to the territory will be less helpful for thinking and communciating.
  2. Some categories might sort of look like they correspond to something in reality, but they are gerrymandered in a way optimized for deception. 
  3. You
... (read more)

This came out in April 2019, and bore a lot of fruit especially in 2020. Without it, I wouldn't have thought about the simulacra concept and developed the ideas, and without those ideas, I don't think I would have made anything like as much progress understanding 2020 and its events, or how things work in general. 

I don't think this was an ideal introduction to the topic, but it was highly motivating regarding the topic, and also it's a very hard topic to introduce or grok, and this was the first attempt that allowed later attempts. I think we should reward all of that.

This is a self-review, looking back at the post after 13 months.

I have made a few edits to the post, including three major changes:
1. Sharpening my definition of what counts as "Rationalist self-improvement" to reduce confusion. This post is about improved epistemics leading to improved life outcomes, which I don't want to conflate with some CFAR techniques that are basically therapy packaged for skeptical nerds.
2. Addressing Scott's "counterargument from market efficiency" that we shouldn't expect to invent easy self-improvement techniques that haven't be... (read more)

This is my post. It is fundamentally a summary of an overview paper, which I wrote to introduce the concept to the community, and I think it works for that purpose. In terms of improvements there are a few I would make; I would perhaps include the details about why people choose megaprojects as a venue, for completeness' sake.  It might have helped if I provided more examples in the post to motivate engagement; these are projects like powerplants, chip fabs, oil rigs and airplanes, or in other words the fundamental blocks of modern civilization.

I cont... (read more)

This is a cogent, if sparse, high-level analysis of the epistemic distortions around megaprojects in AI and other fields.

It points out that projects like the human brain project and the fifth generation computer systems project made massive promises, raised around a billion dollars, and totally flopped. I don't expect this was a simple error, I expect there were indeed systematic epistemic distortions involved, perpetuated at all levels.

It points out that similar scale projects are being evaluated today involving various major AI companies globally, and po... (read more)

(Self-review.) I've edited the post to include the calculation as footnote 10.

The post doesn't emphasize this angle, but this is also more-or-less my abstract story for the classic puzzle of why disagreement is so prevalent, which, from a Bayesian-wannabe rather than a human perspective, should be shocking: there's only one reality, so honest people should get the same answers. How can it simultaneously be the case that disagreement is ubiquitous, but people usually aren't outright lying? Explanation: the "dishonesty" is mostly in the form... (read more)

I think I agree with the thrust of this, but I think the comment section raises caveats that seem important. Scott's acknowledged that there's danger in this, and I hope an updated version would put that in the post.

But also...

Steven Pinker is a black box who occasionally spits out ideas, opinions, and arguments for you to evaluate. If some of them are arguments you wouldn’t have come up with on your own, then he’s doing you a service. If 50% of them are false, then the best-case scenario is that they’re moronically, obviously false, so that you can reject

... (read more)

Review by the author:

I continue to endorse the contents of this post.

I don't really think about the post that much, but the post expresses a worldview that shapes how I do my research - that agency is a mechanical fact about the workings of a system.

To me, the main contribution of the post is setting up a question: what's a good definition of optimisation that avoids the counterexamples of the post? Ideally, this definition would refer or correspond to the mechanistic properties of the system, so that people could somehow statically determine whether a giv

... (read more)

I wrote about this post extensively as part of my essay on Rationalist self-improvement. The general idea of this post is excellent: gathering data for a clever natural experiment of whether Rationalists actually win. Unfortunately, the analysis itself is very lacking and is not very data-driven.

The core result is: 15% of SSC readers who were referred by LessWrong made over $1,000 in crypto, 3% made $100,000. These quantities require quantitative analysis: Is 15%/3% a lot or a little compared to matched groups like the Silicon Valley or Libertarian blogosp... (read more)

Reviewing this quickly because it doesn't have a review.

I've linked this post to several people in the last year. I think it's valuable for people (especially junior researchers or researchers outside of major AIS hubs) to be able to have a "practical sense" of what doing independent alignment research can be like, how the LTFF grant application process works, and some of the tradeoffs of doing this kind of work. 

This seems especially important for independent conceptual work, since this is the path that is least well-paved (relative to empirical work... (read more)

Returning to this essay, it continues to be my favorite Paul post (even What Failure Looks Like only comes second), and I think it's the best way to engage with Paul's work than anything else (including the Eliciting Latent Knowledge document, which feels less grounded in the x-risk problem, is less in Paul's native language, and gets detailed on just one idea for 10x the space thus communicating less of the big picture research goal). I feel I can understand all the arguments made in this post. I think this should be mandatory reading before reading Elici... (read more)

Epistemic Status: I don't actually know anything about machine learning or reinforcement learning and I'm just following your reasoning/explanation.

 

From each state, we can just check each possible action against the action-value function $q(s_t, a_t), and choose the action that returns the highest value from the action-value function. Greedy search against the action-value function for the optimal policy is thus equivalent to the optimal policy. For this reason, many algorithms try to learn the action-value function for the optimal policy.

This do... (read more)

The combination of this post, and an earlier John post (Parable of the Dammed) has given me some better language for understanding what's going on in negotiations and norm-setting, two topics that I think are quite valuable. The concept of "you could actually move the Empire State Building, maybe, and that'd affect the Schelling point of meeting places", was a useful intuition pump for both "you can move norm Schelling points around" (as well as how difficult to think of that task as).

Two years later, I suppose we know more than we did when the article was written. I would like to read some postscript explaining how well this article has aged.

Both this document and John himself have been useful resources to me as I launch into my own career studying aging in graduate school. One thing I think would have been really helpful here are more thorough citations and sourcing. It's hard to follow John's points ("In sarcopenia, one cross-section of the long muscle cell will fail first - a “ragged red” section - and then failure gradually spreads along the length.") and trace them back to any specific source, and it's also hard to know which of the synthetic insights are original to John and which are in... (read more)

I’ll set aside what happens “by default” and focus on the interesting technical question of whether this post is describing a possible straightforward-ish path to aligned superintelligent AGI.

The background idea is “natural abstractions”. This is basically a claim that, when you use an unsupervised world-model-building learning algorithm, its latent space tends to systematically learn some patterns rather than others. Different learning algorithms will converge on similar learned patterns, because those learned patterns are a property of the world, not an ... (read more)

You can see my other reviews from this and past years, and check that I don't generally say this sort of thing:

This was the best post I've written in years. I think it distilled an idea that's perennially sorely needed in the EA community, and presented it well. I fully endorse it word-for-word today.

The only edit I'd consider making is to have the "Denial" reaction explicitly say "that pit over there doesn't really exist".

(Yeah, I know, not an especially informative review - just that the upvote to my past self is an exceptionally strong one.)

One factor no one mentions here is the changing nature of our ability to coordinate at all. If our ability to coordinate in general is breaking down rapidly, which seems at least highly plausible, then that will likely carry over to AGI, and until that reverses it will continuously make coordination on AGI harder same as everything else. 

In general, this post and the answers felt strangely non-"messy" in that sense, although there's also something to be said for the abstract view. 

In terms of inclusion, I think it's a question that deserves more thought, but I didn't feel like the answers here (in OP and below) were enlightening enough to merit inclusion. 

I chose this particular post to review because I think it does a great job of highlighting soe of the biases and implicit assumptions that Zack makes throughout the rest of the sequence. Therefore this review should be considered not just a review of this post, but also all subsequent posts in Zack's sequence.

Firstly, I think the argument Zack is making here is reasonable. He's saying that if a fact is relevant to an argument it should be welcome, and if it's not relevant to an argument it should not be.

Throughout the rest of the sequence, he continues to ... (read more)

This post introduces a potentially very useful model, both for selecting problems to work on and for prioritizing personal development. This model could be called "The Pareto Frontier of Capability". Simply put:

  1. By an efficient markets-type argument, you shouldn't expect to have any particularly good ways of achieving money/status/whatever - if there was an unusually good way of doing that, somebody else would already be exploiting it.
  2. The exception to this is that if only a small amount of people can exploit an opportunity, you may have a shot. So you s
... (read more)

I still broadly agree with everything that I said in this post. I do feel that it is a little imprecise, in that I now have much more detailed and gears-y models for many of its claims. However, elaborating on those would require an entirely new post (one which I currently working on) with a sequence's worth of prerequisites. So if I were to edit this post, I would probably mostly leave it as it is, but include a pointer to the new post once it's finished.

In terms of this post being included in a book, it is worth noting that the post situates it... (read more)

The LW team is encouraging authors to review their own posts, so:

In retrospect, I think this post set out to do a small thing, and did it well. This isn't a grand concept or a vast inferential distance, it's just a reframe that I think is valuable for many people to try for themselves.

I still bring up this concept quite a lot when I'm trying to help people through their psychological troubles, and when I'm reflecting on my own.

I don't know whether the post belongs in the Best of 2018, but I'm proud of it.

Insofar as the AI Alignment Forum is part of the Best-of-2018 Review, this post deserves to be included. It's the friendliest explanation to MIRI's research agenda (as of 2018) that currently exists.

It strikes me as pedagogically unfortunate that sections i. and ii. (on arguments and proof-steps being locally valid) are part of the same essay as as sections iii.–vi. (on what this has to do with the function of Law in Society). Had this been written in the Sequences-era, one would imagine this being (at least) two separate posts, and it would be nice to have a reference link for just the concept of argumentative local validity (which is obviously correct and important to have a name for, even if some of the speculations about Law in sections iii.–vi. turned out to be wrong).

Thermodynamics is the deep theory behind steam engine design (and many other things) -- it doesn't tell you how to build a steam engine, but to design a good one you probably need to draw on it somewhat.

This post feels like a gesture at a deep theory behind truth-oriented forum / community design (and many other things) -- it certainly doesn't help tell you how to build one, but you have to think at least around what it talks about to design a good one. Also applicable to many other things, of course.

It also has virtue of being very short. Per-word one of my favorite posts.

Weakly positive on this one overall.  I like Coase's theory of the firm, and like making analogies with it to other things.  I don't think this application felt like it quite worked to me, and trying to write up why.

One thing is I think feels off is an incomplete understanding of the Coase paper.  What I think the article gets correct: Coase looks at the difference between markets (economists preferred efficient mechanism) and firms / corporation, and observes that transaction costs (for people these would be contracts, but in general all tr... (read more)

I like this post in part because of the dual nature of the conclusion, aimed at two different audiences. Focusing on the cost of implementing various coordination schemes seems... relatively unexamined on LW, I think. The list of life-lessons is intelligible, actionable, and short.

On the other hand, I think you could probably push it even further in "Secret of Our Success" tradition / culture direction. Because there's... a somewhat false claim in it: "Once upon a time, someone had to be the first person to invent each of these concepts."

This seems false ... (read more)

Summary

I summarize this post in a slightly reverse order. In AI alignment, one core question is how to think about utility maximization. What are agents doing that maximize utility? How does embeddedness play into this? What can we prove about such agents? Which types of systems become maximizers of utility in the first place?

This article reformulates expected utility maximization in equivalent terms in the hopes that the new formulation makes answering such questions easier. Concretely, a utility function u is given, and the goal of a u-maximizer is to ch... (read more)

Self-Review

I feel pretty happy with this post in hindsight! Nothing major comes to mind that I'd want to change.

I think that agency is a really, really important concept, and one of the biggest drivers of ways my life has improved. But the notion of agency as a legible, articulated concept (rather than just an intuitive notion) is foreign to a lot of people, and jargon-y. I don't think there was previously a good post cleanly explaining the concept, and I'm very satisfied that this one exists and that I can point people to it.

I particularly like my framin... (read more)

I still think this is basically correct, and have raised my estimation of how important it is in x-risk in particular.  The emphasis on doing The Most Important Thing and Making Large Bets push people against leaving slack, which I think leads to high value but irregular opportunities for gains being ignored.

This post aims to clarify the definitions of a number of concepts in AI alignment introduced by the author and collaborators. The concepts are interesting, and some researchers evidently find them useful. Personally, I find the definitions confusing, but I did benefit a little from thinking about this confusion. In my opinion, the post could greatly benefit from introducing mathematical notation[1] and making the concepts precise at least in some very simplistic toy model.

In the following, I'll try going over some of the definitions and explicating my unde... (read more)

What this post does for me is that it encourages me to view products and services not as physical facts of our world, as things that happen to exist, but as the outcomes of an active creative process that is still ongoing and open to our participation. It reminds us that everything we might want to do is hard, and that the work of making that task less hard is valuable. Otherwise, we are liable to make the mistake of taking functionality and expertise for granted.

What is not an interface? That's the slipperiest aspect of this post. A programming language i... (read more)

An Orthodox Case Against Utility Functions was a shocking piece to me. Abram spends the first half of the post laying out a view he suspects people hold, but he thinks is clearly wrong, which is a perspective that approaches things "from the starting-point of the universe". I felt dread reading it, because it was a view I held at the time, and I used as a key background perspective when I discussed bayesian reasoning. The rest of the post lays out an alternative perspective that "starts from the standpoint of the agent". Instead of my beliefs being about t... (read more)

It would be slightly whimsical to include this post without any explanation in the 2020 review. Everything else in the review is so serious, we could catch a break from apocalypses to look at an elephant seal for ten seconds.

The central point of this article was that conformism was causing society to treat COVID-19 with insufficient alarm. Its goal was to give its readership social sanction and motivation to change that pattern. One of its sub-arguments was that the media was succumbing to conformity. This claim came with an implication that this post was ahead of the curve, and that it was indicative of a pattern of success among rationalists in achieving real benefits, both altruistically (in motivating positive social change) and selfishly (in finding alpha).

I thought it wo... (read more)

I liked this post a lot. In general, I think that the rationalist project should focus a lot more on "doing things" than on writing things. Producing tools like this is a great example of "doing things". Other examples include starting meetups and group houses.

So, I liked this post a) for being an example of "doing things", but also b) for being what I consider to be a good example of "doing things". Consider that quote from Paul Graham about "live in the future and build what's missing". To me, this has gotta be a tool that exists in the future, and I app... (read more)

Overall, you can break my and Jim's claims down into a few categories:
* Descriptions of things that had already happened, where no new information has overturned our interpretation (5)
* CDC made a guess with insufficient information, was correct (1- packages)
* CDC made a guess with insufficient information, we'll never know who was right because the terms were ambiguous (1- the state of post-quarantine individuals)
* CDC made a guess with insufficient information and we were right (1- masks)

That overall seems pretty good. It's great that covid didn't turn o... (read more)

(I am the author)

I still like & stand by this post. I refer back to it constantly. It does two things:

1. Argue that an AI-induced point of no return could significantly before, or significantly after, world GDP growth accelerates--and indeed will probably come before!

2. Argue that we shouldn't define timelines and takeoff speeds in terms of economic growth. So, against "is there a 4 year doubling before a 1 year doubling?" and against "When will we have TAI = AI capable of doubling the economy in 4 years if deployed?"

I think both things are pretty impo... (read more)

This post makes a straightforward analytic argument clarifying the relationship between reason and experience. The popularity of this post suggests that the ideas of cultural accumulation of knowledge, and the power of reason, have been politicized into a specious Hegelian opposition to each other. But for the most part neither Baconian science nor mathematics (except for the occasional Ramanujan) works as a human institution except by the accumulation of knowledge over time.

A good follow-up post would connect this to the ways in which modernist ideology p... (read more)

If this post is selected, I'd like to see the followup made into an addendum—I think it adds a very important piece, and it should have been nominated itself.

Self-review: Looking back, this post is one of the first sightings of a simple, very useful concrete suggestion to have chargers ready to go literal everywhere you might want them, and that is a remarkably large life improvement that got through to many people and that I'm very happy I realized. 

However, that could easily be more than all of this post's value, because essentially no one embraced the central concept of Duel Wielding the phones themselves. And after a few months, I stopped doing so as well, in favor of not getting confused about which p... (read more)

This post surprised me a lot. It still surprises me a lot, actually. I've also linked it a lot of times in the past year. 

The concrete context where this post has come up is in things like ML transparency research, as well as lots of theories about what promising approaches to AGI capabilities research are. In particular, there is a frequently recurring question of the type "to what degree do optimization processes like evolution and stochastic gradient descent give rise to understandable modular algorithms?". 

I'm trying out making some polls about posts for the Review (using the predictions feature). You can answer by hovering over the scale and clicking a number to indicate your agreement with the claim. 

Making more land out of the about 50mi^2 shallow water in the San Francisco Bay, South of the Dumbarton Bridge, would... 

... (read more)

This seems to me like a valuable post, both on the object level, and as a particularly emblematic example of a category ("Just-so-story debunkers") that would be good to broadly encourage.

The tradeoff view of manioc production is an excellent insight, and is an important objection to encourage: the original post and book (haven't read in the entirety) appear to have leaned to heavily on what might be described as a special case of a just-so story: the phenomena is a behavior difference is explained as an absolute by using a post-hoc framework, and then doe... (read more)

In a field like alignment or embedded agency, it's useful to keep a list of one or two dozen ideas which seem like they should fit neatly into a full theory, although it's not yet clear how. When working on a theoretical framework, you regularly revisit each of those ideas, and think about how it fits in. Every once in a while, a piece will click, and another large chunk of the puzzle will come together.

Selection vs control is one of those ideas. It seems like it should fit neatly into a full theory, but it's not yet clear what that will look like. I revis... (read more)

To effectively extend on Raemon's commentary:

I think this post is quite good, overall, and adequately elaborates on the disadvantages and insufficiencies of the Wizard's Code of Honesty beyond the irritatingly pedantic idiomatic example. However, I find the implicit thesis of the post deeply confusing (that EY's post is less "broadly useful" than it initially appears). As I understand them, the two posts are saying basically identical things, but are focused in slightly different areas, and draw very different conclusions. EY's notes the issues with the wi... (read more)

Post is very informal. It reads like, well, a personal blog post. A little in the direction of raw freewriting. It's fluid. Easy to read and relate to.

That matters, when you're trying to convey nuanced information about how minds work. Relatable means the reader is making connections with their personal experiences; one of the most powerful ways to check comprehension and increase retention. This post shows a subtle error as it appears from the inside. It doesn't surprise me that this post sparked some rich discussion in the comments.

To be frank, I'd be ve... (read more)

As has been mentioned elsewhere, this is a crushingly well-argued piece of philosophy of language and its relation to reasoning. I will say this post strikes me as somewhat longer than it needs to be, but that's also my opinion on much of the Sequences, so it is at least traditional.

Also, this piece is historically significant because it played a big role in litigating a community social conflict (which is no less important for having been (being?) mostly below the surface), and set the stage for a lot of further discussion. I think it's very important tha... (read more)

I haven't thought about the bat and ball question specifically very much since writing this post, but I did get a lot of interesting comments and suggestions that have sort of been rolling around my head in background mode ever since. Here's a few I wanted to highlight:

Is the bat and ball question really different to the others? First off, it was interesting to see how much agreement there was with my intuition that the bat and ball question was interestingly different to the other two questions in the CRT. Reading through the comments I count four other p

... (read more)

I was always surprised that small changes in public perception, a slight change in consumption or political opinion can have large effects. This post introduced the concept of the social behaviour curves for me, and it feels like explains quite a lot of things. The writer presents some example behaviours and movements (like why revolutions start slowly or why societal changes are sticky), and then it provides clear explanations for them using this model. Which explains how to use social behaviour curves and verifies some of the model's predictions at the s... (read more)

I've written a bunch elsewhere about object-level thoughts on ELK. For this review, I want to focus instead on meta-level points.

I think ELK was very well-made; I think it did a great job of explaining itself with lots of surface area, explaining a way to think about solutions (the builder-breaker cycle), bridging the gap between toy demonstrations and philosophical problems, and focusing lots of attention on the same thing at the same time. In terms of impact on the growth and development on the AI safety community, I think this is one of the most importa... (read more)

In many ways, this post is frustrating to read. It isn't straigthforward, it needlessly insults people, and it mixes irrelevant details with the key ideas.

And yet, as with many of Eliezer's post, its key points are right.

What this post does is uncover the main epistemological mistakes made by almost everyone trying their hands at figuring out timelines. Among others, there is:

  • Taking arbitrary guesses within a set of options that you don't have enough evidence to separate
  • Piling on arbitrary assumption on arbitraty assumption, leading to completely uninforma
... (read more)

I liked this post, but I don't think it belongs in the review.  It's very long, it needs Zoe's also-very-long post for context, and almost everything you'll learn is about Leverage specifically, with few generalizable insights.  There are some exceptions ("What to do when society is wrong about something?" would work as a standalone post, for example), but they're mostly just interesting questions without any work toward a solution.  I think the relatively weak engagement that it got, relative to its length and quality, reflects that: Less W... (read more)

Self-Review

If you read this post, and wanted to put any of it into practice, I'd love to hear how it went! Whether you tried things and it failed, tried things and it worked, or never got round to trying anything at all. It's hard to reflect on a self-help post without data on how much it helped!

Personal reflections: I overall think this is pretty solid advice, and am very happy I wrote this post! I wrote this a year and a half ago, about an experiment I ran 4 years ago, and given all that, this holds up pretty well. I've refined my approach a fair bit, b... (read more)

Elephant seal is a picture of an elephant seal. It has a mysterious Mona Lisa smile that I can't pin down, that shows glee, intent, focus, forward-looking-ness, and satisfaction. It's fat and funny-looking. It looks very happy lying on the sand. I give this post a +4.

(This review is taken from my post Ben Pace's Controversial Picks for the 2020 Review.)

Introduction to Cartesian Frames is a piece that also gave me a new philosophical perspective on my life. 

I don't know how to simply describe it. I don't know what even to say here. 

One thing I can say is that the post formalized the idea of having "more agency" or "less agency", in terms of "what facts about the world can I force to be true?". The more I approach the world by stating things that are going to happen, that I can't change, the more I'm boxing-in my agency over the world. The more I treat constraints as things I could fight to chang... (read more)

This post is still endorsed, it still feels like a continually fruitful line of research. A notable aspect of it is that, as time goes on, I keep finding more connections and crisper ways of viewing things which means that for many of the further linked posts about inframeasure theory, I think I could explain them from scratch better than the existing work does. One striking example is that the "Nirvana trick" stated in this intro (to encode nonstandard decision-theory problems), has transitioned from "weird hack that happens to work" to "pops straight out... (read more)

Why This Post Is Interesting

This post takes a previously-very-conceptually-difficult alignment problem, and shows that we can model this problem in a straightforward and fairly general way, just using good ol' Bayesian utility maximizers. The formalization makes the Pointers Problem mathematically legible: it's clear what the problem is, it's clear why the problem is important and hard for alignment, and that clarity is not just conceptual but mathematically precise.

Unfortunately, mathematical legibility is not the same as accessibility; the post does have... (read more)

Ajeya's timelines report is the best thing that's ever been written about AI timelines imo. Whenever people ask me for my views on timelines, I go through the following mini-flowchart:

1. Have you read Ajeya's report?

--If yes, launch into a conversation about the distribution over 2020's training compute and explain why I think the distribution should be substantially to the left, why I worry it might shift leftward faster than she projects, and why I think we should use it to forecast AI-PONR instead of TAI.

--If no, launch into a conversation about Ajey... (read more)

This post is both a huge contribution, giving a simpler and shorter explanation of a critical topic, with a far clearer context, and has been useful to point people to as an alternative to the main sequence. I wouldn't promote it as more important than the actual series, but I would suggest it as a strong alternative to including the full sequence in the 2020 Review. (Especially because I suspect that those who are very interested are likely to have read the full sequence, and most others will not even if it is included.)

I had not read this post until just now. I think it is pretty great.

I also already had a vague belief that I should consume more timeless content. But, now I suddenly have a gearsy model that makes it a lot more intuitive why I might want to consume more timeless content. I also have a schema of how to think about "what is valuable to me?"

I bounced off this post the first couple times because, well, it opens with math and math makes my eyes glaze over and maybe it shouldn't cause that but it is what it is. I suspect it would be worth rewriting this post (or writing an alternate version), that puts the entire model in verbal-english front and center.

Looking back, this all seems mostly correct, but missing a couple, assumed steps. 

 I've talked to one person since about their mild anxiety talking to certain types of people; I found two additional steps that helped them.

  1. Actually trying to become better
  2. Understanding that their reaction is appropriate for some situations (like the original trauma), but it's overgeneralized to actually safe situations.

These steps are assumed in this post because, in my case, it's obvious I'm overreacting (there's no drone) and I understand PTSD is common and treat... (read more)

This is an excellent post, with a valuable and well-presented message. This review is going to push back a bit, talk about some ways that the post falls short, with the understanding that it's still a great post.

There's this video of a toddler throwing a tantrum. Whenever the mother (holding the camera) is visible, the child rolls on the floor and loudly cries. But when the mother walks out of sight, the toddler soon stops crying, gets up, and goes in search of the mother. Once the toddler sees the mother again, it's back to rolling on the floor crying.

A k... (read more)

For the Review, I'm experimenting with using the predictions feature to poll users for their opinions about claims made in posts. 

The first two cites Scott almost verbatim, but for the third I tried to specify further. 

Feel free to add your predictions above, and let me know if you have any questions about the experienc... (read more)

As mentioned in my comment, this book review overcame some skepticism from me and explained a new mental model about how inner conflict works. Plus, it was written with Kaj's usual clarity and humility. Recommended.

This review is more broadly of the first several posts of the sequence, and discusses the entire sequence. 

Epistemic Status: The thesis of this review feels highly unoriginal, but I can't find where anyone else discusses it. I'm also very worried about proving too much. At minimum, I think this is an interesting exploration of some abstract ideas. Considering posting as a top-level post. I DO NOT ENDORSE THE POSITION IMPLIED BY THIS REVIEW (that leaving immoral mazes is bad), AND AM FAIRLY SURE I'M INCORRECT.

The rough thesis of "Meditations on Moloch"... (read more)

Biorisk - well wouldn't it be nice if we'd all been familiar with the main principles of biorisk before 2020? i certainly regretted sticking my head in the sand.

> If concerned, intelligent people cannot articulate their reasons for censorship, cannot coordinate around principles of information management, then that itself is a cause for concern. Discussions may simply move to unregulated forums, and dangerous ideas will propagate through well intentioned ignorance.

Well. It certainly sounds prescient in hindsight, doesn't it?

Infohazards in particular cro... (read more)

One year later, I remain excited about this post, from its ideas, to its formalisms, to its implications. I think it helps us formally understand part of the difficulty of the alignment problem. This formalization of power and the Attainable Utility Landscape have together given me a novel frame for understanding alignment and corrigibility.

Since last December, I’ve spent several hundred hours expanding the formal results and rewriting the paper; I’ve generalized the theorems, added rigor, and taken great pains to spell out what the theorems do and do not ... (read more)

This sort of thing is exactly what Less Wrong is supposed to produce. It's a simple, straightforward and generally correct argument, with important consequences for the world, which other people mostly aren't making. That LW can produce posts like this—especially with positive reception and useful discussion—is a vindication of this community's style of thought.

“The Tails Coming Apart as a Metaphor for Life” should be retitled “The Tails Coming Apart as a Metaphor for Earth since 1800.” Scott does three things, 1) he notices that happiness research is framing dependent, 2) he notices that happiness is a human level term, but not specific at the extremes, 3) he considers how this relates to deep seated divergences in moral intuitions becoming ever more apparent in our world.

He hints at why moral divergence occurs with his examples. His extreme case of hedonic utilitarianism, converting... (read more)

Most people who commented on this post seemed to recognise it from their experience and get a general idea of what the different cultures look like (although some people differ on the details, see later). This is partly because it is explained well but also because I think the names were chosen well.

Here are a few people saying that they have used/referenced it: 1, 2, 3 plus me.

From a LW standpoint thinking about this framing helps me to not be offended by blunt comments. My family was very combat culture but in life in general I find people are unwilling ... (read more)

I'm generally in favor of public praise and private criticism, but this post really rubbed me the wrong way. To me it reads as a group of neurotic people getting together to try to get out of neuroticism by being even more neurotic at each other. Or, that in a quest to avoid interacting with the layer of intentions, let's go arbitrarily deep on the recursion stack at the algorithmic/strategy layer of understanding.

Also really bothered by calling a series of reactions spread over time levels of meta. Actually going meta would be paying attention to the structure of the back and forth rather than the individual steps in the back and forth.

Epistemics: Yes, it is sound. Not because of claims (they seem more like opinions to me), but because it is appropriately charitable to those that disagree with Paul, and tries hard to open up avenues of mutual understanding.

Valuable: Yes. It provides new third paradigms that bring clarity to people with different views. Very creative, good suggestions.

Should it be in the Best list?: No. It is from the middle of a conversation, and would be difficult to understand if you haven't read a lot about the 'Foom debate'.

Improved: The same concepts... (read more)

Summary

  • public discourse of politics is too focused on meta and not enough focused on object level
  • the downsides are primarily in insufficient exploration of possibility space

Definitions

  • "politics" is topics related to government, especially candidates for elected positions, and policy proposals
  • opposite of meta is object level - specific policies, or specific impacts of specific actions, etc
  • "meta" is focused on intangibles that are an abstraction away from some object-level feature, X, e.g. someones beliefs about X, or incentives around X, or media coverage v
... (read more)
  • Paul's post on takeoff speed had long been IMO the last major public step in the dialogue on this subject (not forgetting to honorably mention