The 2024 LessWrong Review

RobertM

We have a ritual around these parts.

Every year, we have ourselves a little argument about the annual LessWrong Review, and whether it's a good use of our time or not.

Every year, we decide it passes the cost-benefit analysis^[1].

Oh, also, every^[2] year, you do the following:

Spend 2 weeks nominating the best posts that are at least one year old,
Spend 4 weeks reviewing and discussing the nominated posts,
Spend 3 weeks casting your final votes, to decide which posts end up in the "Best of LessWrong 20xx" collection for that year.

Maybe you can tell that I'm one of the more skeptical members of the team, when it comes to the Review.

Nonetheless, I think the Review is probably worth your time, even (or maybe especially) if your time is otherwise highly valuable. I will explain why I think this, then I will tell you which stretch of ditch you're responsible for digging this year.

Are we full of bullshit?

Every serious field of inquiry has some mechanism(s) by which it discourages its participants from huffing their own farts. Fields which have fewer of these mechanisms tend to be correspondingly less attached to reality. The best fields are those where formal validation is possible (math) or where you can get consistent, easily-replicable experiment results which cleanly refute large swathes of hypothesis-space (much but not all of physics). The worst fields are those where there is no ground truth, or where the "ground truth" is a pointer to a rapidly changing^[3] social reality.

In this respect, LessWrong is playing on hard mode. Most of the intellectual inquiry that "we" (broadly construed) are conducting is not the kind where you can trivially run experiments and get really huge odds ratios to update on based on the results. In most of the cases where we can relatively easily run replicable experiments, like all the ML stuff, it's not clear how much evidence any of that is providing with respect to the underlying questions that are motivating that research (how/when/if/why AI is going to kill everyone).

We need some mechanism by which we look at the posts we were so excited about when they were first published, and check whether they still make any sense now that the NRE^[4] has worn off. This is doubly-important if those posts have spread their memes far and wide - if those memes turned out to be wrong, we should try to figure out whether there were any mistakes that could have been caught at the time, with heuristics or reasoning procedures that wouldn't also throw out all true and useful updates too (and maybe attempt to propagate corrections, though that can be pretty hopeless).

Is there gold in them thar hills?

Separate from the question of whether we're unwittingly committing epistemic crimes and stuffing everyone's heads full of misinformation, is the question of whether all of the blood, sweat, tears, and doomscrolling is producing anything of positive value.

I wish we could point to the slightly unusual number of people who went from reading and writing on LessWrong to getting very rich as proof positive that there's something good here. But I fear those dwarves are digging too deep...

Nano Banana Pro: Viewed from behind: a dwarf digging his way through a mine shaft. The wall he's digging at is studded with lightly glittering gemstones. On the right hand side of the image, viewed from the front: a balrog wreathed in flames, standing in a stone cavern on the opposite side of that wall. Aquarelle.

So we must turn to somewhat less legible, but hopefully also less cursed, evidence. I've found it interesting to consider questions like:

Were there any posts that gave you useful new abstractions or mental handles?
Did any of them make any interesting predictions which have since been born out?
Was there a post that upended your life plans?
Is there a topic or view that felt difficult or impossible to talk about, until a specific post was published?
How many of them raised the collective sanity waterline? (Don't ask what they were putting in the water.)

Imagine that we've struck the motherlode and the answers to some of those questions are "yes". The Review is a chance to form a more holistic, common-knowledge understand of you and other people in your intellectual sphere are relating to these questions. It'd be a little sad to go around with some random mental construction in your head, constantly using it to understand and relate to the world, assuming that everyone else also had the same gadget, and to later learn that you were the only one. By the law of the excluded middle, that gadget is either good, in which case you need to make sure that everyone else also installs it into their heads, or it's bad, which means you should get rid of it ASAP. No other options exist!

If your time and attention is valuable, and you spend a lot of it on LessWrong, it's even more important for you to make sure that it's being well-spent. And so...

The Ask

Similat to last year, actually. Quoting Ray:

If you're the sort of longterm member whose judgment would be valuable, but, because you're a smart person with good judgement, you're busy... here is what I ask:
First, do some minimal actions to contribute your share of judgment for "what were the most important, timeless posts of 2023?". Then, in proportion to how valuable it seems, spend some time reflecting on bigger picture questions on how LessWrong is doing.

The concrete, minimal Civic Duty actions
It's pretty costly to declare something "civic duty". The LessWrong team gets to do it basically in proportion to how much people trust us and believe in our visions.
Here's what I'm asking of people, to get your metaphorical^[5] "I voted and helped the Group Reflection Process" sticker:
Phase I:
Nomination Voting
2 weeks
We identify posts especially worthy of consideration in the review, by casting preliminary votes. Posts with 2 positive votes move into the Discussion Phase.
Asks: Spend ~30 minutes looking at the Nominate Posts page and vote on ones that seem important to you.
Write 2 short reviews^[6] explaining why posts were valuable.
Phase II:
Discussion
4 weeks
We review and debate posts. Posts that receive at least 1 written review move to the final voting phase.
Ask: Write 3 informational reviews^[7] that aim to convey new/non-obvious information, to help inform voters. Summarize that info in the first sentence.
Phase III:
Final Voting
2 weeks
We do a full voting pass, using quadratic voting. The outcome determines the Best of LessWrong results.
Ask: Cast a final vote on at least 6 posts.
Note: Anyone can write reviews. You're eligible to vote if your account was created before January 1st of 2023. More details in the Nuts and Bolts section.
Bigger Picture
I'd suggest spending at least a little time this month (more if it feels like it's organically paying for itself), reflecting on...
...the big picture of what intellectual progress seems important to you. Do it whatever way is most valuable to you. But, do it publicly, this month, such that it helps encourage other people to do so as well. And ideally, do it with some degree of "looking back" – either of your own past work and how your views have changed, or how the overall intellectual landscape has changed.
...how you wish incentives were different on LessWrong. Write up your thoughts on this post. (I suggest including both "what the impossible ideal" would be, as well as some practical ideas for how to improve them on current margins)
...how the LessWrong and X-risk communities could make some group epistemic progress on the longstanding questions that have been most controversial. (We won't resolve the big questions firmly, and I don't want to just rehash old arguments. But, I believe we can make some chunks of incremental progress each year, and the Review is a good time to do so)
In a future post, I'll share more models about why these are valuable, and suggestions on how to go about it.

Except, uh, s/2023/2024. This year, you'll be nominating posts from 2024!

How To Dig

Copied verbatim from last year's announcement post.

Instructions Here

Nuts and Bolts: How does the review work?

Phase 1: Preliminary Voting

To nominate a post, cast a preliminary vote for it. Eligible voters will see this UI:

If you think a post was an important intellectual contribution, you can cast a vote indicating roughly how important it was. For some rough guidance:

A vote of 1 means “it was good.”
A vote of 4 means “it was quite important”.
A vote of 9 means it was "a crucial piece of intellectual progress."

Votes cost quadratic points – a vote strength of "1" costs 1 point. A vote of strength 4 costs 10 points. A vote of strength 9 costs 45. If you spend more than 500 points, your votes will be scaled down proportionately.

Use the Nominate Posts page to find posts to vote on.

Posts that get at least one positive vote go to the Voting Dashboard, where other users can vote on it. You’re encouraged to give at least a rough vote based on what you remember from last year. It's okay (encouraged!) to change your mind later.

Posts with at least 2 positive votes will move on to the Discussion Phase.

Writing a short review

If you feel a post was important, you’re also encouraged to write up at least a short review of it saying what stands out about the post and why it matters. (You’re welcome to write multiple reviews of a post, if you want to start by jotting down your quick impressions, and later review it in more detail)

Posts with at least one review get sorted to the top of the list of posts to vote on, so if you'd like a post to get more attention it's helpful to review it.

Why preliminary voting? Why two voting phases?

Each year, more posts get written on LessWrong. The first Review of 2018 considered 1,500 posts. In 2021, there were 4,250. Processing that many posts is a lot of work.

Preliminary voting is designed to help handle the increased number of posts. Instead of simply nominating posts, we start directly with a vote. Those preliminary votes will then be published, and only posts that at least two people voted on go to the next round.

In the review phase this allows individual site members to notice if something seems particularly inaccurate in its placing. If you think a post was inaccurately ranked low, you can write a positive review arguing it should be higher, which other people can take into account for the final vote. Posts which received lots of middling votes can get deprioritized in the review phase, allowing us to focus on the conversations that are most likely to matter for the final result.

Phase 2: Discussion

The second phase is a month long, and focuses entirely on writing reviews. Reviews are special comments that evaluate a post. Good questions to answer in a review include:

What does this post add to the conversation?
How did this post affect you, your thinking, and your actions?
Does it make accurate claims? Does it carve reality at the joints? How do you know?
Is there a subclaim of this post that you can test?
What followup work would you like to see building on this post?

In the discussion phase, aim for reviews that somehow give a voter more information. It's not that useful to say "this post is great/overrated." It's more useful to say "I link people to this post a lot" or "this post seemed to cause a lot of misunderstandings."

But it's even more useful to say "I've linked this to ~7 people and it helped them understand X", or "This post helped me understand Y, which changed my plans in Z fashion" or "this post seems to cause specific misunderstanding W."

Phase 3: Final Voting

Posts that receive at least one review move on the Final Voting Phase.

The UI will require voters to at least briefly skim reviews before finalizing their vote for each post, so arguments about each post can be considered.

As in previous years, we'll publish the voting results for users with 1000+ karma, as well as all users. The LessWrong moderation team will take the voting results as a strong indicator of which posts to include in the Best of 2024, although we reserve some right to make editorial judgments.

Your mind is your lantern. Your keyboard, your shovel. Go forth and dig!

^{^}
Or at least get tired enough of arguing about it that sheer momentum forces our hands.
^{^}
Historical procedures have varied. This year is the same as last year.
^{^}
And sometimes anti-inductive!
^{^}
New relationship energy.
^{^}
Ray: "Maybe also literal but I haven't done the UI design yet."
^{^}
Ray: "In previous years, we had a distinction between "nomination" comments and "review" comments. I streamlined them into a single type for the 2020 Review, although I'm not sure if that was the right call. Next year I may revert to distinguishing them more."
^{^}
Ray: "These don't have to be long, but aim to either a) highlight pieces within the post you think a cursory voter would most benefit from being reminded of, b) note the specific ways it has helped you, c) share things you've learned since writing the post, or d) note your biggest disagreement with the post."

Are we full of bullshit?

If we wish to really spur the destruction of bullshit, perhaps there should be an anti-review: A selection process aimed at posts that received many upvotes, seem widely loved, but in retrospect were either false or so confused as to be as bad or worse than being false. The worst of LW, rather than the best; the things that seemed most shiny and were most useless.

I note that for purposes of evaluating whether we are full of bullshit, the current review process will very likely fail because of how it is constructed; it isn't an attempt to falsify, it's making the wrong move on the Wason Selection Task. While such a negative process might do the opposite.

(Of course, the questionable social dynamics around this would be even worse)

Huh, I feel like it's pretty good for that purpose? If you want a list of posts that were popular but not endorsed, just take the difference between the highly upvoted posts and the review results.

The only requirement for a posts to enter the review phase is that anyone thinks it still has anything going for it. As such, if a post is obviously a fad, it won't end up thoroughly reviewed, but if it's really that common-knowledge that it was a fad, that seems fine. And even then, we still occasionally have people nominate posts for review just because they think they are bad and want people to do a retrospective on them.

One thing I want to remind people: if something looks like it's going to end up winning the review, and you disagree with it, if you write up a critical review that gets upvoted (10+ karma), it'll show up whenever we spotlight the review. This may not be fully satisfying if you were really hoping to change everyone's mind, but it does mean you can at least make sure our infrastructure makes sure everyone knows about your disagreement.

(I recommend optimizing your first sentence to convey the most important argument of your disagreement, so the one-line version of the comment gets the core idea across)

For example, AI Control was one of the leading candidates from the last review, but, John's countertake is highlighted for people who are skimming through the /bestoflesswrong page.

Here's a feature proposal.

The problem: At present, when a post has 0 reviews, there is an incentive against writing critical reviews. Writing such a review enables the post to enter the voting phase, which you don't especially want to happen if you think the post is undeserving. This seems perverse: critical reviews are valuable, especially so if someone would write a positive review later, enabling the post to enter voting anyway. (In principle, you can "lie in ambush" until someone writes a positive review and only then write your negative review, but that requires annoying logistics.)

My suggestion: Allow flagging reviews as "critical" in the UI. (One option is to consider a review "critical" whenever your own vote for the post is negative, another is to have a separate checkbox.) Such reviews would not count for enabling the post to enter voting.

Mmm.

Somewhat related problem: a lot of the impact of writing a review is that it bumps the post into awareness on the frontpage which makes it more likely for people who liked it to see it and vote positively on it (whether this is good or bad from the perspective of a critical reviewer depends on whether you think you're writing a takedown of something popular, or, just clarifying why something-isn't-good that there was already rough mutual agreement wasn't that good). I don't know that that problem needs "solving" but wanted to acknowledge it and see if anyone had thoughts.

How does crossposting something to nominate work? I tried with Thresholding and the system is tracking its date as the date I crossposted, not the date of the original. Reasonable but not great for my purposes. Is there something I'm supposed to do?

Just to check, did you use the "Submit Linkposts" functionality on the nomination page for that, or did you crosspost it some other way?

ETA: Ok, looks like the library responsible for extracting external article data/metadata didn't successfully extract the date the article was published. I've manually set it to the correct date.

Confirming, I used the Submit Linkpost tab then Import Post

Why do we have 500 points to vote? It seems way too much and most people wouldn't be able to spend all of them when voting normally. My intuition says this isn't a good thing.

Biggest concern I have is that only very active users are likely to be able to spend all the points, that means they essentially get more voting power. I haven't voting on all the posts I read, but I probably will have to strategically bump some 1s to 4s and some 4s to 9s to spend all my points. The average user probably won't bother doing this.

Why are posts that have not gotten one review during the discussion phase eligible for final voting? For example, https://www.lesswrong.com/posts/8ZR3xsWb6TdvmL8kx/optimistic-assumptions-longterm-planning-and-cope

Are we full of bullshit?

(Of course, the questionable social dynamics around this would be even worse)

Huh, I feel like it's pretty good for that purpose? If you want a list of posts that were popular but not endorsed, just take the difference between the highly upvoted posts and the review results.

(I recommend optimizing your first sentence to convey the most important argument of your disagreement, so the one-line version of the comment gets the core idea across)

For example, AI Control was one of the leading candidates from the last review, but, John's countertake is highlighted for people who are skimming through the /bestoflesswrong page.

Here's a feature proposal.

Mmm.

Just to check, did you use the "Submit Linkposts" functionality on the nomination page for that, or did you crosspost it some other way?

ETA: Ok, looks like the library responsible for extracting external article data/metadata didn't successfully extract the date the article was published. I've manually set it to the correct date.

Confirming, I used the Submit Linkpost tab then Import Post

Why do we have 500 points to vote? It seems way too much and most people wouldn't be able to spend all of them when voting normally. My intuition says this isn't a good thing.

LESSWRONG
LW

LESSWRONG
LW

63

The 2024 LessWrong Review

63

Are we full of bullshit?

Is there gold in them thar hills?

The Ask

The concrete, minimal Civic Duty actions

Bigger Picture

How To Dig

Nuts and Bolts: How does the review work?

Phase 1: Preliminary Voting

Phase 2: Discussion

Phase 3: Final Voting

63

63