LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Load More

Popular Comments

Recent Discussion

Moloch Hasn’t Won
Best of LessWrong 2019

Scott Alexander's "Meditations on Moloch" paints a gloomy picture of the world being inevitably consumed by destructive forces of competition and optimization. But Zvi argues this isn't actually how the world works - we've managed to resist and overcome these forces throughout history. 

by Zvi
470Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
74
13fiddler
This review is more broadly of the first several posts of the sequence, and discusses the entire sequence.  Epistemic Status: The thesis of this review feels highly unoriginal, but I can't find where anyone else discusses it. I'm also very worried about proving too much. At minimum, I think this is an interesting exploration of some abstract ideas. Considering posting as a top-level post. I DO NOT ENDORSE THE POSITION IMPLIED BY THIS REVIEW (that leaving immoral mazes is bad), AND AM FAIRLY SURE I'M INCORRECT. The rough thesis of "Meditations on Moloch" is that unregulated perfect competition will inevitably maximize for success-survival, eventually destroying all value in service of this greater goal. Zvi (correctly) points out that this does not happen in the real world, suggesting that something is at least partially incorrect about the above mode, and/or the applicability thereof. Zvi then suggests that a two-pronged reason can explain this: 1. most competition is imperfect, and 2. most of the actual cases in which we see an excess of Moloch occur when there are strong social or signaling pressures to give up slack.  In this essay, I posit an alternative explanation as to how an environment with high levels of perfect competition can prevent the destruction of all value, and further, why the immoral mazes discussed later on in this sequence are an example of highly imperfect competition that causes the Molochian nature thereof.  First, a brief digression on perfect competition: perfect competition assumes perfectly rational agents. Because all strategies discussed are continuous-time, the decisions made in any individual moment are relatively unimportant assuming that strategies do not change wildly from moment to moment, meaning that the majority of these situations can be modeled as perfect-information situations.  Second, the majority of value-destroying optimization issues in a perfect-competition environment can be presented as prisoners dilemmas: both
AGI Forum @ Purdue University
Tue Jul 1•West Lafayette
Lighthaven Sequences Reading Group #40 (Tuesday 7/1)
Wed Jul 2•Berkeley
Sam Marks7h253
1
The "uncensored" Perplexity-R1-1776 becomes censored again after quantizing Perplexity-R1-1776 is an "uncensored" fine-tune of R1, in the sense that Perplexity trained it not to refuse discussion of topics that are politically sensitive in China. However, Rager et al. (2025)[1] documents (see section 4.4) that after quantizing, Perplexity-R1-1776 again censors its responses: I found this pretty surprising. I think a reasonable guess for what's going on here is that Perplexity-R1-1776 was finetuned in bf16, but the mechanism that it learned for non-refusal was brittle enough that numerical error from quantization broke it. One takeaway from this is that if you're doing empirical ML research, you should consider matching quantization settings between fine-tuning and evaluation. E.g. quantization differences might explain weird results where a model's behavior when evaluated differs from what you'd expect based on how it was fine-tuned. 1. ^ I'm not sure if Rager et al. (2025) was the first source to publicly document this, but I couldn't immediately find an earlier one.
Mikhail Samin12h26-4
5
i made a thing! it is a chatbot with 200k tokens of context about AI safety. it is surprisingly good- better than you expect current LLMs to be- at answering questions and counterarguments about AI safety. A third of its dialogues contain genuinely great and valid arguments. You can try the chatbot at https://whycare.aisgf.us (ignore the interface; it hasn't been optimized yet). Please ask it some hard questions! Especially if you're not convinced of AI x-risk yourself, or can repeat the kinds of questions others ask you. Send feedback to ms@contact.ms. A couple of examples of conversations with users:
leogao1d630
7
random brainstorming ideas for things the ideal sane discourse encouraging social media platform would have: * have an LM look at the comment you're writing and real time give feedback on things like "are you sure you want to say that? people will interpret that as an attack and become more defensive, so your point will not be heard". addendum: if it notices you're really fuming and flame warring, literally gray out the text box for 2 minutes with a message like "take a deep breath. go for a walk. yelling never changes minds" * have some threaded chat component bolted on (I have takes on best threading system). big problem is posts are fundamentally too high effort to be a way to think; people want to talk over chat (see success of discord). dialogues were ok but still too high effort and nobody wants to read the transcript. one stupid idea is have an LM look at the transcript and gently nudge people to write things up if the convo is interesting and to have UI affordances to make it low friction (eg a single button that instantly creates a new post and automatically invites everyone from the convo to edit, and auto populates the headers) * inspired by the court system, the most autistically rule following part of the US government: have explicit trusted judges who can be summoned to adjudicate claims or meta level "is this valid arguing" claims. top level judges are selected for fixed terms by a weighted sortition scheme that uses some game theoretic / schelling point stuff to discourage partisanship * recommendation system where you can say what kind of stuff you want to be recommended in some text box in the settings. also when people click "good/bad rec" buttons on the home page, try to notice patterns and occasionally ask the user whether a specific noticed pattern is correct and ask whether they want it appended to their rec preferences * opt in anti scrolling pop up that asks you every few days what the highest value interaction you had recently on the
johnswentworth2dΩ411201
26
I was a relatively late adopter of the smartphone. I was still using a flip phone until around 2015 or 2016 ish. From 2013 to early 2015, I worked as a data scientist at a startup whose product was a mobile social media app; my determination to avoid smartphones became somewhat of a joke there. Even back then, developers talked about UI design for smartphones in terms of attention. Like, the core "advantages" of the smartphone were the "ability to present timely information" (i.e. interrupt/distract you) and always being on hand. Also it was small, so anything too complicated to fit in like three words and one icon was not going to fly. ... and, like, man, that sure did not make me want to buy a smartphone. Even today, I view my phone as a demon which will try to suck away my attention if I let my guard down. I have zero social media apps on there, and no app ever gets push notif permissions when not open except vanilla phone calls and SMS. People would sometimes say something like "John, you should really get a smartphone, you'll fall behind without one" and my gut response was roughly "No, I'm staying in place, and the rest of you are moving backwards". And in hindsight, boy howdy do I endorse that attitude! Past John's gut was right on the money with that one. I notice that I have an extremely similar gut feeling about LLMs today. Like, when I look at the people who are relatively early adopters, making relatively heavy use of LLMs... I do not feel like I'll fall behind if I don't leverage them more. I feel like the people using them a lot are mostly moving backwards, and I'm staying in place.
Raemon6h116
1
TAP for fighting LLM-induced brain atrophy: "send LLM query" ---> "open up a thinking doc and think on purpose." What a thinking doc looks varies by person. Also, if you are sufficiently good at thinking, just "think on purpose" is maybe fine, but, I recommend having a clear sense of what it means to think on purpose and whether you are actually doing it. I think having a doc is useful because it's easier to establish a context switch that is supportive of thinking. For me, "think on purpose" means: * ask myself what my goals are right now (try to notice at least 3) * ask myself what would be the best think to do next (try for at least 3 ideas) * flowing downhill from there is fine
Load More (5/38)
AI Safety Thursdays: Are LLMs aware of their learned behaviors?
Thu Jul 10•Toronto
LessWrong Community Weekend 2025
Fri Aug 29•Berlin
410A case for courage, when speaking of AI danger
So8res
5d
44
84Authors Have a Responsibility to Communicate Clearly
TurnTrout
10h
16
342A deep critique of AI 2027’s bad timeline models
titotal
13d
39
469What We Learned from Briefing 70+ Lawmakers on the Threat from AI
leticiagarcia
1mo
15
340the void
Ω
nostalgebraist
21d
Ω
98
534Orienting Toward Wizard Power
johnswentworth
1mo
142
660AI 2027: What Superintelligence Looks Like
Ω
Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, romeo
3mo
Ω
222
206Foom & Doom 1: “Brain in a box in a basement”
Ω
Steven Byrnes
8d
Ω
78
87What We Learned Trying to Diff Base and Chat Models (And Why It Matters)
Ω
Clément Dumas, Julian Minder, Neel Nanda
1d
Ω
0
286Beware General Claims about “Generalizable Reasoning Capabilities” (of Modern AI Systems)
Ω
LawrenceC
20d
Ω
19
73The best simple argument for Pausing AI?
Gary Marcus
1d
10
159My pitch for the AI Village
Daniel Kokotajlo
7d
29
418Accountability Sinks
Martin Sustrik
2mo
57
Load MoreAdvanced Sorting/Filtering
The Best Tacit Knowledge Videos on Every Subject
437
Parker Conley, Parker Conley
1y

TL;DR

is extremely valuable. Unfortunately, developing tacit knowledge is usually bottlenecked by apprentice-master relationships. Tacit Knowledge Videos could widen this bottleneck. This post is a Schelling point for aggregating these videos—aiming to be The Best Textbooks on Every Subject for Tacit Knowledge Videos. Scroll down to the list if that's what you're here for. Post videos that highlight tacit knowledge in the comments and I’ll add them to the post. Experts in the videos include Stephen Wolfram, Holden Karnofsky, Andy Matuschak, Jonathan Blow, Tyler Cowen, George Hotz, and others. 

What are Tacit Knowledge Videos?

Samo Burja claims YouTube has opened the gates for a revolution in tacit knowledge transfer. Burja defines tacit knowledge as follows:

Tacit knowledge is knowledge that can’t properly be transmitted via verbal or written instruction, like the ability to create

...
(Continue Reading – 6195 more words)
Parker Conley1m10

Any chance you could unpin this comment? Seems like the idea of people suggesting videos based on it didn't work, and having the updates be the first pinned comment would probably provide more value to people looking at the post.

Reply
Raemon's Shortform
Raemon
Ω 08y

This is an experiment in short-form content on LW2.0. I'll be using the comment section of this post as a repository of short, sometimes-half-baked posts that either:

  1. don't feel ready to be written up as a full post
  2. I think the process of writing them up might make them worse (i.e. longer than they need to be)

I ask people not to create top-level comments here, but feel free to reply to comments like you would a FB post.

11Raemon6h
TAP for fighting LLM-induced brain atrophy: "send LLM query" ---> "open up a thinking doc and think on purpose." What a thinking doc looks varies by person. Also, if you are sufficiently good at thinking, just "think on purpose" is maybe fine, but, I recommend having a clear sense of what it means to think on purpose and whether you are actually doing it. I think having a doc is useful because it's easier to establish a context switch that is supportive of thinking. For me, "think on purpose" means: * ask myself what my goals are right now (try to notice at least 3) * ask myself what would be the best think to do next (try for at least 3 ideas) * flowing downhill from there is fine
Thane Ruthenis3m20

Whenever I send an LLM some query I expect to be able to answer myself (instead of requesting a primer on some unknown-to-me subject), I usually try to figure out how to solve it myself, either before reading the response, or before sending the query at all. I. e., I treat the LLM's take as a second opinion.

This isn't a strategy against brain atrophy, though: it's because (1) I often expect to be disappointed by the LLM's answer, meaning I'll end up needing to solve the problem myself anyway, so might as well get started on that, (2) I'm wary of the LLM co... (read more)

Reply
Problematic Professors
12
Eggs
1d

Don't judge a principle by its professors—look to its practitioners.[1]


"Professor" is an interesting word. At one point in my professional life, I had the opportunity to teach college classes. I often corrected students who called me "Professor Eggs," telling them I was just "Mr. Eggs." "Professor" was a title and a high status I hadn’t earned. But at the same time, the root "profess" often carries the opposite implication. To profess means to declare, sometimes loudly, sometimes without credibility. A profess-er, in this light, sounds less like a scholar and more like a huckster.

An intriguing contrast might be the word "practitioner." Connotatively, it sounds humble, even lowly, the opposite of the high-minded professor. But denotatively, it's closer to the true opposite of a profess-er: someone who applies...

(See More – 391 more words)
tailcalled15m20

Issue with judging the practitioners is that practicing it may be correlated with other things that are much more harmful. Like all the talk about how single parenthood is supposedly bad for you, but then it doesn't hold up to more careful scrutiny afaik.

Reply
Degamification
23
Nate Showell
2y

refers to the tendency that when someone sets a performance metric for a goal, the metric itself becomes a target of optimization, often at the expense of the goal it's supposed to measure. Some metrics are subject to imperfectly-aligned incentives in ways that are easy to identify, such as when students optimize for getting high grades rather than understanding the course material. But in other scenarios, metrics fail in less obvious ways. For example, someone might limit himself to one drink per night, but still end up drinking too much because he drinks every night and overestimates how much alcohol counts as "one drink." There's no custom-made giant wineglass staring you in the face, but the metric is still failing to fulfill its intended purpose.

 

The...

(See More – 493 more words)
Said Achmiz16m20

GreaterWrong also has an anti-kibitzer feature.

Reply
Why Engaging with Global Majority AI Policy Matters
2
Heramb
24m

Over the past 6-8 months, I have been involved in drafting AI policy recommendations and official statements directed at governments and institutions across the Global Majority: Chile, Lesotho, Malaysia, the African Commission on Human and Peoples' Rights (ACHPR), Israel, and others. At first glance, this may appear to be a less impactful use of time compared to influencing more powerful jurisdictions like the United States or the European Union. But I argue that engaging with the Global Majority is essential, neglected, and potentially pivotal in shaping a globally safe AI future. Below, I outline four core reasons.

1. National-Level Safeguards Are Essential in a Fracturing World

As global alignment becomes harder, we need decentralized, national-level safety nets. Some things to keep in mind:

  • What if the EU AI Act is
...
(See More – 549 more words)
Sam Marks's Shortform
Sam Marks
Ω 03y
25Sam Marks7h
The "uncensored" Perplexity-R1-1776 becomes censored again after quantizing Perplexity-R1-1776 is an "uncensored" fine-tune of R1, in the sense that Perplexity trained it not to refuse discussion of topics that are politically sensitive in China. However, Rager et al. (2025)[1] documents (see section 4.4) that after quantizing, Perplexity-R1-1776 again censors its responses: I found this pretty surprising. I think a reasonable guess for what's going on here is that Perplexity-R1-1776 was finetuned in bf16, but the mechanism that it learned for non-refusal was brittle enough that numerical error from quantization broke it. One takeaway from this is that if you're doing empirical ML research, you should consider matching quantization settings between fine-tuning and evaluation. E.g. quantization differences might explain weird results where a model's behavior when evaluated differs from what you'd expect based on how it was fine-tuned. 1. ^ I'm not sure if Rager et al. (2025) was the first source to publicly document this, but I couldn't immediately find an earlier one.
Sam Marks28m20

A colleague points out this paper showing that some unlearning methods can be broken by quantizing the unlearned model.

Reply
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
GOOGLEGITHUB
AI Moratorium Stripped From BBB
59
Zvi
7h

The insane attempted AI moratorium has been stripped from the BBB. That doesn’t mean they won’t try again, but we are good for now. We should use this victory as an opportunity to learn. Here’s what happened.

What Happened

Senator Ted Cruz and others attempted to push hard for a 10-year moratorium on enforcement of all AI-specific regulations at the state and local level, and attempted to ram this into the giant BBB despite it being obviously not about the budget.

This was an extremely aggressive move, which most did not expect to survive the Byrd amendment, likely as a form of reconnaissance-in-force for a future attempt.

It looked for a while like it might work and get passed outright, with it even surviving the Byrd amendment, but opposition steadily grew.

We’d...

(Continue Reading – 1681 more words)
RationalElf39m10

Did this case update you to think "If you’re trying to pass a good bill, you need to state and emphasize the good reasons you want to pass that bill, and what actually matters". If so, why? The lesson I think one would naively take from this story is an update in the direction of: "if you want to pass a good bill, you should try to throw in a bunch of stuff you don't actually care about but that others do and build a giant coalition, or make disingenuous but politically expedient arguments for your good stuff, or try to make out people who oppose the bill ... (read more)

Reply
4Thane Ruthenis1h
Here's to the world staying around long enough for us to read AI #1191.
1Wbrom7h
Isn't the problem that any significant AI regulation promulgated by any state is a de facto national regulation due to the nature of the internet? I mean sure age gate AI like porn but you know it's going to be broader than that. 
6MondSemmel6h
As mentioned in the post, Congress is perfectly free to get its act together and do proper legislation, but since they don't actually want to do that*, then it's insane for them to pre-empt the states from doing it. * (E.g. the US Senate, or rather all the 100 individual Senators, could at any time abolish the modern filibuster and actually restore their ability to legislate as a co-equal branch of government, if they ever wanted to. But they don't.)
Authors Have a Responsibility to Communicate Clearly
84
TurnTrout
10h
This is a linkpost for https://turntrout.com/author-responsibility

When a claim is shown to be incorrect, defenders may say that the author was just being “sloppy” and actually meant something else entirely. I argue that this move is not harmless, charitable, or healthy. At best, this attempt at charity reduces an author’s incentive to express themselves clearly – they can clarify later![1] – while burdening the reader with finding the “right” interpretation of the author’s words. At worst, this move is a dishonest defensive tactic which shields the author with the unfalsifiable question of what the author “really” meant.

⚠️ Preemptive clarification

The context for this essay is serious, high-stakes communication: papers, technical blog posts, and tweet threads. In that context, communication is a partnership. A reader has a responsibility to engage in good faith, and an author

...
(Continue Reading – 1572 more words)
1Gavin Runeblade3h
Eschew obfuscation. -- Mark Twain
10sunwillrise5h
Without getting into the specifics of this particular example,[1] sometimes this pattern occurs when an author wants to communicate some intuitive conclusion they have reached, but that intuition is the result of interacting such a large amount of literature/research/empirics/self-reflection/personal experience etc. that they can no longer point to any single short and compelling argument for why it's true.  Mindful of norms and razors such as "what can be asserted without evidence can also be dismissed without evidence," they believe saying "I believe X because of intuition, but I can't explain why" would be dismissed by their audience,[2] so they instead try to manufacture some makeshift argument for it because it feels more epistemically virtuous[3] to come up with "proper" scientific evidence. Unfortunately, their argument is often subtly wrong or slightly off-topic, which shouldn't be surprising; after all, it's not the argument that actually caused them to believe their conclusion, but more akin to a post-hoc confabulation. See also Kaj Sotala's comment here and Ben Pace's summary of Jan Kulveit's comment here. 1. ^ Which I acknowledge at the outset doesn't actually fit into what I'm describing here 2. ^ Which isn't entirely unreasonable to do if you're such an audience member, but certainly not the optimal action to take if you have some degree of trust in the competence of the speaker's intuition 3. ^ Even though it isn't
AnthonyC1h20

I think this is an important point, especially when experts are talking to other experts about their respective fields. I once had a client call this "thinking in webs." If you have a conclusion that you reached via a bunch of weak pieces of evidence collected over a bunch of projects and conversations and things you've read all spread out over years, it might or might not be epistemically correct to add those up to a strong opinion. But, there may be literally no verbally compelling way to express the source of that certainty. If you try, you'll have forg... (read more)

Reply
4mattmacdermott5h
Not sure how to think about this overall. I can come up with examples where it seems like you should assign basically full credit for sloppy or straightforwardly wrong statements. E.g. suppose Alice claims that BIC only make black pens. Bob says, "I literally have a packet of blue BIC pens in my desk drawer. We will go to my house, open the drawer, and you will see them." They go to Bob's house, and lo, the desk drawer is empty. Turns out the pens are on the kitchen table instead. Clearly it's fine for Bob to say, "All I really meant was that I had blue pens at my house, the point stands." I think your mention of motte-and-baileys probably points at the right refinement: maybe it's fine to be sloppy if the version you later correct yourself to has the same implications as what you literally said. But if you correct yourself to something easier to defend but that doesn't support your initial conclusion to the same extent, that's bad. EDIT: another important feature of the pens example is that the statement Bob switched to is uncontroversially true. If on finding the desk drawer empty he instead wanted to switch to, "I left them at work", then probably he should pause and admit a mistake first.
Lessons from Building Secular Ritual: A Winter Solstice Experiment
2
joshuamerriam
1h

 

This is a follow-up to my earlier post about designing a Winter Solstice gathering that combined Rationalist Solstice traditions with local Māori Matariki practices. Here's what I learned from actually running the event.

TL;DR: People wanted structured conversation more than curated performance. Starting with collective acknowledgment of loss made subsequent vulnerability feel natural. Social coordination mechanics are harder than they look, but small-scale practice matters for larger coordination challenges.

What I Was Trying to Solve

Growing up in a religious family, I personally wasn't getting the meaningful aspects of seasonal gatherings which I fondly remember from my childhood. Living in New Zealand, I wanted to create something that honored both Rationalist Solstice traditions and local Matariki practices without falling into either cultural appropriation or forcing cringy fake rituals on people.

My...

(Continue Reading – 1081 more words)
Goodhart's Law
Tacit knowledge
Cole Wyeth1d6037
The best simple argument for Pausing AI?
Welcome to lesswrong! I’m glad you’ve decided to join the conversation here.  A problem with this argument is that it doesn’t prove we should pause AI, only that we should avoid deploying AI in high impact (e.g. military) applications. Insofar as LLMs can’t follow rules, the argument seems to indicate that we should continue to develop the technology until it can. Personally, I’m concerned about the type of AI system which can follow rules, but is not intrinsically motivated to follow our moral rules. Whether LLMs will reach that threshold is not clear to me (see https://www.lesswrong.com/posts/vvgND6aLjuDR6QzDF/my-model-of-what-is-going-on-with-llms) but this argument seems to cut against my actual concerns. 
habryka1d*4732
Don't Eat Honey
My guess is this is obvious, but IMO it seems extremely unlikely to me that bee-experience is remotely as important to care about as cow experience. Enough as to make statements like this just sound approximately insane:  > 97% of years of animal life brought about by industrial farming have been through the honey industry (though this doesn’t take into account other insect farming). Like, no, this isn't how this works. This obviously isn't how this works. You can't add up experience hours like this. At the very least use some kind of neuron basis. > The median estimate, from the most detailed report ever done on the intensity of pleasure and pain in animals, was that bees suffer 7% as intensely as humans. The mean estimate was around 15% as intensely as people. Bees were guessed to be more intensely conscious than salmon! If anyone remotely thinks a bee suffering is 15% (!!!!!!!!) as important as a human suffering, you do not sound like someone who has thought about this reasonably at all. It is so many orders of magnitude away from what sounds reasonable to me that I find myself wanting to look somewhere else but the arguments in things like the Rethink Priorities report (which I have read, and argued with people about for many hours, and still sound insane to me, and IMO do not hold up), but instead look towards things like there being some kind of social signaling madness where someone is trying to signal commitment to some group standard of dedication, which involves some runaway set of extreme beliefs. Edit: And to avoid a slipping of local norms here. I am only leaving this comment here now after I have seriously entertained the hypothesis that I might be wrong, that maybe there do exist good arguments for moral weights that seem crazy to from where I was originally, but no, after looking into the arguments for quite a while, they still seem crazy to me, and so now I feel comfortable moving on and trying to think about what psychological or social process produces posts like this. And still, I am hesitant about it, because many readers have probably not gone through the same journey, and I don't want a culture of dismissing things just because they are big and would imply drastic actions.
Kaj_Sotala7h2511
Authors Have a Responsibility to Communicate Clearly
It used to be that I would sometimes read something and interpret it to mean X (sometimes, even if the author expressed it sloppily). Then I would say "I think the author meant X" and get into arguments with people who thought the author meant something different. These arguments would be very frustrating, since no matter how certain I was of my interpretation, short of asking the author there was no way to determine who was right. At some point I realized that there was no reason to make claims about the author's intent. Instead of saying "I think the author meant X", I could just say "this reads to me as saying X". Now I'm only reporting on how I'm personally interpreting their words, regardless of what they might have meant. That both avoids pointless arguments about what the author really meant, and is more epistemically sensible, since in most cases I don't know that my reading of the words is what the author really intended. Of course, sometimes I might have reason to believe that I do know the author's intent. For example, if I've spent quite some time discussing X with the author directly, and have a good understanding of how they think about the topic. In those cases I might still make claims of their intent. But generally I've stopped making such claims, which has saved me from plenty of pointless arguments.
Load More
84
Proposal for making credible commitments to AIs.
Cleo Nardo
1d
33
150
X explains Z% of the variance in Y
Leon Lang
4d
23