1777

LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Load More

Popular Comments

Harry Potter and The Methods of Rationality

What if Harry was a scientist? What would you do if the universe had magic in it? 
A story that conveys many rationality concepts, helping to make them more visceral, intuitive and emotionally compelling.

Rationalist Shabbat
Fri Nov 7•Rockville
Regular Meetup (Topic: TBD)
Sat Nov 8•Meguro City
Berkeley Solstice Weekend
Fri Dec 5•Berkeley
2025 NYC Secular Solstice & East Coast Rationalist Megameetup
Fri Dec 19•New York
Raemon3d*8845
Heroic Responsibility
I think this part of Heroic Responsibility isn't too surprising/novel to people. Obviously the business owner has responsibility for the business. The part that's novel is more like: If I'm some guy working in legal, and I notice this hot potato going around, and it's explicitly not my job to deal with it, I might nonetheless say "ugh, the CEO is too busy to deal with this today and it's not anyone else's job. I will deal with it." Then you go to each department head, even if you're not even a department head you're a lowly intern (say), and say "guys, I think we need to decide who's going to deal with this." And if their ego won't let them take advice from an intern, you might also take it as your responsibility to figure out how to navigate their ego – maybe by making them feel like it was their own idea, or by threatening to escalate to the CEO if they don't get to it themselves, or by appealing to their sense of duty. A great example of this, staying with them realm of "random Bureaucracy", I got from @Elizabeth: E. D. Morel was a random bureaucrat at a shipping company in 1891. He noticed that his company was shipping guns and manacles into the Congo, and shipping rubber and other resources back out to Britain. It was not Morel's job to notice that this was a bit weird. It was not Morel's job to notice that that weirdness was a clue, and look into those clues. And then find out that what was happening was, weapons were being sent to the Congo to forcibly steal resources at gunpoint. It was not his job to make it his mission to raise awareness of the Congo abuses and stop them. But he did. ... P.S. A failure mode of rationalists is to try to take Heroic responsibility for everything, esp. in a sort of angsty way that is counterproductive and exhausting. It's also a failure mode to act as if only you can possibly take Heroic responsibility, rather than trying to model the ecosystem around you and the other actors (some of whom might be Live Players who are also taking Heroic Responsibility, some of whom might be sort of local actors following normal incentives but are still, like, part of the solution) There is nuance to when and how to do Heroic Responsibility well.
niplav1d*360
People Seem Funny In The Head About Subtle Signals
Hm, I am unsure how much to believe this, even though my intuitions go the same way as yours. As a correlational datapoint, I tracked my success from cold approach and the time I've spent meditating (including a 2-month period of usually ~2 hours of meditation/day), and don't see any measurable improvement in my success rate from cold approach: (Note that the linked analysis also includes a linear regression of slope -6.35e-08, but with p=0.936, so could be random.) In cases where meditation does stuff to your vibe-reading of other people, I would guess that I'd approach women who are more open to being approached. I haven't dug deeper into my fairly rich data on this, and the data doesn't include much post-retreat approaches, but I still find the data I currently have instructive. I wish more people tracked and analyzed this kind of data, but I seem alone in this so far. I do feel some annoyance at everyone (the, ah, "cool people"?) in this area making big claims (and sometimes money off of those claims) without even trying to track any data and analyze it, leaving it basically to me to scramble together some DataFrames and effect sizes next to my dayjob.[1] > So start meditating for an hour a day for 3 months using the mind illuminated as an experiment (getting some of the cool skills mentioned in Kaj Sotala's sequence?) and see what happens? Do you have any concrete measurable predictions for what would happen in that case? ---------------------------------------- 1. I often wonder if empiricism is just incredibly unintuitive for humans in general, and experimentation and measurement even more so. Outside the laboratory very few people do it, and see e.g. Aristotle's claims about the number of women's teeth or his theory of ballistics, which went un(con)tested for almost 2000 years? What is going on here? Is empiricism really that hard? Is it about what people bother to look at? Is making shit up just so much easier so that everyone keeps in that mode, which is a stable equilibrium? ↩︎
Raemon4d*10882
The Tale of the Top-Tier Intellect
Serious question: (well, it'll start as "more of a comment, really", but, at the end I do have a question) The comment: I think the world is bottlenecked on people understanding the sort of concepts in posts like these. I don't think the world is particularly bottlenecked on current-gen-Yudkowsky-shaped dialogue essays about it. They appeal to a small set of people.  My guess is you write them anyway because they are pretty easy to write in your default style and maybe just mostly fun-for-their-own-sake. And when you're in higher-effort modes, you do also write things like If Anyone Builds It, that are shaped pretty different. And, probably these essays still help some people, and maybe they help workshop new analogies that eventually can be refined into If Anyone style books or podcast interviews. But, that said, my questions are: * How much have you experimented with finding low-energy-but-different formats, that might reroll on who finds them compelling? * (I'm particularly interested in if there turns out to be anything short in this reference class) * How much have you (or, anyone else, this doesn't have to be you) systematically thought about how to improve the distribution channel of this sort of essay so it reaches more people? Both of these are presumably high effort. I'm not sure if the the first one is a better use of your high-effort time than other things, or how likely it is to work out. But, wondering if this is an area you think you've already checked for low or mid-hanging fruit it. (Also, having now thought about it for 5 min, I think this sort of thing would actually make a good youtube video that the Rational Animations people might do. That could be mostly outsourced)
Load More
492Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
76
LWLW3h365
waterlubber, habryka, and 7 more
12
I just can’t wrap my head around people who work on AI capabilities or AI control. My worst fear is that AI control works, power inevitably concentrates, and then the people who have the power abuse it. What is outlandish about this chain of events? It just seems like we’re trading X-risk for S-risks, which seems like an unbelievably stupid idea. Do people just not care? Are they genuinely fine with a world with S-risks as long as it’s not happening to them? That’s completely monstrous and I can’t wrap my head around it.  The people who work at the top labs make me ashamed to be human. It’s a shandah. This probably won’t make a difference, but I’ll write this anyways. If you’re working on AI-control, do you trust the people who end up in charge of the technology to wield it well? If you don’t, why are you working on AI control?
GradientDissenter2d*8413
Ryan Meservey, RobertM, and 6 more
13
Notes on living semi-frugally in the Bay Area. I live in the Bay Area, but my cost of living is pretty low: roughly $30k/year. I think I live an extremely comfortable life. I try to be fairly frugal, both so I don't end up dependent on jobs with high salaries and so that I can donate a lot of my income, but it doesn't feel like much of a sacrifice. Often when I tell people how little I spend, they're shocked. I think people conceive of the Bay as exorbitantly expensive, and it can be, but it doesn't have to be. Rent: I pay ~$850 a month for my room. It's a small room in a fairly large group house I live in with nine friends. It's a nice space with plenty of common areas and a big backyard. I know of a few other places like this (including in even pricier areas like Palo Alto). You just need to know where to look and to be willing to live with friends. On top of rent I pay ~$200/month (edit: I was missing one expense, it's more like $300) for things like utilities, repairs on the house, and keeping the house tidy. I pool the grocery bill with my housemates so we can optimize where we shop a little. We also often cook for each other (notably most of us, including myself, also get free meals on weekdays in the offices we work from, though I don't think my cost of living was much higher when I was cooking for myself each day not that long ago). It works out to ~$200/month. I don't buy that much stuff. I thrift most of my clothes, but I buy myself nice items when it matters (for example comfy, somewhat-expensive socks really do make my day better when I wear them). I have a bunch of miscellaneous small expenses like my Claude subscription, toothpaste, etc, but they don't add up to much. I don't have a car, a child, or a pet (but my housemate has a cat, which is almost the same thing). I try to avoid meal delivery and Ubers, though I use them in a pinch. Public transportation costs aren't nothing, but they're quite manageable. I actually have a PA who helps me with
Dave Banerjee34m20
0
Why Steal Model Weights? Epistemic status: Hastily written. I dictated in a doc for 7 minutes. Then I spent an hour polishing it. I don’t think there are any hot takes in this post? It’s mostly a quick overview of model weight security so I can keep track of my threat models. Here’s a quick list of reasons why an attacker might steal frontier AI model weights (lmk if I'm missing something big): 1. Attackers won’t profit from publicly serving the stolen model on an API. A state actor like Russia couldn't price-compete with OpenAI due to lack of GPU infrastructure and economies of scale, so they wouldn’t make money via a public API (unless the model they stole was not publicly available previously and is at the price-performance pareto frontier. But even if this happens, I assume the company (who’s model weights were just stolen) would release their best model and outcompete the attacker’s API due to economies of scale[1]). 2. Attackers won’t gain many AI R&D insights from just stealing the model weights. Stealing weights reveals the model architecture but not much else. However, if the stolen model was using online learning or had algorithmic insights baked into the weights (i.e. the model was trained on internal OpenAI docs, resulting in OpenAI’s algorithmic trade secrets getting baked into the model weights) then the attackers could extract these algorithmic insights by prompting the stolen model. That being said, I think that just stealing the model weights won’t give much R&D insight since you don’t also get the juicy algorithmic secrets (FWIW, if an attacker is able to steal the model weights, they’ve probably stolen many of your algorithmic secrets too…).[2] 3. Attackers could remove the stolen model’s safety guardrails for misuse. Attackers could fine-tune away alignment/safety measures to create helpful-only models. They could then use these helpful-only models for misuse (e.g. mass-surveillance, bioterrorism, weapons development, etc.). 4. Attackers c
jacquesthibs1d3820
Jesper L., Raemon, and 2 more
5
Building an AI safety business that tackles the core challenges of the alignment problem is hard. Epistemic status: uncertain; trying to articulate my cruxes. Please excuse the scattered nature of these thoughts, I’m still trying to make sense of all of it. You can build a guardrails or evals platform, but if your main threat model involves misalignment via internal deployment with self-improving AI (potentially stemming from something like online learning on hard problems like alignment which leads to AI safety sabotage), it is so tied to capabilities that you will likely never have the ability to influence the process. You can build reliability-as-a-business but this probably speeds up timelines via second-order effects and doesn’t really matter for superintelligence. I guess you can hone in on the types of problems where Goodharting is an obvious problem and you are building reliable detectors to help reduce it. Maybe you can find companies that would value that as a feature and you can relate it to the alignment-relevant situations. You can build RL environments, sell evals or sell training data, but you still seemingly end up too far removed from what is happening internally. You could choose a high-stakes vertical you can make money with as a test-bed for alignment and build tooling/techniques that ensure a high-level of guarantees. If you have a theory of change, it will likely need to be some technical alignment breakthrough you make legible and low-friction to incorporate or some open source infra the labs can leverage. You can build ControlArena or Inspect, open-source it, and then try to make a business around it, but of course you are not tackling the core alignment challenges. Unless your entire theory of change is building infrastructure the labs will port into their local Frankenstein infra and that Control ends up being the only thing the labs needed for solving alignment with AIs. And I guess from a startup perspective, you recognize that bu
Daniel Tan12h110
0
Question for people with insider knowledge of how labs train frontier models: Is it more common to do alignment training as the last step of training, or RL as the last step of training? * Edit: I'm mainly referring to on-policy RL, e.g. the type of RL that is used to induce new capabilities like coding / reasoning / math / tool use. I'm excluding RLHF because I think it's pretty disanalogous (though I also welcome disagreement / takes on this point.)  Naively I'd expect we want alignment to happen last. But I have a sense that usually RL happens last - why is this the case? Is it because RL capabilities are too brittle to subsequent finetuning? 
GradientDissenter4d9317
gjm, noahzuniga, and 3 more
6
Here's my attempt at a neutral look at Prop 50, which people in California can vote on Tuesday (Nov 4th). The bill seems like a case-study in high-stakes game theory and when to cooperate or defect. The bill would allow the CA legislature to re-write the congressional district maps until 2030 (when district-drawing would go back to normal). Currently, the district maps are drawn by an independent body designed to be politically neutral. In essence, this would allow the CA legislature to gerrymander California. That would probably give Democrats an extra 3-5 seats in Congress. It seems like there's a ~17% chance that it swings the House in the midterms. Gerrymandering is generally agreed to be a bad thing, since it means elections are determined on the margin more by the map makers and less by the people. The proponents of this bill don't seem to think otherwise. They argue the bill is in response to Texas passing a similar bill to redistrict in a way that is predicted to give Republicans 5 new house seats (not to mention similar bills in North Carolina and Missouri that would give republicans an additional 2 seats). Trump specifically urged Texas, North Carolina, and Missouri to pass their bills, and the rationale was straightforwardly to give Republicans a greater chance at winning the midterms. For example, Rep. Todd Hunter, the author of Texas's redistricting bill, said "The underlying goal of this plan is straightforward, [to] improve Republican political performance". Notably some Republicans have also tried to argue that the Texas bill is in response to Democrats gerrymandering and obstructionism, but this doesn't match how Trump seems to have described the rationale originally.[1] The opponents of Prop 50 don't seem to challenge the notion that the Republican redistricting was bad.[2] They just argue that gerrymandering is bad for all the standard reasons. So, it's an iterated prisoners' dilemma! Gerrymandering is bad, but the Republicans did it, maybe
koanchuk10m10
0
Given superintelligence, what happens next depends on the success of the alignment project. The two options: 1. It fails, and we die soon thereafter (or worse). 2. It succeeds, and we now have an entity that can solve problems for us far better than any human or human organization. We are now in a world where humans have zero socioeconomic utility. The ASI can create entertainment and comfort that surpasses anything any human can provide. Sure you can still interact with others willing to interact with you, it just won't be as fun as whatever stimulus the ASI can provide, and both your pool of available playmates and your own willingness to partake will shrink as the ASI gets better at artificially generating the stimuli and emotions you want. We will spend eternity in this state thanks to advanced medicine. Unless the ASI recognizes a right to die, not that many would choose invoke it given the infinite bliss. Am I missing something? No matter what, it's beginning to look like the afterlife is fast approaching, whether we die or not. What a life.
Load More (7/57)
25Solstice Season 2025: Ritual Roundup & Megameetups
Raemon
21h
0
270I ate bear fat with honey and salt flakes, to prove a point
aggliu
4d
38
221Legible vs. Illegible AI Safety Problems
Ω
Wei Dai
3d
Ω
59
291Why I Transitioned: A Case Study
Fiora Sunshine
6d
48
740The Company Man
Tomás B.
1mo
70
681The Rise of Parasitic AI
Adele Lopez
2mo
178
162The Unreasonable Effectiveness of Fiction
Raelifin
2d
16
186You’re always stressed, your mind is always busy, you never have enough time
mingyuan
6d
6
149Lack of Social Grace is a Lack of Skill
Screwtape
5d
21
147What's up with Anthropic predicting AGI by early 2027?
ryan_greenblatt
4d
15
42A country of alien idiots in a datacenter: AI progress and public alarm
Seth Herd
8h
1
232On Fleshling Safety: A Debate by Klurl and Trapaucius.
Eliezer Yudkowsky
12d
51
356Hospitalization: A Review
Logan Riggs
1mo
21
Load MoreAdvanced Sorting/Filtering
162
The Unreasonable Effectiveness of Fiction
Raelifin
2d
16
85
LLM-generated text is not testimony
TsviBT
5d
80
First Post: Chapter 1: A Day of Very Low Probability