A general guide for pursuing independent research, from conceptual questions like "how to figure out how to prioritize, learn, and think", to practical questions like "what sort of snacks to should you buy to maximize productivity?"

Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
I'm surprised why some people are so interested in the idea of liability for extreme harms. I understand that from a legal/philosophical perspective, there are some nice arguments about how companies should have to internalize the externalities of their actions etc. But in practice, I'd be fairly surprised if liability approaches were actually able to provide a meaningful incentive shift for frontier AI developers. My impression is that frontier AI developers already have fairly strong incentives to avoid catastrophes (e.g., it would be horrible for Microsoft if its AI model caused $1B in harms, it would be horrible for Meta and the entire OS movement if an OS model was able to cause $1B in damages.) And my impression is that most forms of liability would not affect this cost-benefit tradeoff by very much. This is especially true if the liability is only implemented post-catastrophe. Extreme forms of liability could require insurance, but this essentially feels like a roundabout and less effective way of implementing some form of licensing (you have to convince us that risks are below an acceptable threshold to proceed.) I think liability also has the "added" problem of being quite unpopular, especially among Republicans. It is easy to attack liability regulations as anti-innovation, argue that that it creates a moat (only big companies can afford to comply), and argue that it's just not how America ends up regulating things (we don't hold Adobe accountable for someone doing something bad with Photoshop.) To be clear, I don't think "something is politically unpopular" should be a full-stop argument against advocating for it. But I do think that "liability for AI companies" scores poorly both on "actual usefulness if implemented" and "political popularity/feasibility." I also think the "liability for AI companies" advocacy often ends up getting into abstract philosophy land (to what extent should companies internalize externalities) and ends up avoiding some of the "weirder" points (we expect AI has a considerable chance of posing extreme national security risks, which is why we need to treat AI differently than Photoshop.) I would rather people just make the direct case that AI poses extreme risks & discuss the direct policy interventions that are warranted. With this in mind, I'm not an expert in liability and admittedly haven't been following the discussion in great detail (partly because the little I have seen has not convinced me that this is an approach worth investing into). I'd be interested in hearing more from people who have thought about liability– particularly concrete stories for how liability would be expected to meaningfully shift incentives of labs. (See also here).  Stylistic note: I'd prefer replies along the lines of "here is the specific argument for why liability would significantly affect lab incentives and how it would work in concrete cases" rather than replies along the lines of "here is a thing you can read about the general legal/philosophical arguments about how liability is good."
From Newcastle, Australia to Berkeley, San Francisco. I arrived yesterday for Less.online. I’ve had a bit of culture shock, a big helping of being increasingly scared, and quite a few questions. I’ll start with those. Feel free to skip them. These questions are based on warnings I’ve gotten from local non-rationalists. Idk if they’re scared because of the media they consume or because of actual stats. I’m asking these because they feel untrue. 1. Is it ok to be outside after dark? 2. Will I really get ‘rolled’ mid day in Oakland? 3. Are there gangs walking around Oakland looking to stab people? 4. Will all the streets fill up with homeless people at night? 5. Are they chill? In Aus they’re usually down to talk if you are. Culture shocks for your enjoyment: 1. Why is everyone doing yoga? 2. To my Uber driver: “THAT TRAIN IS ON THE ROAD!?” 3. “I thought (X) was just in movies!” 4. Your billboards are about science instead of coal mining! 5. “Wait, you’re telling me everything is vegan?” Thank Bayes, this is the best. All our vegan restaurants went out of business. 6. People brag about things? And they do it openly? At least, I think that’s what’s happening? 7. “Silicon Valley is actually a valley?!” Should have predicted this one. I kinda knew, but I didn’t know like I do now. 8. “Wow! This shop is openly selling nangs!” (whip its) “And a jungle juice display!” 9. All your cars are so new and shiny. 60% of ours are second hand 10. Most people I see in the streets look below 40. It’s like I’m walking around a university! 11. Wow. It’s really sunny. 12. American accents irl make me feel like I’m walking through a film. 13. “HOLY SHIT! A CYBER TRUCK?!” 14. Ok this is a big one. Apps I’ve had for 8+ years are suddenly different when I arrive here? 15. This is what Uber is meant to be. I will go back to Australia and cry. Your airport has custom instruction… in app! WHAT!? The car arrives in 2 minutes instead of 30 minutes. Also, the car arrives at all. 16. The google app has a beaker for tests now? 17. Snap maps has gifs in it 18. Apple Maps lets you scan buildings? And has tips about good restaurants and events? 19. When I bet in the Manifold app. A real paper Crain flies from the nearest tree, lands in front of me and unfolds. Written inside, “Will Eliezer Yudkowsky open a rationalist bakery?” I circle “Yes”. The paper meticulously folds itself back to a Crain. It looks at me. Makes a little sound that doesn’t echo in the streets but in my head, and it burns. Every time this happens I save the ashes. Are Manifold creating new matter? How are they doing this? 20. That one was a lie Things that won’t kill me but scare me rational/irrational: 1. What if I’ve been wrong? What if this is all a scam? A cult? What if Mum was right? 2. What if I show up to the location and there is no building there? 3. What if I make some terribly awkward cultural blunder for SF and everyone yells at me? 4. What if no one tells me? 5. I’m sure I’ll be at least in the bottom 5% for intelligence at Less Online. I won’t be surprised or hurt if I’ve got the least Gs of people there. But what if it all goes over my head? Maybe I can’t even communicate with smart people about the things I care about. 6. What if I can’t handle people telling me what they think of my arguments without kid gloves? What if I get angry and haven’t learnt to handle that? 7. I’m just a Drama teacher and Psych student. My head is filled with improv games and fun facts about Clever Hans! ‘Average’ Americans seem to achieve much higher than ‘average’ Australians. I’m scared of feeling under qualified. Other things: 1. Can you think of something I should be worried about, that I’ve not written here? 2. I’ve brought my copies of the Rationality A-Z books. I want to ask people I meet to sign their favourite post in the two books. Is that culturally acceptable? Feels kinda weird bc Yud is going to be there. But it would be a really warm/fuzzy item to me in the future. 3. I don’t actually know what a lot of the writers going look like. I hope this doesn’t result in a blunder. But might be funny, given that I expect rationalists to be pretty chill. 4. Are other people as excited about the Fooming Shoggoths as I am? 5. I’m 23, I have no idea if that is very old, very young, or about normal for a rationalist. I’d guess about normal, with big spread across the right of a graph. It feels super weird to be in the same town as a bunch of you guys now. I’ve never met a rationalist irl. I talked to Ruby over zoom once, who said to me “You know you don’t have to stay in Australia right?” I hope Ruby is a good baseline for niceness levels of you all. If you’re going, I’ll see you at Less.Online. If you’re not, I’d still love to meet you. Feel free to DM me!
Feels like FLI is a massively underrated org. Cos of the whole vitalik donation thing they have like $300mn. 
My mainline prediction scenario for the next decades. My mainline prediction * : * LLMs will not scale to AGI. They will not spawn evil gremlins or mesa-optimizers. BUT Scaling laws will continue to hold and future LLMs will be very impressive and make a sizable impact on the real economy and science over the next decade.  * there is a single innovation left to make AGI-in-the-alex sense work, i.e. coherent, long-term planning agents (LTPA) that are effective and efficient in data sparse domains over long horizons.  * that innovation will be found within the next 10-15 years * It will be clear to the general public that these are dangerous  * governments will act quickly and (relativiely) decisively to  bring these agents under state-control. national security concerns will dominate.  * power will reside mostly with governments AI safety institutes and national security agencies. In so far as divisions of tech companies are able to create LTPAs they will be effectively nationalized.  * International treaties will be made to constrain AI, outlawing the development of LTPAs by private companies. Great power competition will mean US and China will continue developing LTPAs, possibly largely boxed. Treaties will try to constrain this development with only partial succes (similar to nuclear treaties).  * LLMs will continue to exist and be used by the general public * Conditional on AI ruin the closest analogy is probably something like the Cortez-Pizarro-Afonso takeovers. Unaligned AI will rely on human infrastructure and human allies for the earlier parts of takeover - but its inherent advantages in tech, coherence, decision-making and (artificial) plagues will be the deciding factor. *  The world may be mildly multi-polar.  * This will involve conflict between AIs. * AIs very possible may be able to cooperate in ways humans can't.  * The arrival of AGI will immediately inaugurate a scientific revolution. Sci-fi sounding progress like advanced robotics, quantum magic, nanotech, life extension, laser weapons, large space engineering, cure of many/most remaining diseases will become possible within two decades of AGI, possibly much faster.  * Military power will shift to automated manufacturing of drones &  weaponized artificial plagues. Drones, mostly flying will dominate the battlefield. Mass production of drones and their rapid and effective deployment in swarms will be key to victory.   Two points on which I differ with most commentators: (i) I believe AGI is a real (mostly discrete) thing , not a vibe, or a general increase of improved tools. I believe it is inherently agenctic. I don't think spontaneous emergence of agents is impossible but I think it is more plausible agents will be built rather than grown.  (ii) I believe in general the ea/ai safety community is way overrating the importance of individual tech companies vis a vis broader trends and the power of governments. I strongly agree with Stefan Schubert's take here on the latent hidden power of government: https://stefanschubert.substack.com/p/crises-reveal-centralisation Consequently, the ea/ai safety community is often myopically focusing on boardroom politics that are relativily inconsequential in the grand scheme of things.  *where by mainline prediction I mean the scenario that is the mode of what I expect. This is the single likeliest scenario. However, since it contains a large number of details each of which could go differently, the probability on this specific scenario is still low. 
Ava on looking for rejection: > There is no penalty for asking. You can apply to the same thing 10 times and no one’s gonna get mad at you. You can advertise something on the Internet and even if 99% of people think it’s dumb, 1% might think it’s really cool. You are always doing things for the one person who will give you the yes. And often one yes is enough. I’ve been trying to reframe my relationship with rejection from avoiding it to literally looking for rejection—going out there and risking the NOs. I’ve been doing it in really silly ways, like trying to buy an staff-only hoodie at my favorite grocery store and walking into restaurants when there are no reservations available, but also in meaningful ways like proposing writing collaborations and meeting new people. It’s so fun! I can’t believe I’ve been running away from it for so long. This reminded me of Anonymous[1]’ post on how to become more agentic. The replies to the accompanying tweet also have some fun examples of people asking for stuff—and getting it. ---------------------------------------- 1. I know who wrote it because I've read the post before, but I want to respect their privacy. ↩︎

Popular Comments

Recent Discussion


Ava on looking for rejection:

There is no penalty for asking. You can apply to the same thing 10 times and no one’s gonna get mad at you. You can advertise something on the Internet and even if 99% of people think it’s dumb, 1% might think it’s really cool. You are always doing things for the one person who will give you the yes. And often one yes is enough. I’ve been trying to reframe my relationship with rejection from avoiding it to literally looking for rejection—going out there and risking the NOs. I’ve been doing it in really silly ways, like trying

... (read more)

As we explained in our MIRI 2024 Mission and Strategy update, MIRI has pivoted to prioritize policy, communications, and technical governance research over technical alignment research. This follow-up post goes into detail about our communications strategy.

The Objective: Shut it Down[1]

Our objective is to convince major powers to shut down the development of frontier AI systems worldwide before it is too late. We believe that nothing less than this will prevent future misaligned smarter-than-human AI systems from destroying humanity. Persuading governments worldwide to take sufficiently drastic action will not be easy, but we believe this is the most viable path.

Policymakers deal mostly in compromise: they form coalitions by giving a little here to gain a little somewhere else. We are concerned that most legislation intended to keep humanity alive will go...

3Stephen McAleese
Is MIRI still doing technical alignment research as well?
Sure, but 1. I only put 80% or so on MWI/MUH etc.  and 2. I'm talking about optimizing for more positive-human-lived-seconds, not for just a binary 'I want some humans to keep living' .
Then why aren't you mostly dominated by the possibility of >10^50 positive-human-lived-seconds via human control of the light cone? Maybe some sort of diminishing returns?

I am dominated by it, and okay, I see what you are saying. Whichever scenario results in a higher chance of human control of the light cone is the one I prefer, and these considerations are relevant only where we don't control it.

Epistemic status: mental model which I have found picks out bullshit surprisingly well.

Idea 1: Parasitic memes tend to be value-claims, as opposed to belief-claims

By "parasitic memes" I mean memes whose main function is to copy themselves - as opposed to, say, actually provide value to a human in some way (so that the human then passes it on). Scott's old Toxoplasma of Rage post is a central example; "share to support X" is another.

Insofar as a meme is centered on a factual claim, the claim gets entangled with lots of other facts about the world; it's the phenomenon of Entangled Truths, Contagious Lies. So unless the meme tries to knock out a person's entire epistemic foundation, there's a strong feedback signal pushing against it if it makes a false...

They're coextensive/parasitic on virtues, virtues being hard won compressions of lots of contextual information about how to prioritize and behave for min-maxing costs and benefits in a side-effect free way. Since virtues are illegible to younger people who haven't built up enough data yet, values are an easy attribute substitution.

Yeah, admittedly health is kind of a borderline case where it's technically factual but in practice mostly operates as a standard value-claim because of low entanglement and high reason to care. I basically agree with your claim that the heuristic is approximating (reason to care) + (low entanglement).
I don't think that the key element in the aging example is 'being about value claims'. Instead, it's that the question about what's healthy is a question that many people wonder about. Since many people wonder about that question, some people will venture an answer. Even if humanity hasn't yet built up enough knowledge to have an accurate answer. Thousands of years ago many people wondered what the deal is with the moon and some of them made up stories about this factual (non-value) question whose correct answer was beyond them. And it plays out similarly these days with rumors/speculation/gossip about the topics that grab people's attention. Where curiosity & interest exceeds knowledge, speculation will fill the gaps, sometimes taking on a similar presentation to knowledge. Note the dynamic in your aging example: when you're in a room with 5+ people and you mention that you've read a lot about aging, someone asks the question about what's healthy. No particular answer needs to be memetic because it's the question that keeps popping up and so answers will follow. If we don't know a sufficiently good/accurate/thorough answer then the answers that follow will often be bullshit, whether that's a small number of bullshit answers that are especially memetically fit or whether it's a more varied and changing froth of made-up answers. There are some kinds of value claims that are pretty vague and floaty, disconnected from entangled truths and empirical constraints. But that is not so true of instrumental claims about things like health, where (e.g.) the claim that smoking causes lung cancel is very much empirical & entangled. You might still see a lot of bullshit about these sorts of instrumental value claims, because people will wonder about the question even if humanity doesn't have a good answer. It's useful to know (e.g.) what foods are healthy, so the question of what foods are healthy is one that will keep popping up when there's hope that someone in the room migh
I think the value-ladenness is part of why it comes up even when we don't have an answer, since for value-laden things there's a natural incentive to go up right to the boundary of our knowledge to get as much value as possible.
3the gears to ascension
I went pretty stir crazy without enough room to move around.

That's fair, but it sounds like a personal preference. I asked because maybe you knew there was something unusually bad about small flats in the Bay Area that even folks like me would find annoying. 

That first point made me laugh. It’s exactly the type of mistake I expected to make, and I still didn’t see it coming. I appreciate all this safety advice and will update my decision making based on that. Geez, the weed thing surprises me. I hadn’t planned to smoke any until after the event. But I think I’ll avoid that now. I’m already struggling with motivation from jet lag. I don’t want to increase that feeling.
2the gears to ascension
It's a relatively chemically safe drug, but is easily habit forming and knocks you out of a productive space if used more than once every 3 to 6 months, imo. your reasoning seems reasonable. have fun with the trip!

METR has not intended to claim to have audited anything, or to be providing meaningful oversight or accountability, but there has been some confusion about whether METR is an auditor or planning to be one.

To clarify this point:

  1. METR’s top priority is to develop the science of evaluations, and we don’t need to be auditors in order to succeed at this.
    • We aim to build evaluation protocols that can be used by evaluators/auditors regardless of whether that is the government, an internal lab team, another third party, or a team at METR. 
  2. We should not be considered to have ‘audited’ GPT-4 or Claude.
    • Those were informal pilots of what an audit might involve, or research collaborations – not providing meaningful oversight. For example, it was all under NDA – we didn’t
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
1Bogdan Ionut Cirstea
I'd be interested in seeing the strongest arguments (e.g. safety-washing?) for why, at this point, one shouldn't collaborate with OpenAI (e.g. not even part-time, for AI safety [evaluations] purposes).

If we’re taking the perspective of the entire community, this bears less weight, but: it likely becomes close-to-impossible to criticize OpenAI from that point forward. I’m not even anti-OpenAI, I just try to be truth-seeking where I think people are dropping the ball, and I think there’s almost 0 chance I’ll be able to work with OpenAI in the future given my comments on Twitter.

A cohabitive game[1] is a partially cooperative, partially competitive multiplayer game that provides an anarchic dojo for development in applied cooperative bargaining, or negotiation.

Applied cooperative bargaining isn't currently taught, despite being an infrastructural literacy for peace, trade, democracy or any other form of pluralism. We suffer for that. There are many good board games that come close to meeting the criteria of a cohabitive game today, but they all[2] miss in one way or another, forbidding sophisticated negotiation from being practiced.

So, over the past couple of years, we've been gradually and irregularly designing and playtesting the first[2] cohabitive boardgame, which for now we can call Difference and Peace Peacewager 1, or P1. This article explains why we think this new genre is important, how it's been going, what we've learned,...

2D is a limit. but there's also more design language built around 2D UIs. I still think there's a ton of unexplored design space around "tabletop games" that make use of modern web flows.

I agree shared presence is important. I also think it's unsolved. VR isn't fidelous enough to transmit sufficient social information and it's still very inaccessible due to price & physical discomfort

This is the story of my personal experience with Buddhism (so far).

First Experiences

My first experience with Buddhism was in my high school's World Religions class. For homework, I had to visit a religious institution. I was getting bad grades, so I asked if I could get extra credit for visiting two and my teacher said yes. I picked an Amida Buddhist church and a Tibetan Buddhist meditation center.

I took off my shoes at the entrance to the Tibetan Buddhist meditation center. It was like nothing I had ever seen before in real life. There were no chairs. Cushions were on the floor instead. The walls were covered in murals. There were no instructions. People just sat down and meditated. After that there was some walking meditation. I...


I'm not a doctor, but it is my understanding that meditation, drugs and sleep deprivation all make psychosis more likely.

There's this idea that long hours of meditation can trigger a psychotic episode but that short sessions don't. While it is true that longer hours carry higher risk, I rarely meditated even one hour in a single day. I have meditated three hours in one day only once, and that was long after the events in this narrative. For me, psychosis happened because I entered wild territory before I had cultivated the wisdom to navigate it safely. I t... (read more)

Fixed. Thanks.
Fixed. Thanks.
3Jonas Hallgren
I was doing the same samadhi thing with TMI and I was looking for insight practices from there. My teacher (non dual thai forest tradition) said that the burmese traditions sets up a bit of a strange reality dualism and basically said that the dark night of the soul is often due to developing concentration before awareness, loving kindness and wisdom. So I'm mahamudra pilled now (pointing out the great way is a really good book for this). I do still like the insight model you proposed, I'm still reeling a bit from the insights I got during my last retreat so it seems true. Thank you for sharing your experience!

LessOnline Festival

May 31st to June 2nd, Berkeley CA