Recent Discussion

Hello! This is jacobjacob from the LessWrong / Lightcone team. 

This is a meta thread for you to share any thoughts, feelings, feedback or other stuff about LessWrong, that's been on your mind. 

Examples of things you might share: 

  • "I really like agree/disagree voting!"
  • "What's up with all this Dialogues stuff? It's confusing... 
  • "Hm... it seems like recently the vibe on the site has changed somehow... in particular [insert 10 paragraphs]"

...or anything else! 

The point of this thread is to give you an affordance to share anything that's been on your mind, in a place where you know that a team member will be listening. 

(We're a small team and have to prioritise what we work on, so I of course don't promise to action everything mentioned here. But I will at least listen...

I wonder what fraction of the weirdest writers here feel the same way. I can't remember the last time I've read something on LessWrong and thought to myself, "What a strange, daring, radical idea. It might even be true. I'm scared of what the implications might be." I miss that.

The guy with the post about not paying one's taxes was pretty out there and had plenty interesting discussion, but now it's been voted down to the negatives. I wish it was a bit higher (at 0-ish karma, say), which might've happened if people could disagree-vote on it.

But yes, overal... (read more)

I think it's still the best forum for discussing the most important thing happening in the world. 
2Answer by Chris_Leong32m
I think there's a lot of good content here, but there are definitely issues with it tilting so much towards AI Safety. I'm an AI Safety person myself, but I'm beginning to wonder if it is crowding out the other topics of conversation. We need to make a decision: are we fine with Less Wrong basically becoming about AI because that's the most important topic or do we need to split the discussion somehow?
Do you have any thoughts on what the most common issues you see are or is it more like that every time it is a different issue?

Alexander, Matt and I want to chat about the field of Agent Foundations (AF), where it's at and how to strengthen and grow it going forward. 

We will kick off by each of us making a first message outlining some of our key beliefs and open questions at the moment. Rather than giving a comprehensive take, the idea is to pick out 1-3 things we each care about/think are important, and/or that we are confused about/would like to discuss. We may respond to some subset of the following prompts: 

Where is the field of AF at in your view? How do you see the role of AF in the larger alignment landscape/with respect to making AI futures go well? Where would you like to see it go? What do


The video of John's talk has now been uploaded on YouTube here.

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

Yep, happened to me too. I like LW aesthetic so I wouldn't want profile pics, but I think personal notes on users (like discord has) would be great.

Yes, it happened before for me as well. I think it would be good to have profile pictures to make it easier to recognize users.
It's good to use prediction markets in practice but most people who read the post likely don't get that much value from reading the post.  Larry McEnerney is good at explaining that good writing isn't writing that's cool or interesting but simply writing that provides value to the reader.  As far as the actual execution goes, it might have been better to create fewer markets and focus on fewer experiments, so that each one gets one attention.

(This post is inspired by Carl Shulman’s recent podcast with Dwarkesh Patel, which I highly recommend. See also discussion from Buck Shlegeris and Ryan Greenblatt here, and Evan Hubinger here.)



The “no sandbagging on checkable tasks” hypothesis: With rare exceptions, if a not-wildly-superhuman ML model is capable of doing some task X, and you can check whether it has done X, then you can get it to do X using already-available training techniques (e.g., fine-tuning it using gradient descent).[1]

Borrowing from Shulman, here’s an example of the sort of thing I mean. Suppose that you have a computer that you don’t know how to hack, and that only someone who had hacked it could make a blue banana show up on the screen. You’re wondering whether a given model can hack this...

2Lukas Finnveden13h
I think (5) also depends on further details. As you have written it, both the 2023 and 2033 attempt uses similar data and similar compute. But in my proposed operationalization, "you can get it to do X" is allowed to use a much greater amount of resources ("say, 1% of the pre-training budget") than the test for whether the model is "capable of doing X" ("Say, at most 1000 data points".) I think that's important: * If both the 2023 and the 2033 attempt are really cheap low-effort attempts, then I don't think that the experiment is very relevant for whether "you can get it to do X" in the sort of high-stakes, high-efforts situations that I'm imagining that we'll be in when we're trying to eval/align AI models to avoid takeover. * It seems super plausible that a low-effort attempt could fail, and then succeed later-on with 10 more years knowledge of best practices. I wouldn't learn much from that happening once. * If both the 2023 and the 2033 attempts are really expensive and high-effort (e.g. 1% of pre-training budget), then I think it's very plausible that the 2033 training run gave the model new capabilities that it didn't have before. * And in particular: capabilities that the model wouldn't have been able to utilize in a takeover attempt that it was very motivated to do its best at. (Which is ultimately what we care about.)   By a similar argument, I would think that (4) wouldn't falsify the hypothesis as-written, but would falsify the hypothesis if the first run was a much more high-effort attempt. With lots of iteration by a competent team, and more like a $1,000,000 budget. But the 2nd run, with a much more curated and high-quality dataset, still just used $1,000 of training compute.   One thing that I'm noticing while writing this is something like: The argument that "elicitation efforts would get to use ≥1% of the training budget" makes sense if we're eliciting all the capabilities at once, or if there's only a few important capabilities to e

I think I agree with all of that (with the caveat that it's been months and I only briefly skimmed the past context, so further thinking is unusually likely to change my mind).

There are many things I feel like the post authors miss, and I want to share a few things that seem good to communicate.

I'm going to focus on controlling superintelligent AI systems: systems powerful enough to solve alignment (in the CEV sense) completely, or to kill everyone on the planet. 

In this post, I'm going to ignore other AI-related sources of x-risk, such as AI-enabled bioterrorism, and I'm not commenting on everything that seems important to comment on.

I'm also not going to point at all the slippery claims that I think can make the reader generalize incorrectly, as it'd be nitpicky and also not worth the time (examples of what I'd skip- I couldn't find evidence that GPT-4 has undergone any supervised fine-tuning; RLHF shapes chatbots' brains into...

1O O7h
I find this line of reasoning (and even mentioning it) not useful. Any alignment solution will be alignment complete so it’s tautological. I think you’ve defined alignment as a hard problem, which no one will disagree with, but you also define any steps taken towards solving the alignment problem as alignment complete, and thus impossible unless they also seem infeasibly hard. Can there not be an iterative way to solve alignment? I think we can construct some trivial hypotheticals where we iteratively solve it. For the sake of argument say I created a superhuman math theorem solver, something that can solve IMO problems written in lean with ease. I then use it to solve a lot of important math problems within alignment. This in turn affords us strong guarantees about certain elements of alignment or gradient descent. Can you convince me that the solution to getting a narrow AI useful for alignment is as hard as aligning a generally superhuman AI? What if we reframe it to some real world example. The proof for the Riemann hypothesis begins with a handful of difficult but comparatively simple lemmas. Solving those lemmas is not as hard as solving the Reimann hypothesis. And we can keep decomposing this proof into parts that are simpler than the whole. A step in a process being simpler than the end result of the process is not an argument against that step.
2Mikhail Samin8h
Thanks for the comment! People shouldn’t be doing anything like that; I’m saying that if there is actually a CEV-aligned superintelligence, then this is a good thing. Would you disagree? I agree with “Evolution optimized humans to be reproductively successful, but despite that humans do not optimize for inclusive genetic fitness”, and the point I was making was that the stuff that humans do optimize for is similar to the stuff other humans optimize for. Were you confused by what I said in the post or are you just suggesting a better wording?
I think an actual CEV-aligned superintelligence would probably be good, conditional on being possible, but also that I expect that anyone who thinks they have a plan to create one is almost certainly wrong about that and so plans of that nature are a bad idea in expectation, and much more so if that plan looks like "do a bunch of stuff that would be obviously terrible if not for the end goal in the name of optimizing the universe". I was specifically unsure which meaning of "optimize for" you were referring to with each usage of the term.

Yep, I agree

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

In May and June of 2023, I (Akash) had about 50-70 meetings about AI risks with congressional staffers. I had been meaning to write a post reflecting on the experience and some of my takeaways, and I figured it could be a good topic for a LessWrong dialogue. I saw that hath had offered to do LW dialogues with folks, and I reached out. 

In this dialogue, we discuss how I decided to chat with staffers, my initial observations in DC, some context about how Congressional offices work, what my meetings looked like, lessons I learned, and some miscellaneous takes about my experience. 



Hey! In your message, you mentioned a few topics that relate to your time in DC. 

I figured we should start with your experience talking to congressional

8Zach Stein-Perlman7h
Most don't do policy at all. Many do research. Since you're incredulous, here are some examples of great AI governance research (which don't synergize with talking to policymakers): * Towards best practices in AGI safety and governance * Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring * Survey on intermediate goals in AI governance

I mean, those are all decent projects, but I would call zero of them "great". Like, the whole appeal of governance as an approach to AI safety is that it's (supposed to be) bottlenecked mainly on execution, not on research. None of the projects you list sound like they're addressing an actual rate-limiting step to useful AI governance.

5Mikhail Samin12h
It's great to see this being publicly posted!
Thanks for all of this! Here's a response to your point about committees. I agree that the committee process is extremely important. It's especially important if you're trying to push forward specific legislation.  For people who aren't familiar with committees or why they're important, here's a quick summary of my current understanding (there may be a few mistakes): 1. When a bill gets introduced in the House or the Senate, it gets sent to a committee. The decision is made by the Speaker of the House or the priding officer in the Senate. In practice, however, they often defer to a non-partisan "parliamentarian" who specializes in figuring out which committee would be most appropriate. My impression is that this process is actually pretty legitimate and non-partisan in most cases(?).  2. It takes some degree of skill to be able to predict which committee(s) a bill is most likely to be referred to. Some bills are obvious (like an agriculture bill will go to an agriculture committee). In my opinion, artificial intelligence bills are often harder to predict. There is obviously no "AI committee", and AI stuff can be argued to affect multiple areas. With all that in mind, I think it's not too hard to narrow things down to ~1-3 likely committees in the House and ~1-3 likely committees in the Senate. 3. The most influential person in the committee is the committee chair. The committee chair is the highest-ranking member from the majority party (so in the House, all the committee chairs are currently Republicans; in the Senate, all the committee chairs are currently Democrats).  4. A bill cannot be brought to the House floor or the Senate floor (cannot be properly debated or voted on) until it has gone through committee. The committee is responsible for finalizing the text of the bill and then voting on whether or not they want the bill to advance to the chamber (House or Senate).  5. The committee chair typically has a lot of influence over the committee. The commi

This is the fourth post in my series on Anthropics. The previous one is Anthropical probabilities are fully explained by difference in possible outcomes.


If there is nothing special about anthropics, if it’s just about correctly applying standard probability theory, why do we keep encountering anthropical paradoxes instead of general probability theory paradoxes? Part of the answer is that people tend to be worse at applying probability theory in some cases than in the others. 

But most importantly, the whole premise is wrong. We do encounter paradoxes of probability theory all the time. We are just not paying enough attention to them, and occasionally attribute them to anthropics.

Updateless Dilemma and Psy-Kosh's non-anthropic problem

As an example, let’s investigate Updateless Dilemma, introduced by Eliezer Yudkowsky in 2009.

Let us start with a


Value learning is a proposed method for incorporating human values in an AGI. It involves the creation of an artificial learner whose actions consider many possible setsets of values and preferences, weighed by their likelihood. Value learning could prevent an AGI of having goals detrimental to human values, hence helping in the creation of Friendly AI.

Although thereThere are many proposed ways to incorporate human values in an AGI (e.g.: Coherent Extrapolated Volition, Coherent Aggregated Volition and Coherent Blended Volition), thismostly dating from 2004-2010.) This method is directly mentioned and developedwas suggested in 2011in Daniel Dewey’s paper ‘Learning What to Value’. Like most authors, he assumes that human’s goals would not naturally occur in an artificial agent and shouldneeds to be enforced in it.intentionally aligned to human goals. First, Dewey argues against the use of a simple use of reinforcement learning to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge from value maximization. For example, even if we forcefully engineer the agent to maximize those rewards that also maximize human values, the agentthis could alter its environment to more easily produce those same rewards without the trouble of also maximizing human values (i.e.: if thesuffer from goal misspecification or reward was human happiness it could alter the human mind so it became happy with anything).hacking.

To solve all these problems, Dewey proposes a utility function maximizer, who considers all possible utility functions weighted by their Bayesian probabilities: "[W]e propose uncertainty over utility functions. Instead of providing an agent one utility function up front, we provide an agent with a pool of possible utility functions and a probability distribution P such that each utility function can be assigned probability P(Ujyxm) given a particular interaction history [yxm]. An agent can then calculate an expected value over possible utility functions given a particular interaction history" He concludes saying that although it solves many of the mentioned problems, this method still leaves many open questions. However it should provide a direction for future work.