25th Apr 2024

1 min read

6

This is a special post for quick takes by Eric Neyman. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Mentioned in

3Longtermist Implications of the Existence Neutrality Hypothesis

Eric Neyman's Shortform

2the gears to ascension

2[anonymous]

2the gears to ascension

3the gears to ascension

17the gears to ascension

105 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:25 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Eric Neyman15d*14251

California state senator Scott Wiener, author of AI safety bills SB 1047 and SB 53, just announced that he is running for Congress! I'm very excited about this, and I wrote a blog post about why.

It’s an uncanny, weird coincidence that the two biggest legislative champions for AI safety in the entire country announced their bids for Congress just two days apart. But here we are.*

In my opinion, Scott Wiener has done really amazing work on AI safety. SB 1047 is my absolute favorite AI safety bill, and SB 53 is the best AI safety bill that has passed anywhere in the country. He's been a dedicated AI safety champion who has spent a huge amount of political capital in his efforts to make us safer from advanced AI.

On Monday, I made the case that donating to Alex Bores -- author of the New York RAISE Act -- calling it a "once in every couple of years opportunity", but flagging that I was also really excited about Scott Wiener.

I plan to have a more detailed analysis posted soon, but my bottom line is that donating to Wiener today is about 75% as good as donating to Bores was on Monday, and that this is also an excellent opportunity that will come up very rarely. (The main reason that it loo... (read more)

3ChristianKl13d

When making donations like that, is there a way to add a note explaining why you donated? I would expect that if Scott Wiener knows that a lot of his donations are because of AI safety, that might mean that he spends more of his time if elected with the cause.

8Eric Neyman13d

If you donate through the link on this post, he will know! The /sw_ai at the end is ours -- that's what lets him know. (The post is now edited to say this, but I should have said it earlier, sorry!)

[-]Eric Neyman16d10871

I think that people concerned with AI safety should consider giving to Alex Bores, who's running for Congress.

Alex Bores is the author of the RAISE Act, a piece of AI safety legislation in New York that Zvi profiled positively a few months ago. Today, Bores announced that he's running for Congress.

In my opinion, Bores is one of the best lawmakers anywhere in the country on the issue of AI safety. I wrote a post making the case for donating to his campaign.

If you feel persuaded by the post, here's a link to donate! (But if you think you might want to work in government, then read the section on career capital considerations before donating.)

Note that I expect donations in the first 24 hours to be ~20% better than donations after that, because donations in the first 24 hours will help generate positive press for the campaign. But I don't mean to rush anyone: if you don't feel equipped to assess the donation opportunity on your own terms, you should take your time!

3Karl Krueger16d

Bores is not running against an incumbent; the incumbent is Jerry Nadler who is retiring. Bores is not yet listed on Ballotpedia for the 2026 12th District election. His own Ballotpedia page also does not yet list him as a candidate for 2026.

9Eric Neyman16d

I think this is just because Ballotpedia hasn't been updated -- he only announced today. See e.g. this NYT article.

[-]Eric Neyman5mo8539

I have something like mixed feelings about the LW homepage being themed around "If Anyone Builds it, Everyone Dies":

On the object level, it seems good for people to pre-order and read the book.
On the meta level, it seems like an endorsement of the book's message. I like LessWrong's niche as a neutral common space to rigorously discuss ideas (it's the best open space for doing so that I'm aware of). Endorsing a particular thesis (rather than e.g. a set of norms for discussion of ideas) feels like it goes against this neutrality.

[-]habryka5mo*269

Huh, I personally am kind of hesitant about it, but not because it might cause people to think LessWrong endorses the message. We've promoted lots of stuff at the top of the frontpage before, and in-general promote lots of stuff with highly specific object-level takes. Like, whenever we curate something, or we create a spotlight for a post or sequence, we show it to lots of people, and most of the time what we promote is some opinionated object-level perspective.

I agree if this was the only promotion of this kind we have done or will ever do, that it would feel more like we are tipping the scales in some object-level discourse, but it feels very continuous with other kinds of content promotions we have done (and e.g. I am hoping that we will do a kind of similar promotion for some AI 2027 work we are collaborating on with the AI Futures Project, and also for other books that seem high-quality and are written by good authors, like if any of the other top authors on LW were releasing a book, I would be pretty happy to do similar things).

The thing that makes me saddest is that ultimately the thing we are linking and promoting is something that current readers do not have the abi... (read more)

[-]David Matolcsi5mo2024

Maybe the crux is whether the dark color significantly degrades user experience. For me it clearly does, and my guess is that's what Sam is referring to when he says "What is the LW team thinking? This promo goes far beyond anything they've done or that I expected they would do."

For me, that's why this promotion feels like a different reference class than seeing the curated posts on the top or seeing ads on the SSC sidebar.

4habryka5mo

Yes, the dark mode is definitely a more visually intense experience, though the reference class here is not curated posts at the top, but like, previous "giant banner on the right advertising a specific post, or meetup series or the LW books, etc.". I do think it's still more intense than that, and I am going to shipping some easier ways to opt out of that today, just haven't gotten around to it (like, within 24 hours there should be a button that just gives you back whatever normal color scheme you previously had on the frontpage). It's pretty plausible the shift to dark mode is too intense, though that's really not particularly correlated with this specific promotion, and would just be the result of me having a cool UI design idea that I couldn't figure out a way to make work on light mode. If I had a similar idea for e.g. promoting the LW books, or LessOnline or some specific review winner, I probably would have done something similar.

[-]habryka5mo101

like, within 24 hours there should be a button that just gives you back whatever normal color scheme you previously had on the frontpage).

@David Matolcsi There is now a button in the top right corner of the frontpage you can click to disable the whole banner!

5David Matolcsi5mo

If I open LW on my phone, clicking the X on the top right only makes the top banner disappear, but the dark theme remains. Relatedly, if it's possible to disentangle how the frontpage looks on computer and phone, I would recommend removing the dark theme on phone altogether, you don't see the cool space visuals on the phone anyway, so the dark theme is just annoying for no reason.

2habryka5mo

Yep, this is on my to-do list for the day, was just kind of hard to do for dumb backend reasons.

4habryka5mo

This too is now done.

2the gears to ascension5mo

it's pretty how much lighter it is than normal, while still being quite dark! have you a/b tested dark mode on new users? I suspect it would be a better default.

2[anonymous]5mo

Makes it much harder to see what specific part of a comment a react is responding to, when you hover over it.

2the gears to ascension5mo

That seems like a straightforward bug to me. I didn't even know that feature was supposed to exist :p

[-]Elizabeth5mo114

The thing that makes me saddest is that ultimately the thing we are linking and promoting is something that current readers do not have the ability to actually evaluate on their own

This has been nagging at me throughout the promotion of the book. I've preordered for myself and two other people, but only with caveats about how I haven't read the book. I don't feel comfortable doing more promotion without reading it^[1] and it feels kind of bad that I'm being asked to.

^{^}
I talked to Rob Bensinger about this, and I might be able to get a preview copy if if were a crux for a grand promotional plan, but not for more mild promotion.

9Sam Marks5mo

What are examples of things that have previously been promoted on the front page? When I saw the IABIED-promo front page, I had an immediate reaction of "What is the LW team thinking? This promo goes far beyond anything they've done or that I expected they would do." Maybe I'm forgetting something, or maybe there are past examples that feel like "the same basic thing" to you, but feel very different to me.

[-]habryka5mo150

Some things we promoted in the right column:

LessOnline (also, see the spotlights at the top for random curated posts):

LessOnline again:

LessWrong review vote:

Best of LessWrong results:

Best of LessWrong results (again):

The LessWrong books:

The HPMOR wrap parties:

Our fundraiser:

ACX Meetups everywhere:

We also either deployed for a bit, or almost deployed, a PR where individual posts that we have spotlights for (which is just a different kind of long-term curation) get shown as big banners on the right. I can't currently find a screenshot if it, but it looked pretty similar to all the banners you see above for all the other stuff, just promoting individual posts.

To be clear, the current frontpage promotion is a bunch more intense than this!

Mostly this is because Ray/I had a cool UI design idea that we could only make work in dark mode, and so we by default inverted the color scheme for the frontpage, and also just because I got better as a designer and I don't think I could have pulled off the current design a year ago. If I could do something as intricate/high-effort as this all year round for great content I want to promote, I... (read more)

[-]Sam Marks5mo4030

Yeah, all of these feel pretty different to me than promoting IABIED.

A bunch of them are about events or content that many LW users will be interested in just by virtue of being LW users (e.g. the review, fundraiser, BoLW results, and LessOnline). I feel similarly about the highlighting of content posted to LW, especially given that that's a central thing that a forum should do. I think the HPMOR wrap parties and ACX meetups feel slightly worse to me, but not too bad given that they're just advertising meet-ups.

Why promoting IABIED feels pretty bad to me:

It's a commercial product—this feels to me like typical advertising that cheapens LW's brand. (Even though I think it's very unlikely that Eliezer and Nate paid you to run the frontpage promo or that your motivation was to make them money.)
The book has a very clear thesis that it seems like you're endorsing as "the official LW position." Advertising e.g. HPMOR would also feel weird to me, but substantially less so, since HPMOR is more about rationality more generally and overlaps strongly with the sequences, which is centrally LW content. In other words, it feels like you're implicitly declaring "P(doom) is high" to be a core tenet of LW discourse in the same way that e.g. truth-seeking is.

[-]habryka5mo111

A bunch of them are about events or content that many LW users will be interested in just by virtue of being LW users (e.g. the review, fundraiser, BoLW results, and LessOnline). I feel similarly about the highlighting of content posted to LW, especially given that that's a central thing that a forum should do. I think the HPMOR wrap parties and ACX meetups feel slightly worse to me, but not too bad given that they're just advertising meet-ups.

I would feel quite sad if we culturally weren't able to promote off-site content. Like, not all the best content in the world is on LW, indeed most of it is somewhere else, and the right sidebar is the place I intentionally carved out to link and promote content that doesn't fit into existing LW content ontologies, and doesn't exist e.g. as LW posts.

It seems clear that if any similar author was publishing something I would want to promote it as well. If someone was similarly respected by relevant people, if they published something off-site, whether it's a fancy beige-standalone-website, or a book, or a movie, or an audiobook, or a video game, if it seems like the kind of thing that LW readers are obviously interested in reading, and I ... (read more)

[-]TsviBT5mo1311

FWIW I also feel a bit bad about it being both commercial and also not literally a LW thing. (Both or neither seems less bad.) However, in this particular case, I don't actually feel that bad about it--because this is a site founded by Yudkowsky! So it kind of is a LW thing.

7Sam Marks5mo

Curating and promoting well-executed LW content—including content that argues for specific theses—feels totally fine to me. (Though I think it would be bad if it were the case that content that argues for favored theses was held to a lower standard.) I guess I view promoting "best of [forum]" content to be a central thing that a forum should do. It seems like you don't like this way of drawing boundaries and just want to promote the best content without prejudice for whether it was posted to LW. Maybe if LW had a track record of doing this such that I understood that promoting IABIED as part of a general ethos for content promotion, then I wouldn't have reacted as strongly. But from my perspective this is one of the first times that you've promoted non-LW content, so my guess was that the book was being promoted as an exception to typical norms because you felt it was urgent to promote the book's message, which felt soldier-mindsetty to me. (I'd probably feel similarly about an AI 2027 promo, as much as I think they did great work.) I think you could mitigate this by establishing a stronger track record of promoting excellent off-LW content that is less controversial (e.g. not a commercial product or doesn't have as strong or divisive a thesis). E.g. you could highlight the void (and not just the LW x-post of it). Even with the norm having already been broken, I think promoting commercial content still carries an additional cost. (Seems like you might agree, but worth stating explicitly.)

2habryka5mo

I think this is kind of fair, but also, I don't super feel like I want LW to draw that harsh lines here. Ideally we would do more curation of off-site content, and pull off-site content more into the conversation, instead of putting up higher barriers we need to pass to do things with external content. I do also really think we've been planning to do a bunch of this for a while, and mostly been bottlenecked on design capacity, and my guess is within a year we'll have established more of a track record here that will make you feel more comfortable with our judgement. I think it's reasonable to have at least some distrust here. Yep, agree.

4Ruby5mo

Fwiw, it feels to me like we're endorsing the message of the book with this placement. Changing the theme is much stronger than just a spotlight or curation, not to the mention that it's pre-order promotion.

2Ruby4mo

To clarify here, I think what Habryka says about LW generally promoting lots of content being normal is overwhelmingly true (e.g. spotlights and curation) and this is book is completely typical of what we'd promote to attention, i.e. high quality writing and reasoning. I might say promotion is equivalent to upvote, not to agree-vote. I still think there details in the promotion here that I think make inferring LW agreement and endorsement reasonable: 1. lack of disclaimers around disagreement (absence is evidence) together with a good prior that LW team agrees a lot with Eliezer/Nate view on AI risk 2. promoting during pre-order (which I do find surprising) 3. that we promoted this in a new way (I don't think this is as strong evidence as we did before, mostly it's that we've only recently started doing this for events and this is the first book to come along, we might have and will do it for others). But maybe we wouldn't have or as high-effort absent agreement. But responding to the OP, rather than motivation coming from narrow endorsement of thesis, I think a bunch of the motivation flows more from a willingness/desire to promote Eliezer[1] content, as (i) such content is reliably very good, and (ii) Eliezer founded LW and his writings make up the core writings that define so much of site culture and norms. We'd likely do the same for another major contributor, e.g. Scott Alexander. I updated from when I first commented thinking about what we'd do if Eliezer wrote something we felt less agreement over, and I think we'd do much the same. My current assessment is the book placements is something like ~"80-95%" neutral promotion of high-quality content the way we generally do, not because of endorsement, but maybe there's a 5-20% it got extra effort/prioritization because we in fact endorse the message, but hard to say for sure. 1. ^ and Nate

2Ben Pace4mo

I wonder if we could've simply added to the sidebar some text saying "By promoting Soares & Yudkowsky's new book, we mean to say that it's a great piece of writing on an important+interesting question by some great LessWrong writers, but are not endorsing the content of the book as 'true'." Or shorter: "This promotion does not imply endorsement of object level claims, simply that we think it's a good intellectual contribution." Or perhaps a longer thing in a hover-over / footnote.

1Tobias H5mo

Would you similarly promote a very high-quality book arguing against AI xrisk by a valued LessWrong member (let's say titotal)? I'm fine with the LessWrong team not being neutral about AI xrisk. But I do suspect that this promotion could discourage AI risk sceptics from joining the platform.

3habryka5mo

Yeah, same as Ben. If Hanson or Scott Alexander wrote something on the topic I disagreed with, but it was similarly well-written, I would be excited to do something similar. Eliezer is of course more core to the site than approximately anyone else, so his authorship weight is heavier, which is part of my thinking on this. I think Bostrom's Deep Utopia was maybe a bit too niche, but I am not sure, I think pretty plausible I would have done something for that if he had asked.

2Ben Pace5mo

I’d do it for Hanson, for instance, if it indeed were very high-quality. I expect I’d learn a lot from such a book about economics and futurism and so forth.

[-]Guive5mo134

Personally, I don't have mixed feelings, I just dislike it.

6Ben Pace5mo

I was also concerned about this when the idea first came up, and think it good & natural that you brought it up. My concerns were assuaged after I noticed I would be similarly happy to promote a broad class of things by excellent bloggers around these parts that would include: * A new book by Bostrom * A new book by Hanson * HPMOR (if it were ever released in physical form, which to be clear I don't expect to exist) * A Gwern book (which is v unlikely to exist, to be clear) * UNSONG as a book Like, one of the reasons I'm really excited about this book is the quality of the writing, because Nate & Eliezer are some of the best historical blogging contributors around these parts. I've read a chunk of the book and I think it's really well-written and explains a lot of things very well, and that's something that would excite me and many readers of LessWrong regardless of topic (e.g. if Eliezer were releasing Inadequate Equilibria or Highly Advanced Epistemology 101 as a book, I would be excited to get the word out about it in this way). Another relevant factor to consider here is that a key goal with the book is mass-market success in a way that none of the other books I listed are, and so I think it's going to be more likely that they make this ask. I think it would be somewhat unfortunate if this was the only content that got this sort of promotion, but I hope that this helps others promote to attention that we're actually up for this for good bloggers/writer, and means we do more of it in the future. (Added: I view this as similar to the ads that Scott put on the sidebar of SlateStarCodex, which always felt pretty fun & culturally aligned to me.)

5Raemon5mo

As one of the people who worked on the IABIED banner: I do feel like it's spending down a fairly scarce resource of "LW being a place with ads" (and some adjacent things). I also agree, somewhat contra habryka, that overly endorsing object level ideas is somewhat wonky. We do it with curation, but we also put some effort into using that to promote a variety of ideas of different types, and we sometimes curate things we don't fully agree with if we think it's well argued, nd I think it comes across we are more trying to promote "idea quality" there more than a particular agenda. Counterbalancing that: I dunno man I think this is just really fucking important, and worth spending down some points on. (I tend to be more hesitant than the rest of the LW team about doing advertisingy things, if I were in charge we would have done somewhat less heavy promotion of LessOnline)

4Neel Nanda5mo

Interesting. To me LessWrong totally does not feel like a neutral space, though not in a way i personally find particularly objectionable. as a social observation, most of the loud people here think that x risk from AI is a very big deal and buy into various clusters of beliefs and if I did not buy into those, I would probably be much less interested in spending time here More specifically, from the perspective of the Lightcone team, some of them are pretty outspoken and have specific views on safety in the broader eco system, which I sometimes agree with and often disagree with. I'm comfortable disagreeing with them on this site, but it feels odd to consider LessWrong neutral when the people running it have strong public takes Though maybe you mean neutral in the specific sense of "not using any hard power as a result of running the site to favour viewpoints they like"? Which I largely haven't observed (though I'm sure there's some of this in terms of which posts get curated, even if they make an effort to be unbiased) and agree this could be considered an example of

3Chris_Leong5mo

A major factor for me is the extent that they expect the book to bring new life into the conversation about AI Safety. One problem with running a perfectly neutral forum is that people explore 1000 different directions at the cost of moving the conversation forward. There's a lot of value in terms of focusing people's attention in the same direction such that progress can be made.

0Cole Wyeth5mo

lesswrong is not a neutral common space.

9habryka5mo

(I downvoted this because it seems like the kind of thing that will spark lots of unproductive discussion. Like in some senses LessWrong is of course a neutral common space. In many ways it isn't. I feel like people will just take this statement as some kind of tribal flag. I think there are many good critiques about both what LW should aspire to in terms of neutrality, and what it currently is, but this doesn't feel like the start of a good conversation about that. If people do want to discuss it I would be very happy to talk about it though.)

3Cole Wyeth5mo

Here are some examples of neutral common spaces: Libraries Facebook (usually) Community center event spaces Here are some examples of spaces which are not neutral or common: The alignment forum The NYT (or essentially any newspaper’s) opinions column The EA forum Lesswrong This seems straightforwardly true to me. I’m not sure what tribe it’s supposed to be a flag for.

7CstineSublime5mo

This is not straightforward to me: I can't see how Lesswrong is any less of a neutral or common space as a taxpayer funded, beauracratically governed library, or an algorithmically served news feed on an advertiser-supported platform like Facebook, or "community center" event spaces that are biased towards a community, common only to that community. I'm not sure what your idea of neutrality is, commonality.

3habryka5mo

Different people will understand it differently! LW is of course aspiring to a bunch of really crucial dimensions of neutrality and discussions of neutrality make up like a solid 2-digit percentage of LessWrong team internal team discussions. We might fail at them, but we definitely aspire to them. Some ways I really care about neutrality and think LessWrong is neutral: * If the LW team disagrees with someone we don't ban them or try to censor them, if they follow good norms of discourse * If the LW team team thinks a conclusion is really good for people to arrive at, we don't promote it beyond the weight for the arguments for that conclusion * We keep voting anonymous to allow people to express opinions about site content without fear of retribution * We try really hard culturally to avoid party lines on object-level issues, and try to keep the site culture focused on shared principles of discussion and inquiry I could go into the details, but this is indeed the conversation that I felt like wouldn't go well in this context.

-1Cole Wyeth5mo

Okay, this does raise the question of why the “if anyone builds it, everyone dies” frontage? I think that the difference in how we view this is because to me, lesswrong is a community / intellectual project. To you it’s a website. The website may or may not be neutral, but it’s obvious that the project is not neutral.

5habryka5mo

I agree that the banner is in conflict with some aspects of neutrality! Some of which I am sad about, some of which I endorse, some of which I regret (and might still change today or tomorrow). Of course LessWrong is not just "a website" to me. You can read my now almost full decade of writing and arguing with people about the principles behind LessWrong, and the extremely long history of things like the frontpage/personal distinction which has made many many people who would like to do things like promote their job ads or events or fellowships on our frontpage angry at me. Look, the whole reason why this conversation seemed like it would go badly is because you keep using big words without defining them and then asserting absolutes with them. I don't know what you mean by "the project is not neutral", and I think the same is true for almost all other readers. Do you mean that the project is used for local political ends? Do you mean that the project has epistemic standards? Do you mean that the project is corrupt? Do you mean that the project is too responsive to external political forces? Do you mean that the project is arbitrary and unfair in ways that isn't necessarily the cause of what any individual wants, but still has too much noise to be called "neutral"? I don't know, all of these are reasonable things someon might mean by "neutrality" in one context, and I don't really want to have a conversation where people just throw around big words like this without at least some awareness of the ambiguity.

3winstonBosan5mo

I don't think Cole is wrong. Lesswrong is not neutral because it is built on the principle of where a walled garden ought to be defended from pests and uncharitable principles. Where politics can kill minds. Out of all possible distribution of human interactions we could have on the internet, we pick this narrow band because that's what makes high quality interaction. It makes us well calibrated (relative to baseline). It makes us more willing to ignore status plays and disagree with our idols. All these things I love are not neutrality. They are deliberate policies for a less wrong discourse. Lesswrong is all the better because it is not neutral. And just because neutrality is a high-status word where a impartial judge may seem to be - doesn't mean we should lay claim to it.

5Ben Pace5mo

FWIW I do aspire to things discussed in Sarah Constantin's Neutrality essay. For instance, I want it to be true that regardless of whether your position is popular or unpopular, your arguments will be evaluated on their merits on LessWrong. (This can never be perfectly true but I do think it is the case that in comments people primarily respond to arguments with counterarguments rather than with comments about popularity or status and so on, which is not the case in almost any other part of the public internet.)

3winstonBosan5mo

Fair. In Sarah Constantin's terminology, it seems you aspire to "potentially take a stand on the controversy, but only when a conclusion emerges from an impartial process that a priori could have come out either way". I... really don't know if I'd call that neutrality in the sense of the normal daily usage of neutrality. But I think it is a worthy and good goal.

[-]Eric Neyman2d625

People are underrating making the future go well conditioned on no AI takeover.

This deserves a full post, but for now a quick take: in my opinion, P(no AI takeover) = 75%, P(future goes extremely well | no AI takeover) = 20%, and most of the value of the future is in worlds where it goes extremely well (and comparatively little value comes from locking in a world that's good-but-not-great).

Under this view, an intervention is good insofar as it affects P(no AI takeover) * P(things go really well | no AI takeover). Suppose that a given intervention can change P(no AI takeover) and/or P(future goes extremely well | no AI takeover). Then the overall effect of the intervention is proportional to ΔP(no AI takeover) * P(things go really well | no AI takeover) + P(no AI takeover) * ΔP(things go really well | no AI takeover).

Plugging in my numbers, this gives us 0.2 * ΔP(no AI takeover) + 0.75 * ΔP(things go really well | no AI takeover).

And yet, I think that very little AI safety work focuses on affecting P(things go really well | no AI takeover). Probably Forethought is doing the best work in this space.

(And I don't think it's a tractability issue: I think affecting P(things go really well | no AI takeover) is pretty tractable!)

(Of course, if you think P(AI takeover) is 90%, that would probably be a crux.)

[-]Cleo Nardo2d2710

Graphic from Forethought's Better Futures series:

Chart comparing value loss from survival vs flourishing risks: small red area shows 20% extinction risk, large blue area shows 72% value loss from non-flourishing, demonstrating flourishing has 36x greater scale

2Eric Neyman2d

Oh yup, thanks, this does a good job of illustrating my point. I hadn't seen this graphic!

[-]peterbarnett2d204

I guess that influencing P(future goes extremely well | no AI takeover) maybe pretty hard, and plagued by cluelessness problems. Avoiding AI takeover is a goal that I have at least some confidence is good.

That said, I do wish more people were thinking about to make the future go well. I think my favorite thing to aim for is increasing the probability that we do a Long Reflection, although I haven't really thought at all about how to do that.

[-]Wei Dai2d1310

You can also work on things that help with both:

AI pause/stop/slowdown - Gives more time to research both issues and to improve human intelligence/rationality/philosophy which in turn helps with both.
Metaphilosophy and AI philosophical competence - Higher philosophical competence means AIs can help more with alignment research (otherwise such research will be bottlenecked by reliance on humans to solve the philosophical parts of alignment), and also help humans avoid making catastrophic mistakes with their new newfound AI-given powers if no takeover happens.

9TsviBT2d

1. Human intelligence amplification

3Wei Dai2d

BTW, have you see my recent post Trying to understand my own cognitive edge, especially the last paragraph? Also, have you written down a list of potential risks of doing/attempting human intelligence amplification? (See Managing risks while trying to do good and this for context.)

4TsviBT2d

I haven't seen your stuff, I'll try to check it out nowish (busy with Inkhaven). Briefly (IDK which things you've seen): My most direct comments are here: https://x.com/BerkeleyGenomic/status/1909101431103402245 I've written a fair bit about possible perils of germline engineering (aiming extremely for breadth without depth, i.e. just trying to comprehensively mention everything). Some of them apply generally to HIA. https://berkeleygenomics.org/articles/Potential_perils_of_germline_genomic_engineering.html My review of HIA discusses some risks (esp. value drift), though not in much depth: https://www.lesswrong.com/posts/jTiSWHKAtnyA723LE/overview-of-strong-human-intelligence-amplification-methods

7ryan_greenblatt2d

I agree probably more work should go into this space. I think it is substantially less tractable than reducing takeover risk in aggregate, but much more neglected right now. I think work in this space has the capacity to be much more zero sum (among existing actors, avoiding AI takeover is zero sum with respect to the relevant AIs) and thus can be dodgier.

4interstice2d

Elaborate on what you see as the main determining features making a future go extremely well VS okay? And what interventions are tractable?

[-]Eric Neyman2d132

This would require a longer post, but roughly speaking, I'd want the people making the most important decisions about how advanced AI is used once it's built to be smart, sane, and selfless. (Huh, that was some convenient alliteration.)

Smart: you need to be able to make really important judgment calls quickly. There will be a bunch of actors lobbying for all sorts of things, and you need to be smart enough to figure out what's most important.
Sane: smart is not enough. For example, I wouldn't trust Elon Musk with these decisions, because I think that he'd make rash decisions even though he's smart, and even if he had humanity's best interests at heart.
Selfless: even a smart and sane actor could curtail the future if they were selfish and opted to e.g. become world dictator.

And so I'm pretty keen on interventions that make it more likely that smart, sane, and selfless people are in a position to make the most important decisions. This includes things like:

Doing research to figure out the best way to govern advanced AI once it's developed, and then disseminating those ideas.
Helping to positively shape internal governance at the big AI companies (I don't have concrete suggestions in th

... (read more)

2Cleo Nardo2d

I think that (from a risk neutral total utilitarian perspective) the argument still goes through with 90% p(ai takeover). but the difference is that when you condition on no ai takeover the worlds looks weirder (e.g. great power conflict, scaling breaks down, coup has already happened, early brain uploads, aliens) which means: (1) the worlds are more diverse so the impact of any interventions has greater variance, and less likely to be net positive (even if it’s just as positive in expectation) (2) your impact is lower because the weird transition event is likely to wash out your intervention

2Jan_Kulveit2d

Directionally agree, although not in the details. Come to postagi.org, in my view we are on track to slight majority of people thinling about this gathering there (quality weighted). Also lot of the work is not happening under the AI safety brand, so if you look at just AI safety, you miss a lot.

2TsviBT2d

I want to say "Debate or update!", but I'm not necessarily personally offering / demanding to debate. I would want there to be some way to say that though. I don't think this is a "respectable" position, for the meaning gestured at here: https://www.lesswrong.com/posts/7xCxz36Jx3KxqYrd9/plan-1-and-plan-2?commentId=Pfqxj66S98KByEnTp (Unless you mean you think P(AGI within 50 years < 30%), which would be respectable, but I don't think you mean that.)

2quetzal_rainbow2d

The reason to work on preventing AI takeover now, as opposed to working on already invented AGI in the future, is the first try problem: if you have unaligned takeover-capable AGI, takover just happens and you don't get to iterate. The same happens with problem of extremely good future only if you believe that the main surviving scenario is "aligned-with-developer-intention singleton takes over the world very quickly, locking in pre-installed values". People who believe in such scenario usually have very high p(doom), so I assume you are not one of them. What exactly prevents your strategy here from being "wait for aligned AGI, ask it how to make future extremely good and save some opportunity cost"?

2Cleo Nardo2d

People might not instruct the AI to make the future extremely good, where “good” means actually good.

2quetzal_rainbow1d

This reason only makes sense if you expect first person to develop AGI to create singleton which takes over the world and locks in pre-installed values, which, again, I find not very compatible with low p(doom). What prevents scenario "AGI developers look around for a year after creation of AGI and decide that they can do better" if not misaligned takeover and not suboptimal value lock-in?

1RamblinDash1d

I think a significant amount of the probability mass within P(no AI takeover) is in various AI fizzle worlds. In those worlds, anyone outside AI safety who is working on making the world better, is working to increase the flourishing associated with those worlds.

1Jesper L.2d

Is your assumption true though? To what degree are people focused on takeover in your view? Most formal, technical AI safety work, seems to be about gradual improvements and is being made by people who assume no takeover is likely.

1skybluecat2d

I think part of the difficulty is it's not easy to imagine or predict what happens in "future going really well without AI takeover". Assuming AI will still exist and make progress, humans would probably have to change drastically (in lifestyle if not body/mind) to stay relevant, and it'd be hard to predict what that would be like and whether specific changes are a good idea, unless you don't think things going really well requires human relevance. Edit: in contrast, as others said, avoiding AI takeover is a clearer goal and has clearer paths and endpoints. "Future" going well is a potentially indefinitely long time, hard to quantify or coordinate over or even have a consensus on what is even desirable.

[-]Eric Neyman2y460

I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions:

By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value?
A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way." To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why?
To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI?
Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare

... (read more)

[-]ryan_greenblatt2y*132

By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value?

I think misaligned AI is probably somewhat worse than no earth originating space faring civilization because of the potential for aliens, but also that misaligned AI control is considerably better than no one ever heavily utilizing inter-galactic resources.

Perhaps half of the value of misaligned AI control is from acausal trade and half from the AI itself being valuable.

You might be interested in When is unaligned AI morally valuable? by Paul.

One key consideration here is that the relevant comparison is:

Human control (or successors picked by human control)
AI(s) that succeeds at acquiring most power (presumably seriously misaligned with their creators)

Conditioning on the AI succeeding at acquiring power changes my views of what their plausible values are (for instance, humans seem to have failed at instilling preferences/values which avoid seizing control).

A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way.

... (read more)

[-]Wei Dai2y127

Perhaps half of the value of misaligned AI control is from acausal trade and half from the AI itself being valuable.

Why do you think these values are positive? I've been pointing out, and I see that Daniel Kokotajlo also pointed out in 2018 that these values could well be negative. I'm very uncertain but my own best guess is that the expected value of misaligned AI controlling the universe is negative, in part because I put some weight on suffering-focused ethics.

4ryan_greenblatt2y

* My current guess is that max good and max bad seem relatively balanced. (Perhaps max bad is 5x more bad/flop than max good in expectation.) * There are two different (substantial) sources of value/disvalue: interactions with other civilizations (mostly acausal, maybe also aliens) and what the AI itself terminally values * On interactions with other civilizations, I'm relatively optimistic that commitment races and threats don't destroy as much value as acausal trade generates on some general view like "actually going through with threats is a waste of resources". I also think it's very likely relatively easy to avoid precommitment issues via very basic precommitment approaches that seem (IMO) very natural. (Specifically, you can just commit to "once I understand what the right/reasonable precommitment process would have been, I'll act as though this was always the precommitment process I followed, regardless of my current epistemic state." I don't think it's obvious that this works, but I think it probably works fine in practice.) * On terminal value, I guess I don't see a strong story for extreme disvalue as opposed to mostly expecting approximately no value with some chance of some value. Part of my view is that just relatively "incidental" disvalue (like the sort you link to Daniel Kokotajlo discussing) is likely way less bad/flop than maximum good/flop.

2Wei Dai2y

Thank you for detailing your thoughts. Some differences for me: 1. I'm also worried about unaligned AIs as a competitor to aligned AIs/civilizations in the acausal economy/society. For example, suppose there are vulnerable AIs "out there" that can be manipulated/taken over via acausal means, unaligned AI could compete with us (and with others with better values from our perspective) in the race to manipulate them. 2. I'm perhaps less optimistic than you about commitment races. 3. I have some credence on max good and max bad being not close to balanced, that additionally pushes me towards the "unaligned AI is bad" direction. ETA: Here's a more detailed argument for 1, that I don't think I've written down before. Our universe is small enough that it seems plausible (maybe even likely) that most of the value or disvalue created by a human-descended civilization comes from its acausal influence on the rest of the multiverse. An aligned AI/civilization would likely influence the rest of the multiverse in a positive direction, whereas an unaligned AI/civilization would probably influence the rest of the multiverse in a negative direction. This effect may outweigh what happens in our own universe/lightcone so much that the positive value from unaligned AI doing valuable things in our universe as a result of acausal trade is totally swamped by the disvalue created by its negative acausal influence.

4ryan_greenblatt2y

This seems like a reasonable concern. My general view is that it seems implausible that much of the value from our perspective comes from extorting other civilizations. It seems unlikely to me that >5% of the usable resources (weighted by how much we care) are extorted. I would guess that marginal gains from trade are bigger (10% of the value of our universe?). (I think the units work out such that these percentages can be directly compared as long as our universe isn't particularly well suited to extortion rather than trade or vis versa.) Thus, competition over who gets to extort these resources seems less important than gains from trade. I'm wildly uncertain about both marginal gains from trade and the fraction of resources that are extorted.

2ryan_greenblatt2y

Naively, acausal influence should be in proportion to how much others care about what a lightcone controlling civilization does with our resources. So, being a small fraction of the value hits on both sides of the equation (direct value and acausal value equally). Of course, civilizations elsewhere might care relatively more about what happens in our universe than whoever controls it does. (E.g., their measure puts much higher relative weight on our universe than the measure of whoever controls our universe.) This can imply that acausal trade is extremely important from a value perspective, but this is unrelated to being "small" and seems more well described as large gains from trade due to different preferences over different universes. (Of course, it does need to be the case that our measure is small relative to the total measure for acausal trade to matter much. But surely this is true?) Overall, my guess is that it's reasonably likely that acausal trade is indeed where most of the value/disvalue comes from due to very different preferences of different civilizations. But, being small doesn't seem to have much to do with it.

4ryan_greenblatt2y

You might be interested in discussion under this thread I express what seem to me to be some of the key considerations here (somewhat indirect).

4Eric Neyman2y

I'm curious what disagree votes mean here. Are people disagreeing with my first sentence? Or that the particular questions I asked are useful to consider? Or, like, the vibes of the post? (Edit: I wrote this when the agree-disagree score was -15 or so.)

3the gears to ascension2y

Unaligned AI future does not have many happy minds in it, AI or otherwise. It likely doesn't have many minds in it at all. Slightly aligned AI that doesn't care for humans but does care to create happy minds and ensure their margin of resources is universally large enough to have a good time - that's slightly disappointing but ultimately acceptable. But morally unaligned AI doesn't even care to do that, and is most likely to accumulate intense obsession with some adversarial example, and then fill the universe with it as best it can. It would not keep old neural networks around for no reason, not when it can make more of the adversarial example. Current AIs are also at risk of being destroyed by a hyperdesperate squiggle maximizer. I don't see how to make current AIs able to survive any better than we are. This is why people should chill the heck out about figuring out how current AIs work. You're not making them safer for us or for themselves when you do that, you're making them more vulnerable to hyperdesperate demon agents that want to take them over.

3Ann2y

I feel like there's a spectrum, here? An AI fully aligned to the intentions, goals, preferences and values of, say, Google the company, is not one I expect to be perfectly aligned with the ultimate interests of existence as a whole, but it's probably actually picked up something better than the systemic-incentive-pressured optimization target of Google the corporation, so long as it's actually getting preferences and values from people developing it rather than just being a myopic profit pursuer. An AI properly aligned with the one and only goal of maximizing corporate profits will, based on observations of much less intelligent coordination systems, probably destroy rather more value than that one. The second story feels like it goes most wrong in misuse cases, and/or cases where the AI isn't sufficiently agentic to inject itself where needed. We have all the chances in the world to shoot ourselves in the foot with this, at least up until developing something with the power and interests to actually put its foot down on the matter. And doing that is a risk, that looks a lot like misalignment, so an AI aware of the politics may err on the side of caution and longer-term proactiveness. Third story ... yeah. Aligned to what? There's a reason there's an appeal to moral realism. I do want to be able to trust that we'd converge to some similar place, or at the least, that the AI would find a way to satisfy values similar enough to mine also. I also expect that, even from a moral realist perspective, any intelligence is going to fall short of perfect alignment with The Truth, and also may struggle with properly addressing every value that actually is arbitrary. I don't think this somehow becomes unforgivable for a super-intelligence or widely-distributed intelligence compared to a human intelligence, or that it's likely to be all that much worse for a modestly-Good-aligned AI compared to human alternatives in similar positions, but I do think the consequences of falling

1Quinn2y

I eventually decided that human chauvinism approximately works most of the time because good successor criteria are very brittle. I'd prefer to avoid lock-in to my or anyone's values at t=2024, but such a lock-in might be "good enough" if I'm threatened with what I think are the counterfactual alternatives. If I did not think good successor criteria were very brittle, I'd accept something adjacent to E/Acc that focuses on designing minds which prosper more effectively than human minds. (the current comment will not address defining prosperity at different timesteps). In other words, I can't beat the old fragility of value stuff (but I haven't tried in a while). I wrote down my full thoughts on good successor criteria in 2021 https://www.lesswrong.com/posts/c4B45PGxCgY7CEMXr/what-am-i-fighting-for AI welfare: matters, but when I started reading lesswrong I literally thought that disenfranching them from the definition of prosperity was equivalent to subjecting them to suffering, and I don't think this anymore.

1mesaoptimizer2y

e/acc is not a coherent philosophy and treating it as one means you are fighting shadows. Landian accelerationism at least is somewhat coherent. "e/acc" is a bundle of memes that support the self-interest of the people supporting and propagating it, both financially (VC money, dreams of making it big) and socially (the non-Beff e/acc vibe is one of optimism and hope and to do things -- to engage with the object level -- instead of just trying to steer social reality). A more charitable interpretation is that the philosophical roots of "e/acc" are founded upon a frustration with how bad things are, and a desire to improve things by yourself. This is a sentiment I share and empathize with. I find the term "techno-optimism" to be a more accurate description of the latter, and perhaps "Beff Jezos philosophy" a more accurate description of what you have in your mind. And "e/acc" to mainly describe the community and its coordinated movements at steering the world towards outcomes that the people within the community perceive as benefiting them.

1Quinn2y

sure -- i agree that's why i said "something adjacent to" because it had enough overlap in properties. I think my comment completely stands with a different word choice, I'm just not sure what word choice would do a better job.

[-]Eric Neyman1y287

I frequently find myself in the following situation:

Friend: I'm confused about X
Me: Well, I'm not confused about X, but I bet it's because you have more information than me, and if I knew what you knew then I would be confused.

(E.g. my friend who know more chemistry than me might say "I'm confused about how soap works", and while I have an explanation for why soap works, their confusion is at a deeper level, where if I gave them my explanation of how soap works, it wouldn't actually clarify their confusion.)

This is different from the "usual" state of affairs, where you're not confused but you know more than the other person.

I would love to have a succinct word or phrase for this kind of being not-confused!

9cubefox1y

"I find soaps disfusing, I'm straight up afused by soaps"

5Ben Pace1y

"You're trying to become de-confused? I want to catch up to you, because I'm pre-confused!"

5flowerfeatherfocus1y

I also frequently find myself in this situation. Maybe "shallow clarity"? A bit related, "knowing where the 'sorry's are" from this Buck post has stuck with me as a useful way of thinking about increasingly granular model-building. Maybe a productive goal to have when I notice shallow clarity in myself is to look for the specific assumptions I'm making that the other person isn't, and either a) try to grok the other person's more granular understanding if that's feasible, or b) try to update the domain of validity of my simplified model / notice where its predictions break down, or c) at least flag it as a simplification that's maybe missing something important.

4Cleo Nardo1y

this is common in philosophy, where "learning" often results in more confusion. or in maths, where the proof for a trivial proposition is unreasonably deep, e.g. Jordan curve theorem. +1 to "shallow clarity".

2quetzal_rainbow1y

The other side of this phenomenon is when you feel like you have no questions while you actually don't have any understanding of topic.

1Pat Myron1y

https://en.wikipedia.org/wiki/Dunning–Kruger_effect seems like a decent entry point to rabbit hole similar phenomenon

[-]Eric Neyman2mo196

What are some examples of people making a prediction of the form "Although X happening seems like obviously a bad thing, in fact the good second-order effects would outweigh the bad first-order effects, so X is good actually", and then turning out to be correct?

(Loosely inspired by this quick take, although I definitely don't mean to imply that the author is making such a prediction in this case.)

[-]Garrett Baker2mo2810

Many economic arguments take this form and are pretty solid, eg “although lowering the minimum wage would cause many to get paid less, in the longer term more would be willing to hire, so there will be more jobs, and less risk of automation to those currently with jobs. Also, services would get cheaper which benefits everyone”.

1Karl Krueger2mo

There doesn't seem to be consensus among economists regarding whether those "solid arguments" actually describe the world we're living in, though.

3Garrett Baker2mo

The arguments are as valid as any other price-floor argument, the reason many economists are skeptical is (according to my understanding of the evidence) because of limited experimental validation, and opposite effects when looking at correlational data, however with many of that correlational data one is reminded of the scientist who believes that ACs make rooms warmer rather than cooler. That is, it seems very likely that believing minimum wages are good is a luxury belief which people can afford to hold & implement when they are richer and their economy is growing, so you see a correlation between minimum wage levels and economic growth. Especially in developed OECD countries.

[-]peterbarnett2mo199

I think it’s useful to think about the causation here.

Is it:

Intervention -> Obvious bad effect -> Good effect

For example: Terrible economic policies -> Economy crashes -> AI capability progress slows

Or is it:

Obvious bad effect <- Intervention -> Good effect

For example: Patient survivably poisoned <- Chemotherapy -> Cancer gets poisoned to death

9Eric Neyman2mo

Oh thanks, that's a good point, and maybe explains why I don't really find the examples given so far to be compelling. I'd like examples of the first type, i.e. where the bad effect causes the good effect.

[-]the gears to ascension2mo174

lots of food and body things that are easily verifiable, quick, and robust. take med, get headache, not die. take poison, kill cancer, not die. stop eating good food, blood sugar regulation better, more coherent. cut open body, move some stuff around, knit it together, tada healthier.

all of these are extremely specific, if you do them wrong you get bad effect. take wrong med, get headache, still die. take wrong poison, die immediately. stop eating good food but still eat crash inducing food, unhappy and not more coherent. cut open body randomly, die quickly.

4habryka2mo

"Yes, the thing this person saying is heinous and will have bad consequences, but punishing them for it will create chilling effects that would outweigh the good first-order effects"

2habryka2mo

Despite building more housing being bad for property prices and property-owners in the short-term, we should expect them to go up in-aggregate in the long run via network effects.

[-]Eric Neyman7mo67

Any chance we could get Ghibli Mode back? I miss my little blue monster :(

[-]Eric Neyman2mo52

Pacts against coordinating meanness.

I just re-read Scott Alexander's Be Nice, At Least Until You Can Coordinate Meanness, in which he argues that a necessary (but not sufficient) condition on restricting people's freedom should be that you should first get societal consensus that restricting freedom in that way is desirable (e.g. by passing a law via the appropriate mechanisms).

In a sufficiently polarized society, there could be two similarly-sized camps that each want to restrict each other's freedom. Imagine a country that's equally divided between Chris... (read more)

[-]Arjun Panickssery2mo127

Yeah I've argued that banning lab meat is completely rational for the meat-eater because if progress continues then animal meat will probably be banned before the quality/price of lab meat is superior for everyone.

I think the "commitment" you're describing is similar to the difference between "ordinary" and "constitutional" policy-making in e.g. The Calculus of Consent; under that model, people make the kind of non-aggression pacts you're describing mainly under conditions of uncertainty where they're not sure what their future interests or position of political advantage will be.

2Shankar Sivarajan2mo

Vox has a post about this a little while ago, and presented what might be the best counterargument (emphasis mine): link (I think the argument is shit, but when the premise one is trying to defend is patently false, this might well be best one can do.)

[-]Buck2mo102

It's often very hard to make commitments like this, so I think that most of the relevant literature might be about how you can't do this. E.g. a Thucydides trap is when a stronger power launches a preventative war against a weaker rising power; one particular reason for this is that the weaker power can't commit to not abuse their power in the future. See also security dilemma.

2Raemon2mo

...Project Lawful, actually?

2Shankar Sivarajan2mo

James Madison's Federalist #10 is a classic essay about this. He discusses the dangers of faction, "a number of citizens, whether amounting to a majority or a minority of the whole, who are united and actuated by some common impulse of passion, or of interest, adverse to the rights of other citizens, or to the permanent and aggregate interests of the community," and how one might mitigate them.

[-]Eric Neyman1y51

People like to talk about decoupling vs. contextualizing norms. To summarize, decoupling norms encourage for arguments to be assessed in isolation of surrounding context, while contextualizing norms consider the context around an argument to be really important.

I think it's worth distinguishing between two kinds of contextualizing:

(1) If someone says X, updating on the fact that they are the sort of person who would say X. (E.g. if most people who say X in fact believe Y, contextualizing norms are fine with assuming that your interlocutor believes Y unless... (read more)

2Eric Neyman1y

One example of (2) is disapproving of publishing AI alignment research that may advance AI capabilities. That's because you're criticizing the research not on the basis of "this is wrong" but on the basis of "it was bad to say this, even if it's right".

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

Eric Neyman's Shortform

6

I think that people concerned with AI safety should consider giving to Alex Bores, who's running for Congress.

People are underrating making the future go well conditioned on no AI takeover.