magic9mushroom — LessWrong

Why do many people who care about AI Safety not clearly endorse PauseAI?

Obviously P(doom | no slowdown) < 1.

This is not obvious. My P(doom|no slowdown) is like 0.95-0.97, the difference from 1 being essentially "maybe I am crazy or am missing something vital when making the following argument".

Instrumental convergence suggests that the vast majority of possible AGI will be hostile. No slowdown means that neural-net ASI will be instantiated. To get ~doom from this, you need some way to solve the problem of "what does this code do when run" with extreme accuracy in order to only instantiate non-hostile neural-net ASI (you need "extreme" accuracy because you're up against the rare disease problem a.k.a. false positive paradox; true positives are extremely rare, so a positive alignment result from a 99%-accurate test is still almost certainly a false positive). Unfortunately, the "what does this code do when run" problem has a name, the "halting problem", and it's literally the first problem in computer science ever proven to be unsolvable in the general case.

And, sure, the general case being unsolvable doesn't mean that the case you care about is unsolvable. GOFAI has a good argument for being a special case, because human-written source code is quite useful to understanding a program. Neural nets... don't. At least, they don't in the case we care about; "I am smarter than the neural net" is also a plausible special case, but that's obviously no help with neural-net ASI.

My P(doom) is a lot lower than 0.95, but that's because I think slowdown is fairly likely, due to warning shots/nuclear war/maybe direct political success (key result from the middle one: if you want to stop AI, it is helpful to ensure you'll survive a nuclear war in order to help lock it down then). But my stance on aligning neural nets? "It is impossible to solve the true puzzle from inside this [field], because the key piece is not here." Blind alley. Abort.

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

magic9mushroom7mo50

If I want to pre-order but don't use Internet marketplaces and don't have a credit card, are there options for that (e.g. going to a physical store and asking them to pre-order)?

AI X-risk is a possible solution to the Fermi Paradox

magic9mushroom11mo10Review for 2023 Review

You're encouraged to write a self-review, exploring how you think about the post today. Do you still endorse it? Have you learned anything new that adds more depth? How might you improve the post? What further work do you think should be done exploring the ideas here?

Still endorse. Learning about SIA/SSA from the comments was interesting. Timeless but not directly useful, testable or actionable.

What’s the short timeline plan?

magic9mushroom1y31

There is no war in the run-up to AGI that would derail the project, e.g. by necessitating that most resources be used for capabilities instead of safety research.

Assuming short timelines, I think it’s likely impossible to reach my desired levels of safety culture.

I feel obliged to note that a nuclear war, by dint of EMPs wiping out the power grid, would likely remove private AI companies as a thing for a while, thus deleting their current culture. It would also lengthen timelines.

Certainly not ideal in its own right, though.

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

magic9mushroom1y2-1

There are a couple of things that are making me really nervous about the idea of donating:

"AI safety" is TTBOMK a broad term and encompasses prosaic alignment as well as governance. I am of the strong opinion that prosaic alignment is a blind alley that's mostly either wasted effort or actively harmful due to producing fake alignment that makes people not abandon neural nets. ~97% of my P(not doom) routes through Butlerian Jihad against neural nets (with or without a nuclear war buying us more time) that lasts long enough to build GOFAI. And frankly, I don't spend that much time on LW, so I've little idea which of these efforts (or others!) gets most of the benefit you claim from the site.
As noted above, I think a substantial chunk of useful futures (though not a vast majority) route through nuclear war destroying the neural-net sector for a substantial amount of time (via blast wiping out factories, EMP destroying much of existing chip stocks, destruction of power and communication infrastructure reducing the profitability of AI, economic collapse more broadly, and possibly soft errors). As such, I've been rather concerned for years about the fact that the Ratsphere's main IRL presence is in the Bay Area and thus nuke-bait; we want to disproportionately survive that, not die in it. Insofar as Lighthaven is in the Bay Area, I am thus questioning whether its retention is +EV.

What does it take to defend the world against out-of-control AGIs?

magic9mushroom2y10

>Second, I imagine that such a near-miss would make Demis Hassabis etc. less likely to build and use AGIs in an aggressive pivotal-act-type way. Instead, I think there would be very strong internal and external pressures (employees, government scrutiny, public scrutiny) preventing him and others from doing much of anything with AGIs at all.

I feel I should note that while this does indeed form part of a debunk of the "good guy with an AGI" idea, it is in and of itself a possible reason for hope. After all, if nobody anywhere dares to make AGI, well, then, AGI X-risk isn't going to happen. The trouble is getting the Overton Window to the point where sufficient bloodthirst to actually produce that outcome (i.e. nuclear-armed countries saying "if anyone attempts to build AGI, everyone who cooperated in doing it hangs or gets life without parole, and if any country does not enforce this vigorously we will invade, and if they have nukes or have a bigger army than us then we pre-emptively nuke them because their retaliation is still higher-EV than letting them finish") is seen as something other than insanity, which a warning shot could well pull off.

This is not a permanent solution - questions of eventual societal relaxation aside, humanity cannot expand past K2 without the Jihad breaking down unless FTL is a thing - but it buys a lot of breathing time, which is the key missing ingredient you note in a lot of these plans.

Failures in Kindness

magic9mushroom2y4-4

I've got to admit, I look at most of these and say "you're treating the social discomfort as something immutable to be routed around, rather than something to be fixed by establishing different norms". Forgive me, but it strikes me (especially in this kind of community with high aspie proportion) that it's probably easier to tutor the... insufficiently-assertive... in how to stand up for themselves in Ask Culture than it is to tutor the aspies in how to not set everything on fire in Guess Culture.

[April Fools' Day] Introducing Open Asteroid Impact

magic9mushroom2y110

Amusingly, "rare earths" are actually concentrated in the crust compared to universal abundance and thus would make awful candidates for asteroid mining, while "tellurium", literally named after the Earth, is an atmophile/siderophile element with extreme depletion in the crust and one of the best candidates.

Shortform

magic9mushroom2y30

It strikes me that I'm not sure whether I'd prefer to lose $20,000 or have my jaw broken. I'm pretty sure I'd prefer to have my jaw broken than to lose $200,000, though. So, especially in the case that the money cannot actually be extracted back from the thief, I would tend to think the $200,000 theft should be punished more harshly than the jaw-breaking. And, sure, you've said that the $20,000 would be punished more harshly than the jaw-breaker, but that's plausibly just because 2 days is too long for a $100 theft to begin with.

Shortform

magic9mushroom2y10

I mean, most moral theories do either give the answers of "zero", "as large as can be fed", or "a bit less than as large as can be fed". Given the potential to scale feeding in the future, the latter two round off to "infinity".

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments