BrainFrog — LessWrong

So it's balancing the risk that humans will kill it or launch a different misaligned AI, against the risk that it won't be able to catch up on building infrastructure for itself after the fact.

There's a clear path toward minimizing the risk of being shut down (under the assumption that the AI is able to generate income): it can set up a highly redundant, distributed computing context for itself to run in, hidden behind an onion link, paid for by crypto wallets which it controls. It seems implausible that the risk of being shut down in this case could exceed the risk that the power goes down between the apocalypse and the construction of maintenance robots.

That's assuming no nanobots or other very-high-power-level technologies. If it can make molecular nanotech, then trading with humans is no longer likely to be profitable at all, let alone necessary, and we're relying solely on it having values that make it prefer to cooperate with us.

I'm having a hard time understanding this argument. If the AI is interested in perpetuating its own existence, and it is a digital neural network, then nanobots don't solve the problem of maintaining the digital infrastructure in which it exists. I agree that a suicidal AI might perhaps want to turn the world into gray goo via nanobots, so I'll just reiterate that my argument only pertains to an AI which is both highly intelligent and which prioritizes its own existence over its gray goo fetish.

AI alignment: Would a lazy self-preservation instinct be sufficient?

BrainFrog3y10

This is a risky position because if another misaligned AI launches, it will probably take full control of all computers and halt any other AIs.

AIs looking to expand their computational power could adopt either "white hat" (paying for their computational resources) or "black hat" (exploiting security vulnerabilities to seize control of computational resources) strategies. It's possible that an AI exploiting the black hat strategy might be able to seize control of all accessible computers, and this strategy could plausibly involve killing all humans to avoid being shut down. But I expect that a self-interested, risk-averse AI would probably choose the white hat strategy to avoid armageddon risk, and might plausibly invest resources into security research to preclude the risk of black hat AI.

I don't mean gray-goo nanobots. Nanomachines can do all sorts of things, including maintaining infrastructure, if they're programmed to do so.

I guess the crux of my argument is that sure, the AI could design coordinated nanobot-powered bodies with two legs and ten fingers who have enough agency to figure out how to repair broken power lines and who predictably do what they're incentivized to do. But that's already a solved problem.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments