Simulation_Brain — LessWrong

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

I think the main concern is that feed forward nets are used as a component in systems that achieve full AGI. For instance, deepmind's agent systems include a few networks and run a few times before selecting an action. Current networks are more like individual pieces of the human brain, like a visual system and a language system. Putting them together and getting them to choose and pursue goals and subgoals appropriately seems all too plausible.

Now, some people also think that just increasing the size of nets and training data sets will produce AGI, because progress has been so good so far. Those people seem to be less concerned with safety. This is probably because such feedforward nets would be more like tools than agents. I tend to agree with you that this approach seems unlikely to.produce real AGI much less ASI, but it could produce very useful systems that are superhuman in limited areas. It already has in a few areas, such as protein folding.

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Simulation_Brain3y10

I think those are perfectly good concerns. But they don't seem so likely that they make me want to exterminate humanity to avoid them.

I think you're describing a failure of corrigibility. Which could certainly happen, for the reason you give. But it does seem quite possible (and perhaps likely) that an agentic system will be designed primarily for corrigibility, or alternately, alignment by obedience.

The second seems like a failure of morality. Which could certainly happen. But I see very few people who both enjoy inflicting suffering, and who would continue to enjoy that even given unlimited time and resources to become happy themselves.

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Simulation_Brain3y20

You are probably guessing correctly. I'm hoping that whoever gets ahold of aligned AGI will also make it corrigible, and that over time they'll trend toward a similar moral view to that generally held in this community. It doesn't have to be fast.

To be fair, I'm probably pretty biased against the idea that all we can realistically hope for is extinction. The recent [case against AGI alignment](https://www.lesswrong.com/posts/CtXaFo3hikGMWW4C9/the-case-against-ai-alignment) post was the first time I'd seen arguments that strong in that direction. I haven't really assimilated them yet.

My take on human nature is that, while humans are often stunningly vicious, they are also often remarkably generous. Further, it seems that the viciousness is usually happening when they feel materially threatened. Someone in charge of an aligned AGI will not feel very threatened for very long. And generosity will be safer and easier than it usually is.

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Simulation_Brain3y32

Yes. But that seems awfully unlikely to me. What would it need to be, two years from now? AI hype is going to keep ramping up as chatGPT and its successors are more widely used and improved.

If the odds of slipping it by governments and miltaries is slight, wouldn't the conclusion be the opposite - we should spread understanding of AGI alignment issues so that those in power have thought about them by the time they appropriate the leading projects?

This strikes me as a really practically important question. I personally may be rearranging my future based on what the community comes to believe about this.

Edit: I think the community tends to agree with you and be working in hopes that we reach the finish line before the broader world takes note. But this seems more like wishful thinking than a realistic guess about likely futures.

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Simulation_Brain3y10

I think there's a possibility that their lives, or some of them, are vastly worse than death. See the recent post the case against value alignment for some pretty convincing concerns.

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Simulation_Brain3y20

I totally agree with the core logic. I've been refraining from spreading these ideas, as much as I want to.

Here's the problem: Do you really think the whole government and military complex is dumb enough to miss this logic, right up to successful AGI? You don't think they'll roll in and nationalize the efforts when the power of AI keeps on progressively freaking people out more and more?

I think a lot of folks in the military are a lot smarter than you give them credit for. Or the issue will become much more obvious than you assume, as we get closer to general AI.

But I don't think that's necessarily going to spell doom.

I hope that emphasizing corrigability might be adequate. That would at least let the one group who've controlled creation of AGI change their minds down the road.

I think a lot of folks in the government and military might be swayed by logic, once they can perfectly protect and provide abundantly for themselves and everyone they value. Their circle of compassion can expand, just like everyone here has expanded theirs.

Superintelligence 19: Post-transition formation of a singleton

Simulation_Brain11y00

Really? Can you say a little more about why you think you have that value? I guess I'm not convinced that it's really a terminal value if it varies so widely across people of otherwise similar beliefs. Presumably that's what lalartu meant as well, but I just don't get it. I like myself, so I'd like more of myself in the world!

How to Beat Procrastination

Simulation_Brain11y00

Perhaps you're thinking of the dopamine spike when reward is actually given? I had thought the predictive spike was purely proportional to the odds of success and the amount of reward- which would indeed change with boring tasks, but not in any linear way. If you're right about that basic structure of the predictive spike I should know about it for my research; can you give a reference?

Book review: The Reputation Society. Part II

Simulation_Brain12y30

Less Wrong seems like the ideal community to think up better reputation systems. Doctorow's Whuffie is reasonably well-thought-out, but intended for a post-scarcity economy; but its ideas of distinguishing right-handed (people who agree with you) from left-handed (from people who generally don't agree with you) reputations seems like one useful ingredient. Reducing the influence of those who tend to vote together seems like another potential win.

I like to imagine a face-based system; snap an image from a smartphone, and access reputation.

I hope to see more discussion, in particular, VAuroch's suggestion.

AI risk, executive summary

Simulation_Brain12y20

I think the example is weak; the software was not that dangerous, the researchers were idiots who broke a vial they knew was insanely dangerous.

I think it dilutes the argument to broaden it to software in general; it could be very dangerous under exactly those circumstances (with terrible physical safety measures), but the dangers of superhuman AGI are vastly larger IMHO and deserve to remain the focus, particularly of the ultra-reduced bullet points.

I think this is as crisp and convincing a summary as I've ever seen; nice work! I also liked the book, but condensing it even further is a great idea.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments