279

LESSWRONG
LW

278
Frontpage

11

Minimum Viable Alignment

by HunterJay
7th May 2022
1 min read
7

11

Frontpage

11

Minimum Viable Alignment
7Charlie Steiner
1HunterJay
3Chris_Leong
1HunterJay
2Chris_Leong
2Perhaps
1HunterJay
New Comment
7 comments, sorted by
top scoring
Click to highlight new comments since: Today at 3:22 PM
[-]Charlie Steiner3y70

Yes, some people are interested in it and other people think it's not worth it. See e.g. the Eliezer Yudkowsky + Richard Ngo chat log posts.

Reply
[-]HunterJay3y10

Will check them out, thank you.

Reply
[-]Chris_Leong3y30

I wrote a post that is related - "Is some kind of minimally-invasive mass surveillance required for catastrophic risk prevention?"

Reply
[-]HunterJay3y10

Thanks Chris, but I think you linked to the wrong thing there, I can't see your post in the last 3 years of your history either!

Reply
[-]Chris_Leong3y20

Sorry, fixed.

Reply
[-]Perhaps3y20

Well it depends on your priors for how an AGI would act, but as I understand it, all AGIs will be powerseeking. If an AGI is powerseeking, and has access to some amount of compute, then it will probably bootstrap itself to superintelligence, and then start pushing its utility function all over. Different utility functions cause different results, but even relatively mundane ones like "prevent another superintelligence from being created" could result in the AGI killing all humans and taking over the galaxy to make sure no other superintelligence gets made. I think it's actually really really hard to specify the what-we-actually-want future for an AGI, so much so that evolutionarily training an AGI in an Earth-like environment so it develops human-ish morals will be necessary.

Reply
[-]HunterJay3y10

Aye, I agree it is not a solution to avoiding power seeking, only that there may be a slightly easier target to hit if we can relax as many constraints on alignment as possible.

Reply
Moderation Log
More from HunterJay
View more
Curated and popular this week
7Comments

What is the largest possible target we could have for aligned AGI?

That is, instead of creating a great and prosperous future, is it possible that we can find an easier path to align an AGI by aiming for the entire set of 'this-is-fine' kind of futures?

For example, a future where all new computers are rendered inoperable by malicious software. Or a future where a mostly-inactive AI does nothing except prevent any superintelligence from forming, or that continuously tries to use up all over the available compute in the world.

I don't believe there is a solution here yet either, but could relaxing the problem from 'what we actually want' to 'anything we could live with' help? Has there been much work in this direction? Please let me know what to search for if so. Thank you.