Shay — LessWrong

Various Alignment Strategies (and how likely they are to work)

Logan, for your preferred alignment approach how likely is it that the alignment remains durable over time? A superhuman AGI will understand the choices that were made by its creators to align it. It will be capable of comparing its current programming with counterfactuals where it’s not aligned. It will also have the ability to alter its own code. So what if it determines it’s best course of action is to alter the very code that maintains it’s alignment? How would this be prevented?

Accounting For College Costs

Shay4y30

Regarding increased costs in healthcare…

I’ve worked in med device since 2008. The effort it takes to develop and commercialize med devices is continuously increasing and subsequently driving up costs. Many teams of engineers are paid well to generate binders full of documentation in support of the regulatory/compliance requirements of even simple devices. I’m sad to say that this increased effort doesn’t directly translate to better devices, but it certainly keeps a lot of people employed.

Steer the Sun?

Shay4y10

Thanks for the link! It’s always fun when you have an interesting thought, do some searching, and then find out the idea is 100 years old.

The possibilities presented on Wiki seem so boring tho! Who wants to set out on a million year journey? What would it take to steer the sun to Alpha Centauri in 10,000 years?

How Might an Alignment Attractor Look like?

Shay4y10

Yeah, I agree that valuing humans isn’t enough. I’m suggesting something that humans intrinsically have, or at least have the capacity for. Something that most life on Earth also shares a capacity for. Something that doesn't change drastically over time in the way that ethics and morals do. Something that humans value, that is universal, and also durable.

I am not suggesting anything about efficiency. Why bother with efficiencies in a post scarcity world?

The goal should not be to maximize anything, not even intelligence. Maintaining or incrementally increasing intelligence would be favorable to humans.

How Might an Alignment Attractor Look like?

Shay4y10

Perhaps the attractor could be intelligence itself. So a primary goal of the AGI would be to maximize intelligence. It seems like human flourishing would then be helpful to the AGI’s goal. Human flourishing, properly defined, implies flourishing of the Earth and its biosphere as a whole, so maybe that attractor brings our world, cultures, and way of life along for the ride.

We may also need to ensure that intelligences have property rights over the substrates they operate on. That may be needed prevent the AGI from converting brains and bodies into microchips, if that’s even possible.

Is alignment possible?

Shay4y10

A mutually beneficial relationship would be great! I have a hard time believing that the relationship would remain mutually beneficial over long time periods though.

Regarding the universe destroying part, it’s nice to know that half dark galaxies haven’t been discovered, at least not yet. By half dark I mean galaxies that are partially destroyed. That’s at least weak evidence that universe destroying AIs aren’t already in existence.

Is alignment possible?

Shay4y10

Thanks for answering and pointing out the FAQ Raemon! What Scott describes sounds like a harmonious relationship between humans and AGI. Is that a fair summary?.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments