All of Shay's Comments + Replies

Logan, for your preferred alignment approach how likely is it that the alignment remains durable over time? A superhuman AGI will understand the choices that were made by its creators to align it. It will be capable of comparing its current programming with counterfactuals where it’s not aligned. It will also have the ability to alter its own code. So what if it determines it’s best course of action is to alter the very code that maintains it’s alignment? How would this be prevented?

1Logan Zoellner5mo
I will try to do a longer write-up sometime, but in a Bureaucracy of AIs, no individual AI is actually super-human (just as Google collectively knows more than any human being but no individual at Google is super-human). It stays aligned because there is always a "human in the loop", in fact the whole organization simply competes to produce plans which are then approved by human reviewers (under some sort of futarchy-style political system). Importantly, some of the AIs compete by creating plans, and other AIs compete by explaining to humans how dangerous those plans are. All of the individual AIs in the Bureaucracy have very strict controls on things like: their source code, their training data, the amount of time they are allowed to run, how much compute they have access to, when and how they communicate with the outside world and with each other. They are very much not allowed to alter their own source code (except after extensive review by the outside humans who govern the system).

Regarding increased costs in healthcare…

I’ve worked in med device since 2008. The effort it takes to develop and commercialize med devices is continuously increasing and subsequently driving up costs. Many teams of engineers are paid well to generate binders full of documentation in support of the regulatory/compliance requirements of even simple devices. I’m sad to say that this increased effort doesn’t directly translate to better devices, but it certainly keeps a lot of people employed.

Thanks for the link! It’s always fun when you have an interesting thought, do some searching, and then find out the idea is 100 years old.

The possibilities presented on Wiki seem so boring tho! Who wants to set out on a million year journey? What would it take to steer the sun to Alpha Centauri in 10,000 years?

Yeah, I agree that valuing humans isn’t enough. I’m suggesting something that humans intrinsically have, or at least have the capacity for. Something that most life on Earth also shares a capacity for. Something that doesn't change drastically over time in the way that ethics and morals do. Something that humans value, that is universal, and also durable.

I am not suggesting anything about efficiency. Why bother with efficiencies in a post scarcity world?

The goal should not be to maximize anything, not even intelligence. Maintaining or incrementally increasing intelligence would be favorable to humans.

1ZT55mo
Imagine this is a story where a person makes a wish, and it goes terribly wrong. How does the wish of "maintaining or incrementally increasing intelligence" go wrong [https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-complexity-of-wishes] ? I mean, the goal doesn't actually say anything about human intelligence. It might as well increase the intelligence of spiders. Actually, I guess the real problem is that our wish is not to for AGI to "increase intelligence" but to "increase intelligence without violating our values or doing things we would find morally abhorrent". Otherwise AGI might as well kidnap humans and forcibly perform invasive surgery on them to install computer chips in their brains. I mean, it would increase their intellingence. That is what you asked for, no? So AGI needs to care about human values and human ethics in order to be safe. And if does understand and care about human ethics, why not have it act on all human ethics, instead of just a single unclearly-defined task like "increasing intellingence"? This is the concept of Coherent Extrapolated Volition [https://arbital.com/p/cev/], as a value system for how we would wish aligned AGI to behave. You might also The Superintelligence FAQ [https://www.lesswrong.com/posts/LTtNXM9shNM9AC2mp/superintelligence-faq] interesting (as general background, not to answer any specific question or disagreement we might have)

Perhaps the attractor could be intelligence itself. So a primary goal of the AGI would be to maximize intelligence. It seems like human flourishing would then be helpful to the AGI’s goal. Human flourishing, properly defined, implies flourishing of the Earth and its biosphere as a whole, so maybe that attractor brings our world, cultures, and way of life along for the ride.

We may also need to ensure that intelligences have property rights over the substrates they operate on. That may be needed prevent the AGI from converting brains and bodies into microchips, if that’s even possible.

4ZT55mo
"You saw a future with a ton of sentient, happy humans, saw that [the AI] would value that future highly, and stopped. You didn’t check to see if there was anything it considered more valuable." (a quote from The Number [https://www.royalroad.com/fiction/48012/the-number]) I'm trying to gently point out that it's not enough to have the AI value humans, if it values other configurations of matter even more than humans. Do I need to say more? Are humans really the most efficient way to go about creating intelligence (if that is what AGI is maximizing)?

A mutually beneficial relationship would be great! I have a hard time believing that the relationship would remain mutually beneficial over long time periods though.

Regarding the universe destroying part, it’s nice to know that half dark galaxies haven’t been discovered, at least not yet. By half dark I mean galaxies that are partially destroyed. That’s at least weak evidence that universe destroying AIs aren’t already in existence.

1中文房间5mo
I wouldn't call being kept as biological backup particularly beneficial for humanity, but it's the only plausible way humanity being useful enough for a sufficiently advanced AGI I can currently think of. Destroying the universe might just take long enough for AGI to evolve itself sufficiently to reconsider. I should have actually used "earth-destroying" instead in the answer above.

Thanks for answering and pointing out the FAQ Raemon! What Scott describes sounds like a harmonious relationship between humans and AGI. Is that a fair summary?.