Why not spend more time looking at human alignment?

[-]Gordon Seidoh Worley3y2-1

This is a reasonable question, although I don't think we need to worry much about analogues between AI and human psychology. Instead we should be looking for robust alignment schemes that will work regardless of the design of the agent, even if that means the scheme correctly rejects alignability of the agent, we should know that up front.

Because of this, I think of the problem more like this: if we design a working alignment scheme, it should at least work to align humans since humans are less robust optimizers but are (minimally) general intelligences.

So if we're trying to think of how to build aligned AI, a reasonable test for any scheme that is not "design the AI with an architecture that guarantees that it is aligned" is to check if the scheme could align humans. We have thus far not solved the human alignment problem, but solving the problem of how to get humans to align on the right behavior without Goodharting should be a step in the direction of coming up with an alignment scheme that might be able to handle super-optimizing AI.

[-]ajc5863y10

Thanks for the thoughtful response, although I'm not sure quite of the approach. For starters, 'aligning humans' takes a long time and we may simply not have time to test any proposed alignment scheme on humans if we want to avoid AGI misalignment catastrophe. Not to mention ethical issues and so forth.

Society has been working on human alignment for a very long time, and we've settled on a dual approach: (1) training the model in particular ways (e.g. parenting, schooling etc.) and (2) a very complex, likely suboptimal system of 'checks and balances' to try and mitigate issues that occur for whatever reason post-training, e.g. the criminal justice system, national and foreign intelligence, etc. This is clearly imperfect, but it seems to work a lot better to maintain cohesion than prior systems of 'mob justice' where if you didn't like someone you'd just club them in the head. Unfortunately, we now also have much more destructive weapons than the clubs of the 'mob justice' era. Nonetheless, as of today you and I are still alive and significant chunks of society more or less function and continue to do so over extended periods of time, so the equilibrium we're in could be a lot worse, but per my original post re. the nuclear button it's clear we have ended up on a knife edge.

Fortunately, we do have a powerful advantage with AGI alignment over human alignment: for the former we can inspect the internal state (weights etc.). Interpretability of the internal state is of course a challenge, but we nonetheless have access. (The reason why human alignment requires a criminal justice system like we have, with all its complexity and failings, is largely because we cannot inspect the internal state.) So it seems that AGI alignment may well be achieved through a combination of the right approach during the training phase, and a subsequent 'check' methodology via a continuous analysis of the internal state. I believe that bringing the large body of understanding/research/data we already have on human alignment in the training phase (i.e. psychology and parenting) to the AI safety researcher's table may be very helpful. And right now, I don't see this. Look for example at the open recs posted on the OpenAI web site - they are all 'nerd' jobs and no experts of human behaviour. This surprises me and I don't really understand it. If AGI alignment is as important to us as we claim, we should be more proactive in bringing experts from other disciplines into the fold when there's a reasonable argument for it, not just more and more computer scientists.

[-]Slider3y00

I had used the term Natural Intelligence. Something that is not designed is not that great a resource to get design tips from.

But when I think about the term Biological General Intelligence I am wondering whether an AGI could be a BGI. One could argue that humans are not 100% natural intelligence but are partly artifical.

One could also argue that the standard cognitive individual contains silicon bodyparts in the form of cellphone. Or that google or facebook is in non-unsignificant degree part of individual human intelligence attention control.

LESSWRONG
LW

LESSWRONG
LW

11

Why not spend more time looking at human alignment?

11

11