Understanding differences between humans and intelligence-in-general to build safe AGI

[-]Dave Orr3y40

I think that AIs being able to access their own thoughts probably needs more work to show that it is actually the case. Certainly the state of the art AIs now, e.g. GPT3 or PaLM, have if anything less access to their own state than people. They can't introspect really, all they can do is process the data that they are given.

Maybe that will change, but as you note, the configuration space of intelligence is large, and it could easily be that we don't end up with that particular ability, it seems to me.

I have similar reservations about the next one, thoughts of others, though you do caveat that one.

One thing that might be missing is that humans tend to have a defined location -- I know where I am, and "where I am" has a relatively clear definition. That may not hold for AIs which are much more loosely coupled to the computers running them.

[-]Florian_Dietz3y10

I agree that current AIs can not introspect. My own research has bled into my believes here. I am actually working on this problem, and I expect that we won't get anything like AGI until we have solved this issue. As far as I can tell, an AI that works properly and has any chance to become an AGI will necessarily have to be able to introspect. Many of the big open problems in the field seem to me like they can't be solved precisely because we haven't figured out how to do this, yet.

The "defined location" point you note is intended to be covered by "being sure about the nature of your reality", but it's much more specific, and you are right that it might be worth considering as a separate point.

[-]Shmi3y30

I don't think your listed points are the crux of the difference. Though maybe AI (self-)interpretability is an important one. My personal feeling is that what is important is that humans are not coherent agents with goals, we just do things, often sphexing and being random or, conversely, routine, not acting to advance any of the stated goals.

[-]Florian_Dietz3y30

This is a great point. I don't expect that the first AGI will be a coherent agent either, though.

As far as I can tell from my research, being a coherent agent is not an intrinsic property you can build into an AI, or at least not if you want it to have a reasonably effective ability to learn. It seems more like being coherent is a property that each agent has to continuously work on.

The reason for this is basically that every time we discover new things about the way reality works, the new knowledge might contradict some of the assumptions on which our goals are grounded. If this happens, we need a way to reconfigure and catch ourselves.

Example: A child does not have the capacity to understand ethics, yet. So it is told "hurting people is bad", and that is good enough to keep it from doing terrible things until it is old enough to learn more complex ethics. Trying to teach it about utilitarian ethics before it has an understanding of probability theory would be counterproductive.

[-]Shmi3y2-2

I agree that even an AGI would have shifting goals. But at least at every single instance of time one assumes that there is a goal it optimizes for. Or a set of rules it follows. Or a set of acceptable behaviors. Or maybe some combination of those. Humans are not like that. There is no inner coherence ever, we just do stuff we are compelled to do in the moment.

[-]Florian_Dietz3y-1-2

Contemporary AI agents that are based on neural networks are exactly like that. They do stuff they feel compelled to in the moment. If anything, they have less coherence than humans, and no capacity for introspection at all. I doubt that AI will magically go from this current, very sad state to a coherent agent. It might modify itself into being coherent some time after becoming super intelligent, but it won't be coherent out of the box.

[-]Shmi3y52

Interesting. I know very little about the ML field, and my impression from reading what the ML and AI alignment experts write on this site is that they model an AI as an agent to some degree, not just "do something incoherent at any given moment".

[-]Florian_Dietz3y-1-2

I mean "do something incoherent at any given moment" is also perfectly agent-y behavior. Babies are agents, too.

I think the problem is modelling incoherent AI is even harder than modelling coherent AI, so most alignment researchers just hope that AI researchers will be able to build coherence in before there is a takeoff, so that they can base their own theories on the assumption that the AI is already coherent.

I find that view overly optimistic. I expect that AI is going to remain incoherent until long after it has become superintelligent.

LESSWRONG
LW

LESSWRONG
LW

7

Understanding differences between humans and intelligence-in-general to build safe AGI

7

7