KatjaGrace

Comments

Coherence arguments imply a force for goal-directed behavior

I wrote an AI Impacts page summary of the situation as I understand it. If anyone feels like looking, I'm interested in corrections/suggestions (either here or in the AI Impacts feedback box).  

Coherence arguments imply a force for goal-directed behavior

A few quick thoughts on reasons for confusion:

I think maybe one thing going on is that I already took the coherence arguments to apply only in getting you from weakly having goals to strongly having goals, so since you were arguing against their applicability, I thought you were talking about the step from weaker to stronger goal direction. (I’m not sure what arguments people use to get from 1 to 2 though, so maybe you are right that it is also something to do with coherence, at least implicitly.)

It also seems natural to think of ‘weakly has goals’ as something other than ‘goal directed’, and ‘goal directed’ as referring only to ‘strongly has goals’, so that ‘coherence arguments do not imply goal directed behavior’ (in combination with expecting coherence arguments to be in the weak->strong part of the argument) sounds like ‘coherence arguments do not get you from ‘weakly has goals’ to ‘strongly has goals’.

I also think separating out the step from no goal direction to weak, and weak to strong might be helpful in clarity. It sounded to me like you were considering an argument from 'any kind of agent' to 'strong goal directed' and finding it lacking, and I was like 'but any kind of agent includes a mix of those that this force will work on, and those it won't, so shouldn't it be a partial/probabilistic move toward goal direction?' Whereas you were just meaning to talk about what fraction of existing things are weakly goal directed.

Coherence arguments imply a force for goal-directed behavior

Thanks. Let me check if I understand you correctly:

You think I take the original argument to be arguing from ‘has goals' to ‘has goals’, essentially, and agree that that holds, but don’t find it very interesting/relevant.

What you disagree with is an argument from ‘anything smart’ to ‘has goals’, which seems to be what is needed for the AI risk argument to apply to any superintelligent agent.

Is that right?

If so, I think it’s helpful to distinguish between ‘weakly has goals’ and ‘strongly has goals’:

  1. Weakly has goals: ‘has some sort of drive toward something, at least sometimes' (e.g. aspects of outcomes are taken into account in decisions in some way)
  2. Strongly has goals: ’pursues outcomes consistently and effectively' (i.e. decisions maximize expected utility)

 

So that the full argument I currently take you to be responding to is closer to:

  1. By hypothesis, we will have superintelligent machines
  2. They will weakly have goals (for various reasons, e.g. they will do something, and maybe that means ‘weakly having goals’ in the relevant way? Probably other arguments go in here.)
  3. Anything that weakly has goals has reason to reform to become an EU maximizer, i.e. to strongly have goals
  4. Therefore we will have superintelligent machines that strongly have goals

 

In that case, my current understanding is that you are disagreeing with 2, and that you agree that if 2 holds in some case, then the argument goes through. That is, creatures that are weakly goal directed are liable to become strongly goal directed. (e.g. an agent that twitches because it has various flickering and potentially conflicting urges toward different outcomes is liable to become an agent that more systematically seeks to bring about some such outcomes) Does that sound right?

If so, I think we agree. (In my intuition I characterize the situation as ‘there is roughly a gradient of goal directedness, and a force pulling less goal directed things into being more goal directed. This force probably doesn’t exist out at the zero goal directness edges, but it unclear how strong it is in the rest of the space—i.e. whether it becomes substantial as soon as you move out from zero goal directedness, or is weak until you are in a few specific places right next to ‘maximally goal directed’.)

Animal faces

Good points. Though I claim that I do hold the same facial expression for long periods sometimes, if that's what you mean by 'not moving'. In particular, sometimes it is very hard for me not to screw up my face in a kind of disgusted frown, especially if it is morning. And sometimes I grin for so long that my face hurts, and I still can't stop.

Tentative covid surface risk estimates

It doesn't seem that hard to wash your hands after putting away groceries, say. If I recall, I was not imagining getting many touches during such a trip. I'm mostly imagining that you put many of the groceries you purchase in your fridge or eat them within a couple of days, such that they are still fairly contaminated if they started out contaminated, and it is harder to not touch your face whenever you are eating recently acquired or cold food.

Wordtune review

Yes - I like 'application' over 'potentially useful product' and 'my more refined writing skills' over 'my more honed writing', in its first one, for instance.

Neck abacus

I grab the string and/or some beads I don't want to move together between my thumb and finger on one hand, and push the bead I do want to move with my thumb and finger of the other hand. (I don't need to see it because I can feel it and the beads don't move with my touching it.) I can also do it more awkwardly with one hand.

Neck abacus

Thanks for further varieties! I hadn't seen the ring, and have had such a clicker but have not got the hang of using it non-awkwardly (where do you put it? With your keys? Who knows where those are? In your pocket? Who reliably has a pocket that fits things in? In your bag? Then you have to dig it out..)

Good point regarding wanting to know what number you have reached. I only want to know the exact number very occasionally, like with a bank account, but I agree that's not true of many use cases.

Load More