Wiki Contributions


Beyond fire alarms: freeing the groupstruck

Agree the difference between actors and real companions is very important! I think you misread me (see response to AllAmericanBreakfast's above comment.) 

Your current model appears to be wrong (supposing people should respond to fire alarms quickly).

From the paper:

"Subjects in the three naive bystander condition were markedly inhibited from reporting the smoke. Since 75% of the alone subjects reported the smoke, we would expect over 98% of the three-person groups to contain at least one reporter. In fact, in only 38% of the eight groups in this condition did even 1 subject report (p < .01). Of the 24 people run in these eight groups, only 1 person reported the smoke within the first 4 minutes before the room got noticeably unpleasant. Only 3 people reported the smoke within the entire experimental period." 

Fig 1 in the paper looks at a glance to imply also that the solitary people all reported it before 4 minutes.

Beyond fire alarms: freeing the groupstruck

Sorry for being unclear.  The first video shows a rerun of the original experiment, which I think is interesting because it is nice to actually see how people behave, though it is missing footage of the (I agree crucial) three group case. The original experiment itself definitely included groups of entirely innocent participants, and I agree that if it didn't it wouldn't be very interesting. (According to the researcher in the footage, via private conversation, he recalls that the filmed rerun also included at least one trial with all innocent people, but it was a while ago, so he didn't sound confident. See footnote there.)

It still looks to me like this is what I say, but perhaps I could signpost more clearly that the video is different from the proper experiment? 

Ask Not "How Are You Doing?"

I think I would have agreed that answering honestly is a social gaffe a few years ago, and in my even younger years I found it embarrassing to ask such things when we both knew I wasn't trying to learn the answer, but now I feel like it's very natural to elaborate a bit, and it usually doesn't feel like an error. e.g. 'Alright - somewhat regretting signing up for this thing, but it's reminding me that I'm interested in the topic' or 'eh, seen better days, but making crepes - want one?' I wonder if I've become oblivious in my old age, or socially chill, or the context has changed. It could be partly that this depends on how well the conversationalists know each other, and it has been a slow year and a half for seeing people I don't live with.

Feedback is central to agency

To check I have this: in the two-level adaptive system, one level is the program adjusting its plan toward the target configuration of being a good plan, and the other level is the car (for instance) adjusting its behavior (due to following the plan) toward getting to a particular place without crashing?

Taboo "Outside View"

Fwiw I'm not aware of using or understanding 'outside view' to mean something other than basically reference class forecasting (or trend extrapolation, which I'd say is the same). In your initial example, it seems like the other person is using it fine - yes, if you had more examples of an AGI takeoff, you could do better reference class forecasting, but their point is that in the absence of any examples of the specific thing, you also lack other non-reference-class-forecasting methods (e.g. a model), and you lack them even more than you lack relevant reference classes. They might be wrong, but it seems like a valid use. I assume you're right that some people do use the term for other stuff, because they say so in the comments, but is it actually that common?

I don't follow your critique of doing an intuitively-weighted average of outside view and some inside view. In particular, you say 'This is not Tetlock’s advice, nor is it the lesson from the forecasting tournaments...'. But in the  blog post section that you point to, you say 'Tetlock’s advice is to start with the outside view, and then adjust using the inside view.', which sounds like he is endorsing something very similar, or a superset of the thing you're citing him as disagreeing with?

Holidaying and purpose

I too thought the one cruise I've been on was a pretty good type of holiday! A giant moving building full of nice things is so much more convenient a vehicle than the usual series of planes and cabs and subways and hauling bags along the road and stationary buildings etc.

Coherence arguments imply a force for goal-directed behavior

I wrote an AI Impacts page summary of the situation as I understand it. If anyone feels like looking, I'm interested in corrections/suggestions (either here or in the AI Impacts feedback box).  

Coherence arguments imply a force for goal-directed behavior

A few quick thoughts on reasons for confusion:

I think maybe one thing going on is that I already took the coherence arguments to apply only in getting you from weakly having goals to strongly having goals, so since you were arguing against their applicability, I thought you were talking about the step from weaker to stronger goal direction. (I’m not sure what arguments people use to get from 1 to 2 though, so maybe you are right that it is also something to do with coherence, at least implicitly.)

It also seems natural to think of ‘weakly has goals’ as something other than ‘goal directed’, and ‘goal directed’ as referring only to ‘strongly has goals’, so that ‘coherence arguments do not imply goal directed behavior’ (in combination with expecting coherence arguments to be in the weak->strong part of the argument) sounds like ‘coherence arguments do not get you from ‘weakly has goals’ to ‘strongly has goals’.

I also think separating out the step from no goal direction to weak, and weak to strong might be helpful in clarity. It sounded to me like you were considering an argument from 'any kind of agent' to 'strong goal directed' and finding it lacking, and I was like 'but any kind of agent includes a mix of those that this force will work on, and those it won't, so shouldn't it be a partial/probabilistic move toward goal direction?' Whereas you were just meaning to talk about what fraction of existing things are weakly goal directed.

Coherence arguments imply a force for goal-directed behavior

Thanks. Let me check if I understand you correctly:

You think I take the original argument to be arguing from ‘has goals' to ‘has goals’, essentially, and agree that that holds, but don’t find it very interesting/relevant.

What you disagree with is an argument from ‘anything smart’ to ‘has goals’, which seems to be what is needed for the AI risk argument to apply to any superintelligent agent.

Is that right?

If so, I think it’s helpful to distinguish between ‘weakly has goals’ and ‘strongly has goals’:

  1. Weakly has goals: ‘has some sort of drive toward something, at least sometimes' (e.g. aspects of outcomes are taken into account in decisions in some way)
  2. Strongly has goals: ’pursues outcomes consistently and effectively' (i.e. decisions maximize expected utility)


So that the full argument I currently take you to be responding to is closer to:

  1. By hypothesis, we will have superintelligent machines
  2. They will weakly have goals (for various reasons, e.g. they will do something, and maybe that means ‘weakly having goals’ in the relevant way? Probably other arguments go in here.)
  3. Anything that weakly has goals has reason to reform to become an EU maximizer, i.e. to strongly have goals
  4. Therefore we will have superintelligent machines that strongly have goals


In that case, my current understanding is that you are disagreeing with 2, and that you agree that if 2 holds in some case, then the argument goes through. That is, creatures that are weakly goal directed are liable to become strongly goal directed. (e.g. an agent that twitches because it has various flickering and potentially conflicting urges toward different outcomes is liable to become an agent that more systematically seeks to bring about some such outcomes) Does that sound right?

If so, I think we agree. (In my intuition I characterize the situation as ‘there is roughly a gradient of goal directedness, and a force pulling less goal directed things into being more goal directed. This force probably doesn’t exist out at the zero goal directness edges, but it unclear how strong it is in the rest of the space—i.e. whether it becomes substantial as soon as you move out from zero goal directedness, or is weak until you are in a few specific places right next to ‘maximally goal directed’.)

Animal faces

Good points. Though I claim that I do hold the same facial expression for long periods sometimes, if that's what you mean by 'not moving'. In particular, sometimes it is very hard for me not to screw up my face in a kind of disgusted frown, especially if it is morning. And sometimes I grin for so long that my face hurts, and I still can't stop.

Load More