Valentine — LessWrong

LESSWRONG
is fundraising!
LW

The mother is not hostile, she is closely aligned to her child.

I just replied to another reviewer about this point. In short: I agree, I think it's worth noticing, and I also think the point is irrelevant. The question isn't whether the mother truly is hostile vs. aligned with the child. The question is whether the child experiences threat from an apparent telepath.

This point is related to footnote 4. I think it's unhelpful to ask whether the mother "actually is" a hostile telepath. Hostile telepathy is about a perception someone has of another. If you perceive someone (or something) as a hostile telepath, you need some solution to that problem. One possible solution is to discover that they are not, in fact, hostile. But if you don't converge on that solution, you'll need some other one.

As I mentioned to the other reviewer, it stands out to me that two people both zoomed in on the same objection. I'm not sure what's going on there. Let me know if I've missed your point?

The math class example also reads false to me. I have a kid who loves math and hates math class, and this does not seem like a difficult distinction to make.

Of course! My impression is that many (most?) math students don't get sucked into the Newcomblike self-deception pattern I was naming. But some do! You're pointing out an example of it not happening. If I were claiming that this happens for all math students, your point would totally refute mine. And to the degree you thought I was making that claim, or it came across ambiguous about whether I was, I'm glad you brought it up! But my point wasn't that all math students encounter this. It's that some do. And I don't think it's super rare.

The AI implications follow naturally.

Yep. Key to why I worked on it to begin with. I'm glad you caught that and named it!

The hostile telepaths problem

Valentine6d20

My guess is self-deception is incentivized so long as other deception is incentivized

I think you're probably right.

Just to be clear, I wasn't trying to lay out a theory of all self-deception. I was trying to lay out a theory of one cause of at least one kind of self-deception (namely Newcomblike self-deception). I noticed that the problem seemed like it should be real, and it has multiple solutions, Newcomblike self-deception being one of them. That setup had some interesting logic that panned out with some casual experimentation.

I kind of wonder what portion of self-deception is entirely about dealing with hostile telepaths. Is it 100%? Probably not. My intuitive impression is that it's well over 50% though. But even if it's just 10% I don't think that affects the logic of the post whatsoever. It's a suggestion that thus-and-such type of self-deception arises from solving a particular problem that has other possible solutions, not a theory that all self-deception comes from this mechanism.

Regarding the mother whose glasses broke as hostile telepath: I think a more charitable interpretation is she's ineffective socializer.

Someone else brought this up too. I think you're right. And perhaps I wrote that part in a misleading way!

That said, I don't think your very accurate point affects the example whatsoever.

The question isn't about what's really going on. The question is, from the perspective of the child, is he dealing with a hostile telepath? Which is to say, is he dealing with someone (a) who seems to be able to read his internal states and (b) whom he doesn't trust won't make his life worse based on what she finds? If the answer is "yes", he's faced with a hostile telepath problem, for which he needs some kind of solution.

It really doesn't matter whatsoever how badly that represents the mother's subjective state or motivations. The child doesn't have access to that. The child just knows that Mother is mad at him, is demanding that he "be sorry", and is checking. It's possible the mom isn't even mad! Maybe she's a perfect saint bringing pure love and care and understanding while gently guiding the child to the best of her ability. But if the child perceives the mother as a hostile telepath, then he needs a solution. Which Newcomblike self-deception is one such solution.

It seems relevant that another reviewer gave the same pushback though. I wonder if I've been unclear, or if I'm missing something. Let me know if it seems to you like I've missed your point.

Consider chilling out in 2028

Valentine6mo70

I highly doubt it is explanatory for the field and the associated risk predictions to exist in the first place, or that its validity should be questioned on such grounds, but this seems to happen in the article if I'm not entirely misreading it.

Not entirely. It's a bit of a misreading. In this case I think the bit matters though.

(And it's an understandable bit! It's a subtle point I find I have a hard time communicating clearly.)

I'm trying to say two things:

There sure do seem to be some bad psychological influences going on.
It's harder to tell what's real when you have sufficiently bad psychological influences going on.

I think some people, such as you, are reacting really strongly to that second point. Like I'm taking a stand for AI risk being a non-issue and saying it's all psychological projection.

I'm saying that nonzero, but close to zero. It's a more plausible hypothesis to me than I think it is to this community. But that's not because I'm going through the arguments that AI risk is real and finding refutations. It's because I've seen some shockingly basic things turn out to be psychological projection, and I don't think Less Wrong collectively understands that projection really can be that deep. I just don't see it accounted for in the arguments for doom.

But that's not the central point I'm trying to make. My point is more that I think the probability of doom is significantly elevated as a result of how memetic evolution works — and, stupidly, I think that makes doom more likely as a result of the "Don't hit the tree" phenomenon.

And maybe even more centrally, you cannot know how elevated the probability is until you seriously check for memetic probability boosters. And even then, how you check needs to account for those memetic influences.

I'm not trying to say that AI safety shouldn't exist as a field though.

From my point of view, there is already an overemphasis on psychological factors in the broader debate and it would be desirable to get back to the object level

Wow, you and I sure must be seeing different parts of the debate! I approximately only hear people talking about the object level. That's part of my concern.

I mean, I see some folk doing hot takes on Twitter about psychological angles. But most of those strike me as more like pot shots and less like attempts to engage in a dialogue.

Consider chilling out in 2028

Valentine6mo57

This was a great steelmanning, and is exactly the kind of thing I hope people will do in contact with what I offer. Even though I don't agree with every detail, I feel received and like the thing I care about is being well enough held. Thank you.

Consider chilling out in 2028

Valentine6mo20

Good call. I haven't been reading Less Wrong in enough detail for a while to pull this up usefully. My impression comes from in-person conversations plus Twitter interactions. The thickest use of my encountering these terms in rationality circles was admittedly about a decade ago. But I'm not sure how much of that is due to my not spending as much time in rationality circles versus discourse norms moving on. I still encounter it almost solely from folk tied to LW-style rationality.

I don't recall hearing you use the terms in ways that bothered me this way, FWIW.

Consider chilling out in 2028

Valentine6mo60

it still seems bad to advocate for the exactly wrong policy, especially one that doesn't make sense even if you turn out to be correct (as habryka points out in the original comment, many think 2028 is not really when most people expect agi to have happened).

I'm super sensitive to framing effects. I notice one here. I could be wrong, and I'm guessing that even if I'm right you didn't intend it. But I want to push back against it here anyway. Framing effects don't have to be intentional!

It's not that I started with what I thought was a wrong or bad policy and tried to advocate for it. It's that given all the constraints, I thought that preregistering a possibility as a "pause and reconsider" moment might be the most effective and respectful. It's not what I'd have preferred if things were different. But things aren't different from how they are, so I made a guess about the best compromise.

I then learned that I'd made some assumptions that weren't right, and that determining such a pause point that would have collective weight is much more tricky. Alas.

But it was Oliver's comment that brought this problem to my awareness. At no point did I advocate for what I thought at the time was the wrong policy. I had hope because I thought folk were laying down some timeline predictions that could be falsified soon. Turns out, approximately nope.

i think you would have much more luck advocating for chilling today and citing past evidence to make your case..

Empirically I disagree. That demonstrably has not been within the reach of my skill to do effectively. But it's a sensible thing to consider trying again sometime.

Consider chilling out in 2028

Valentine6mo20

Although I don't like comments starting with "your logic slipped" because it sounds passive-aggressive "you are stupid" vibes I will reply.

Sorry, that's not how I meant it. I meant it more like "Oh, I think your foot slipped there, so if you take another step I think it won't have the effect you're looking for." We can all slip up. It's intended as a friendly note.

I agree that on rereading it it didn't come across that way.

So what you are saying is that yes this time is different just not today. It will definately happen and all the doomerism is correct but not on a short timeline because ____ insert reasoning that is different than what the top AI minds are saying today.

Uh, no. That's not what I'm saying.

I'm saying something more like: if it turns out that doomerism is once again exaggerated, perhaps we should take a step back and ask what's creating the exaggeration instead of plowing ahead as we have been.

Consider chilling out in 2028

Valentine6mo40

I love this direction of inquiry. It's tricky to get right because of lots of confounds. But I think something like it should be doable.

I love that this is a space where people care about this stuff.

I'm poorly suited to actually carry out these examinations on my own. But if someone does, or if someone wants to lead the charge and would like me to help design the study, I'd love to hear about it!

Consider chilling out in 2028

Valentine6mo20

I think personally I'm more grim and more traumatized than most people around here, and other people are too happy and not neurotic enough to understand how fucked alignment research is; and yet I have much longer timelines than most people around here.

Two notes:

I think this is a little evidence against the trauma model, but not much. Most forms of trauma don't cause people to become AI doomers, just like most forms of trauma don't cause most people to become alcoholics. I think the form of the trauma, and the set of coping mechanisms, and the set of life opportunities, all have to converge to result in this specific flavor of doom focus. (And I hypothesize that LW has long been an attractor for people who've been hit with that set!)
I don't care that much about the trauma model in particular. I should have been clearer in the OP. What I meant was more like, "Gosh it sure seems to me that fixating on doom seems to be a drive independent of truth. Keeps happening all over the place. Sure looks like that's maybe happening here too. That seems important." The trauma thing was meant to both (a) highlight the type of phenomenon I'm talking about and (b) give an example of a loosely gearsy mechanism for producing it. I think you're offering a slightly different model — which is great! I think they could be empirically distinguished (and aren't mutually exclusive, and the whole scene is probably multifaceted).

Overall I like your comment.

Consider chilling out in 2028

Valentine6mo43

I agree with you.

My best guess is that the degree of doom is exaggerated but not fabricated. The exaggeration matters because if it's there, it's warping perception about what to do about the real problem. So if it's there, it would be ideal to address the cause of the exaggeration, even though on the inside that's probably always going to feel like the wrong thing to focus on.

In the end, I think a healthy attitude looks more like facing the darkness hand-in-hand with joy in our hearts and music in our throats.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments