I'm Michael "Valentine" Smith. Cofounder & senior instructor at CFAR back in the day. I've been in the rationalist scene since 2011 but mostly left in late 2018. To the extent that "post-rationalist" means anything, the term should probably apply to me.
You can find my non-LW writing on my Substack. You can also find my social media profiles via my Linktree.
My guess is self-deception is incentivized so long as other deception is incentivized
I think you're probably right.
Just to be clear, I wasn't trying to lay out a theory of all self-deception. I was trying to lay out a theory of one cause of at least one kind of self-deception (namely Newcomblike self-deception). I noticed that the problem seemed like it should be real, and it has multiple solutions, Newcomblike self-deception being one of them. That setup had some interesting logic that panned out with some casual experimentation.
I kind of wonder what portion of self-deception is entirely about dealing with hostile telepaths. Is it 100%? Probably not. My intuitive impression is that it's well over 50% though. But even if it's just 10% I don't think that affects the logic of the post whatsoever. It's a suggestion that thus-and-such type of self-deception arises from solving a particular problem that has other possible solutions, not a theory that all self-deception comes from this mechanism.
Regarding the mother whose glasses broke as hostile telepath: I think a more charitable interpretation is she's ineffective socializer.
Someone else brought this up too. I think you're right. And perhaps I wrote that part in a misleading way!
That said, I don't think your very accurate point affects the example whatsoever.
The question isn't about what's really going on. The question is, from the perspective of the child, is he dealing with a hostile telepath? Which is to say, is he dealing with someone (a) who seems to be able to read his internal states and (b) whom he doesn't trust won't make his life worse based on what she finds? If the answer is "yes", he's faced with a hostile telepath problem, for which he needs some kind of solution.
It really doesn't matter whatsoever how badly that represents the mother's subjective state or motivations. The child doesn't have access to that. The child just knows that Mother is mad at him, is demanding that he "be sorry", and is checking. It's possible the mom isn't even mad! Maybe she's a perfect saint bringing pure love and care and understanding while gently guiding the child to the best of her ability. But if the child perceives the mother as a hostile telepath, then he needs a solution. Which Newcomblike self-deception is one such solution.
It seems relevant that another reviewer gave the same pushback though. I wonder if I've been unclear, or if I'm missing something. Let me know if it seems to you like I've missed your point.
I highly doubt it is explanatory for the field and the associated risk predictions to exist in the first place, or that its validity should be questioned on such grounds, but this seems to happen in the article if I'm not entirely misreading it.
Not entirely. It's a bit of a misreading. In this case I think the bit matters though.
(And it's an understandable bit! It's a subtle point I find I have a hard time communicating clearly.)
I'm trying to say two things:
I think some people, such as you, are reacting really strongly to that second point. Like I'm taking a stand for AI risk being a non-issue and saying it's all psychological projection.
I'm saying that nonzero, but close to zero. It's a more plausible hypothesis to me than I think it is to this community. But that's not because I'm going through the arguments that AI risk is real and finding refutations. It's because I've seen some shockingly basic things turn out to be psychological projection, and I don't think Less Wrong collectively understands that projection really can be that deep. I just don't see it accounted for in the arguments for doom.
But that's not the central point I'm trying to make. My point is more that I think the probability of doom is significantly elevated as a result of how memetic evolution works — and, stupidly, I think that makes doom more likely as a result of the "Don't hit the tree" phenomenon.
And maybe even more centrally, you cannot know how elevated the probability is until you seriously check for memetic probability boosters. And even then, how you check needs to account for those memetic influences.
I'm not trying to say that AI safety shouldn't exist as a field though.
From my point of view, there is already an overemphasis on psychological factors in the broader debate and it would be desirable to get back to the object level
Wow, you and I sure must be seeing different parts of the debate! I approximately only hear people talking about the object level. That's part of my concern.
I mean, I see some folk doing hot takes on Twitter about psychological angles. But most of those strike me as more like pot shots and less like attempts to engage in a dialogue.
This was a great steelmanning, and is exactly the kind of thing I hope people will do in contact with what I offer. Even though I don't agree with every detail, I feel received and like the thing I care about is being well enough held. Thank you.
Good call. I haven't been reading Less Wrong in enough detail for a while to pull this up usefully. My impression comes from in-person conversations plus Twitter interactions. The thickest use of my encountering these terms in rationality circles was admittedly about a decade ago. But I'm not sure how much of that is due to my not spending as much time in rationality circles versus discourse norms moving on. I still encounter it almost solely from folk tied to LW-style rationality.
I don't recall hearing you use the terms in ways that bothered me this way, FWIW.
it still seems bad to advocate for the exactly wrong policy, especially one that doesn't make sense even if you turn out to be correct (as habryka points out in the original comment, many think 2028 is not really when most people expect agi to have happened).
I'm super sensitive to framing effects. I notice one here. I could be wrong, and I'm guessing that even if I'm right you didn't intend it. But I want to push back against it here anyway. Framing effects don't have to be intentional!
It's not that I started with what I thought was a wrong or bad policy and tried to advocate for it. It's that given all the constraints, I thought that preregistering a possibility as a "pause and reconsider" moment might be the most effective and respectful. It's not what I'd have preferred if things were different. But things aren't different from how they are, so I made a guess about the best compromise.
I then learned that I'd made some assumptions that weren't right, and that determining such a pause point that would have collective weight is much more tricky. Alas.
But it was Oliver's comment that brought this problem to my awareness. At no point did I advocate for what I thought at the time was the wrong policy. I had hope because I thought folk were laying down some timeline predictions that could be falsified soon. Turns out, approximately nope.
i think you would have much more luck advocating for chilling today and citing past evidence to make your case..
Empirically I disagree. That demonstrably has not been within the reach of my skill to do effectively. But it's a sensible thing to consider trying again sometime.
Although I don't like comments starting with "your logic slipped" because it sounds passive-aggressive "you are stupid" vibes I will reply.
Sorry, that's not how I meant it. I meant it more like "Oh, I think your foot slipped there, so if you take another step I think it won't have the effect you're looking for." We can all slip up. It's intended as a friendly note.
I agree that on rereading it it didn't come across that way.
So what you are saying is that yes this time is different just not today. It will definately happen and all the doomerism is correct but not on a short timeline because ____ insert reasoning that is different than what the top AI minds are saying today.
Uh, no. That's not what I'm saying.
I'm saying something more like: if it turns out that doomerism is once again exaggerated, perhaps we should take a step back and ask what's creating the exaggeration instead of plowing ahead as we have been.
I love this direction of inquiry. It's tricky to get right because of lots of confounds. But I think something like it should be doable.
I love that this is a space where people care about this stuff.
I'm poorly suited to actually carry out these examinations on my own. But if someone does, or if someone wants to lead the charge and would like me to help design the study, I'd love to hear about it!
I think personally I'm more grim and more traumatized than most people around here, and other people are too happy and not neurotic enough to understand how fucked alignment research is; and yet I have much longer timelines than most people around here.
Two notes:
Overall I like your comment.
I agree with you.
My best guess is that the degree of doom is exaggerated but not fabricated. The exaggeration matters because if it's there, it's warping perception about what to do about the real problem. So if it's there, it would be ideal to address the cause of the exaggeration, even though on the inside that's probably always going to feel like the wrong thing to focus on.
In the end, I think a healthy attitude looks more like facing the darkness hand-in-hand with joy in our hearts and music in our throats.
I just replied to another reviewer about this point. In short: I agree, I think it's worth noticing, and I also think the point is irrelevant. The question isn't whether the mother truly is hostile vs. aligned with the child. The question is whether the child experiences threat from an apparent telepath.
This point is related to footnote 4. I think it's unhelpful to ask whether the mother "actually is" a hostile telepath. Hostile telepathy is about a perception someone has of another. If you perceive someone (or something) as a hostile telepath, you need some solution to that problem. One possible solution is to discover that they are not, in fact, hostile. But if you don't converge on that solution, you'll need some other one.
As I mentioned to the other reviewer, it stands out to me that two people both zoomed in on the same objection. I'm not sure what's going on there. Let me know if I've missed your point?
Of course! My impression is that many (most?) math students don't get sucked into the Newcomblike self-deception pattern I was naming. But some do! You're pointing out an example of it not happening. If I were claiming that this happens for all math students, your point would totally refute mine. And to the degree you thought I was making that claim, or it came across ambiguous about whether I was, I'm glad you brought it up! But my point wasn't that all math students encounter this. It's that some do. And I don't think it's super rare.
Yep. Key to why I worked on it to begin with. I'm glad you caught that and named it!