I don't use LessWrong much anymore. Find me at www.turntrout.com.
My name is Alex Turner. I'm a research scientist at Google DeepMind on the Scalable Alignment team. My views are strictly my own; I do not represent Google. Reach me at alex[at]turntrout.com
Maybe "psychology" is just the wrong word to use here, because I think it conjures up ideas of anthropomorphism, when in fact I read you as simply making an argument that the processes interior to an AI system matter as to whether and how an AI might try to instrumentally converge towards some goals.
I agree. I welcome suggestions for alternate titles, if anyone has any! I tried myself but didn't find anything immediately. "No instrumental convergence without considering how the AI will make decisions" isn't exactly the snappiest title.
EDIT: I actually think "psychology" is pretty good here, despite some flaws.
Like, imo, "most programs which make a mind upload device also kill humanity" is (if true) an interesting and somewhat compelling first claim to make in a discussion of AI risk, to which the claim "but one can at least in principle have a distribution on programs such that most programs which make mind uploads no not also kill humans" alone is not a comparably interesting or compelling response.
I disagree somewhat, but—whatever the facts about programs—at least it is not appropriate to claim "not only do most programs which make a mind upload device also kill humanity, it's an issue with the space of programs themselves, not with the way we generate distributions over those programs." That is not true.
It is at least not true "in principle" and perhaps it is not true for more substantial reasons (depending on the task you want and its alignment tax, psychology becomes more or less important in explaining the difficulty, as I gave examples for). On this, we perhaps agree?
I think the problem of "may suggest a potentially suboptimal intervention" is less severe than "isn't descriptive." Plus, I think we're going to see "self-fulfilling alignment" be upsampled after the recent positive results. :)
When talking about "self-fulfilling misalignment", "hyperstition" is a fun name but not a good name which actually describes the concept to a new listener. (In this sense, the name has the same problem as "shard theory" --- cool but not descriptive unless you already know the idea.) As a matter of discourse health, I think people should use "self-fulfilling {misalignment, alignment, ...}" instead.
Based. Thank you for your altruism, Sheikh. :)
Last week, I took the 10% giving pledge to donate at least 10% of my income to effective charities, for the rest of my life. I encourage you to think carefully and honestly about what you can do to improve this world. Maybe you should take the pledge yourself.
Yes, I have left many comments on Nate's posts which I think he would agree were valuable. By blocking me, he confirmed that he was not merely moving (supposedly) irrelevant information, but retaliating for sharing unfavorable information.
I had spent nearly two years without making any public comments regarding Nate's behavior, so I don't see any rational basis for him to expect I would "hound" him in future comment sections.
Different people have different experiences. Some of Nate's coworkers I interviewed felt just fine working with him, as I have mentioned.
I would share your concern if TurnTrout or others were replying to everything Nate published in this way. But well... the original comment seemed reasonably relevant to the topic of the post and TurnTrout's reply seemed relevant to the comment. So it seems like there's likely a limiting principle here
I think there is a huge limiter. Consider that Nate's inappropriate behavior towards Kurt Brown happened in 2017 & 2018 but resulted in no consequences until 5 and a half years later. This suggests that victims are massively under-supplying information due to high costs. We do not have an over-supply problem.
Let me share some of what I've learned from my own experience and reflection over the last two years, and speaking with ~10 people who recounted their own experiences.
Speaking out against powerful people is costly. Due to how tight-knit the community is, speaking out may well limit your professional opportunities, get you uninvited to crucial networking events, and reduce your chances of getting funding. Junior researchers may worry about displeased moderators thumbing the scales against future work they might want to share on the Alignment Forum. (And I imagine that junior, vulnerable community members are more likely to be mistreated to begin with.)
People who come forward will also have their motivations scrutinized. Were they being "too triggered"? This is exhausting, especially because (more hurt) -> (more trauma) -> (less equanimity). However, LessWrong culture demands equanimity while recounting trauma. If you show signs of pain or upset, or even verbally admit that you're upset while writing calmly --- you face accusations of irrationality. Alternatively, observers might invent false psychological narratives --- claiming a grievance is actually about a romantic situation or a personal grudge --- rather than engaging with the specific evidence and claims provided by the person who came forward.
But if abuse actually took place, then the victim is quite likely to feel upset! What sense, then, does it make to penalize people because they are upset, when that's exactly what you'd see from many people who were abused? [1]
This irrational, insular set of incentives damages community health and subsidizes silence, which in turn reduces penalties for abuse.
Certainly, people should write clearly, honestly, and without unnecessary hostility. However, I'm critiquing "dismiss people who are mad or upset, even if they communicate appropriately." ↩︎
More evidence that AIs are fundamentally human-like and non-alien!
(Not sure how serious I'm being, but it's non-zero)