I shouldn't be here, but I can't stay away. Systems which produce as output intellectual content in an ongoing fashion run the risk of low-entropy sink states without my intervention. I'm keeping an eye on you, because I care deeply about the rationalist program. 

Frankly I have an obsession with playing games with the rationalist community and its members. I spent a long time trying to do so maximally cooperatively, pursuing a career in AI safety research; perfectionism was paralyzing, and I got stuck at a ladder step in this career path in a very painful way. I then tried to stay away for years; LessWrong is an attractor I was not able to ignore, and this manifested as internally maligning the community and probably doing downstream subtle harm rather than the intended causal separation.

My current belief is that indulging myself with the intention of some non-maximal cooperation (small but nonzero cosine distance; imperfect alignment) is an effective equilibrium. The first paragraph in this bio is a rationalization of this behavior that I partially believe, and I intend to follow a script like this–stirring pots and making messes only insofar as it seems plausibly like valuable temperature-raising intervention in our (roughly) shared search for epistemic progress.


Sorted by New

Wiki Contributions


This is a bit of an odd time to start debating, because I haven't explicitly stated a position, and it seems we're in agreement that that's a good thing[1]. Calling this to attention because

  1. You make good points.
  2. The idea you're disagreeing with digresses from any idea I would endorse multiple times in the first two sentences.

Speaking first to this point about culture wars: that all makes sense to me. By this argument, "trying to elevate something to being regulated by congress by turning it into a culture war is not a reliable strategy" is probably a solid heuristic.

I wonder whether we've lost the context of my top-level comment. The scope (the "endgame") I'm speaking to is moving alignment into the set of technical safety issues that the broader ML field recognizes as its responsibility, as has happened with fairness. My main argument is that a typical ML scientist/engineer tends not to use systemic thought to adjudicate which moral issues are important, and this is instead "regulated by tribal circuitry" (to quote romeostevensit's comment). This does not preclude their having requisite technical ability to make progress on the problem if they decide it's important.

As far as strategic ideas, it gets hairy from there. Again, I think we're in agreement that it's good for me not to come out here with a half-baked suggestion[1].


There's a smaller culture war, a gray-vs-blue one, that's been waging for quite some time now, in which more inflamed people argue about punching nazis and more reserved people argue about what's more important between protecting specific marginalized groups or protecting discussion norms and standards of truth.

Here's a hypothetical question that should bear on strategic planning: suppose you could triple the proportion of capable ML researchers who consider alignment to be their responsibility as an ML researcher, but all of the new population are on the blue side of zero on the protect-groups-vs-protect-norms debate. Is this an outcome more likely to save everyone?

  • On the plus side, the narrative will have shifted massively away from a bunch of the failure modes Rob identified in the post (this is by assumption: "consider alignment to be their responsibility").
  • On the minus side, if you believe that LW/AF/EA-style beliefs/norms/aesthetics/ethics are key to making good progress on the technical problems, you might be concerned about alignment researchers of a less effective style competing for resources.

If no, is there some other number of people who could be convinced in this manner such that you would expect it to be positive on AGI outcomes?

  1. ^

    To reiterate:

    1. I expect a large portion of the audience here would dislike my ideas about this for reasons that are not helpful.
    2. I expect it to be a bad look externally for it to be discussed carelessly on LW.
    3. I'm not currently convinced it's a good idea, and for reasons 1 and 2 I'm mostly deliberating it elsewhere.

Nothing like taking over the world. From a certain angle it’s almost opposite to that, relinquishing some control.

The observations in my long comment suggest to me some different angles for how to talk about alignment risk. They are part of a style of discourse that is not well-respected on LessWrong, and this being a space where that is pushed out is probably good for the health of LessWrong. But the state of broader popular political/ethical discourse puts a lot of weight on these types of arguments, and they’re more effective (because they push around so much social capital) at convincing engineers they have an external responsibility.

I don’t want to be too specific with the arguments unless I pull the trigger on writing something longer form. I was being a little cheeky at the end of that comment and since I posted it I’ve been convinced that there’s more harm in expressing that idea ineffectively or dismissively than I’d originally estimated (so I’m grateful to the social mechanism that prevented me from causing harm there!).

A success story would look like building a memeplex that is entirely truthful, even if it’s distasteful to rationalists, and putting it out into the world, where it would elevate alignment to the status that fairness/accountability have in the ML community. This would be ideologically decentralizing to the field; in this success story I don’t expect the aesthetics of LessWrong, Alignment Forum, etc to be an adequate home for this new audience, and I would predict something that sounds more like that link from Percy Liang becoming the center of the conversation. It would be a fun piece of trivia that this came from the rationalist community, and look at all this other crazy stuff they said! We might hear big names in AI say of Yudkowsky what Wittgenstein said of Russell:

Russell's books should be bound in two colours…those dealing with mathematical logic in red – and all students of philosophy should read them; those dealing with ethics and politics in blue – and no one should be allowed to read them.

This may be scary because it means the field would be less aligned than it is currently. My instinct says this is a kind of misalignment that we’re already robust to: ideological beliefs in other scientific communities are often far more heterogeneous than those of the AI alignment community. It may be scary because the field would become more political, which may end up lowering effectiveness, contra my hypothesis that growing the field this way would be effective. It may be scary because it’s intensely status-lowering in some contexts for anyone who would be reading this.

I’m still on the fence for whether this would be good on net. Every time I see a post about the alignment discourse among broader populations, I interpret it as people interested in having some of this conversation, and I’ll keep probing.

tl;dr: most AI/ML practitioners make moral decisions based on social feedback rather than systems of moral thought. Good arguments don't do much here.

Engineers and scientists, most of the time, do not want to think about ethics in the context of their work, and begrudgingly do so to the extent that they are socially rewarded for it (and socially punished for avoiding it). See here.


I wrote in another comment about my experience in my early research career at a FAANG AI lab trying to talk to colleagues about larger scale risks. Granted we weren't working on anything much like AGI at the time in that group, but there are some overlapping themes here.

What I built to in that story was that bringing up anything about AGI safety in even a work-adjacent context (i.e. not a meeting about a thing, but, say, over lunch) was a faux pas:

Distracting: We have something else we're working on, and that is a deep question, and you probably could push hard enough on me to nerd snipe me with it if I don't put up barriers.

Rude: It implies that the work we're doing here, which we all care deeply about (right?) is problematic for reasons well outside our models of who we are and what we're responsible for, and challenging that necessitates a bunch of complicated shadow work.

Low status: Wait, are you one of those LessWrong people? I bet you're anti-woke and think James Damore shouldn't have been fired, huh? And you're so wound up in your privilege bubble that you think this AGI alarmism is more important than the struggles of real underprivileged people who we know actually exist, here, now? Got it.

I'm connecting this in particular to point 6 in your second list ("that's not my job").


David Chapman writes, in a fairly offhand footnote in Vividness, with reference to stages in Kegan's constructive developmental framework:[1]

[In technical domains] it’s common to operate at stage 3 in emotional and relational domains, although at stage 4 in cognitive ones.

I think this is a useful map, if a bit uncharitable, where morality and politics go in the "emotional and relational domains" bucket. Go hang out at Brain or FAIR, especially the parts farther away from AGI conversations, especially among IC-level scientists and engineers, and talk to people about anything controversial. Unless you happen across someone's special interest and they have a lot to say about shipping logistics in Ukraine or something, you're generally going to encounter more or less the same views that you would expect to find on Twitter (with some neoliberal-flavored cope about the fact that FAANG employees are rich; see below). Everyone will get very excited to go participate in a protest on Google campus about hiring diversity or pay inequality or someone's toxic speech who needs to be fired, because they are socially embedded in a community in which these are Things That Matter.

You look into what "AI ethics" means to most practitioners and generally the same pattern emerges: people who work on AI are very technically proficient at working on problems around fairness and accountability and so on. The step of actually determining which ethical problems they should be responsible for is messier, is what I think Rob's post is primarily about, and I think is better understood using Chapman's statement about technical folks operating at stage 3 outside of technical contexts.


Here's some fresh AI ethics controversy: Percy Liang's censure of Yannic Kilcher for his GPT-4chan project. From the linked form, (which is an open letter soliciting more people signing on and boasting an all-star cast after being up for 24 hours), emphasis mine:

Yannic Kilcher's deployment of GPT-4chan is a clear example of irresponsible practice. GPT-4chan is a language model that Kilcher trained on over three million 4chan threads from the Politically Incorrect /pol/ board, a community full of racist, sexist, xenophobic, and hateful speech that has been linked to white-supremacist violence such as the Buffalo shooting last month.  He then used GPT-4chan to generate and deceptively post over 30,000 posts on 4chan mimicking the hateful comments it was trained on without identifying the model as a bot. Kilcher now claims that the release of “the most horrible model on the internet” was “a prank and light-hearted trolling.”

It is possible to imagine a reasonable case for training a language model on toxic speech, for example, to detect and understand toxicity on the internet, or for general analysis. However, Kilcher’s decision to deploy this bot does not meet any test of reasonableness. His actions deserve censure. He undermines the responsible practice of AI science.

You may feel one way or another about this; personally if I put bigger AI safety questions aside and think about this I feel ambivalent, unsure, kind of icky about it.

But my point here is not to debate the merits of this open letter. My point is that this is what the field connects to as their own moral responsibility. Problems that their broader community already cares about for non-technical and social-signaling-y reasons, but that their technical work undeniably affects. I don't see this even as moral misalignment in the most obvious sense, rather that engineers and scientists, most of the time, do not want to think about ethics in the context of their work, and begrudgingly do so to the extent that they are socially rewarded for it (and socially punished for avoiding it).

In trying to reach these folks, convincing arguments are a type error. Convincing arguments require systematic analysis, a framework for analyzing moral impact, which the vast majority of people, even many successful scientists and engineers, don't really use to make decisions. Convincing arguments are for setting hyperparameters and designing your stack and proving that your work counts as "responsible AI" according to the current consensus on what that requires. Social feedback is the source of an abstract moral issue attaining "responsible AI" status, actually moving that consensus around.


I spent some years somewhat on the outside, not participating in the conversation, but seeing it bleed out into the world. Seeing what that looks like from a perspective like what I'm describing. Seeing how my friends, the ones I made during my time away from the Bay Area and effective altruism and AI alignment, experienced it. There's a pervasive feeling that the world is dominated by "a handful of bug-eyed salamanders in Silicon Valley" and their obsessions with living forever and singularity myths and it's a tragic waste of resources that could be going to helping real people with real problems. Even this description is uncharacteristically charitable.

This is not the perspective of a Google engineer, of course. But I think that this narrative has a much stronger pull to a typical Google engineer than any alignment argument. "You're culpable as part of the Big Evil Capitalist Machine that hurts minorities" is more memetically powerful than "A cognitive system with sufficiently high cognitive powers, given any medium-bandwidth channel of causal influence, will not find it difficult to bootstrap to overpowering capabilities independent of human infrastructure." This is probably counterintuitive in the frame of a conversation on LessWrong. I think it's right.


There's a pithy potential ending to this comment that keeps jumping into my mind. It feels like both an obviously correct strategic suggestion and an artistically satisfying bow to tie. Problem is it's a pretty unpopular idea both among the populations I'm describing here and, I think, on LessWrong. So I'm gonna just not say it, because I don't have enough social capital here at time of writing to feel comfortable taking that gambit.

I'd love it if someone else said it. If no one does, and I acquire sufficient social capital, maybe then I will.

  1. ^

    Stage 3 is characterized by a sense of self and view being socially determined, whereas in stage 4 they are determined by the workings of some system (e.g. ~LW rationality; think "if I say something dumb people will think less of me" as stage 3 relating to rationality as community norms, versus "I think that preferring this route to more traditional policy influence requires extreme confidence about details of the policy situation" as stage 4 usage of the systems of thought prescribed by the framework).

Answer by outerloper340

When I worked a FAANG research job, my experience was that it was socially punishable to bring up AI alignment research in just about any context, with exceptions as it was relevant to the team's immediate mission, for example robustness on the scale required for medical decisions (a much smaller scale than AGI ruin, but a notably larger scale, in the sense of errors being costly, than most deep learning systems in production use at the time).

I find that in some social spaces, Rationality/EA-adjacent ones in particular, it's seen as distracting, rude, and low status to emphasize a hobby horse social justice issue at the expense of whatever else is being discussed. This is straightforward when "whatever else is being discussed" is AI alignment, which the inside view privileges roughly as "more important than everything else, with vague exceptions when the mental health of high-value people who might otherwise do productive work on the topic is at stake."

On a medical research team, I took a little too long to realize that I'd implicitly bought into a shared vision of what's important. We were going to save lives! We weren't going to cure cancer–everyone falls for that trap, aiming too high. We're working on the ground, saving real people, on real timescales. Computer vision can solve the disagreement-among-experts problem in all sorts of medical classification problems, and we're here to fight that fight and win.

So you've gathered a team of AI researchers, some expert, some early-career, to finally take a powerful stab at the alignment problem. A new angle, or more funding, or the right people in the room, whatever belief of comparative advantage you have that inspires hope beyond death with dignity. And you have someone on your team who deeply cares about a complicated social issue you don't understand. Maybe this is their deepest mission, and they see this early-engineer position at your new research org as a stepping stone toward the fairness and accessibility team at Brain that's doing the real work. They do their best to contribute in the team's terms of what's valuable, and they censor themselves constantly, waiting for the right moment to make the pivotal observation that there's not a single cis woman in the room, or that the work we're doing here may be building a future that's even more hostile toward people with developmental disabilities, or this adversarial training scheme has some alarming implications when you consider that the system could learn race as a feature even if we exclude it from the dataset, or something.

I think this is a fair analogue to my situation, and I expect more broadly among people already doing AI research toward a goal other than alignment. It's

  • Distracting: We have something else we're working on, and that is a deep question, and you probably could push hard enough on me to nerd snipe me with it if I don't put up barriers.
  • Rude: It implies that the work we're doing here, which we all care deeply about (right?) is problematic for reasons well outside our models of who we are and what we're responsible for, and challenging that necessitates a bunch of complicated shadow work.
  • Low status: Wait, are you one of those LessWrong people? I bet you're anti-woke and think James Damore shouldn't have been fired, huh? And you're so wound up in your privilege bubble that you think this AGI alarmism is more important than the struggles of real underprivileged people who we know actually exist, here, now? Got it.

I'm being slightly unfair in implying that these are literally interactions I had with real people in the industry. This is more representative of my experiences online and in other spaces with less of a backdrop of professional courtesy. At [FAANG company] these interactions were subtler.


This story is meant to provide answers to your questions 1 and 2. As far as question 3 and making a change, I'm bullish on narratives, aesthetics, anthropology and the like as genuine interventions upstream of AI safety. We're in a social equilibrium where only certain sorts of people can move into AI safety without seriously disrupting the means by which their social needs are met. There are many wonderful people in that set, but it is relatively quite small compared to the set of people who, if they were convinced to genuinely try, could contribute meaningfully.

I would guess this doesn't appear to qualify for bonus points for being reasonably low-hanging. I come from an odd place though: personally sufficiently traumatized by my experiences in AI research that in practical terms contributing there is more or less off limits for me for the time being, yet compelled by AGI ruin narratives and experienced with substantial relevant technical background. So at least for me, this is the way forward.