my guess:
I am not surprised to hear this but also, this is insane.
All the lab heads are repeatedly publicly claiming they could cause human extinction, superintelligence is within reach, and a majority of people at their own labs don't take them seriously on this.
I'm somewhat confused what causes a group of people who talk to each other everyday, work on the same projects, observe the same evidence, etc to come to such wildly different conclusions about the work they're doing together and then be uninterested in resolving the disagreement.
Is there a taboo being enforced against discussing these disagreements inside the labs?
this is pretty normal? it's really hard for leadership to make employees care about or believe specific things. do you really think the average Amazon employee or whatever has strong opinions on the future of delivery drones? does the average Waymo employee have extremely strong beliefs about the future of self driving?
for most people in the world, their job is just a job. people obviously avoid working on things they believe are completely doomed, and tend to work on cool trendy things. but generally most people do not really have strong beliefs about where the stuff they're working on is going.
no specific taboo is required to ensure that people don't really iron out deep philosophical disagreements with their coworkers. people care about all sorts of other things in life. they care about money, they care whether they're enjoying the work, they care whether their coworkers are pleasant to be around, they care about their wife and kids and house.
once you have a company with more than 10 people, it requires constant effort to maintain culture. hiring is way harder if you can only hire people who are aligned, or if you insist on aligning people. if you grow very fast (and openai has grown very fast - it's approximately doubled every single year I've been here), it's inevitable that your culture will splinter. forget about having everyone on the same page; you're going to have entire little googletowns and amazontowns and so on of people who bring Google or Amazon culture with them and agglomerate with other recent transplants from those companies.
I agree with your point. I think this tendency is especially visible in global conglomerates with rigid personnel structures. Take Samsung, for example — a company that supplies memory semiconductors while working with Nvidia and TSMC in the AI space. Samsung’s influence is comparable to that of many Silicon Valley giants, but its HR system is still based on traditional corporate aptitude tests and academic pedigree.
They hire on a massive scale, and the problem isn’t really about whether someone believes in X-risk or not — sometimes promotions are limited or even employment is terminated simply because of age. In more flexible job markets, like the Bay Area, the issue you described seems much less pronounced.
I’m writing this from the perspective of Seoul, where the job market is very isolated and rigid, so my comment doesn’t represent the whole world. I just wanted to say that, in that sense, what you wrote might actually make more sense on a global level. On the other side of the planet, there are people who worry less about whether the AI they’re building might end the world, and more about how they’ll feed their families tomorrow.
VojtaKovarik expressed that beautifully through the concept of fiducity, but from an individual’s point of view, it can be summed up more simply as: “One meal tomorrow matters more than the end of the world ten years from now.”
I think it’s notably abnormal specifically because it wasn’t the “default” equilibrium for OpenAI specifically.
Like earlier you mentioned:
selective hiring is very real. lots of people who are xrisk pilled just refuse to join oai. people who care a lot often end up very stressed and leave in large part because of the stress.
and
most cases of xrisk people leaving are just because people felt sidelined/unhappy and chose to leave.
One model of this is “its normal that people at any company don’t have strong opinions about their work”. Another model is “lots of people in various positions did in fact have strong opinions about this given the stakes and left”.
If you send such strong signals about safety that people preemptively filter out of the hiring pipeline, then people who are already there with strong opinions on safety feel sidelined, IMO the obvious interpretation is “you actively filtered against people with strong views on safety”.
I appreciate your answer, it adds useful info.
do you really think the average Amazon employee or whatever has strong opinions on the future of delivery drones? does the average Waymo employee have extremely strong beliefs about the future of self driving?
I assume yes a lot of employees do believe technical capabilities are further ahead of what the rest of the public is aware of, because they have insider knowledge. And they are aware they are accelerating it.
They might not deeply care about drones but they are aware what is going on.
With AI, you can't not care about ASI or extinction or catastrophe or dictatorship or any of the words thrown around, as they directly affect your life too.
once you have a company with more than 10 people, it requires constant effort to maintain culture. hiring is way harder if you can only hire people who are aligned, or if you insist on aligning people.
Then I expect OpenAI to splinter hard as we get closer to ASI.
With AI, you can't not care about ASI or extinction or catastrophe or dictatorship or any of the words thrown around, as they directly affect your life too.
I think it is important to distinguish between caring in the sense of "it hugely affects your life" and caring in the sense of, for example, "having opinions on it, voicing them, or even taking costlier actions". Presumably, if something affects you hugely, you should care about it in the second sense, but that doesn't mean people do. My uninformed intuition is that this would apply to most employees of AI companies.
Can you gesture at what you are basing this guess on?
Ignore this if you are busy, but I was also wondering: Do you have any takes on "do people get fired over acting on beliefs about xrisk", as opposed to "people getting fired over caring about xrisk"?
And perhaps on "whether people would get fired over acting on xrisk beliefs, so they don't act on them"? (Though this seems difficult to operationalise.)
as far as I'm aware, the only person who can be argued to have ever been fired for acting on beliefs about x risk is leopold, and the circumstances there are pretty complicated. since I don't think he's the only person to have ever acted on xrisk at oai to the extent he did, I don't think this is just because other people don't do anything about xrisk.
most cases of xrisk people leaving are just because people felt sidelined/unhappy and chose to leave. which is ofc also bad, but quite different.
I don't think this is just because other people don't do anything about xrisk.
Why is that? If someone went off and consistently worked on an agenda that was directly xrisk related (that didn’t contribute to a short term capabilities or product safety) you’re saying they wouldn’t get sidelined / not allocated resources / fired?
Another dynamic is that employees at AI companies often think that their AI company not going bankrupt will allow P(doom) reduction, at least via the classic mechanism of "having more power lets us do more good things" (e.g. advocating for good policies using the clout of a leading AI company, doing the cheap and important safety/security stuff that the counterfactual company would not even bother doing - and demonstrating that these cheap and important things are cheap and important, using AIs in net-good ways that other companies would have not bothered doing, not trying to spread dangerous ideologies, ...).
And it's unclear how incorrect this reasoning is. This seems to me like a sensible thing to do if you know the company will actually use its power for good - but it's worrisome that so many people in so many different AI companies think they are in an AI company that will do something better with the power than the company that would replace them if they went bankrupt.
I think that "the company is doing sth first-order risky but employees think it's actually good because going bankrupt would shift power towards some other more irresponsible actors" is at least as central as "the company is doing something risky to avoid going bankrupt and employees would prefer that the company would not do it, but [dynamic] prevents employees from stopping it".
But I would be curious to know what are the dynamics that actually take place, which of them matter how much, and what is the overall effect.
I would be interested to know more about those too! However, I don't have any direct experience with the insides of AI companies, and I don't have any friends who do either, so I'm hoping that other readers of this post might have insights that they are willing to share.
For those who have worked for AI companies or have reliable info from others who have worked for an AI company, these are a few things I am especially curious about, categorised by the mechanisms mentioned in the post:
Selective hiring
Firing dissenters as a coordination problem
Compartmentalising undesirable information (and other types of information control)
Yep, these all seem relevant, and I would be interested in answers. (And my thanks to leogao for their takes.)
I would additionally highlight[1] the complication where even if a tendency is present -- say, selective hiring for a particular belief -- it might not work as explicitly as people knowingly paying attention to it. It could be implicitly present in something else ("culture fit"), or purely correlational, etc. I am not sure how to best deal with this.
FWIW, I think this concern is important, but we need to be very cautious about it. Since one could always go "yeah, your results say nothing like this is going on, but the real mechanism is even more indirect". I imagine that the fields like fairness & bias have to encounter this a lot, so they might be some insights.
To be clear, your comment seems aware of this issue. I just wanted to emphasise it.
it might not work as explicitly as people knowingly paying attention to it. It could be implicitly present in something else ("culture fit")
Yes, that sounds plausible to me as well. I did not mention those because I found it much harder to think of ways to tell when those dynamics are actually in play.
If I understand this correctly:
FWIW, I think this concern is important, but we need to be very cautious about it. Since one could always go "yeah, your results say nothing like this is going on, but the real mechanism is even more indirect".
I think you are gesturing at the same issue?
I imagine that the fields like fairness & bias have to encounter this a lot, so they might be some insights.
It makes sense to me that the implicit pathways for these dynamics would be an area of interest to the fields of fairness and bias. But I would not expect them to have any better tools for identifying causes and mechanisms than anyone else[1]. What kinds of insights would you expect those fields to offer?
To be clear, I have only a superficial awareness of the fields of fairness and bias. ↩︎
> I imagine that the fields like fairness & bias have to encounter this a lot, so they might be some insights.
It makes sense to me that the implicit pathways for these dynamics would be an area of interest to the fields of fairness and bias. But I would not expect them to have any better tools for identifying causes and mechanisms than anyone else[1]. What kinds of insights would you expect those fields to offer?
[Rephrasing to require less context.] Consider questions "Are companies 'trying to' override the safety concerns of their employees?" and "Are companies 'trying to' hire fewer women or paying them less?". I imagine that both of these will suffer from the similar issues: (1) Even if a company is doing the bad thing, it might not be doing it through explicit means like having an explicit hiring policy. (2) At the same time, you can probably always postulate one more level of indirection, and end up going on witch hunt even in places where there are no witches.
Mostly, it just seemed to me that fairness & bias might be the biggest examples of where (1) and (2) are in tension. (Which might have something to do with there being both strong incentives to discriminate and strong taboos against it.) So it seems more likely that somebody there would have insights about how to fight it, compared to other fields like physics or AI, and perhaps even more than economics and politics. (Of course, "somebody having insights" is consistent with "most people are doing it wrong".)
As to how those insights would look like: I don't know :-( . Could just be a collection of clear historical examples where we definitely know that something bad was going on but naive methods X, Y, Z didn't show it, together with some opposite examples of where nothing was wrong but people kept looking until they found some signs? Plus some heuristics for how to distinguish these.
tl;dr:
Epistemic status & disclaimers: The mechanisms I describe definitely play some role in real AI companies. But in practice, there are more things going on simultaneously and this post is not trying to give a full picture.[2][3]Also, none of this is meant to be novel, but rather just putting existing things together and applying to AI risk.
Let's leave aside the question how real companies act. Instead, we start with a simple observation: If all a company cared about were financial interests, bankruptcy and the world getting destroyed are equivalent. Unsurprisingly, this translates to undesirable decisions in various situations.
For example, consider an over-simplified scenario where an AI company somehow has precisely these two options[4]:
We could imagine that this corresponds to racing ahead (which risks causing the end of the world) and taking things slowly (which leads to a loss of revenue). But to simplify the discussion, we make the assumption that Option A has no benefit and everybody knows this. In this situation, if the company was following its financial interests (and knew the numbers), it should take Option A -- deploy the AI and risk destroying the world.
However, companies are made of people, who might not be happy with risking the world. Shouldn't we expect that they would decide to take Option B instead? I am going to argue that this might not necessarily be the case. That is, that there are ways in which the company might end up taking Option A even if every employee would prefer Option B instead.
It shouldn't come as a surprise that companies are good at getting people to act against their preferences. The basic example of this is paying people off. By giving people salary, we override their preference for staying home rather than working. Less benignly, an AI company might use a similar tactic to override people's reluctance to gamble with the world -- bribe them with obscene amounts of money and if that is not enough, dangle the chance to shape the future of the universe. However, accepting these bribes is morally questionable to say the least, and might not work on everybody -- and my claim was that AI companies might act irresponsibly even if all of their employees are good people. So later in this text, we will go over a few other mechanisms.
To preempt a possible misunderstanding: Getting a company to act like this does not require deliberate effort[5] by individuals inside the company. Sure, things might go easier if a supervillain CEO can have a meeting with mustache-twirling HR personnel, in order to figure out the best ways to get their employees to go along with profit seeking. And to some extent, fiduciary duty might imply the CEO should be doing this. But mostly, I expect most of these things to happen organically. Many of the mechanisms will be a part of the standard package for how to structure a modern business. Because companies compete and evolve over time, we should expect the most successful ones to have "adaptations" that help their bottom line.
So, what are some of the mechanisms that could help a company to pursue its financial interests even when they are at odds with what employees would naively prefer?
(To reiterate, this list is meant as an existence proof rather than an accurate picture of the key dynamics responsible for the behaviour of AI companies.)
I described some dynamics that could plausibly take place inside AI companies -- that probably do take place there. But I would be curious to know what are the dynamics that actually take place, which of them matter how much, and what is the overall effect. (For all I know, this could be towards responsible behaviour.) Looking at the actions that the companies took so far gives some information, but it isn't clear to me how, say, lobbying behaviour generalises to decisions about deploying superintelligence.
Why care about this? Partly, this just seems fascinating on its own. Partly, this seems important to understand if somebody wanted to make AI companies "more aligned" with society. Or it might be that AI companies are so fundamentally "misaligned" that gentle interventions are never going to be enough -- but if that was the case, it would be important to make a clearer case that this is so. Either way, understanding this topic better seems like a good next step. (If you have any pointers, I would be curious!)
Finally, I get the impression that there is general reluctance to engage with the possibility that AI companies are basically "pure evil" and should be viewed as completely untrustworthy.[9] I am confused about why this is. But one guess is that it's because some equate "the company is evil" with "the employees are bad people". But this is incorrect: An AI company could be the most harmful entity in human history even if every single employee was a decent person. We should hesitate to accuse individual people, but this should not prevent us from recognising that the organisation might be untrustworthy.
When I mention following financial interests, I just mean the vague notion of seeking profits, revenue, shareholder value, influence, and things like that (and being somewhat decent at it). I don't think the exact details matter for the point of this post. I definitely don't mean to imply that the company acts as a perfectly rational agent or that it is free of internal inefficiencies such as those described in Immoral Mazes or Recursive Middle Manager Hell.
More precisely, there will be various dynamics in play. Some these push in the direction of following profits, others towards things like doing good or following the company's stated mission, and some just cause internal inefficiencies. I expect that the push towards profits will be stronger when there is stronger competition and higher financial stakes. But I don't have a confident take on where the overall balance lies. Similarly, I don't claim that the mechanisms I give here as examples (selective hiring and miscoordination) are the most important ones among those that push towards profit-following.
Relatedly to footnotes 1 and 2, Richard Ngo made some good points about why the framing I adopt here is not the right one. (His post Power Lies Trembling is relevant and offers a good framing on dynamics inside countries -- but probably companies too.) Still, I think the dynamics I mention here are relevant too, and in the absence of knowing a better pointer to their discussion, this point was cheap enough to write.
The scenario is over-simplified and unrealistic, but this shouldn't matter too much. The same dynamics should show up in many other cases as well.
The post Unconscious Economics feels relevant here.
I am not sure how exactly this works and it interacts with "negative externalities" such as "unclear risk of extinction". There definitely is a threat of getting sued over this, but I am not sure how much this really matters, as opposed to serving as a convenient excuse to not rock the boat.
This will be costly for the company as well. If nothing else, it causes delays and they will be replacing you by somebody less skilled (otherwise they would have hired them already). So I advocate for not complying with things you are against. But despite this, the coordination problems are definitely real and difficult to solve.
To reiterate, this does not require deliberate plotting by anybody inside the company. You don't need to actually fire those people; it should be enough to incidentally underfund them, or perhaps converge on a company culture where they leave on their own.
I am not saying they definitely are. But I do view this as plausible enough that acting on it would seem reasonable.