my guess:
Can you gesture at what you are basing this guess on?
Ignore this if you are busy, but I was also wondering: Do you have any takes on "do people get fired over acting on beliefs about xrisk", as opposed to "people getting fired over caring about xrisk"?
And perhaps on "whether people would get fired over acting on xrisk beliefs, so they don't act on them"? (Though this seems difficult to operationalise.)
as far as I'm aware, the only person who can be argued to have ever been fired for acting on beliefs about x risk is leopold, and the circumstances there are pretty complicated. since I don't think he's the only person to have ever acted on xrisk at oai to the extent he did, I don't think this is just because other people don't do anything about xrisk.
most cases of xrisk people leaving are just because people felt sidelined/unhappy and chose to leave. which is ofc also bad, but quite different.
Another dynamic is that employees at AI companies often think that their AI company not going bankrupt will allow P(doom) reduction, at least via the classic mechanism of "having more power lets us do more good things" (e.g. advocating for good policies using the clout of a leading AI company, doing the cheap and important safety/security stuff that the counterfactual company would not even bother doing - and demonstrating that these cheap and important things are cheap and important, using AIs in net-good ways that other companies would have not bothered doing, not trying to spread dangerous ideologies, ...).
And it's unclear how incorrect this reasoning is. This seems to me like a sensible thing to do if you know the company will actually use its power for good - but it's worrisome that so many people in so many different AI companies think they are in an AI company that will do something better with the power than the company that would replace them if they went bankrupt.
I think that "the company is doing sth first-order risky but employees think it's actually good because going bankrupt would shift power towards some other more irresponsible actors" is at least as central as "the company is doing something risky to avoid going bankrupt and employees would prefer that the company would not do it, but [dynamic] prevents employees from stopping it".
But I would be curious to know what are the dynamics that actually take place, which of them matter how much, and what is the overall effect.
I would be interested to know more about those too! However, I don't have any direct experience with the insides of AI companies, and I don't have any friends who do either, so I'm hoping that other readers of this post might have insights that they are willing to share.
For those who have worked for AI companies or have reliable info from others who have worked for an AI company, these are a few things I am especially curious about, categorised by the mechanisms mentioned in the post:
Selective hiring
Firing dissenters as a coordination problem
Compartmentalising undesirable information (and other types of information control)
Yep, these all seem relevant, and I would be interested in answers. (And my thanks to leogao for their takes.)
I would additionally highlight[1] the complication where even if a tendency is present -- say, selective hiring for a particular belief -- it might not work as explicitly as people knowingly paying attention to it. It could be implicitly present in something else ("culture fit"), or purely correlational, etc. I am not sure how to best deal with this.
FWIW, I think this concern is important, but we need to be very cautious about it. Since one could always go "yeah, your results say nothing like this is going on, but the real mechanism is even more indirect". I imagine that the fields like fairness & bias have to encounter this a lot, so they might be some insights.
To be clear, your comment seems aware of this issue. I just wanted to emphasise it.
tl;dr:
Epistemic status & disclaimers: The mechanisms I describe definitely play some role in real AI companies. But in practice, there are more things going on simultaneously and this post is not trying to give a full picture.[2][3]Also, none of this is meant to be novel, but rather just putting existing things together and applying to AI risk.
Let's leave aside the question how real companies act. Instead, we start with a simple observation: If all a company cared about were financial interests, bankruptcy and the world getting destroyed are equivalent. Unsurprisingly, this translates to undesirable decisions in various situations.
For example, consider an over-simplified scenario where an AI company somehow has precisely these two options[4]:
We could imagine that this corresponds to racing ahead (which risks causing the end of the world) and taking things slowly (which leads to a loss of revenue). But to simplify the discussion, we make the assumption that Option A has no benefit and everybody knows this. In this situation, if the company was following its financial interests (and knew the numbers), it should take Option A -- deploy the AI and risk destroying the world.
However, companies are made of people, who might not be happy with risking the world. Shouldn't we expect that they would decide to take Option B instead? I am going to argue that this might not necessarily be the case. That is, that there are ways in which the company might end up taking Option A even if every employee would prefer Option B instead.
It shouldn't come as a surprise that companies are good at getting people to act against their preferences. The basic example of this is paying people off. By giving people salary, we override their preference for staying home rather than working. Less benignly, an AI company might use a similar tactic to override people's reluctance to gamble with the world -- bribe them with obscene amounts of money and if that is not enough, dangle the chance to shape the future of the universe. However, accepting these bribes is morally questionable to say the least, and might not work on everybody -- and my claim was that AI companies might act irresponsibly even if all of their employees are good people. So later in this text, we will go over a few other mechanisms.
To preempt a possible misunderstanding: Getting a company to act like this does not require deliberate effort[5] by individuals inside the company. Sure, things might go easier if a supervillain CEO can have a meeting with mustache-twirling HR personnel, in order to figure out the best ways to get their employees to go along with profit seeking. And to some extent, fiduciary duty might imply the CEO should be doing this. But mostly, I expect most of these things to happen organically. Many of the mechanisms will be a part of the standard package for how to structure a modern business. Because companies compete and evolve over time, we should expect the most successful ones to have "adaptations" that help their bottom line.
So, what are some of the mechanisms that could help a company to pursue its financial interests even when they are at odds with what employees would naively prefer?
(To reiterate, this list is meant as an existence proof rather than an accurate picture of the key dynamics responsible for the behaviour of AI companies.)
I described some dynamics that could plausibly take place inside AI companies -- that probably do take place there. But I would be curious to know what are the dynamics that actually take place, which of them matter how much, and what is the overall effect. (For all I know, this could be towards responsible behaviour.) Looking at the actions that the companies took so far gives some information, but it isn't clear to me how, say, lobbying behaviour generalises to decisions about deploying superintelligence.
Why care about this? Partly, this just seems fascinating on its own. Partly, this seems important to understand if somebody wanted to make AI companies "more aligned" with society. Or it might be that AI companies are so fundamentally "misaligned" that gentle interventions are never going to be enough -- but if that was the case, it would be important to make a clearer case that this is so. Either way, understanding this topic better seems like a good next step. (If you have any pointers, I would be curious!)
Finally, I get the impression that there is general reluctance to engage with the possibility that AI companies are basically "pure evil" and should be viewed as completely untrustworthy.[9] I am confused about why this is. But one guess is that it's because some equate "the company is evil" with "the employees are bad people". But this is incorrect: An AI company could be the most harmful entity in human history even if every single employee was a decent person. We should hesitate to accuse individual people, but this should not prevent us from recognising that the organisation might be untrustworthy.
When I mention following financial interests, I just mean the vague notion of seeking profits, revenue, shareholder value, influence, and things like that (and being somewhat decent at it). I don't think the exact details matter for the point of this post. I definitely don't mean to imply that the company acts as a perfectly rational agent or that it is free of internal inefficiencies such as those described in Immoral Mazes or Recursive Middle Manager Hell.
More precisely, there will be various dynamics in play. Some these push in the direction of following profits, others towards things like doing good or following the company's stated mission, and some just cause internal inefficiencies. I expect that the push towards profits will be stronger when there is stronger competition and higher financial stakes. But I don't have a confident take on where the overall balance lies. Similarly, I don't claim that the mechanisms I give here as examples (selective hiring and miscoordination) are the most important ones among those that push towards profit-following.
Relatedly to footnotes 1 and 2, Richard Ngo made some good points about why the framing I adopt here is not the right one. (His post Power Lies Trembling is relevant and offers a good framing on dynamics inside countries -- but probably companies too.) Still, I think the dynamics I mention here are relevant too, and in the absence of knowing a better pointer to their discussion, this point was cheap enough to write.
The scenario is over-simplified and unrealistic, but this shouldn't matter too much. The same dynamics should show up in many other cases as well.
The post Unconscious Economics feels relevant here.
I am not sure how exactly this works and it interacts with "negative externalities" such as "unclear risk of extinction". There definitely is a threat of getting sued over this, but I am not sure how much this really matters, as opposed to serving as a convenient excuse to not rock the boat.
This will be costly for the company as well. If nothing else, it causes delays and they will be replacing you by somebody less skilled (otherwise they would have hired them already). So I advocate for not complying with things you are against. But despite this, the coordination problems are definitely real and difficult to solve.
To reiterate, this does not require deliberate plotting by anybody inside the company. You don't need to actually fire those people; it should be enough to incidentally underfund them, or perhaps converge on a company culture where they leave on their own.
I am not saying they definitely are. But I do view this as plausible enough that acting on it would seem reasonable.