Suppose you are passionate about the problem of AGI alignment. You want to research solutions to the problem, and you state it more or less like this:
We are on track for an AI catastrophe, most likely within the next few decades. We do not know how (or even if) we can prevent it. We need to do continued research right now, and closely follow new developments in industry. If there are enough of us dedicated to the problem, perhaps we can find a solution in time. Otherwise, the companies developing AI won't do it because they only care about short-term business goals, and won't take the threat seriously until it's too late. Come with us if you want to live.
This is a tough sell. From the point of view of any organization granting research funds, it is expected to hear: a) this problem is extremely important and b) our research is the most promising way to solve it. I believe that making the case for a) is easier than for b). Why? Because the notable AI advancements from the past few years came from companies like Facebook and Google. If you are not inside those companies, your job as a researcher is to react to whatever they decide to share with the rest of the world.
If I were completely convinced that this is the most pressing problem that exists, I would want to be as close as possible to the source of new developments so I could influence them directly. It would be a tough job, because I would have to walk a fine line. I could not get myself fired by opposing the deployment of every new mechanism, but perhaps I could spend a significant portion of my time playing devil's advocate. As someone involved in deploying production systems, I could insist in adding circuit breakers and other safety mechanisms. In addition I would be involved in the hiring process, so I could try to get like-minded people to join the team. Perhaps we might manage to nudge the company culture away from "move fast and break things (irreversibly)."
If you cannot beat them, join them and promote change from within.
This seems pretty implausible to me when compared to "work at a lab with an explicit safety team or focus" (e.g. DeepMind, OpenAI, Anthropic, Redwood, Conjecture). Researchers generally also don't get formal power in companies of Google or Facebook scale, nor any real ability to shift the culture.
I'm surprised that "is joining Facebook or Google the best way to work on alignment" seems likely enough to even be worth asking, when you could just work on the problem directly in so many other places, including some with better R&D track records.
I work at Anthropic, but my opinions are my own ↩︎
You don't really have to be that sneaky. People are usually happy to hear the case for AI safety if you put it in palatable terms.
Not that this will instantly enable them to pick the right technical issues to worry about, or make them willing to make sacrifices that affect their own personal comfort. But I bet that AI safety initiatives within companies and universities are mostly formed by by going to someone in control of resources, making a reasonable-if-normie case for building AIs that do good things and not bad things, and then getting money or time allocated to you in a way that looks good for the person doing the allocating.
This is not the level of influence needed to start a Manhattan Project for value-aligned AI, so if that turns out to be necessary we're probably S.O.L., but it seems like a lot more influence than you can exert while being sneaky.
potentially relevant: https://alphabetworkersunion.org/
my view is that approaches within google, such as AWU, are useful brakes on the terribleness but not likely to have enough impact to offset how fast google is getting ahead in capabilities. I'd personally suggest that even starting your own capabilities lab is more useful for safety than joining google - capabilities mixed with safety is what eg anthropic are doing (according to me. maybe anthropic doesn't see their work as advancing capability).