Google’s Ethical AI team and AI Safety

magfrump

cross-posted from my blog

Background on the events

I have been thinking about this since the firing of Dr. Timnit Gebru, and yet still no one has actually written about it beyond my own tweets, so I guess it falls to me.

I find, and I imagine many people in the rat-sphere agree, the idea of talking about energy consumption and climate change to be low on my list of ethical priorities surrounding AI. But I find that uncompelling because I think that (a) this cost can be weighed against the benefits AI can create and (b) this cost can be literally offset by potential current and future carbon capture technologies. I think this is well established in the EA community, with recent possible exceptions taking shape.

But these ideas rely on current assumptions about how much power is being used for what purposes. If AI continues to scale by adding compute, as is generally expected, this could create conflicts of interest in the AI space. That would be bad for a number of reasons, chief among them that it would mean that only actors who are willing to impose substantial costs on the commons would be able to implement their visions. This is my central point, so I will return to it later.

For now, just keep in mind that the low priority of climate change among EAs is an empirical question of how easy it is to influence certain changes. I don’t think any of the specific work by Dr. Gebru makes a convincing case to me that the question has a different answer. But I haven’t heard literally any other single person say that!

Instead, she was fired, and today the other co-lead of her team was also fired. The justification for firing Gebru was “she quit.” No public statement has been made, even internally to the team both managed, about why Margaret Mitchell was fired, unless you count “it’s part of a re-org.” For reference, my team at Google has been re-org’d at least four times, and I have never seen anyone fired or even moved out of their management position in that time. Usually I don’t even notice.

(Because of timing there has been some conflation of this incident with the firing of a recruiter who worked mostly with historically Black colleges and universities. Maybe this is evidence that racism played a part in the decision, but I intend to regard “bad decision-making” on the part of Alphabet as a bit of a black box because harming AI safety prospects is very bad regardless of whether they are doing it for racist reasons.)

So at this stage, it looks like a big corporation made some bad HR decisions and fired people who were well regarded as managers but ultimately doing work that I value about as much as most of the day to day work at Google. That’s not so bad, beyond the fact that we live in a world where small sets of high ranking execs get to make bad decisions without oversight, but we all already knew we were living in that world.

Models of AI Safety

The reason I think this is bad, is that I think it invalidates my ideas about how AI Safety could be implemented in the real world.

In brief: in order to “align” your AI, you will need to reduce it’s direct efficacy on some measure, which will be opposed by a middle manager. I had hoped that “official avenues” like the Ethical AI team could be sufficiently ingrained that when the ideas needed to solve AI Safety are developed, there is a way to incorporate them into the projects which have enough compute to create an aligned AI before others accidentally create misaligned AI.

In more detail:

AI scales intensely with compute. (99%+ confidence)
Large projects, such as Google or the US government, will have access to more compute than projects formed by small organizations, being able to easily put together several million dollars of compute on short notice. (95% confidence)
Some small set of large projects will be in position to create AGI with some alignment plan for a few years before large numbers of small actors will be in position to do so. (requires ~1+2, slightly less confident than 2, say a 90% confidence interval of 1.5-10 years)
Once a large number of small actors are able to easily create AGI, one of them will accidentally create misaligned AGI pretty quickly. This is my “time limit” on how long we have to get AGI implemented, assuming a ‘solution’ exists before AGI is possible. (~80% chance a misaligned AGI emerges within 30 years of it being possible to make an AGI with <1 year’s SWE salary of compute in your garage; 50% chance within 10 years, 10% chance within 2 years)
The first available solution to AGI alignment will require spending more time to develop an aligned AGI than the first available plan for creating any AGI. (90% confidence interval of how much longer: 2 weeks - 5 years)
Therefore, in order to be confident that the first AGI created is aligned, the process from “AGI alignment is solved and AGI is possible with access to the best compute on earth” to “An org with enough compute to build AGI is executing an alignment plan with enough head start not to be overtaken by a new misaligned AGI project” needs to be as short as possible, because 1.5 years of compute advantage plus 2 years of code accessibility is already possibly not enough to cover the delay needed to align the AGI. (follows from 3,4,5)
Ethical AI teams are a natural place to introduce an alignment solution to a large organization like Google. (90% confidence interval of how much faster the team could impose an alignment plan than any other team: 0.5 - 10 years. Probability I think such a team would impose such a plan, if they had it and were in a position to do so: 80%+. Probability I think any other team would impose such a plan: ~30%)
1. The team has to be aware of cutting edge developments in alignment enough to identify the earliest correct solution. Teams focused on improving results or applying to specific cases will not reliably have that familiarity, but it fits directly into the scope of ethical AI teams.
2. The team has to be technically capable of influencing the direction of actually implemented AI projects at the org. If one Google exec believes something strongly, they can’t implement a technical program on their own. If people in Ads understand the program, transitioning to Google Brain’s codebase alone would be a difficult task. An ethical AI team should have specific firsthand experience applying alignment-like frameworks to actual AI projects, so that they can begin executing as soon as the priority is clear to them.
3. The team has to be politically capable of influencing the direction of actually implemented AI projects at the org. If a SWE says “I can do this in two weeks” and their manager says “ship it tomorrow or I’ll get Jane to do it instead,” then you need to have influence over every possible SWE that could do the work. If the organization instead sees the value of oversight programs and has people in place to execute those programs, you only need to influence the leader of that team to start the plan.

I don’t think any of these points are controversial or surprising.

There has long been agreement that large-scale projects such as those at large corporations or governments will be able to create AGI earlier. This is a possible way to get the timing lead necessary if aligning the AGI causes it to take much more time than misaligned AGI.

But that timing lead only manifests if it takes a shorter period of time to become part of the org and set the direction of a major project than it does to wait for compute to get cheaper.

Putting two and two together

My previously existing hope was something like this:

Existing ethical AI teams are maintained at large companies because they add value through:
1. Something like PR maintenance by keeping them from being too evil
2. Finding pockets of non-obvious value through accessibility or long-term incentives.
Existing ethical AI teams actually care about ethics, and have some members that keep up with AI Safety research.
The members who keep up with safety research can convince their team to act when there is a “solution.”
The ethical AI team can convince other teams to act.
One or more large companies with such teams and such plans will move forward confidently while other possible early actors will not, giving them a strategic advantage.
????
Profit

The news of Drs. Gebru and Mitchell being removed from Google seems to be a direct refutation of (4), because their attempts to create action more broadly caused retaliation against them.

It also makes me very concerned about (1), and especially (1.1), in that it seems that Google continued this course of action over the course of three months of bad press. I can also take this as some evidence that (1.2) isn’t working out, or else there would be some profit motive for Google not to take these steps.

Google is the prime example of a tech company that values ethics, or it was in the recent past. I have much less faith in Amazon or Microsoft or Facebook or the US federal government or the Chinese government that they would even make gestures toward responsibility in AI. And the paper that is widely cited as kicking off this debate is raising concerns about racial equity and climate change, which are exactly the boring central parts of the culture war that I’d expect Googlers to support in massive majorities.

What else could happen

One of the big reactions I’m having is despair. I don’t think this is a killing blow to humanity for a few reasons, but it did seem like the single most likely path to a really good future, so I’m pretty sad about it.

But I think there are still a number of other ways that the problem of implementing an AGI alignment strategy could be solved.

It could be much easier to solve alignment than expected, or the solution could make it easier to implement AGI than otherwise! That would be nice, but it’s not within anyone’s control.
New ethical AI teams could form at other organizations, especially the US Gov’t. This seems directly actionable, though I don’t really see how someone would sell this work to an org.
More secretive connections (such as the leadership of DeepMind or OpenAI) could establish a more direct throughline that I will never know about until I start living out the first chapter of Friendship is Optimal. This also does not seem actionable to me.
Fix (4) in the above by pressure from people who can exert influence over such organizations.
1. By creating helpful legislation for alignment if such a thing is possible, which I am somewhat skeptical of
2. Worker organizations that can pressure tech companies on problems of the commons, such as the newly formed Alphabet Worker’s Union. This seems directly actionable, but requires a lot of effort of a sort that I’m uncomfortable with.

I guess I’m mostly out of thoughts, but I hope this makes the case at least somewhat that these events are important, even if you don’t care at all about the specific politics involved.

Two comments on your model, both leading to the same conclusion that even if Google had an AI Ethics panel with good recommendations, teams that might produce AGI would not implement them:

Not being stopped

To prevent a bad end, the first aligned AGI must also prevent the creation of any future possibly-unaligned (or differently aligned) AGIs. This is implied by some formulations of AGI (e.g. converging instrumental goals) but it's good to state it explicitly. Let's call this "AGI conquers the world".

A team good enough to build an aligned AGI, is likely also good enough to foresee that it might conquer the world. Once this becomes known, in a company the size of Google, enough people would strongly disapprove that it would be leaked outside the company, and then the project would be shut down or taken over by external forces.

Building an AGI that might take over the world can only work if it's kept very secret - harder in a company like Google than in a small company or in a secretive government agency - or if noone outside the project believes it can succeed. In either case, an AI ethics committee wouldn't intervene.

Value alignment

Suppose that the AI Alignment problem is solved. The solution is public, easy to implement, and proven correct. However, the values to align to still need to be chosen; they are independent of the solution.

There will still be teams competing to build the first AGI. Each team will align its AGI with its own values. (Ignore for the moment disagreements between team members.) But what does that mean in practice - who gets to choose these values? The programmers? Middle management? The CEO? The President? The AI Ethics committee?

Any public attempt to agree on values will generate a huge, unsolvable political storm. Faced with a chance to control our future lightcone, humans will argue about democracy, religion, and the unfair firing of Dr Gebru. Meanwhile, anyone who thinks they can influence the values in their favor will have the biggest imaginable incentive to do so. This includes, of course, use of force and potential sabotage of the project.

(For sabotage, read nuclear first strike. If you think that's unlikely, consider how e.g. the US military might react if they truly believe a Chinese company is about to develop a singleton AGI. Or how Israel might react if they believe that about Iran.)

Therefore, any team who thinks they are building an AGI and realises the implications will do their best to stay secret. Which means not revealing yourself to the Google AI Ethics panel. They might still implement ethics or alignment, but they would be equally likely to implement things published outside Google.

Meanwhile, any team that doesn't realize they are building an AGI, will probably not be able to make it aligned, even with the best ethics advice. They will take the perfect solution to alignment and use it with a value like "maximize user engagement with our ads (over the future lightcone of humanity)".

Google is the prime example of a tech company that values ethics, or it was in the recent past. I have much less faith in Amazon or Microsoft or Facebook or the US federal government or the Chinese government that they would even make gestures toward responsibility in AI.

I work for Microsoft, though not in AI/ML. My impression is that we do care deeply about using AI responsibly, but not necessarily about the kinds of alignment issues that people on LessWrong are most interested in.

Microsoft's leadership seems to be mostly concerned that AI will be biased in various ways, or will make mistakes when it's deployed in the real world. There are also privacy concerns around how data is being collected (though I suspect that's also an opportunistic way to attack Google and Facebook, since they get most of the revenue for personalized ads).

The LessWrong community seems to be more concerned that AI will be too good at achieving its objectives, and we'll realize when it's too late that those aren't the actual objectives we want (e.g., Paperclip Maximizer).

To me those seem like mostly opposite concerns. That's why I'm actually somewhat skeptical of your hope that ethical AI teams would push a solution for the alignment issue. The work might overlap in some ways, but I think the main goals are different.

Does that make sense?

I think this makes sense, but I disagree with it as a factual assessment.

In particular I think "will make mistakes" is actually an example of some combination of inner and outer alignment problems that are exactly the focus of LW-style alignment.

I also tend to think that the failure to make this connection is perhaps the biggest single problem in both ethical AI and AI alignment spaces, and I continue to be confused about why no one else seems to take this perspective.

Necroing.

"This perspective" being smuggling in LW alignment into corps through expanding the fear of the AI "making mistakes" to include our fears?

After reading some of this reddit thread I think I have a better picture of how people are reacting to these events. I will probably edit or follow up on this post to follow up.

My high level takeaway is:

people are afraid to engage in speech that will be interpreted as political, so are saying nothing.
nobody is actually making statements about my model of alignment deployment, possibly nobody is even thinking about it.

In the edit or possibly in a separate followup post I will try to present the model at a further disconnect from the specific events and actors involved, which I am only interested in as inputs to the implementation model anyway.

people are afraid to engage in speech that will be interpreted as political [...] nobody is actually making statements about my model of alignment deployment [...] try to present the model at a further disconnect from the specific events and actors involved

This seems pretty unfortunate insofar as some genuinely relevant real-world details might not survive the obfuscation of premature abstraction.

Example of such an empirical consideration (relevant to the "have some members that keep up with AI Safety research" point in your hopeful plan): how much overlap and cultural compatibility is there between AI-ethics-researchers-as-exemplified-by-Timnit-Gebru and AI-safety-researchers-as-exemplified-by-Paul-Christiano? (By all rights, there should be overlap and compatibility, because the skills you need to prevent your credit-score AI from being racist (with respect to whatever the correct technical reduction of racism turns out to be) should be a strict subset of the skills you need to prevent your AGI from destroying all value in the universe (with respect to whatever the correct technical reduction of value turns out to be).)

Have you tried asking people to comment privately?

I can't figure out why this is being downvoted. I found the model of how AI safety work is likely to actually ensure (or not) the development of safe AI to be helpful, and I thought this was a pretty good case that this firing is a worrying sign, even if it's not directly related to safety in particular.

I can’t figure out why this is being downvoted.

Because the events are not related to AI alignment, but are the focus of a current battle in the culture war.

Upcoming this comment because it helped me understand why nobody seems to be engaging with what I think the central point of my post is.

Why do you think Google is in the wrong here?

Hacker News discussion about why Margaret Mitchell was fired. Or see this NYT story:

Google confirmed that her employment had been terminated. “After conducting a review of this manager’s conduct, we confirmed that there were multiple violations of our code of conduct,” read a statement from the company.

The statement went on to claim that Dr. Mitchell had violated the company’s security policies by lifting confidential documents and private employee data from the Google network. The company said previously that Dr. Mitchell had tried to remove such files, the news site Axios reported last month.

See also this Twitter thread for some additional insights into this story. Given the politically sensitive nature of the topic, it may not be a great idea to discuss it much further on this platform, as that could further antagonize various camps interested in AI, among other potential negative consequences.

I appreciate the thread as context for a different perspective, but it seems to me that it loses track of verifiable facts partway through (around here), though I don't mean to say it's wrong after that.

I think in terms of implementation of frameworks around AI, it still seems very meaningful to me how influence and responsibility are handled. I don't think that a federal agency specifically would do a good job handling an alignment plan, but I also don't think Yann LeCun setting things up on his own without a dedicated team could handle it.

I would want to see a strong justification before deciding not to discuss something that is directly relevant to the purpose of the site.

Noted that a statement has been made. I don't find it convincing, and even if it did I don't think it changes the effect of the argument.

In particular, even if it was the case that both dismissals were completely justified, I think the chain of logic still holds.

I hope this makes the case at least somewhat that these events are important, even if you don’t care at all about the specific politics involved.

I would argue that the specific politics inherent in these events are exactly why I don't want to approach them. From the outside, the mix of corporate politics, reputation management, culture war (even the boring part), all of which belong in the giant near-opaque system that is Google, is a distraction from the underlying (indeed important) AI governance problems.

For that particular series of events, I already got all the governance-relevant information I needed from the paper that apparently made the dominoes fall. I don't want my attention to get caught in the whirlwind. It's too messy (and still is after months). It's too shiny. It's not tractable for me. It would be an opportunity cost. So I take a deep breath and avert my eyes.

Two comments on your model, both leading to the same conclusion that even if Google had an AI Ethics panel with good recommendations, teams that might produce AGI would not implement them:

Not being stopped

Value alignment

Google is the prime example of a tech company that values ethics, or it was in the recent past. I have much less faith in Amazon or Microsoft or Facebook or the US federal government or the Chinese government that they would even make gestures toward responsibility in AI.

Does that make sense?

I think this makes sense, but I disagree with it as a factual assessment.

In particular I think "will make mistakes" is actually an example of some combination of inner and outer alignment problems that are exactly the focus of LW-style alignment.

Necroing.

"This perspective" being smuggling in LW alignment into corps through expanding the fear of the AI "making mistakes" to include our fears?

After reading some of this reddit thread I think I have a better picture of how people are reacting to these events. I will probably edit or follow up on this post to follow up.

My high level takeaway is:

people are afraid to engage in speech that will be interpreted as political, so are saying nothing.
nobody is actually making statements about my model of alignment deployment, possibly nobody is even thinking about it.

people are afraid to engage in speech that will be interpreted as political [...] nobody is actually making statements about my model of alignment deployment [...] try to present the model at a further disconnect from the specific events and actors involved

This seems pretty unfortunate insofar as some genuinely relevant real-world details might not survive the obfuscation of premature abstraction.

Have you tried asking people to comment privately?

I can’t figure out why this is being downvoted.

Because the events are not related to AI alignment, but are the focus of a current battle in the culture war.

Upcoming this comment because it helped me understand why nobody seems to be engaging with what I think the central point of my post is.

Why do you think Google is in the wrong here?

Hacker News discussion about why Margaret Mitchell was fired. Or see this NYT story:

Google confirmed that her employment had been terminated. “After conducting a review of this manager’s conduct, we confirmed that there were multiple violations of our code of conduct,” read a statement from the company.

The statement went on to claim that Dr. Mitchell had violated the company’s security policies by lifting confidential documents and private employee data from the Google network. The company said previously that Dr. Mitchell had tried to remove such files, the news site Axios reported last month.

I would want to see a strong justification before deciding not to discuss something that is directly relevant to the purpose of the site.

Noted that a statement has been made. I don't find it convincing, and even if it did I don't think it changes the effect of the argument.

In particular, even if it was the case that both dismissals were completely justified, I think the chain of logic still holds.

I hope this makes the case at least somewhat that these events are important, even if you don’t care at all about the specific politics involved.

12

Google’s Ethical AI team and AI Safety

12

Background on the events

Models of AI Safety

Putting two and two together

What else could happen

12

Not being stopped

Value alignment

12

Not being stopped

Value alignment