Ebenezer Dukakis - LessWrong

What mistakes has the AI safety movement made?

IMO it's important to keep in mind that the sample size driving these conclusions is generally pretty small. Every statistician and machine learning engineer knows that a dataset with 2 data points is essentially worthless, yet people are surprisingly willing to draw a trend line through 2 data points.

When you're dealing with small sample sizes, in my view it is better to take more of a "case study" approach than an "outside view" approach. 2 data points isn't really enough for statistical inference. However, if the 2 data points all illuminate some underlying dynamic which isn't likely to change, then the argument becomes more compelling. Basically when sample sizes are small, you need to do more inside-view theorizing to make up for it. And then you need to be careful about extrapolating to new situations, to ensure that the inside-view properties you identified actually hold in those new situations.

robo's Shortform

Ebenezer Dukakis4d85

If LW takes this route, it should be cognizant of the usual challenges of getting involved in politics. I think there's a very good chance of evaporative cooling, where people trying to see AI clearly gradually leave, and are replaced by activists. The current reaction to OpenAI events is already seeming fairly tribal IMO.

Stephen Fowler's Shortform

Ebenezer Dukakis6d20

So basically, I think it is a bad idea and you think we can't do it anyway. In that case let's stop calling for it, and call for something more compassionate and realistic like a public apology.

I'll bet an apology would be a more effective way to pressure OpenAI to clean up its act anyways. Which is a better headline -- "OpenAI cofounder apologizes for their role in creating OpenAI", or some sort of internal EA movement drama? If we can generate a steady stream of negative headlines about OpenAI, there's a chance that Sam is declared too much of a PR and regulatory liability. I don't think it's a particularly good plan, but I haven't heard a better one.

Stephen Fowler's Shortform

Ebenezer Dukakis6d10

It is not that people people's decision-making skill is optimized such that you can consistently reverse someone's opinion to get something that accurately tracks reality. If that was the case then they are implicitly tracking reality very well already. Reversed stupidity is not intelligence.

Sure, I think this helps tease out the moral valence point I was trying to make. "Don't allow them near" implies their advice is actively harmful, which in turn suggests that reversing it could be a good idea. But as you say, this is implausible. A more plausible statement is that their advice is basically noise -- you shouldn't pay too much attention to it. I expect OP would've said something like that if they were focused on descriptive accuracy rather than scapegoating.

Another way to illuminate the moral dimension of this conversation: If we're talking about poor decision-making, perhaps MIRI and FHI should also be discussed? They did a lot to create interest in AGI, and MIRI failed to create good alignment researchers by its own lights. Now after doing advocacy off and on for years, and creating this situation, they're pivoting to 100% advocacy.

Could MIRI be made up of good people who are "great at technical stuff", yet apt to shoot themselves in the foot when it comes to communicating with the public? It's hard for me to imagine an upvoted post on this forum saying "MIRI shouldn't be allowed anywhere near AI safety communications".

Ebenezer Dukakis's Shortform

Ebenezer Dukakis6d40

Regarding the situation at OpenAI, I think it's important to keep a few historical facts in mind:

The AI alignment community has long stated that an ideal FAI project would have a lead over competing projects. See e.g. this post:

Requisite resource levels: The project must have adequate resources to compete at the frontier of AGI development, including whatever mix of computational resources, intellectual labor, and closed insights are required to produce a 1+ year lead over less cautious competing projects.

The scaling hypothesis wasn't obviously true around the time OpenAI was founded. At that time, it was assumed that regulation was ineffectual because algorithms can't be regulated. It's only now, when GPUs are looking like the bottleneck, that the regulation strategy seems viable.

What happened with OpenAI? One story is something like:

AI safety advocates attracted a lot of attention in Silicon Valley with a particular story about AI dangers and what needed to be done.
Part of this story involved an FAI project with a lead over competing projects. But the story didn't come with easy-to-evaluate criteria for whether a leading project counted as a good "FAI project" or an bad "UFAI project". Thinking about AI alignment is epistemically cursed; people who think about the topic independently rarely have similar models.
Deepmind was originally the consensus "FAI project", but Elon Musk started OpenAI because Larry Page has e/acc beliefs.
OpenAI hired employees with a distribution of beliefs about AI alignment difficulty, some of whom may be motivated primarily by greed or power-seeking.
At a certain point, that distribution got "truncated" with the formation of Anthropic.

Presumably at this point, every major project thinks it's best if they win, due to self-serving biases.

Some possible lessons:

Do more message red-teaming. If an organization like AI Lab Watch had been founded 10+ years ago, and was baked into the AI safety messaging along with "FAI project needs a clear lead", then we could've spent the past 10 years getting consensus on how to anoint one or a just a few "FAI projects". And the campaign for AI Pause could instead be a campaign to "pause all AGI projects except the anointed FAI project". So -- when we look back in 10 years on the current messaging, what mistakes will seem obvious in hindsight? And if this situation is partially a result of MIRI's messaging in the past, perhaps we should ask hard questions about their current pivot towards messaging? (Note: I could be accused of grinding my personal axe here, because I'm rather dissatisfied with current AI Pause messaging.)
Assume AI acts like magnet for greedy power-seekers. Make decisions accordingly.

Stephen Fowler's Shortform

Ebenezer Dukakis6d10

Enforcing social norms to prevent scapegoating also destroys information that is valuable for accurate credit assignment and causally modelling reality.

I read the Ben Hoffman post you linked. I'm not finding it very clear, but the gist seems to be something like: Statements about others often import some sort of good/bad moral valence; trying to avoid this valence can decrease the accuracy of your statements.

If OP was optimizing purely for descriptive accuracy, disregarding everyone's feelings, that would be one thing. But the discussion of "repercussions" before there's been an investigation goes into pure-scapegoating territory if you ask me.

I do not read any mention of a 'moral failing' in that comment.

If OP wants to clarify that he doesn't think there was a moral failing, I expect that to be helpful for a post-mortem. I expect some other people besides me also saw that subtext, even if it's not explicit.

You can be empathetic to people having flawed decision making and care about them, while also wanting to keep them away from certain decision-making positions.

"Keep people away" sounds like moral talk to me. If you think someone's decisionmaking is actively bad, i.e. you'd better off reversing any advice from them, then maybe you should keep them around so you can do that! But more realistically, someone who's fucked up in a big way will probably have learned from that, and functional cultures don't throw away hard-won knowledge.

Imagine a world where AI is just an inherently treacherous domain, and we throw out the leadership whenever they make a mistake. So we get a continuous churn of inexperienced leaders in an inherently treacherous domain -- doesn't sound like a recipe for success!

Oh, interesting. Who exactly do you think influential people like Holden Karnofsky and Paul Christiano are accountable to, exactly? This "detailed investigation" you speak of, and this notion of a "blameless culture", makes a lot of sense when you are the head of an organization and you are conducting an investigation as to the systematic mistakes made by people who work for you, and who you are responsible for. I don't think this situation is similar enough that you can use these intuitions blandly without thinking through the actual causal factors involved in this situation.

I agree that changes things. I'd be much more sympathetic to the OP if they were demanding an investigation or an apology.

Tamsin Leake's Shortform

Ebenezer Dukakis6d10

In the spirit of trying to understand what actually went wrong here -- IIRC, OpenAI didn't expect ChatGPT to blow up the way it did. Seems like they were playing a strategy of "release cool demos" as opposed to "create a giant competitive market".

Stephen Fowler's Shortform

Ebenezer Dukakis6d142

I downvoted this comment because it felt uncomfortably scapegoat-y to me. If you think the OpenAI grant was a big mistake, it's important to have a detailed investigation of what went wrong, and that sort of detailed investigation is most likely to succeed if you have cooperation from people who are involved. I've been reading a fair amount about what it takes to instill a culture of safety in an organization, and nothing I've seen suggests that scapegoating is a good approach.

Writing a postmortem is not punishment—it is a learning opportunity for the entire company.

...

Blameless postmortems are a tenet of SRE culture. For a postmortem to be truly blameless, it must focus on identifying the contributing causes of the incident without indicting any individual or team for bad or inappropriate behavior. A blamelessly written postmortem assumes that everyone involved in an incident had good intentions and did the right thing with the information they had. If a culture of finger pointing and shaming individuals or teams for doing the "wrong" thing prevails, people will not bring issues to light for fear of punishment.

Blameless culture originated in the healthcare and avionics industries where mistakes can be fatal. These industries nurture an environment where every "mistake" is seen as an opportunity to strengthen the system. When postmortems shift from allocating blame to investigating the systematic reasons why an individual or team had incomplete or incorrect information, effective prevention plans can be put in place. You can’t "fix" people, but you can fix systems and processes to better support people making the right choices when designing and maintaining complex systems.

...

Removing blame from a postmortem gives people the confidence to escalate issues without fear. It is also important not to stigmatize frequent production of postmortems by a person or team. An atmosphere of blame risks creating a culture in which incidents and issues are swept under the rug, leading to greater risk for the organization [Boy13].

...

We can say with confidence that thanks to our continuous investment in cultivating a postmortem culture, Google weathers fewer outages and fosters a better user experience.

https://sre.google/sre-book/postmortem-culture/

If you start with the assumption that there was a moral failing on the part of the grantmakers, and you are wrong, there's a good chance you'll never learn that.

Stephen Fowler's Shortform

Ebenezer Dukakis6d10

endorsing getting into bed with companies on-track to make billions of dollars profiting from risking the extinction of humanity in order to nudge them a bit

Wasn't OpenAI a nonprofit at the time?

Stephen Fowler's Shortform

Ebenezer Dukakis6d30

adversarial dynamics present in such a strategy

Are you just referring to the profit incentive conflicting with the need for safety, or something else?

I'm struggling to see how we get aligned AI without "inside game at labs" in some way, shape, or form.

My sense is that evaporative cooling is the biggest thing which went wrong at OpenAI. So I feel OK about e.g. Anthropic if it's not showing signs of evaporative cooling.

LESSWRONG
LW

Posts

Wiki Contributions

Comments