[The original question was "Is OpenAI increasing the existential risks related to AI?" I changed it to the current one following a discussion with Rohin in the comments. It clarifies that my question asks about the consequences of OpenAI's work will assuming positive and aligned intentions.]

This is a question I've been asked recently by friends interested in AI Safety and EA. Usually this question comes from discussions around GPT-3 and the tendency of OpenAI to invest a lot in capabilities research.

[Following this answer by Vaniver, I propose for a baseline/counterfactual the world where OpenAI doesn't exists but the researchers there still do.]

Yet I haven't seen it discussed here. Is it a debate we failed to have, or has there already been some discussion around it? I found a post from 3 years ago, but I think the situation probably changed in the meantime.

A couple of arguments for and against to prompt your thinking:

  • OpenAI is increasing the existential risks related to AI because:
    • They are doing far more capability research than safety research;
    • They are pushing the state of the art of capability research;
    • Their results will motivate many people to go work on AI capabilities, whether out of wonder or out of fear of unemployment.
  • OpenAI is not increasing the existential risks related to AI because:
    • They have a top-notch safety team;
    • They restrict the access to their models, by either not releasing them outright (GPT-2) or bottlenecking access through their API (GPT-3);
    • Their results are showing the potential dangers of AI, and pushing many people to go work on AI safety.Is OpenAI increasing the existential risks related to AI?
New Answer
New Comment

6 Answers sorted by

Vaniver

Ω16430

[Speaking solely for myself in this comment; I know some people at OpenAI, but don't have much in the way of special info. I also previously worked at MIRI, but am not currently.]

I think "increasing" requires some baseline, and I don't think it's obvious what baseline to pick here.

For example, consider instead the question "is MIRI decreasing the existential risks related to AI?". Well, are we comparing to the world where everyone currently employed at MIRI vanishes? Or are we comparing to the world where MIRI as an organization implodes, but the employees are still around, and find jobs somewhere else? Or are we comparing to the world where MIRI as an organization gets absorbed by some other entity? Or are we comparing to the world where MIRI still exists, the same employees still work there, but the mission is somehow changed to be the null mission?

Or perhaps we're interested in the effects on the margins--if MIRI had more dollars to spend, or less dollars, how would the existential risks change? Even the answers to those last two questions could easily be quite different--perhaps firing any current MIRI employee would make things worse, but there are no additional people that could be hired by MIRI to make things better. [Prove me wrong!]

---

With that preamble out of the way, I think there are three main obstacles to discussing this in public, a la Benquo's earlier post.

The main one is something like "appeals to consequences." Talking in public has two main functions: coordinating and information-processing, and it's quite difficult to separate the two functions. [See this post and the related posts at the bottom.] Suppose I think OpenAI makes humanity less safe, and I want humanity to be more safe; I might try to figure out which strategy will be most persuasive (while still correcting me if I'm the mistaken one!) and pursue that strategy, instead of employing a strategy that more quickly 'settles the question' at the cost of making it harder to shift OpenAI's beliefs. More generally, the people with the most information will be people closest to OpenAI, which probably makes them more careful about what they will or won't say. There also seem to be significant asymmetries here, as it might be very easy to say "here are three OpenAI researchers I think are making existential risk lower" but very difficult to say "here are three OpenAI researchers I think are making existential risk higher." [Setting aside the social costs, there's their personal safety to consider.]

The second one is something like "prediction is hard." One of my favorite math stories is the history of the Markov chain; in the version I heard, Markov's rival said a thing, Markov thought to himself "that's not true!" and then formalized the counterexample in a way that dramatically improved that field. Supposing Benquo's story of how OpenAI came about is true, and OpenAI will succeed at making beneficial AI, and (counterfactually) DeepMind wouldn't have succeeded. In this hypothetical world, then it would be the case that while the direct effect of DeepMind on existential AI risk would have been negative, the indirect effect would be positive (as otherwise OpenAI, which succeeded, wouldn't have existed). While we often think we have a good sense of the direct effect of things, in complicated systems it becomes very non-obvious what the total effects are.

The third one is something like "heterogeneity." Rather than passing a judgment on the org as a whole, it would make more sense to make my judgments more narrow; "widespread access to AI seems like it makes things worse instead of better," for example, which OpenAI seems to already have shifted their views on, instead focusing on widespread benefits instead of widespread access.

---

With those obstacles out of the way, here's some limited thoughts:

I think OpenAI has changed for the better in several important ways over time; for example, the 'Open' part of the name is not really appropriate anymore, but this seems good instead of bad on my models of how to avoid existential risks from AI. I think their fraction of technical staff devoted to reasoning about and mitigating risks is higher than DeepMind's, although lower than MIRI's (tho MIRI's fraction is a very high bar); I don't have a good sense whether that fraction is high enough.

I think the main effects of OpenAI are the impacts they have on the people they hire (and the impacts they don't have on the people they don't hire). There are three main effects to consider here: resources, direction-shifting, and osmosis.

On resources, imagine that there's Dr. Light, whose research interests point in a positive direction, and Dr. Wily, whose research interests point in a negative direction, and the more money you give to Dr. Light the better things get, and the more money you give to Dr. Wily, the worse things get. [But actually what we care about is counterfactuals; if you don't give Dr. Wily access to any of your compute, he might go elsewhere and get similar amounts of compute, or possibly even more.]

On direction-shifting, imagine someone has a good idea for how to make machine learning better, and they don't really care what the underlying problem is. You might be able to dramatically change their impact by pointing them at cancer-detection instead of missile guidance, for example. Similarly, they might have a default preference for releasing models, but not actually care much if management says the release should be delayed.

On osmosis, imagine there are lots of machine learning researchers who are mostly focused on technical problems, and mostly get their 'political' opinions for social reasons instead of philosophical reasons. Then the main determinant of whether they think that, say, the benefits of AI should be dispersed or concentrated might be whether they hang out at lunch with people who think the former or the latter.

I don't have a great sense of how those factors aggregate into an overall sense of "OpenAI: increasing or decreasing risks?", but I think people who take safety seriously should consider working at OpenAI, especially on teams clearly related to decreasing existential risks. [I think people who don't take safety seriously should consider taking safety seriously.]

Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?

Thanks a lot for this great answer!

First, I should have written it, but my baseline (or my counterfactual) is a world where OpenAI doesn't exists but the people working there still exists. This might be an improvement if you think that pushing the scaling hypothesis is dangerous and that most of the safety team would find money to keep working, or an issue if you think someone else, probably less aligned, would have pushed the scaling hypothesis, and that the structure given by OpenAI to its safety team is really special and important.

As for your obst... (read more)

2Vaniver
But part of the problem here is that the question "what's the impact of our stance on OpenAI on existential risks?" is potentially very different from "is OpenAI's current direction increasing or decreasing existential risks?", and as people outside of OpenAI have much more control over their stance than they do over OpenAI's current direction, the first question is much more actionable. And so we run into the standard question substitution problems, where we might be pretending to talk about a probabilistic assessment of an org's impact while actually targeting the question of "how do I think people should relate to OpenAI?". [That said, I see the desire to have clear discussion of the current direction, and that's why I wrote as much as I did, but I think it has prerequisites that aren't quite achieved yet.]

I would reemphasize that the "does OpenAI increase risks" is a counterfactual question. That means we need to be clearer about what we are asking as a matter of predicting what the counterfactuals are, and consider strategy options for going forward. This is a major set of questions, and increasing or decreasing risks as a single metric isn't enough to capture much of interest.

For a taste of what we'd want to consider, what about the following:

Are we asking OpenAI to pick a different, "safer" strategy?

Perhaps they should focu... (read more)

7Vaniver
Also apparently Megaman is less popular than I thought so I added links to the names.
2Raemon
Fwiw I recently listened to the excellent song 'The Good Doctor' which has me quite delighted to get random megaman references.
2Davidmanheim
Oh. Right. I should have gotten the reference, but wasn't thinking about it.
1adamShimi
Just so you know, I got the reference. ;)

Ben Pace

Ω13330

I think it's fairly self-evident that you should have exceedingly high standards for projects intending to build AGI (OpenAI, DeepMind, others). It's really hard to reduce existential risk from AI, and I think much thought around this has been naive and misguided. 

(Two examples of this outside of OpenAI include: senior AI researchers talking about military use of AI instead of misalignment, and senior AI researchers saying responding to the problems of specification gaming by saying "objectives can be changed quickly when issues surface" and "existential threats to humanity have to be explicitly designed as such".)

An obvious reason to think OpenAI's impact will be net negative is that they seem to be trying to reach AGI as fast as possible, and trying a route different from DeepMind and other competitors, so are in some world shortening the timeline until AI. (I'm aware that there are arguments about why a shorter timeline is better, but I'm not sold on them right now.)

There are also more detailed conversations, about alignment, what the core of the problem actually is, and other strategic questions. I expect (and take from occasional things I hear) I have substantial disagreements with OpenAI decision-makers, which I think alone is sufficient reason for me to feel doomy about humanity's prospects.

That said, I'm quite impressed with their actions around release practises and also their work in becoming a profit-capped entity. I felt like they were a live player with these acts and were clearly acting against their short-term self-interest in favour of humanity's broader good, with some relatively sane models around these specific aspects of what's important. Those were both substantial updates for me, and make me feel pretty cooperative with them.

And of course I'm very happy indeed about a bunch of the safety work they do and support. The org give lots of support and engineers to people like Paul Christiano, Chris Olah, etc that I think is better than those people probably would get counterfactually, and I'm very grateful that the organisation provides this.

Overall I don't feel my opinion is very robust, and could easily change. Here's some example of things that I think could substantially change my opinion:

  • How senior decision-making happens at OpenAI
  • What technical models of AGI senior researchers at OpenAI have
  • Broader trends that would have happened to the field of AI (and the field of AI alignment) in the counterfactual world where they were not founded

Thanks for your answer! Trying to make your examples of what might change your opinion substantially more concrete, I got these:

  • Does senior decision-making at OpenAI always consider safety issues before greenlighting new capability research?
  • Do senior researchers at OpenAI believe that their current research directly leads to AGI in the short term?
  • Would the Scaling Hypothesis (and thus GPT-N) have been vindicated as soon in a world without OpenAI?

Do you agree with these? Do you have other ideas of concrete questions?

[-]Ben PaceΩ3100

The first one feels a bit too optimistic. It’s something more like: Are they able to be direct in their disagreement with one another? What level of internal politicking is there? How much ability do some of the leadership have to make unilateral decisions? Etc.

The second one is the one more about alignment, takeoff dynamics, and timelines. All the details, like the likelihood of Mesa optimisers. What are their thoughts on this, and how much do they think about it?

For the third, that one’s good. Also things about how differently things would’ve gone at DeepMind, and also how good/bad the world would be if Musk hadn’t shifted The Overton window so much (which I think is counterfactually linked up with OpenAI existing, you get both or neither).

Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?

Ben Pace

Ω6140

See all the discussion under the OpenAI tag. Don't forget SSC's post on it either.

I mostly think we had a good discussion about it when it launched (primarily due to Ben Hoffman and Scott Alexander deliberately creating the discussion).

Do you think you (or someone else) could summarize this discussion here? I have to admit that the ideas being spread out between multiple posts doesn't help.

2Ben Pace
I don’t plan to.  I’d strong upvote if someone else did a nice job of summarising the discussion, perhaps inspired by how I distilled the discusssion around what failure looks like.  (To be clear I think my distillation of the comment section was much better and more useful than the distillation of the post itself.)

Veedrac

80

Putting aside the general question, is OpenAI good for the world, I want to consider the smaller question, how do OpenAI's demonstrations of scaled up versions of current models affect AI safety?

I think there's a much easier answer to this. Any risks we face from scaling up models we already have with funding much less than tens of billions of dollars amounts to unexploded uranium sitting around, that we're refining in microgram quantities. The absolute worst that can happen with connectionist architectures is that we solve all the hard problems without having done the trivial scaled-up variants, and therefore scaling up is trivial, and so that final step to superhuman AI also becomes trivial.

Even if scaling up ahead of time results in slightly faster progress towards AGI, it seems that it at least makes it easier to see what's coming, as incremental improvements require research and thought, not just trivial quantities of dollars.

Going back to the general question, one good I see OpenAI producing is the normalization of the conversation around AI safety. It is important for authority figures to be talking about long-term outcomes, and in order to be an authority figure, you need a shiny demo. It's not obvious how a company could be more authoritative than OpenAI while being less novel.

Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?

1Veedrac
To the question, how do OpenAI's demonstrations of scaled up versions of current models affect AI safety?, I don't think much changes? It does seem that OpenAI is aiming to go beyond simple scaling, which seems much riskier. As to the general question, certainly that news makes me more worried about the state of things. I know way too little about the decision to be more concrete than that.

SoerenMind

Ω260

OpenAI's work speeds up progress, but in a way that's likely smooth progress later on. If you spend as much compute as possible now, you reduce potential surprises in the future.

But what if they reach AGI during their speed up? The smoothing at a later time assumes that we'll end up with diminishing returns before AGI, which is not what happens for the moment.

1Veedrac
If OpenAI changed direction tomorrow, how long would that slow the progress to larger models? I can't see it lasting; the field of AI is already incessantly moving towards scale, and big models are better. Even in a counterfactual where OpenAI never started scaling models, is this really something that no other company can gradient descent on? Models were getting bigger without OpenAI, and the hardware to do it at scale is getting cheaper.

Well, if we take this comment by gwern at face value, it clearly seems that no one with the actual resources has any interest in doing it for now. Based on these premises, scaling towards incredibly larger models would probably not have happened for years.

So I do think that if you believe this is wrong, you should be able to show where gwern's comment is wrong.

2Veedrac
Gwern's claim is that these other institutions won't scale up as a consequence of believing the scaling hypothesis; that is, they won't bet on it as a path to AGI, and thus won't spend this money on abstract of philosophical grounds. My point is that this only matters on short-term scales. None of these companies are blind to the obvious conclusion that bigger models are better. The difference between a hundred-trillion dollar payout and a hundred-million dollar payout is philosophical when you're talking about justifying <$5m investments. NVIDIA trained an 8.3 B parameter model as practically an afterthought. I get the impression Microsoft's 17 B parameter Turing-NLG was basically trained to test DeepSpeed. As markets open up to exploit the power of these larger models, the money spent on model scaling is going to continue to rise. These companies aren't competing with OpenAI. They've built these incredibly powerful systems incidentally, because it's the obvious way to do better than everyone else. It's a tool they use for market competitiveness, not as a fundamental insight into the nature of intelligence. OpenAI's key differentiator is only that they view scale as integral and explanatory, rather than an incidental nuisance. With this insight, OpenAI can make moonshots that the others can't: build a huge model, scale it up, and throw money at it. Without this understanding, others will only get there piecewise, scaling up one paper at a time. The delta between the two is at best a handful of years.

The scaling hypothesis implies that it'll happen eventually, yes: but the details matter a lot. One way to think of it is Eliezer's quip: the IQ necessary to destroy the world drops by 1 point per year. Similarly, to do scaling or bitter-lesson-style research, you need resources * fanaticism < a constant. This constant seems to be very small, which is why compute had to drop all the way to ~$1k before any researchers worldwide were fanatical enough to bother trying CNNs and create AlexNet. Countless entities, and companies, could have used this 'obvious way to do better than everyone else, for market competitiveness' for years - or decades - before hand. But they didn't.

For the question of who gets there first, 'a handful of years' is decisive. So this is pretty important if you want to think about the current plausible AGI trajectories, which for many people (even excluding individuals like Moravec, or Shane Legg who has projected out to ~2028 for a long time now), have shrunk rapidly to timescales on which 'a handful of years' represents a large fraction of the outstanding timeline!


Incidentally, it has now been 86 days since the GPT-3 paper was uploaded, or a quarter of a year. Excluding GShard (which as a sparse model is not at all comparable parameter-wise), as far as I know no one has announced any new (dense) models which are even as large as Turing-NLG - much less larger than GPT-3.

4ESRogs
A fairly minor point, but I don't quite follow the formula / analogy. Don't resources and fanaticism help you do the scaling research? So shouldn't it be a > sign rather than <, and shouldn't we say that the constant is large rather than small?
1Veedrac
I agree this makes a large fractional change to some AI timelines, and has significant impacts on questions like ownership. But when considering very short timescales, while I can see OpenAI halting their work would change ownership, presumably to some worse steward, I don't see the gap being large enough to materially affect alignment research. That is, it's better OpenAI gets it in 2024 than someone else gets it in 2026. It's hard to be fanatical when you don't have results. Nowadays AI is so successful it's hard to imagine this being a significant impediment. I wouldn't dismiss GShard altogether. The parameter counts aren't equal, but MoE(2048E, 60L) is still a beast, and it opens up room for more scaling than a standard model.
1SoerenMind
I agree, but I think it's unlikely OpenAI will be the first to build AGI. (Except maybe if it turns out AGI isn't economically viable).

Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?

3SoerenMind
No. Amodei led the GPT-3 project, he's clearly not opposed to scaling things. Idk why they're leaving but since they're all starting a new thing together, I presume that's the reason.

Evan R. Murphy

10

I think that OpenAI is certainly reducing massive misuse risks of AI. By existing, they have made it a significant chance that a capped-profit entity will be the first to develop transformative AI. Without them, it's much more likely that a 100% for-profit company would be the first, and a for-profit company is more likely to misuse the power of being the first to have this power than a capped-profit entity.

As for misaligned AI, I'm not sure because while I think they're unlikely to develop a super-powerful misaligned AI, as the OP says they are accelerating development of AI capabilities for everyone and spreading awareness of these.

24 comments, sorted by Click to highlight new comments since:

Strong downvote because the question feels a bit targeted / leading. May be OpenAI is decreasing AI xrisk. May be other organizations are also engaged in similar behaviors that increase AI xrisk. I think a better approach would be to break things down into:

1) What factors affect AI xrisk? (Or, since that's pretty broad, have specific questions like "Does X affect AI xrisk?") (E.g. "How does pushing the state of the art of capability research affect AI xrisk?")

2) Have specific questions about OpenAI actions / traits that can be relatively easily grounded. (E.g. "How would you rate the quality of the OpenAI safety team?")

It's easy enough for people to put #1 and #2 together, but it has the added benefit of people using answering #1 without targeting any specific company. Plus answers from #1 apply to other organizations.

Thanks for explaining your downvote! I agree that the question is targeted. I tried to also give arguments against this idea of OpenAI increasing xrisks, but it probably still reads as biased.

That being said, I disagree about not targetting OpenAI. Everything that I've seen discussed by friends is centered completely about OpenAI. I think it would be great to have an answer showing that OpenAI is only the most visible group acting that way, but that others follow the same template. It's still true that the question is raised way more about OpenAI than any other research group.

As I read this question, it translated as:

"Is everyone at OpenAI a moral monster?"

I would much prefer this question if it instead translated as:

"Are OpenAI's efforts counterproductive?"

The current version of the question seems needlessly controversial / aggressive. (This is similar to Alexei's point, except I haven't downvoted because I think the question could easily be rephrased to be fine, even if it specifically names OpenAI.)

FWIW, I thought the original question text was slightly better, since I didn't read it as aggressive, and it didn't needlessly explicitly assume that everyone at OpenAI is avoiding increasing existential risk. Furthermore, it seems clear to me that an organisation can be increasing existential risk without everybody at that organisation being a moral monster, since most organisations are heterogeneous.

In general, I think one should be able to ask questions of the form "is actor X causing harm Y" on LessWrong, and furthermore that people should not thereby assume that the questioner thinks that actor X is evil. I also think that some people are moral monsters and/or evil, and the way to figure out whether or not that's true is to ask questions of this form.

In general, I think one should be able to ask questions of the form "is actor X causing harm Y" on LessWrong, and furthermore that people should not thereby assume that the questioner thinks that actor X is evil.

I can believe that should be the case (not sure). I do not think it is actually the case. Is this the battle you choose to fight?

I also think that some people are moral monsters and/or evil, and the way to figure out whether or not that's true is to ask questions of this form.

If you do in fact want to know this answer, I feel more okay about asking the question (though I have bigger disagreements upstream). I don't think OP was particularly interested in this answer.

I see what you mean. Although my question is definitely pointed at OpenAI, I don't want to accuse them of anything. One thing I wanted to write in the question but that I forgot was that the question asks about the consequences of OpenAI's work, not the intentions. So there might be negative consequences that were not intentional (or no negative consequences of course).

Is "Are the consequences of OpenAI's work positive or negative for xrisks?" better?

"Will OpenAI's work unintentionally increase existential risks related to AI?"

"Will OpenAI's strategy succeed at reducing existential risks related to AI?"

The point is to build in a presumption of good intentions, unless you explicitly want to challenge that presumption (which I expect you do not want to do).

David's suggestion also seems good to me, though is asking a slightly different question and is a bit wordier.

Done! I used your first proposal, as it is more in line with my original question.

[-]habrykaΩ390

I much prefer Rohin's alternative version of: "Are OpenAI's efforts to reduce existential risk counterproductive?". The current version does feel like it screens off substantial portions of the potential risk.

Example? I legitimately struggle to imagine something covered by "Are OpenAI's efforts to reduce existential risk counterproductive?" but not by "Will OpenAI's work unintentionally increase existential risks related to AI?"; if anything it seems the latter covers more than the former.

One route would be if some of them thought that existential risks weren't that much worse than major global catastrophes. 

If I think that likely 10% of everyone will die because of the wrong people getting control of the killer AI drones ("slaughterbots"), and it's important that we get to AI as quickly as possible, then we might move it forward as quickly as possible because we want to be in control, at the expense of some kinds of unlikely alignment problems. This person accepts a very small increase in the chance of existential risk via indirect AI issues at the price of a substantial decrease in the chance of 10% of humanity being wiped out via bad direct use of the AI. This would be intentionally be increasing x-risk in expectation, and they would agree.

You might correctly point out that Paul Christiano and Chris Olah don't think like this, but I don't really know who is involved in leadership at OpenAI, perhaps "safe" AI to some of them means "non-military". So this is a case that the new title rules out.

Yeah, that's a good example, thanks.

(I do think it is unlikely.)

I think this is a worse question now? Like, I expect OpenAI leadership explicitly thinks of themselves as increasing x-risk a bit by choosing to attempt to speed up progress to AGI. 

On net they expect it‘s probably the right call, but they also probably would say “Yes, our actions are intentionally increasing the chances of x-risk in some worlds, but on net we think it’s improving things”. And then, supposing they’re wrong, and those worlds are the actual world, then they’re intentionally increasing x-risk. And now the question tells me to ignore that possibility.

The initial question made no discussion of intention, seemed better to me.

Like, I expect OpenAI leadership explicitly thinks of themselves as increasing x-risk a bit by choosing to attempt to speed up progress to AGI.

Do you think that they think they are increasing x-risk in expectation (where the expectation is according to their beliefs)? I'd find that extremely surprising (unless their reasoning is something like "yes, we raise it from 1 in a trillion to 2 in a trillion, this doesn't matter").

See my reply downthread, responding to where you asked Oli for an example.

Hum, my perspective is that in the example that you describe, OpenAI isn't intentionally increasing the risks, in that they think it improves things over all. My line at "intentionally increasing xrisks" would be to literally decide to act while thinking/knowing that your action are making things worse in general for xrisks, which doesn't sound like your example.

I'd focus even more, (per my comment to Vanniver's response,) and ask "What parts of OpenAI are most and least valuable, and how do these relate to their strategy - and what strategy is best?"

[-][anonymous]Ω260

Some OpenAI people are on LW. It'd be interesting to hear their thoughts as well.

Two general things which have made me less optimistic about OpenAI are that:

  1. They recently spun-out a capped-profit company, which seems like the end goal is monetizing some of their recent advancements. The page linked in the previous sentence also has some stuff about safety and about how none of their day-to-day work is changing, but it doesn't seem that encouraging.

  2. They've recently partnered up with Microsoft, presumably for product integration. This seems like it positions them as less of a neutral entity, especially as Alphabet owns DeepMind.

[-]VaniverΩ16320
They recently spun-out a capped-profit company, which seems like the end goal is monetizing some of their recent advancements. The page linked in the previous sentence also has some stuff about safety and about how none of their day-to-day work is changing, but it doesn't seem that encouraging.

I found this moderately encouraging instead of discouraging. So far I think OpenAI is 2 for 2 on managing organizational transitions in ways that seem likely to not compromise safety very much (or even improve safety) while expanding their access to resources; if you think the story of building AGI looks more like assembling a coalition that's able to deploy massive resources to solve the problem than a flash of insight in a basement, then the ability to manage those transitions becomes a core part of the overall safety story.

[-][anonymous]Ω6160

This makes sense to me, given the situation you describe.

That's an interesting point. Why do you think that the new organizational transition is not compromising safety? (I have no formed opinion on this, but it seems that adding economic incentives is dangerous by default)

[-]VaniverΩ4120

I agree that adding economic incentives is dangerous by default, but think their safeguards are basically adequate to overcome that incentive pressure. At the time I spent an hour trying to come up with improvements to the structure, and ended up not thinking of anything. Also remember that this sort of change, even if it isn't a direct improvement, can be an indirect improvement by cutting off unpleasant possibilities; for example, before the move to the LP, there was some risk OpenAI would become a regular for-profit, and the LP move dramatically lowered that risk.

I also think for most of the things I'm concerned about, psychological pressure to think the thing isn't dangerous is more important; like, I don't think we're in the cigarette case where it's mostly other people who get cancer while the company profits; I think we're in the case where either the bomb ignites the atmosphere or it doesn't, and even in wartime the evidence was that people would abandon plans that posed a serious chance of destroying humanity.

Note also that economic incentives quite possibly push away from AGI towards providing narrow services (see Drexler's various arguments that AGI isn't economically useful, and so people won't make it by default). If you are more worried about companies that want to build AGIs and then ask it what to do than you are about companies that want to build AIs to accomplish specific tasks, increased short-term profit motive makes OpenAI more likely to move in the second direction. [I think this consideration is pretty weak but worth thinking about.]

So if I understand your main point, you argue that OpenAI LP incentivized new investments without endangering the safety, thanks to the capped returns. And that this tradeoff looks like one of the best possible, compared to becoming a for-profit or getting bought by a big for-profit company. Is that right?

I also think for most of the things I'm concerned about, psychological pressure to think the thing isn't dangerous is more important; like, I don't think we're in the cigarette case where it's mostly other people who get cancer while the company profits; I think we're in the case where either the bomb ignites the atmosphere or it doesn't, and even in wartime the evidence was that people would abandon plans that posed a serious chance of destroying humanity.

I agree with you that we're in the second case, but that doesn't necessarily means that there's a fire alarm. And economic incentives might push you to go slightly further, where it looks like everything is still okay, but we reached transformative AI in a terrible way. [I don't think it is actually the case for OpenAI right now, just answering to your point.]

Note also that economic incentives quite possibly push away from AGI towards providing narrow services (see Drexler's various arguments that AGI isn't economically useful, and so people won't make it by default). If you are more worried about companies that want to build AGIs and then ask it what to do than you are about companies that want to build AIs to accomplish specific tasks, increased short-term profit motive makes OpenAI more likely to move in the second direction

Good point, I need to think more about that. A counterargument that springs to mind is that AGI research might push forward other kinds of AI, and thus bring transformative AI sooner even if it isn't an AGI.

[-]VaniverΩ260
thanks to the capped returns

Out of the various mechanisms, I think the capped returns are relatively low ranking; probably the top on my list is the nonprofit board having control over decision-making (and implicitly the nonprofit board's membership not being determined by investors, as would happen in a normal company).