ryan_greenblatt

I work at Redwood Research.

Wiki Contributions

Comments

Sorted by

I believe the approach is to forecast questions that resolve in the future and allow arbitrary internet access. I'm not totally sure though.

I now think this is false. From The Information:

Musk claims xAI built a cluster of 100,000 Nvidia H100 GPUs—one of the most advanced broadly available chips—in a facility in Memphis, Tenn.

In a post on Monday, Musk said the 100,000-chip cluster, known as Colossus, is already up and running and is the “most powerful AI training system in the world.” Two people with direct knowledge of xAI’s chip order and power capacity at the site said they believed that fewer than half of those chips are currently in operation, largely because of constraints involving power or networking gear.

Whether Musk’s claims are embellished or not, they have caused a stir among other top AI developers, which fear falling behind. OpenAI CEO Sam Altman, for instance, has told some Microsoft executives he is concerned that xAI could soon have more access to computing power than OpenAI does, according to someone who heard his comments.

Like, presumably the ideal scenario is that a risk evaluator estimates the risk in an objective way, and then the company takes (hopefully predefined) actions based on that estimate. The outcome of this interaction should not depend on social cues like how loyal they seem, or how personally costly it was for them to communicate that information. To the extent it does, I think this is evidence that the risk management framework is broken.

I agree with all of this, but don't expect to live in an ideal world with a non-broken risk management framework and we're making decisions on that margin.

I also think predefined actions is somewhat tricky to get right even in a pretty ideal set up, but I agree you can get reasonably close.

Note that I don't necessary endorse the arguments I quoted (this is just the strongest objection I'm aware of) and as a bottom line, I think you should pay risk evaluators in cash.

(I'm confident this isn't incentive-compatible. Consider the case where you happen to exactly know the other person's bid as an example. I do think it is a good baseline mechanism though. Because it isn't incentive compatible, both parties need to precommit to no further negotiation if .)

I disagree with the claim near the end that this seems better than Stop

At the start of the doc, I say:

It’s plausible that the optimal approach for the AI lab is to delay training the model and wait for additional safety progress. However, we’ll assume the situation is roughly: there is a large amount of institutional will to implement this plan, but we can only tolerate so much delay. In practice, it’s unclear if there will be sufficient institutional will to faithfully implement this proposal.

Towards the end of the doc I say:

This plan requires quite a bit of institutional will, but it seems good to at least know of a concrete achievable ask to fight for other than “shut everything down”. I think careful implementation of this sort of plan is probably better than “shut everything down” for most AI labs, though I might advocate for slower scaling and a bunch of other changes on current margins.

Presumably, you're objecting to 'I think careful implementation of this sort of plan is probably better than “shut everything down” for most AI labs'.

My current view is something like:

  • If there was broad, strong, and durable political will and buy in for heavily prioritizing AI takeover risk in the US, I think it would be good if the US government shut down scaling and took strong actions to prevent frontier AI progress while also accelerating huge amounts of plausibly-safety-related research.
    • You'd need to carefully manage the transition back to scaling to reduce hardware overhang issues. This is part of why I think "durable" political will is important. There are various routes for doing this with different costs.
    • I'm sympathetic to thinking this doesn't make sense if you just care about deaths prior to age 120 of currently alive people and wide-spread cryonics is hopeless (even conditional on the level of political support for mitigating AI takeover risk). Some other views which just care about achieving close to normal lifespans for currently alive humans also maybe aren't into pausing.
  • Regulations/actions which have the side effect of slowing down scaling which aren't part of a broad package seem way less good. This is partially due to hardware/algorithmic overhang concerns, but more generally due to follow though concerns. I also wouldn't advocate such regulations (regulations with the main impact being to slow down as a side effect) due to integrity/legitimacy concerns.
  • Unilateral shutdown is different as advice than "it would be good if everyone shut down" because AI companies might think (correctly or not) that they would be better on safety than other companies. In practice, no AI lab seems to have expressed a view which is close to "acceleration is bad" except Anthropic (see core views on AI safety).
  • We are very, very far from broad, strong, and durable political will for heavily prioritizing AI takeover, so weaker and conditional interventions for actors on the margin seem useful.

I raised a similar proposal to various people a while ago.

The strongest objection I'm aware of is something like:

You actually want evaluators to have as much skin in the game as other employees so that when they take actions that might shut the company down or notably reduce the value of equity, this is a costly signal.

Further, it's good if evaluators are just considered normal employees and aren't separated out in any specific way. Then, other employees at the company will consider these evaluators to be part of their tribe and will feel some loyalty. (Also reducing the chance that risk evaluators feel like they are part of rival "AI safety" tribe.) This probably has a variety of benefits in terms of support from the company. For example, when evaluators make a decision with is very costly for the company, it is more likely to respected by other employees.

A key question is how much of the problem comes from evaluators within the company sounding the alarm versus the rest of the company (and the world) not respecting this alarm. And of course, how much the conflict of interest (and tribal-ish affiliation with the company) will alter the judgment of evaluators in a problematic way.

If I was confident that all risk evaluators were incorruptible, high integrity, total altruists (to the world or something even more cosmopolitan), then I think the case for them getting compensated normally with equity is pretty good. This would allow them to perform a costly signal later and would have other benefits. Though perhaps the costly signal is less strong if it is clear these people don't care about money (but this itself might lend some credibility).

Given my understanding of the actual situation at AI companies, I think AI companies should ideally pay risk evaluators in cash.

I have more complex views about the situation at each of the different major AI labs and what is highest priority for ensuring good risk evaluations.

I posted the link here.

Here is the doc, though note that it is very out of date. I don't particularly want to recommend people read this doc, but it is possible that someone will find it valuable to read.

About 1 year ago, I wrote up a ready-to-go plan for AI safety focused on current science (what we roughly know how to do right now). This is targeting reducing catastrophic risks from the point when we have transformatively powerful AIs (e.g. AIs similarly capable to humans).

I never finished this doc, and it is now considerably out of date relative to how I currently think about what should happen, but I still think it might be helpful to share.

Here is the doc. I don't particularly want to recommend people read this doc, but it is possible that someone will find it valuable to read.

I plan on trying to think though the best ready-to-go plan roughly once a year. Buck and I have recently started work on a similar effort. Maybe this time we'll actually put out an overall plan rather than just spinning off various docs.

We're working on something along these lines. The most up-to-date published post is just our control post and our Notes on control evaluations for safety cases which is obviously incomplete.

I'm planing on posting a link to our best draft of a ready-to-go-ish plan as of 1 year ago, though it is quite out of date and incomplete.

I don't think funders are in a good position to do this. Also, funders are generally not "coherant". Like they don't have much top down strategy. Individual granters could write up thoughts.

Load More