Jelle Donders — LessWrong

Biomedical Engineering > Philosophy of AI student trying to figure out how I can robustly set myself up for meaningfully contributing to making transformative AI go well, likely through AI governance and/or field building.

They've shortened timelines, yes, but what's really the counterfactual here?

"Getting AGI 5 years later with more compute overhang and controlled by people that don't care about the longterm future of humanity because anyone that did ran away" doesn't sound like an obviously better world.

I get their concerns about Google, but I don't get why they emphasize Demis. Makes it seem like there's more to it than "he happens to be DeepMind's CEO atm"

Too bad it's on same day as EAGx Berkeley

How many safety-focused people have left since the board drama now? I count 7, but I might be missing more. Ilya Sutskever, Jan Leike, Daniel Kokotajlo, Leopold Aschenbrenner, Cullen O'Keefe, Pavel Izmailov, William Saunders.

This is a big deal. A bunch of the voices that could raise safety concerns at OpenAI when things really heat up are now gone. Idk what happened behind the scenes, but they judged now is a good time to leave.

Possible effective intervention: Guaranteeing that if these people break their NDA's, all their legal fees will be compensated for. No idea how sensible this is, so agree/disagree voting encouraged.

Guaranteeing all the safety people that left OpenAI that any legal fees for breaking their NDA would be fully compensated might be a very effective intervention.

Thanks for this post! I've been thinking a lot about AI governance strategies and their robustness/tractability lately, much of which feels like a close match to what you've written here.

For many AI governance strategies, I think we are more clueless than many seem to assume about whether a strategy ends up positively shaping the development of AI or backfiring in some foreseen or unforeseen way. There are many crucial considerations for AI governance strategies, miss or get one wrong and the whole strategy can fall apart, or become actively counterproductive. What I've been trying to do is:

Draft a list of trajectories for how the development and governance of AI up until we get to TAI, estimating the likelihood and associated xrisk from AI for each trajectory.
- e.g. "There ends up being no meaningful international agreements or harsh regulation and labs race each other until TAI. Probability of trajectory: 10%, Xrisk from AI for scenario: 20%."
Draft a list of AI governance strategies that can be pursued.
- e.g. "push for slowing down frontier AI development by licensing the development of large models above a compute threshold and putting significant regulatory burden on them".
For each combination of trajectory and strategy, assess whether we are clueless about the what the sign before the impact of said strategy would be, or if the strategy would be robustly good (~predictably lowers xrisk from AI in expectation), at least for this trajectory. A third option would of course be robustly bad.
- e.g. "Clueless, it's not clear which consideration should have more weight, and backfiring could be as bad as success is good.
  - + This strategy would make this trajectory less likely and possibly shift it to a trajectory with lower xrisk from AI.
  - - Getting proper international agreement seems unlikely for this pessimistic trajectory. Partial regulation could disproportionally slow down good actors, or lead to open source proliferation and increases misuse risk."
Try to identify strategies that are robust across a wide array of trajectories.

I'm just winging it without much background in how such foresight-related work is normally done, so any thoughts or feedback on how to approach this kind of investigation, or what existing foresight frameworks you think would be particularly helpful here are very much appreciated!

If this event is full at some point, there's a meetup in Eindhoven on the same day :)

43. This situation you see when you look around you is not what a surviving world looks like.

A similar argument could have been made during the cold war to argue that nuclear war is inevitable, yet here we are.

Is it conclusive the meetup will be tomorrow? There seemed to be some uncertainty at the Eindhoven meetup...

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)

Jelle Donders10mo53

Jelle Donders10mo66

Ilya Sutskever and Jan Leike resign from OpenAI [updated]

Jelle Donders1y179

Jelle Donders1y284

Foresight for AGI Safety Strategy: Mitigating Risks and Identifying Golden Opportunities

Jelle Donders2y30

AGI Ruin: A List of Lethalities

Jelle Donders3y12

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments