LESSWRONG
LW

Corin Katzke
4463460
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating
Corin Katzke6mo41

our

Note: coauthored by Gideon Futerman. 

Reply
Scenario planning for AI x-risk
Corin Katzke1y50

Thank you for reading and responding to it! For what it's worth, some of these ideas got rolling during your "AI safety under uncertainty" workshop at EAG Boston.

Reply1
Scenario planning for AI x-risk
Corin Katzke1y52

Yep, another good point, and in principle I agree. A couple of caveats, though:

First, it's not clear to me that experts would agree on enough dynamics to make these clusters predicatively reliable. There might be agreement on the dynamics between scaling laws and timelines (and that's a nice insight!) — but the Killian et al. paper considered 14 variables, which (for example) would be 91 pairwise dynamics to agree on. I'd at least like some data on whether conditional forecasts converge. I think FRI is doing some work on that.

Second, the Grace et al. paper suggested that expert forecasts exhibited framing effects. So, even if experts did agree on underlying dynamics, those agreements might not be able to be reliably elicited. But maybe conditional forecasts are less susceptible to framing effects.

Reply
Scenario planning for AI x-risk
Corin Katzke1y30

Thanks for the clarification! I didn't mean to imply that Anthropic hasn't been thinking about the full spectrum of risk — only that "misuse" and "autonomy and replication" are the two categories of catastrophic risk explicitly listed in the RSP.

If I do think of a good way to evaluate accident risks before deployment, I'll definitely let you know. (I might actually pitch my team to work on this.)

Reply
Scenario planning for AI x-risk
Corin Katzke1y30

Yep, fair enough. I agree that an MTBF of millions of years is an alternative sustainable theory of victory. 

Could you expand on "the challenge is almost entirely in getting to an acceptably low rate"? It's not clear to me that that's true. For example, it seems plausible that at some point nuclear risk was at an acceptably low rate (maybe post-fall of the USSR? I'm niether an expert nor old enough to remember) conditional on a further downward trend — but we didn't get a further downward trend.

Reply
Responsible Scaling Policies Are Risk Management Done Wrong
Corin Katzke2y10

It’s called “responsible scaling”. In its own name, it conveys the idea that not further scaling those systems as a risk mitigation measure is not an option.

That seems like an uncharitable reading of "responsible scaling." Strictly speaking, the only thing that name implies is that it is possible to scale responsibly. It could be more charitably interpreted as "we will only scale when it is responsible to do so." Regardless of whether Anthropic is getting the criteria for "responsibility" right, it does seem like their RSP leaves open the possibility of not scaling. 

Reply
4AISN #59: EU Publishes General-Purpose AI Code of Practice
10h
0
4AISN #58: Senate Removes State AI Regulation Moratorium
12d
0
4AISN #57: The RAISE Act
1mo
0
5AISN #56: Google Releases Veo 3
2mo
0
4AISN #55: Trump Administration Rescinds AI Diffusion Rule, Allows Chip Sales to Gulf States
2mo
1
6AISN #54: OpenAI Updates Restructure Plan
2mo
1
5AISN #53: An Open Letter Attempts to Block OpenAI Restructuring
3mo
0
4AISN#52: An Expert Virology Benchmark
3mo
0
6AISN #51: AI Frontiers
3mo
1
4AISN #50: AI Action Plan Responses
3mo
0
Load More