LESSWRONG
LW

Charbel-Raphaël
2424Ω360242114
Message
Dialogue
Subscribe

Charbel-Raphael Segerie

https://crsegerie.github.io/ 

Living in Paris

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Political Funding Expertise (Post 6 of 7 on AI Governance)
Charbel-Raphaël10d40

This is convincing!

Reply
Mainstream Grantmaking Expertise (Post 7 of 7 on AI Governance)
Charbel-Raphaël13d*67

If there is a shortage of staff time, then AI safety funders need to hire more staff. If they don’t have time to hire more staff, then they need to hire headhunters to do so for them. If a grantee is running up against a budget crisis before the new grantmaking staff can be on-boarded, then funders can maintain the grantee’s program at present funding levels while they wait for their new staff to become available.

+1 - and this has been a problem for many years.

Reply1
Political Funding Expertise (Post 6 of 7 on AI Governance)
Charbel-Raphaël13d62

I find it slightly concerning that this post is not receiving more attention.

Reply
Political Funding Expertise (Post 6 of 7 on AI Governance)
Charbel-Raphaël13d20

By the time we observe whether AI governance grants have been successful, it will be too late to change course.

I don't understand this part. I think that it is possible to assess in much more granular detail the progress of some advocacy effort.

Reply
Orphaned Policies (Post 5 of 7 on AI Governance)
Charbel-Raphaël13d60

Strong upvote. A few complementary remarks:

  • Many more people agree on the risks than on the solutions - advocating for situational awareness of the different risks might be more productive and urgent than arguing for a particular policy, even though I also see the benefits of pushing for a policy.
  • The AI Safety movement is highly uncoordinated; everyone is pushing their own idea. By default, I think this might be negative - maybe we should coordinate better.
  • The list of orphaned policies could go on - for example, at CeSIA, we are more focused on formalizing what unacceptable risks would mean, and trying to trace precise red lines and risk thresholds. We think this approach is: 1) Most acceptable to states, since even rival countries have an interest in cooperating to prevent worst-case scenarios, as demonstrated by the Nuclear Non-Proliferation Treaty during the Cold War. 2) Most widely endorsed by research institutes, think tanks, and advocacy groups (and we think this might be a good candidate policy that should be pushed in a coalition). 3) Reasonable, as most AI companies have already voluntarily committed to these principles during the International AI Summit in Seoul. However, to date, the red lines have been largely vague and are not yet implementable.
Reply
Mikhail Samin's Shortform
Charbel-Raphaël1mo116

P(doom|Anthropic builds AGI) is 15% and P(doom|some other company builds AGI) is 30% --> You need to add to this the probability that Anthropic is first and that the other companies are not going to create AGI if Anthropic already created it. this is by default not the case

Reply
The 80/20 playbook for mitigating AI scheming in 2025
Charbel-Raphaël1mo*Ω120

I'm going to collect here new papers that might be relevant:

  • https://x.com/bartoszcyw/status/1925220617256628587
  • Why Do Some Language Models Fake Alignment While Others Don’t? (link)
Reply
Season Recap of the Village: Agents raise $2,000
Charbel-Raphaël1mo20

I was thinking about this:

  • Perhaps this link is relevant: https://www.fanaticalfuturist.com/2024/12/ai-agents-created-a-minecraft-civilisation-complete-with-culture-religion-and-tax/  (it's not a research paper, but neither you I think?)
  • Voyager is a single agent, but it's very visual: https://voyager.minedojo.org/ 
  • OpenAI already did the hide-and-seek project a while ago: https://openai.com/index/emergent-tool-use/ 


    While those are not examples of computer use, I think it fits the bill for a presentation of multi-agent capabilities in a visual way.

I'm happy to see that you are creating recaps for journalists and social media.

Regarding the comment on advocacy, "I think it also has some important epistemic challenges": I'm not going to deny that in a highly optimized slide deck, you won't have time to balance each argument. But also, does it matter that much? Rationality is winning, and to win, we need to be persuasive in a limited amount of time. I don't have the time to also fix civilizational inadequacy regarding epistemics, so I play the game, as is doing the other side.

Also, I'm not criticizing the work itself, but rather the justification or goal. I think that if you did the goal factoring, you could optimize for this more directly.

Let's chat in person !

Reply
Season Recap of the Village: Agents raise $2,000
Charbel-Raphaël1mo31

I'm skeptical that this is the best way to achieve this goal, as many existing works already demonstrate these capabilities. Also, I think policymakers may struggle to connect these types of seemingly non-dangerous capabilities to AI risks. If I only had three minutes to pitch the case for AI safety, I wouldn't use this work; I would primarily present some examples of scary demos.

Also, what you are doing is essentially capability research, which is not very neglected. There are already plenty of impressive capability papers that I could use for a presentation.

For info, here is the deck of slides that I generally use in different context.

I have considerable experience pitching to policymakers, and I'm very confident that my bottleneck in making my case isn't a need for more experiments or papers, but rather more opportunities, more cold emails, and generally more advocacy.

I'm happy to jump on a call if you'd like to hear more about my perspective on what resonates with policymakers.

See also: We're Not Advertising Enough.

Reply
Constructability: Plainly-coded AGIs may be feasible in the near future
Charbel-Raphaël1moΩ240

relevant: https://x.com/adcock_brett/status/1929207216910790946 

Reply
Load More
6Charbel-Raphaël's Shortform
Ω
3mo
Ω
7
AI Control
1y
24The bitter lesson of misuse detection
Ω
1d
Ω
2
39The 80/20 playbook for mitigating AI scheming in 2025
Ω
1mo
Ω
2
25[Paper] Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods
2mo
0
6Charbel-Raphaël's Shortform
Ω
3mo
Ω
7
94🇫🇷 Announcing CeSIA: The French Center for AI Safety
7mo
2
41Are we dropping the ball on Recommendation AIs?
Ω
9mo
Ω
17
63We might be dropping the ball on Autonomous Replication and Adaptation.
QΩ
1y
QΩ
30
34AI Safety Strategies Landscape
Ω
1y
Ω
1
91Constructability: Plainly-coded AGIs may be feasible in the near future
Ω
1y
Ω
15
108What convincing warning shot could help prevent extinction from AI?
QΩ
1y
QΩ
22
Load More