tchauvin — LessWrong

Vulnerabilities and exploits: where are we headed?

In Are Mythos’ cyber capabilities overhyped?, co-authored with Epoch AI, we looked at the public evidence on how good Mythos Preview was at vulnerability discovery and exploit development. In this post, I consider the implications. For vulnerability discovery: moving from sparse sampling to dense sampling, AI vs fuzzing, long-term defense...

Jun 1815

End-to-end hacking with language models

Cross-posted from https://tchauvin.com/end-to-end-hacking-with-language-models Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort. Thanks to JS Denain and Léo Grinsztajn for valuable feedback on drafts of this post. How close are we to autonomous hacking agents, i.e. AI agents that can surpass humans in cyber-offensive...

Apr 5, 202429

An Overview of AI risks - the Flyer

by Charbel-Raphaël, Jonathan Claybrough, and tchauvin

EffiSciences recently published a document outlining various types of AI risks destined for a technical audience. We shared this flyer as part of a hackathon we organized with Entrepreneur first at MetaAI Paris. Feel free to copy and modify this flyer for your own usage. Special thanks to Jonathan Claybrough...

Jul 17, 202320

Is there an analysis of the common consideration that splitting an AI lab into two (e.g. the founding of Anthropic) speeds up the development of TAI and therefore increases AI x-risk?

I'm asking because I can think of arguments going both ways. Note: this post is focused on the generic question "what to expect from an AI lab splitting into two" more than on the specifics of the OpenAI vs Anthropic case. Here's the basic argument: after splitting, the two groups...

Mar 16, 20234

Using spaced repetition to make the most out of blog posts and books

Cross-posted from timot.cool Spaced repetition (SR) is still an early field of collective experimentation. People have been coming up with many ideas on what to use SR for: trivia like the capitals of the world, foreign vocabulary, their domain of expertise... What I almost never see discussed is the use...

Dec 21, 202010