I'm Jérémy Perret. Based in France. PhD in AI (NLP). AI Safety & EA meetup organizer. Information sponge. Mostly lurking since 2014. Seeking more experience, and eventually a position, in AI safety/governance.

Extremely annoyed by the lack of an explorable framework for AI risk/benefits. Working on that.


XiXiDu's AI Risk Interview Series

Wiki Contributions

Load More


Strongly upvoted for the clear write-up, thank you for that, and engagement with a potentially neglected issue.

Following your post I'd distinguish two issues:

(a) Lack of data privacy enabling a powerful future agent to target/manipulate you personally, because your data is just there for the taking, stored in not-so-well-protected databases, cross-reference is easier at higher capability levels, singling you out and fine-tuning a behavioral model on you in particular isn't hard ;

(b) Lack of data privacy enabling a powerful future agent to build that generic behavioral model of humans from the thousands/millions of well-documented examples from people who aren't particularly bothered by privacy, from the same databases as above, plus simply (semi-)public social media records.

From your deception examples we already have strong evidence that (b) is possible. LLM capabilities will get better, and it will get worse when [redacted plausible scenario because my infohazard policies are ringing].

In (b) comes to pass, I would argue that the marginal effort needed to prevent (a) would only be useful to prevent certain whole coordinated groups of people (who should already be infosec-aware) to be manipulated. Rephrased: there's already a ton of epistemic failures all over the place but maybe there can be pockets of sanity linked to critical assets.

I may be missing something as well. Also seconding the Seed webtoon recommendation.

Quick review of the review, this could indeed make a very good top-level post.

No need to apologize, I'm usually late as well!

I don't think there is a great answer to "What is the most comprehensive repository of resources on the work being done in AI Safety?"

There is no great answer, but I am compelled to list some of the few I know of (that I wanted to update my Resources post with) :

  • Vael Gates's transcripts, which attempts to cover multiple views but, by the nature of conversations, aren't very legible;
  • The Stampy project to build a comprehensive AGI safety FAQ, and to go beyond questions only, they do need motivated people;
  • Issa Rice's AI Watch, which is definitely stuck in a corner of the Internet, if I didn't work with Issa I would never have discovered it, lots of data about orgs, people and labs, not much context.

Other mapping resources involve not the work being done but arguments and scenarios, as an example there's Lukas Trötzmüller's excellent argument compilation, but that wouldn't exactly help someone get into the field faster.

Just in case you don't know about it there's the AI alignment field-building tag on LW, which mentions an initiative run by plex, who also coordinates Stampy.

I'd be interested in reviewing stuff, yes, time permitting!

Answers in order: there is none, there were, there are none yet.

(Context starts, feel free to skip, this is the first time I can share this story)

After posting this, I was contacted by Richard Mallah, who (if memory serves right) created the map, compiled the references and wrote most of the text in 2017, to help with the next iteration of the map. The goal was to build a Body of Knowledge for AI Safety, including AGI topics but also more current-capabilities ML Safety methods.

This was going to happen in conjunction with the contributions of many academic & industry stakeholders, under the umbrella of CLAIS (Consortium on the Landscape of AI Safety), mentioned here.

There were design documents for the interactivity of the resource, and I volunteered Back in 2020 I had severely overestimated both my web development skills and ability to work during a lockdown, never published a prototype interface, and for unrelated reasons the CLAIS project... winded down.

(End of context)

I do not remember Richard mentioning a review of the map contents, apart from the feedback he received back when he wrote them. The map has been a bit tucked in a corner of the Internet for a while now.

The plans to update/expand it failed as far as I can tell. There is no new version and I'm not aware of any new plans to create one. I stopped working on this in April 2021.

There is no current map with this level of interactivity and visualization, but there has been a number of initiatives trying to be more comprehensive and up-to-date!

I second this, and expansions of these ideas.

Thank you, that is clearer!

But let's suppose that the first team of people who build a superintelligence first decide not to turn the machine on and immediately surrender our future to it. Suppose they recognize the danger and decide not to press "run" until they have solved alignment.

The section ends here but... isn't there a paragraph missing? I was expecting the standard continuation along the lines of "Will the second team make the same decision, once they reach the same capability? Will the third, or the fourth?" and so on.

Thank you for this post, I find this distinction very useful and would like to see more of it. Has the talk been recorded, by any chance (or will you give it again)?

Thank you, that's was my understanding. Looking forward to the second competition! And, good luck sorting out all the submissions for this one.

[Meta comment]

The deadline is past, should we keep the submissions coming or is it too late? Some of the best arguments I could find elsewhere are rather long, in the vein of the Superintelligence FAQ. I did not want to copy-paste chunks of it and the arguments stand better as part of a longer format.

Anyway, signalling that the lack of money incentive will not stop me from trying to generate more compelling arguments... but I'd rather do it in French instead of posting here (I'm currently working on some video scripts on AI alignment, there's not enough French content of that type).

Load More