To do more good than harm as an AI safety researcher, read Legible vs. Illegible AI Safety Problems.
pretty useful for some recs, thanks.
I'd recommend learning about things how the cycling of rnc chairs is faster than dnc, how this affects things, why the dnc picks kamala, how staffers work and influence things, how laws are enforced, how reports change the compliance of laws, which laws get/dont get enforced and why, etc. There isn't a super clean place where this is written down, afaik, but just reading the financial times, politico, cnbc, etc can be useful - and then seeing if you can predict the next headlines.
If you'd like to test yourself on this after reading - do you know who Susie Wiles is and why she's important?
In a similar vein, a solid moral-philosophical grounding on what the stakes of power are and why they matter are a must-have. Too many AI safety researchers are desperately naive about the game of power and about the overall sweep of sociotechnical development.
It seems to me that this was already explored in enough detail and that our goal, conditioned on solving technical alignment, is to convince decisionmakers to implement the solution. The TLDR of the post-alignment problem is that mankind will be rendered unable to meaningfully impact the economy because decisionmakers will realize that any cognitive task is far cheaper to outsource to the AIs, ensuring that mankind can no longer obtain resources by any form of labor. The closest thing to a solution of the problem would be implementing some combination of MacAskell's Universal Basic Resources combined with games with clear rules and ability to earn prizes.
As for solving technical alignment itself, I understand how math-related books or the security mindset could help, but I struggle to understand how exactly governance-related ideas or sections related to game theory and coordination problems can help, with the exception of making deals with not-so-aligned AIs.
There is also an additional consideration that rogue or careless (think of xAI's lack of precautions) AGI development should be prevented until alignment is solved or forever if alignment is deemed insoluble, but I suspect that it also is easily understandable. Therefore, I cannot see a causal pathway for reading about governance to become useful for AI safety.
(Epistemic status: Opinions, but justifiable ones. For several notable AI safety research sprint program directors of my acquaintance who I won't mention by name here; with thanks to @WhatsTrueKittycat and @Morphism, the latter of whom insisted that I crosspost.)
...then what would I make absolutely sure that the new blood read, played, or otherwise interacted with? And why? This list is not meant to be exhaustive, but I've tried pretty hard to cover a lot of ground very fast. You may assume that this is in addition to classics, like "A List of Lethalities", excerpts from Bostrom, and "Ten Levels of AI Alignment Difficulty". Accordingly, this is the things that I would personally add to that curriculum, or maybe bump some marginal things in favor of. It's aimed all over the spectrum of what "new AI safety researcher" means; some of them are for totally new people, some are for people who have a sense of what subfield they want to attack, and some could benefit literally everyone including established researchers. I've tried to pick things that are specifically underutilized and a relatively short time commitment, or when longer, at least decidedly not dry; in all cases, I've valorized works which are prescient or which are not downstream of the AI safety community.
To begin, a few things that I'd have literally everyone read. The first subcategory is basic grounding - what are we to contemplate and study and disassemble, and why?
In a similar vein, a solid moral-philosophical grounding on what the stakes of power are and why they matter are a must-have. Too many AI safety researchers are desperately naive about the game of power and about the overall sweep of sociotechnical development. Unfortunately, I might also have described this subcategory as "texts that point out an extremely severe and corrosive problem, examine it at length and with sophistication, and end by presenting no answer, but rather apologizing for having none and expressing a sureness or fervent wish that in the decades to come someone cleverer will find the answers, or that everyone will put down their arms and hold hands to find that answer". Gentle reader: know your history; they did not.
I've noticed a few major missing categories from AI safety reading lists. I've listed them off here, along with my imperial decrees on which texts to pick.
Finally, a few miscellaneous entries invaluable to specific subfields but bearing fruit for all.
Honorable mentions, given their presence in many existing introductory AI safety reading lists:
I fully expect that at least three people will happily tell me where this decree has obviously and crucially faltered, and be wise advisors in doing so. Such is the nature of lists made by a single fox rather than a committee of hedgehogs. I welcome your sage good-faith counsel and promise not to send you to the GPU mines.