I am the co founder of and researcher at the quantitative long term strategy organization Convergence (see here for our growing list of publications). Over the last fourteen years I have worked with MIRI, CFAR, EA Global, and Founders Fund, and done work in EA strategy, fundraising, networking, teaching, cognitive enhancement, and AI safety research. I have a MS degree in computer science and BS degrees in computer science, mathematics, and physics.
More generally you can use the following typology to inspire creating more interventions.
Interventions points to change/form an AGI company and its surroundings towards safer x-risk results (I've used this in advising startups on AI safety, it is also related to my post on positions where people can be in the loop):
Thanks for asking the question!
Some things I'd especially like to see change (in as much as I know what is happening) are:
Gotcha. What determines the "ratios" is some sort of underlying causal structure of which some aspects can be summarized by a tech tree. For thinking about the causal structure you may also like this post: https://forum.effectivealtruism.org/posts/TfRexamDYBqSwg7er/causal-diagrams-of-the-paths-to-existential-catastrophe
Complementary ideas to this article:
Relatedly, here is a post going beyond the framework of a ratio of progress to the effect on the ratio of research that still needs to be done for various outcomes: https://www.lesswrong.com/posts/BfKQGYJBwdHfik4Kd/fai-research-constraints-and-agi-side-effects
Extending further one can examine higher order derivatives and curvature in a space of existential risk trajectories: https://forum.effectivealtruism.org/posts/TCxik4KvTgGzMowP9/state-space-of-x-risk-trajectories
Roughly speaking, in terms of the actions you take, various timelines should be weighted as P(AGI in year t)*DifferenceYouCanProduceInAGIAlignmentAt(t). This produces a new, non normalized distribution of how much to prioritize each time (you can renormalize it if you wish to make it more like "probability").
Note that this is just a first approximation and there are additional subtleties.
(Meta: I may make a full post on this someday and use this reasoning often)
I think causal diagrams naturally emerge when thinking about Goodhart's law and its implications.
I came up with the concept of Goodhart's law causal graphs above because of a presentation someone gave at the EA Hotel in late 2019 of Scott's Goodhart Taxonomy. I thought causal diagrams were a clearer way to describe some parts of the taxonomy but their relationship to the taxonomy is complex. I also just encountered the paper you and Scott wrote a couple weeks ago when getting ready to write this Good Heart Week prompted post, and I was planning in the next post to reference it when we address "causal stomping" and "function generalization error" and can more comprehensively describe the relationship with the paper.
In terms of the relationship to the paper, I think that the Goodhart's law causal graphs I describe above are more fundamental and atomically describe the relationship types between the target and proxies in a unified way. I read how you were using causal diagrams in your paper as rather describing various ways causal graph relationships may be broken by taking action rather than simply describing relationships between proxies and targets and ways they may be confused with each other (which is the function of the Goodhart's law causal graphs above).
Mostly the purpose of this post and the next are to present an alternative, and I think cleaner, ontological structure for thinking about Goodhart's law though there will still be some messiness in carving up reality.
As to your suggested mitigations, both randomization and secret metric are good to add though I'm not as sure about post hoc. Thanks for the suggestions and the surrounding paper.
I like the distinction that you're making and that you gave it a clear name.
Relatedly, there is the method of Lagrangian multipliers for solving things in the subspace.
On a side note: there is a way to partially unify subspace optimum and local optimum by saying that the subspace optimum is a local optimum with respect to the local set of parameters you're using to define the subspace. You're at a local optimum with respect to defining the underlying space to optimize over (aka the subspace) and a local optimum within that space (the subspace). (Relatedly, moduli spaces.)
I've decided to try modelling testing and contact tracing over the weekend. If you wish to join and want to ping me my contact details are in the doc.
I agree.
Anthropic's marginal contribution to safety (compared to what we would have in a world without Anthropic) probably doesn't offset Anthropic's contribution to the AI race.
I think there are more worlds where Anthropic is contributing to the race in a negative fashion than there are worlds where Anthropic's marginal safety improvement over OpenAI/DeepMind-ish orgs is critical for securing a good future with AGI (weighing things according to the impact sizes and probabilities).