Joining big labs to work on alignment = actually helping improve capabilities, helping concentrate power at the top
I'm working through my own theory of change right now and would really appreciate any sources that helped you arrive here.
My current prior is weaker. I think the fungibility argument has weight (alignment research feeds back into capability, safety teams give legitimacy to labs that could be misused, commercial pressure bends commitments, e.g. Anthropic's RSP v3 walking back concrete if-then triggers). But I don't currently see it as fully fu...
Hello!
I'm new here, but have been reading through the sequences and other posts for the last few weeks and would love some feedback on a post idea. I'm writing my theory of change for AI safety and how I can help. I've defined my priors, identified cruxes, and I'm in the middle of reading papers and blog posts to challenge my priors. I've seen a few theory of change posts (e.g., Critch's healthtech post), but I'm wondering if I should post mine as a working document, starting with an unfinished product and updating as I refine my beliefs.
Is an in-progress...
Agreed. Even in non-technical consulting projects (e.g., strategy, change management), I've found high ROI by turning our process documentation into skills and turning our project plans into memory layers (e.g., current focus, recent progress). I've found that lessons / workflows from software development are useful for pretty much any domain, especially as more work is delegated to the agent as you mentioned.