I'm an Operations Associate at Arcadia Impact, a UK nonprofit that empowers people and organisations to tackle pressing global issues, with a focus on AI safety and governance.
Before joining Arcadia, I:
You can reach out to me at brian [at] arcadiaimpact [dot] org or find me on LinkedIn.
My typo reaction may have glitched, but I think you meant "Don't push the frontier of capabilities" in the last bullet?
I've only read the blog post and a bit of the paper so far, but do you plan to investigate how to remove alignment faking in these situations? I wonder if there are simple methods to do so without negatively affecting the model's capabilities and safety.
Thanks for doing this important research! I may have found 2 minor typos:
Thanks for this analysis! A minor note: you're probably aware of this, but OpenPhil funds a lot of technical AI safety field-building work as part of their "Global Catastrophic Risks Capacity Building" grants. So the proportion of field-building / talent-development grants would be significantly higher if those were included.
Thanks for making this! This is minor, but I think the total should be $189M and not $169M?
Your last sentence in the first paragraph seems to be cut off at "gets a lot more than"!
I'm following up on Leon's question - have the results already been posted? If not, when will they be posted (if they will be)? I'm curious to know. Thanks!
Thanks for this. This tweet from Dr. Jacob Glanville, founder and CEO of Centivax, makes me worried about this variant too:
The new B.1.1.529 strain out of South Africa has 15 mutations in the RBD where majority of neutralizing antibodies bind. The current vaccines and even Delta-based vaccines probably won’t work against this new strain. Swift, vigorous containment is needed.
Thanks for linking these! I also want to highlight that Sam shared his AGI timeline in the Bloomberg interview: "I think AGI will probably get developed during this president’s term, and getting that right seems really important."