Carson Denison

I work on deceptive alignment and reward hacking at Anthropic

Wiki Contributions

Comments

Having just finished reading Scott Garrabrant's sequence on geometric rationality: https://www.lesswrong.com/s/4hmf7rdfuXDJkxhfg 
These lines:
- Give a de-facto veto to each major faction
- Within each major faction, do pure democracy.
Remind me very much of additive expectations / maximization within coordinated objects and multiplicative expectations / maximization between adversarial ones. For example maximizing expectation of reward within a hypothesis, but sampling which hypothesis to listen to for a given action according to their expected utility rather than just taking the max.

Thank you for catching this. 

These linked to section titles in our draft gdoc for this post. I have replaced them with mentions of the appropriate sections in this post.