When wishful thinking worksΩ

Safely and usefully spectating on AIs optimizing over toy worldsΩ

Computational efficiency reasons not to model VNM-rational preference relations with utility functions

A comment on the IDA-AlphaGoZero metaphor; capabilities versus alignmentΩ

[Link]Logical uncertainty and mathematical uncertaintyΩ

Logical uncertainty and Mathematical uncertainty

More on the Linear Utility Hypothesis and the Leverage Prior

Value learning subproblem: learning goals of simple agentsΩ

Against the Linear Utility Hypothesis and the Leverage Penalty

Being legible to other agents by committing to using weaker reasoning systemsΩ