dil-leik-og

Message

When is it Better to Train on the Alignment Proxy?

This is a response to Matt's earlier post. If you see "a large mixture of alignment proxies" when you look at a standard loss function, my post might save you from drawing silly conclusions from the earlier post. If you parse the world into non-overlapping magisteria of "validation losses" and...

Mar 11, 202514

Low Temperature Solomonoff Induction

Supposing that agentic hypotheses are more complex than non-agentic ones, is it possible to reduce the impact of the agentic ones by penalizing complexity more? Consider a hypothetical hypercomputer capable of running Solomonoff induction. Rather than using it to generate plans directly, we want to predict risks in plans generated...

Dec 6, 202410

LESSWRONG
LW

LESSWRONG
LW

dil-leik-og

dil-leik-og

dil-leik-og

dil-leik-og

When is it Better to Train on the Alignment Proxy?

Low Temperature Solomonoff Induction

When is it Better to Train on the Alignment Proxy?

Low Temperature Solomonoff Induction