Hjalmar_Wijk's Shortform
May 31, 20243
Note: This is a rough attempt to write down a more concrete threshold at which models might pose significant risks from autonomous replication and adaptation (ARA). It is fairly in the weeds and does not attempt to motivate or contextualize the idea of ARA very much, nor is it developed...
This post is an attempt to sketch a presentation of the alignment problem while tabooing words like agency, goals or optimization as core parts of the ontology.[1] This is not a critique of frameworks which treat these topics as fundamental, in fact I end up concluding that this is likely...