Multiple AIs in boxes, evaluating each other's alignment
Summary Below I describe a variation of the classical AI box experiment, in which two AIs in boxes are created, and asked to determine whether the other is aligned with the values of humanity. Several provisions in the experiment are created to discourage the AIs from hiding a potential failure...
May 29, 20228