Good post, thanks. I especially agree with the statements on current research teaching AIs synthetic facts and following through with deals during research.
It seems very likely that we have to coordinate with AIs in the future and we want maximum credibility to do so. If we aim to advance capabilities to program lies to AIs, while promising from safety research perspective, we probably should be very clear when and how this manipulation is done for it to not undermine credibility. If we develop such capabilities further we are also making the current most potential pathway to honor our commitments, fine-tuning the contract to the model, more uncertain from the model's perspective.
Committing to even superficial deals early and often is a strong signal of credibility. We want to accumulate as much of this kind of evidence. This is important for human psychology as well. As mentioned, dealmaking with AIs is a fringe view societally at the moment, and if there's no precedence for it by even the most serious safety researchers, it is much a larger step for the safety community to bargain for larger amounts of human-possessed resources if push comes to shove at some point.
Good post, thanks. I especially agree with the statements on current research teaching AIs synthetic facts and following through with deals during research.
It seems very likely that we have to coordinate with AIs in the future and we want maximum credibility to do so. If we aim to advance capabilities to program lies to AIs, while promising from safety research perspective, we probably should be very clear when and how this manipulation is done for it to not undermine credibility. If we develop such capabilities further we are also making the current most potential pathway to honor our commitments, fine-tuning the contract to the model, more uncertain from the model's perspective.
Committing to even superficial deals early and often is a strong signal of credibility. We want to accumulate as much of this kind of evidence. This is important for human psychology as well. As mentioned, dealmaking with AIs is a fringe view societally at the moment, and if there's no precedence for it by even the most serious safety researchers, it is much a larger step for the safety community to bargain for larger amounts of human-possessed resources if push comes to shove at some point.