Can we steer AI models toward safer actions by making these instrumentally useful? — LessWrong