x
Direct Preference Optimization in One Minute — LessWrong