x
What Makes a Good Terminal Bench Task — LessWrong