x
A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior — LessWrong