We Need a Baseline for LLM-Aided Experiments — LessWrong