As AI becomes increasingly capable of following instructions and conducting analyses, Chenhao Tan believes that scientists will increasingly play the role of selector and evaluator. In this talk, he will share recent advances in AI-enabled hypothesis generation and research evaluation. Rather than treating AI hallucinations as obstacles to eliminate, we leverage data and literature to steer AI creativity toward generating effective hypotheses. He will also introduce HypoBench, a dedicated benchmark for evaluating hypothesis generation, which reveals significant room for potential improvement of current AI models. Finally, he will present ongoing work that formalizes the evaluation of research outcomes beyond the paper itself and use AI to conduct robust evaluation of research evaluation, with a case study on mechanistic interpretability.
As AI becomes increasingly capable of following instructions and conducting analyses, Chenhao Tan believes that scientists will increasingly play the role of selector and evaluator. In this talk, he will share recent advances in AI-enabled hypothesis generation and research evaluation. Rather than treating AI hallucinations as obstacles to eliminate, we leverage data and literature to steer AI creativity toward generating effective hypotheses. He will also introduce HypoBench, a dedicated benchmark for evaluating hypothesis generation, which reveals significant room for potential improvement of current AI models. Finally, he will present ongoing work that formalizes the evaluation of research outcomes beyond the paper itself and use AI to conduct robust evaluation of research evaluation, with a case study on mechanistic interpretability.
Posted on: