Summary
Can LLMs science? The answer to this question can tell us important things about timelines to AGI. In this small pilot experiment, we test frontier LLMs on their ability to perform a minimal version of scientific research, where they must discover a hidden rule about lists of integers by iteratively generating and testing hypotheses. Results are ambiguous: they're mostly pretty bad at it but top systems show apparent signs of life. We're working on a larger, more rigorous experiment, and we really want your input.
Structure
In this post we:
- Describe an experiment on general reasoning and scientific research ability in LLMs.
- Describe the main research project for which this is a pilot project.
- Ask for your
... (read 9285 more words →)