A method for empirical back-testing of AI's ability to self-improve

TL;DR; Train an LLM/AI with text only until 2016, and ask it to come up with AI architecture ideas based on ML research until then to see whether it could come up with the idea of transformers.

One of the biggest promises of A(G)I is automated and fast invention of new ideas, theories, and scientific breakthroughs. As some have said, AGI might be the last invention humanity needs to make.

We'd like to be able to test existing and future AIs for their capacity for innovation, but it's hard. It includes two parts:

Getting AI to come up with new ideas: this is easy. Just ask GPT-4. There are some interesting prompts floating around, like this one:
What’s an example of a phenomenon where humanity as a whole lacks a good explanation for, but, taking into account the full set of human generated knowledge, an explanation is actually possible to generate? Please write the explanation. It must not be a hypothesis that has been previously proposed. A good explanation will be hard to vary.
Source and examples: https://twitter.com/gfodor/status/1637323490390073349?t=zhRmG397YS9lRi24oSGRjg&s=19
Validate the new ideas AI came up with: This seems hard. To test a hypothesis, we often need to marshal resources, go out into the real world, and test it. We need experts, funding, and often the creation of new methods. Even if it's just algorithms, testing them at scale is not trivial.

Fortunately, we do have a free source of evidence for validity of new hypotheses and theories: history. If you want to test whether an AI can come up with architectural improvements to itself, hide the last 5 years of ML advancements from it and ask it to come up with 1000 ideas for architectural improvements. Use a time-unrestricted AI to automatically compare these hypotheses to the actual ML innovations to see whether any of its ideas came close to the ideas that humans came up with.

Extension: You can do this with any arbitrary knowledge cutoff and any human invention. Train an LLM with all human knowledge up until 1700 and ask it to come up with hypotheses for various phenomena that were explained since 1700. You can have LLMs trained with info up to any decade in the last 400 years try to predict inventions that happened after that cutoff.

For earlier years, you may run into not having enough data. To get around this, you can use time-unrestricted AI to generate training text that doesn't include relevant information after the given year. It's harder to verify, but there are probably clever ways around it.

Caveat: humans are probably wrong about some things we believe. This benchmark bakes in our incorrectness at some level into the test, but it does at least test for human-level innovation.

When I shared this idea with my wife, she extended it to the area of forecasting. AI trained up to year X can try to predict the probability of events that did (or did not) occur in the next 1/5/10/20 years.

Maybe someone has come up with this idea before, but if not, we can use this as a new benchmark for AI's ability to both innovate and to forecast.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

3

A method for empirical back-testing of AI's ability to self-improve

3

3