If you go back 10 million years and ask for an “actual mathematical model” supporting a claim that descendants of chimpanzees may pose an existential threat to the descendants of ground sloths (for example)—a model that can then be “tested against data”—man, I would just have no idea how to do that.
Like, chimpanzees aren’t even living on the same continent as ground sloths! And a ground sloth could crush a chimpanzee in a fight anyway! It’s not like there’s some trendline where the chimpanzee-descendants are gradually killing more and more ground sloths, and we can extrapolate it out. Instead you have to start making up somewhat-speculative stories (“what if the chimp-descendants invent a thing called weapons!?”). And then it’s not really a “mathematical model” anymore, or at least I don’t think it’s the kind of mathematical model that Tyler Cowen is hoping for.
One mathematical model that seems like it would be particularly valuable to have here is a model of the shapes of the resources invested vs optimization power curve. The reason I think an explicit model would be valuable there is that a lot of the AI risk discussion centers around recursive self-improvement. For example, instrumental convergence / orthogonality thesis / pivotal acts are relevant mostly in contexts where we expect a single agent to become more powerful than everyone else combined. (I am aware that there are other types of risk associated with AI, like "better AI tools will allow for worse outcomes from malicious humans / accidents". Those are outside the scope of the particular model I'm discussing).
To expand on what I mean by this, let's consider a couple of examples of recursive self-improvement.
For the first example, let's consider the game of Factorio. Let's specifically consider the "mine coal + iron ore + stone / smelt iron / make miners and smelters" loop. Each miner produces some raw materials, and those raw materials can be used to craft more miners. This feedback loop is extremely rapid, and once that cycle gets started the number of miners placed grows exponentially until all available ore patches are covered with miners.
For our second example, let's consider the case of an optimizing compiler like gcc. A compiler takes some code, and turns it into an executable. An optimizing compiler does the same thing, but also checks if there are any ways for it to output an executable that does the same thing, but more efficiently. Some of the optimization steps will give better results in expectation the more resources you allocate to them, at the cost of (sometimes enormously) greater required time and memory for the optimization step, and as such optimizing compilers like gcc have a number of flags that let you specify exactly how hard it should try.
Let's consider the following program:
INLINE_LIMIT=1
# <snip gcc source download / configure steps>
while true; do
make CC="gcc" CFLAGS="-O3 -finline-limit=$INLINE_LIMIT"
make install
INLINE_LIMIT=$((INLINE_LIMIT+1))
done
This is also a thing which will recursively self-improve, in the technical sense of "the result of each iteration will, in expectation, be better than the result of the previous iteration, and the improvements it finds help it more efficiently find future improvements". However, it seems pretty obvious that this "recursive self-improver" will not do the kind of exponential takeoff we care about.
The difference between these two cases comes down to the shapes of the curves. So one area of mathematical modeling I think would be pretty valuable would be
I will throw in an additional $300 bounty for an explicit model of this specific question, subject to the usual caveats (payable to only one person, can't be in a sanctioned country, etc), because I personally would like to know.
Edit: Apparently Tyler Cowen didn't actually bounty this. My $300 bounty offer stands but you will not be getting additional money from Tyler it looks like.
If we can model the spread of a virus, why can't we model a superintelligence? A brilliant question indeed.
I built a preliminary model here: https://colab.research.google.com/drive/108YuOmrf18nQTOQksV30vch6HNPivvX3?authuser=2
It’s definitely too simple to treat as strong evidence, but it shows some interesting dynamics. For example, levels of alignment rise at first, then rapidly falling when AI deception skills exceed human oversight capacity. I sent it to Tyler and he agreed — cool, but not actual evidence.
If anyone wants to work on improving this, feel free to reach out!
On the Russ Roberts ECONTALK Podcast #893, guest Tyler Cowen challenges Eliezer Yudkowsky and the Less Wrong/EA Alignment communities to develop a mathematical model for AI X-Risk.
Will Tyler Cowen agree that an 'actual mathematical model' for AI X-Risk has been developed by October 15, 2023?
https://manifold.markets/JoeBrenton/will-tyler-cowen-agree-that-an-actu?r=Sm9lQnJlbnRvbg
(This market resolves to "YES" if Tyler Cowen publicly acknowledges, by October 15 2023, that an actual mathematical model of AI X-Risk has been developed.)
Two excerpts from the conversation:
https://youtube.com/clip/Ugkxtf8ZD3FSvs8TAM2lhqlWvRh7xo7bISkp
https://youtube.com/clip/Ugkx4msoNRn5ryBWhrIZS-oQml8NpStT_FEU
Related:
Will there be a funding commitment of at least $1 billion in 2023 to a program for mitigating AI risk?
https://manifold.markets/JoeBrenton/will-there-be-a-funding-commitment?r=Sm9lQnJlbnRvbg
Will the US government launch an effort in 2023 to augment human intelligence biologically in response to AI risk?
https://manifold.markets/JoeBrenton/will-the-us-government-launch-an-ef?r=Sm9lQnJlbnRvbg
https://manifold.markets/JoeBrenton/will-the-general-public-in-the-unit?r=Sm9lQnJlbnRvbg
Will the general public in the United States become deeply concerned by LLM-facilitated scams by Aug 2 2023?
https://manifold.markets/JoeBrenton/will-the-general-public-in-the-unit?r=Sm9lQnJlbnRvbg