134 Conceding a short timelines bet early

16th Mar 2023

1 min read

134

Last year I bet some people about short AI timelines. While I don't think I've lost the bet yet, I think it's clear at this point that I will lose with high probability. I've outlined the reasons why I think that in a retrospective here. Even if I end up winning, I think it will likely be the result of a technicality, and that wouldn't be very interesting.

Because of my personal preference for settling this matter now without delay, I have decided to take the step of conceding the bet now. Note however that I am not asking Tamay to do the same. I have messaged the relevant parties and asked them to send me details on how to pay them.

I congratulate Nathan Helm-Burger and Tomás B. for taking the other side of the bet.

AI TimelinesBettingAI

Frontpage

134

Mentioned in

199A concrete bet offer to those with short AGI timelines

101AI #4: Introducing GPT-4

13EA & LW Forum Weekly Summary (13th - 19th March 2023)

Conceding a short timelines bet early

New Comment

17 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:27 PM

[-]Tomás B.3y499

My emotional state right now: https://twitter.com/emojimashupbot/status/1409934745895583750?s=46

[-]Nathan Helm-Burger3y*211

I will accept the early resolution, but I'd like to reserve the option to reverse the decision and the payment should the world turn out unexpectedly in our favor.

Also, I'd like to state that I commit to using the money to buy more equipment for my AI safety research. [Edit: Matthew paid up!] [Edit 2: So did Tamay!]

[-]Liron3y132

Bravo.

Which 2+ outcomes from the list do you think are most likely to lead to your loss?

[-]Matthew Barnett3y170

I suspect the MMLU and the MATH milestones are the easiest to achieve. I suspect it will probably happen after a GPT-4-level model is specialized to perform well in mathematics like Minerva.

[-]hold_my_fish3y10

I'm curious about this too. The retrospective covers weaknesses in each milestone, but a collection of weak milestones doesn't necessarily aggregate to a guaranteed loss, since performance ought to be correlated (due to an underlying general factor of AI progress).

[-][anonymous]3y1-2

Hmm? The 10 billion funding increase to OpenAI and the arms race with google pretty much guaranteed that the 10^30/ 1 billion USD machine for training would be satisfied. So we can mark that one as "almost certainly" satisfied by EOY 2023. Only way it isn't is a shortage of GPU/TPUs.

GPT-4 likely satisfies MMLU. So with 2 "almost certain" conditions met, plus if by some fluke they aren't met by 2026, there are still several other ways Matt can lose the bet.

[-]Matthew Barnett3y72

I think you're overconfident here. I'm quite skeptical that GPT-4 already got above 80% on every single task in the MMLU since there are 57 tasks and it got 86.4% on average. I'm also skeptical that OpenAI will very soon spend >$1 billion to train a single model, but I definitely don't think that's implausible. "Almost certain" for either of those seems wrong.

[-][anonymous]3y10

There's gpt-5 though, or GPT-4.math.finetune. You saw the Minerva results. You know there will be significant gain with a fine-tune, likely enough to satisfy 2-3 of your conditions.

As I said it's ridiculous to think someone either in the Google or OAI camp won't have more than 1 billion USD in training hardware, in service for a single model (training many instances in parallel) by openAI.

Think about what that means. 1 A100 is 25k. The cluster meta uses is 2048 of them. So about 50 million.

Why would you not go for the most powerful model possible as soon as you can? Either the world's largest tech giant is about to lose it all, or they are going to put the proportional effort in.

[-]Matthew Barnett3y52

As I said it's ridiculous to think someone either in the Google or OAI camp won't have more than 1 billion USD in training hardware, in service for a single model (training many instances in parallel) by openAI.

I think you're reading this condition incorrectly. The $1 billion would need to be spent for a single model. If OpenAI buys a $2 billion supercomputer but they train 10 models with it, that won't necessarily qualify.

[-][anonymous]3y10

Then why did you add the term? I assume you meant that the entire supercomputer is working on instances of the same model at once. Obviously training is massively parallel.

Once the model is done obviously the supercomputer will be used for other things.

[-]Evan R. Murphy3y70

I congratulate Nathan Helm-Burger and Tomás B. for taking the other side of the bet.

Just for the record, I also took your bet. ;)

[-]Matthew Barnett3y164

Congratulations. However, unless I'm mistaken, you simply said you'd be open to taking the bet. We didn't actually take it with you, did we?

[-]Evan R. Murphy3y73

Yea, I guess I was a little unclear on whether your post constituted a bet offer where people could simply reply to accept as I did, or if you were doing specific follow-up to finalize the bet agreements. I see you did do that with Nathan and Tomás, so it makes sense you didn't view our bet as on. It's ok, I was more interested in the epistemic/forecasting points than the $1,000 anyway. ;)

I commend you for following up and for your great retrospective analysis of the benchmark criteria. Even though I offered to take your bet, I didn't realize just how problematic the benchmark criteria were for your side of the bet.

Most importantly, it's disquieting and bad news that long timelines are looking increasingly implausible. I would have felt less worried about a world where you were right about that.

[-]Ahmdal Oberth3y50

Darn. Who should I defer to now if I want to believe longer timelines?

[-]Ben Pace3y110

lol

[-]Lone Pine3y41

Wild that this bet lasted less than a year.

If you were interested in rebetting, maybe you can make the threshold 3 or 4 items.

[-]Review Bot1y10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

134

Conceding a short timelines bet early

134

134