Several unimpressive tasks, with my associated P(GPT-4 can't do it):
@
s from start to finish).I'm happy to operationalize and bet on any of these, taking the "GPT-4 can't do it" side.
I'd be interested to hear thoughts on this argument for optimism that I've never seen anybody address: if we create a superintelligent AI (which will, by instrumental convergence, want to take over the world), it might rush, for fear of competition. If it waits a month, some other superintelligent AI might get developed and take over / destroy the world; so, unless there's a quick safe way for the AI to determine that it's not in a race, it might need to shoot from the hip, which might give its plans a significant chance of failure / getting caught?
Counterarguments I can generate:
Log of my attempts so far:
Attempt #1: note that, for any probability p, you can compute "number of predictions you made with probability less than p that came true". If you're perfectly-calibrated, then this should be a random variable with:
mean = sum(q for q in prediction_probs if q<p)
variance = sum(q*(1-q) for q in prediction_probs if q<p)
Let's see what this looks like if we plot it as a function of p. Let's consider three people:
Let's have each person make 1000 predictions with probabilities uniformly distributed in [0,1]; and then sample outcomes for each set of predictions and plot out their num-true-predictions-below functions. (The gray lines show the mean and first 3 stdev intervals for a perfectly calibrated predictor.)
Hrrm. The y-axis is too big to see the variation, Let's subtract off the mean.
And to get a feeling for how else this plot could have looked, let's run 100 more simulations for each the three people:
Okay, this is pretty good!
But it's not perfect: everything's too squished together on the left to see what's happening -- a predictor could be really screwing up their very-low-probability predictions and this graph would hide it. Possibly related to that squishing, I feel like the plot should be right-left symmetric, to reflect the symmetries of the predictors' biases. But it's not.
Attempt #2: the same thing, except instead of plotting
sum((1 if came_true else 0) for q in prediction_probs if q<p)
we plot
sum(-log(prob you assigned to the correct outcome) for q in prediction_probs if q<p)
i.e. we measure the total "surprisal" for all your predictions with probability under p. (I'm very fond of surprisal; it has some very appealing information-theory-esque properties.)
On the bright side, this plot has less overlap between the three predictors' typical sets of lines. And the red curves look... more symmetrical, kinda, like an odd function, if you squint. Same for the blue curves.
On the dark side, everything is still too squished together on the left. (I think this is a problem inherent to any "sum(... for q in prediction_probs if q<p)" function. I tried normalizing everything in terms of stdevs, but it ruined the symmetry and made everything kinda crazy on the left-hand side.)
Plot of global infant mortality rate versus time.
I donated for some nonzero X:
My attempted condensation, in case it helps future generations (or in case somebody wants to set me straight): here's my understanding of the "pay $0.50 to win $1.10 if you correctly guess the next flip of a coin that's weighted either 40% or 60% Heads" game:
You, a traditional Bayesian, say, "My priors are 50/50 on which bias the coin has. So, I'm playing this single-player 'game':
"I see that my highest-EV option is to play, betting on either H or T, doesn't matter."
Perry says, "I'm playing this zero-sum multi-player game, where my 'Knightian uncertainty' represents a layer in the decision tree where the Devil makes a decision:
"By minimax, I see that my highest-EV option is to not play."
...and the difference between Perry and Caul seems purely philosophical: I think they always make the same decisions.
I regret to report that I goofed the scheduling, and will be out of town, but @Orborde will be there to run the show! Sorry to miss you. Next time!
you say that IVF costs $12k and surrogacy costs $100k, but also that surrogacy is only $20k more than IVF? That doesn't add up to me.
Ah, yes, this threw me too! I think @weft is right that (a) I wasn't accounting for multiple cycles of IVF being necessary, and (b) medical expenses etc. are part of the $100k surrogacy figure.
sperm/egg donation are usually you getting paid to give those things
Thanks for revealing that I wrote this ambiguously! The figures in the book are for receiving donated eggs/sperm. (Get inseminated for $355, get an egg implanted in you for $10k.)
Ooh, you raise a good point, Caplan gives $12k as the per-cycle cost of IVF, which I failed to factor in. I will edit that in. Thank you for your data!
And you're right that medical expenses are part of the gap: the book says the "$100k" figure for surrogacy includes medical expenses (which you'd have to pay anyway) and "miscellaneous" (which... ???).
So, if we stick with the book's "$12k per cycle" figure, times an average of maybe 2 cycles, that gives $24k, which still leaves a $56k gap to be explained. Conceivably, medical expenses and "miscellaneous" could fill that gap? I'm sure you know better than I!
I am thinking of mazes as complicated as the top one here! And few-shot is perfectly okay.
(I'd be flabbergasted if it could solve an ascii-art maze "in one step" (i.e. I present the maze in a prompt, and GPT-4 just generates a stream of tokens that shows the path through the maze). I'd accept a program that iteratively runs GPT-4 on several prompts until it considers the maze "solved," as long as it was clear that the maze-solving logic lived in GPT-4 and not the wrapper program.)