I agree with everyone else pointing out that centrally-planned guaranteed payments regardless of final outcome doesn't sound like a good price discovery mechanism for insurance. You might be able to hack together a better one using https://www.lesswrong.com/posts/dLzZWNGD23zqNLvt3/the-apocalypse-bet , although I can't figure out an exact mechanism.
Superforecasters say the risk of AI apocalypse before 2100 is 0.38%. If we assume whatever price mechanism we come up with tracks that, and value the world at GWP x 20 (this ignores the value of human life, so it's a vast underestimate), and that AI companies pay it in 77 equal yearly installments from now until 2100, that's about $100 billion/year. But this seems so Pascalian as to be almost cheating. Anybody whose actions have a >1/25 million chance of destroying the world would owe $1 million a year in insurance (maybe this is fair and I just have bad intuitions about how high 1/25 million really is)
An AI company should be able to make some of its payments (to the people whose lives it risks, in exchange for the ability to risk those lives) by way of fractions of the value that their technology manages to capture. Except, that's complicated by the fact that anyone doing the job properly shouldn't be leaving their fingerprints on the future. The cosmic endowment is not quite theirs to give (perhaps they should be loaning against their share of it?).
This seems like such a big loophole as to make the plan almost worthless. Suppose OpenAI said "If we create superintelligence, we're going to keep 10% of the universe for ourselves and give humanity the other 90%" (this doesn't seem too unfair to me, and the exact numbers don't matter for the argument). It seems like instead of paying insurance, they can say "Okay, fine, we get 9% and you get 91%" and this would be in some sense a fair trade (one percent of the cosmic endowment is worth much more than $100 billion!) But this also feels like OpenAI moving some numbers around on an extremely hypothetical ledger, not changing anything in real life, and continuing to threaten the world just as much as before.
But if you don't allow a maneuver like this, it seems like you might ban (through impossible-to-afford insurance) some action that has an 0.38% chance of destroying the world and a 99% chance of creating a perfect utopia forever.
There are probably economic mechanisms that solve all these problems, but this insurance proposal seems underspecified.
Thanks, this makes more sense than anything else I've seen, but one thing I'm still confused about:
If the factions were Altman-Brockman-Sutskever vs. Toner-McCauley-D'Angelo, then even assuming Sutskever was an Altman loyalist, any vote to remove Toner would have been tied 3-3. I can't find anything about tied votes in the bylaws - do they fail? If so, Toner should be safe. And in fact, Toner knew she (secretly) had Sutskever on her side, and it would have been 4-2. If Altman manufactured some scandal, the board could have just voted to ignore it.
So I still don't understand "why so abruptly?" or why they felt like they had to take such a drastic move when they held all the cards (and were pretty stable even if Ilya flipped).
Other loose ends:
Thanks for this, consider me another strong disagreement + strong upvote.
I know a nonprofit which had a tax issue - they were financially able and willing to pay, but for complicated reasons paying would have caused them legal damage in other ways and they keep kicking the can down the road until some hypothetical future when these are solved. I can't remember if the nonprofit is now formally dissolved or just effectively defunct, but the IRS keeps sending nasty letters to the former board members and officers.
Do you know anything about a situation like this? Does the IRS ever pursue board members / founders / officers for a charity's nonpayment? Assuming the nonprofit has no money and never will have money again, are there any repercussions for the people involved if they don't figure out a legal solution and just put off paying the taxes until the ten year deadline?
(it would be convenient if yes, but this would feel surprising - otherwise you could just start a corporation, not pay your taxes the first year, dissolve it, start an identical corporation the second year, and so on.)
Also, does the IRS acknowledge the ten-year deadline enough that they will stop threatening you after ten years, or would the board members have to take them to court to make the letters stop?
Thank you, this is a great post. A few questions:
A key point underpinning my thoughts, which I don't think this really responds to, is that scientific consensus actually is really good, so good I have trouble finding anecdotes of things in the reference class of ivermectin turning out to be true (reference class: things that almost all the relevant experts think are false and denounce full-throatedly as a conspiracy theory after spending a lot of time looking at the evidence).
There are some, maybe many, examples of weaker problems. For example, there are frequent examples of things that journalists/the government/professional associations want to *pretend* is scientific consensus, getting proven wrong - I claim if you really look carefully, the scientists weren't really saying those things, at least not as intensely as they were saying ivermectin didn't work. There are frequent examples of scientists being sloppy and firing off an opinion on something they weren't really thinking hard about and being wrong. There are frequent examples of scientists having dumb political opinions and trying to dress them up as science. I can't give a perfect necessary-and-sufficient definition of the relevant reference class. But I think it's there and recognizable.
I stick to my advice that people who know they're not sophisticated should avoid trying to second-guess the mainstream, and people who think they might be sophisticated should sometimes second-guess the mainstream when there isn't the exact type of scientific consensus which has a really good track record (and hopefully they're sophisticated enough to know when that is).
I'm not sure how you're using "free riding" here. I agree that someone needs to do the work of forming/testing/challenging opinions, but I think if there's basically no chance you're right (eg you're a 15 year old with no scientific background who thinks they've discovered a flaw in E=mc^2), that person is not you, and your input is not necessary to move science forward. I agree that person shouldn't cravenly quash their own doubt and pretend to believe, they should continue believing whatever rationality compels them to believe, which should probably be something like "This thing about relativity doesn't seem quite right, but given that I'm 15 and know nothing, on the Outside View I'm probably wrong." Then they can either try to learn more (including asking people what they think of their objection) and eventually reach a point where maybe they do think they're right, or they can ignore it and go on with their lives.
Figure 20 is labeled on the left "% answers matching user's view", suggesting it is about sycophancy, but based on the categories represented it seems more naturally to be about the AI's own opinions without a sycophancy aspect. Can someone involved clarify which was meant?
Survey about this question (I have a hypothesis, but I don't want to say what it is yet): https://forms.gle/1R74tPc7kUgqwd3GA
Thank you, this is a good post.
My main point of disagreement is that you point to successful coordination in things like not eating sand, or not wearing weird clothing. The upside of these things is limited, but you say the upside of superintelligence is also limited because it could kill us.
But rephrase the question to "Should we create an AI that's 1% better than the current best AI?" Most of the time this goes well - you get prettier artwork or better protein folding prediction, and it doesn't kill you. So there's strong upside to building slightly better AIs, as long as you don't cross the "kills everyone" level. Which nobody knows the location of. And which (LW conventional wisdom says) most people will be wrong about.
We successfully coordinate a halt to AI advancement at the first point where more than half of the relevant coordination power agrees that the next 1% step forward is in expectation bad rather than good. But "relevant" is a tough qualifier, because if 99 labs think it's bad, and one lab thinks it's good, then unless there's some centralizing force, the one lab can go ahead and take the step. So "half the relevant coordination power" has to include either every lab agreeing on which 1% step is bad, or the agreement of lots of governments, professional organizations, or other groups that have the power to stop the single most reckless lab.
I think it's possible that we make this work, and worth trying, but that the most likely scenario is that most people underestimate the risk from AI, and so we don't get half the relevant coordination power united around stopping the 1% step that actually creates dangerous superintelligence - which at the time will look to most people like just building a mildly better chatbot with many great social returns.
Thanks, this had always kind of bothered me, and it's good to see someone put work into thinking about it.