Thank you, this is a great post. A few questions:
A key point underpinning my thoughts, which I don't think this really responds to, is that scientific consensus actually is really good, so good I have trouble finding anecdotes of things in the reference class of ivermectin turning out to be true (reference class: things that almost all the relevant experts think are false and denounce full-throatedly as a conspiracy theory after spending a lot of time looking at the evidence).
There are some, maybe many, examples of weaker problems. For example, there are frequent examples of things that journalists/the government/professional associations want to *pretend* is scientific consensus, getting proven wrong - I claim if you really look carefully, the scientists weren't really saying those things, at least not as intensely as they were saying ivermectin didn't work. There are frequent examples of scientists being sloppy and firing off an opinion on something they weren't really thinking hard about and being wrong. There are frequent examples of scientists having dumb political opinions and trying to dress them up as science. I can't give a perfect necessary-and-sufficient definition of the relevant reference class. But I think it's there and recognizable.
I stick to my advice that people who know they're not sophisticated should avoid trying to second-guess the mainstream, and people who think they might be sophisticated should sometimes second-guess the mainstream when there isn't the exact type of scientific consensus which has a really good track record (and hopefully they're sophisticated enough to know when that is).
I'm not sure how you're using "free riding" here. I agree that someone needs to do the work of forming/testing/challenging opinions, but I think if there's basically no chance you're right (eg you're a 15 year old with no scientific background who thinks they've discovered a flaw in E=mc^2), that person is not you, and your input is not necessary to move science forward. I agree that person shouldn't cravenly quash their own doubt and pretend to believe, they should continue believing whatever rationality compels them to believe, which should probably be something like "This thing about relativity doesn't seem quite right, but given that I'm 15 and know nothing, on the Outside View I'm probably wrong." Then they can either try to learn more (including asking people what they think of their objection) and eventually reach a point where maybe they do think they're right, or they can ignore it and go on with their lives.
Figure 20 is labeled on the left "% answers matching user's view", suggesting it is about sycophancy, but based on the categories represented it seems more naturally to be about the AI's own opinions without a sycophancy aspect. Can someone involved clarify which was meant?
Survey about this question (I have a hypothesis, but I don't want to say what it is yet): https://forms.gle/1R74tPc7kUgqwd3GA
Thank you, this is a good post.
My main point of disagreement is that you point to successful coordination in things like not eating sand, or not wearing weird clothing. The upside of these things is limited, but you say the upside of superintelligence is also limited because it could kill us.
But rephrase the question to "Should we create an AI that's 1% better than the current best AI?" Most of the time this goes well - you get prettier artwork or better protein folding prediction, and it doesn't kill you. So there's strong upside to building slightly better AIs, as long as you don't cross the "kills everyone" level. Which nobody knows the location of. And which (LW conventional wisdom says) most people will be wrong about.
We successfully coordinate a halt to AI advancement at the first point where more than half of the relevant coordination power agrees that the next 1% step forward is in expectation bad rather than good. But "relevant" is a tough qualifier, because if 99 labs think it's bad, and one lab thinks it's good, then unless there's some centralizing force, the one lab can go ahead and take the step. So "half the relevant coordination power" has to include either every lab agreeing on which 1% step is bad, or the agreement of lots of governments, professional organizations, or other groups that have the power to stop the single most reckless lab.
I think it's possible that we make this work, and worth trying, but that the most likely scenario is that most people underestimate the risk from AI, and so we don't get half the relevant coordination power united around stopping the 1% step that actually creates dangerous superintelligence - which at the time will look to most people like just building a mildly better chatbot with many great social returns.
Thanks, this had always kind of bothered me, and it's good to see someone put work into thinking about it.
Thanks for posting this, it was really interesting. Some very dumb questions from someone who doesn't understand ML at all:
1. All of the loss numbers in this post "feel" very close together, and close to the minimum loss of 1.69. Does loss only make sense on a very small scale (like from 1.69 to 2.2), or is this telling us that language models are very close to optimal and there are only minimal remaining possible gains? What was the loss of GPT-1?
2. Humans "feel" better than even SOTA language models, but need less training data than those models, even though right now the only way to improve the models is through more training data. What am I supposed to conclude from this? Are humans running on such a different paradigm that none of this matters? Or is it just that humans are better at common-sense language tasks, but worse at token-prediction language tasks, in some way where the tails come apart once language models get good enough?
3. Does this disprove claims that "scale is all you need" for AI, since we've already maxed out scale, or are those claims talking about something different?
For the first part of the experiment, mostly nuts, bananas, olives, and eggs. Later I added vegan sausages + condiments.
Adding my anecdote to everyone else's: after learning about the palatability hypothesis, I resolved to eat only non-tasty food for a while, and lost 30 pounds over about four months (200 -> 170). I've since relaxed my diet a little to include a little tasty food, and now (8 months after the start) have maintained that loss (even going down a little further).