'Empiricism!' as Anti-Epistemology

It is not the case that an observation of things happening in the past automatically translates into a high probability of them continuing to happen. Solomonoff Induction actually operates over possible programs that generate our observation set (and in extension, the observable universe), and it may or not may not be the case that the simplest universe is such that any given trend persists into the future. There are no also easy rules that tell you when this happens; you just have to do the hard work of comparing world models.

I'm not sure the post says sufficiently many other things to justify its length.

[-]Drake Morrison2y6855

If you already have the concept, you only need a pointer. If you don't have the concept, you need the whole construction. ^[1]

^{^}
Related: Sazen and Wisdom Cannot Be Unzipped

[-]kave2y*4351

I sometimes like things being said in a long way. Mostly that's just because it helps me stew on the ideas and look at them from different angles. But also, specifically, I liked the engagement with a bunch of epistemological intuitions and figuring out what can be recovered from them. I like in particular connecting the "trend continues" trend to the redoubtable "electron will weigh the same tomorrow" intuition.

(I realise you didn't claim there was nothing else in the dialogue, just not enough to justify the length)

[-]Olli Järviniemi2y3612

I strongly emphasize with "I sometimes like things being said in a long way.", and am in general doubtful of comments like "I think this post can be summarized as [one paragraph]".

(The extreme caricature of this is "isn't your post just [one sentence description that strips off all nuance and rounds the post to the closest nearby cliche, completely missing the point, perhaps also mocking the author about complicating such a simple matter]", which I have encountered sometimes.)

Some of the most valuable blog posts I have read have been exactly of the form "write a long essay about a common-wisdom-ish thing, but really drill down on the details and look at the thing from multiple perspectives".

Some years back I read Scott Alexander's I Can Tolerate Anything Except The Outgroup. For context, I'm not from the US. I was very excited about the post and upon reading it hastily tried to explain it to my friends. I said something like "your outgroups are not who you think they are, in the US partisan biases are stronger than racial biases". The response I got?

"Yeah I mean the US partisan biases are really extreme.", in a tone implying that surely nothing like that affects us in [country I li... (read more)

[-]cubefox2y1415

I'll add that sometimes, there is a big difference between verbally agreeing with a short summary, even if it is accurate, and really understanding and appreciating it and its implications. That often requires long explanations with many examples and looking at the same issue from various angles. The two Scott Alexander posts you mentioned are a good example.

4Said Achmiz2y

If the post describes a method for analyzing a situation, and that described method is not in fact the correct method for analyzing that situation (and is actually much worse than the correct method), then this is a problem with the post. (Also, your description of my approach as “appealing very concretely to the object level”, and your corresponding dismissal of that approach, is very ironic! The post, in essence, argues precisely for appealing concretely to the object level; but then if we actually do that, as I demonstrated, we render the post moot.)

[-]Shankar Sivarajan2y220

For even more brevity with no loss of substance:

A turkey gets fed every day, right up until it's slaughtered before Thanksgiving.

2M. Y. Zuo2y

The shorter the better. Or as Lao Tzu said, Those who know don’t talk. Those who talk don’t know…

[-]green_leaf2y188

Nobody would understand that.

This sort of saying-things-directly doesn't usually work unless the other person feels the social obligation to parse what you're saying to the extent they can't run away from it.

[-]cubefox2y112

Yeah, but I do actually think this paragraph is wrong on the existence of easy rules. It is a bit like saying: There are only the laws of fundamental physics, don't bother with trying to find high level laws, you just have to do the hard work of learning to apply fundamental physics when you are trying to understand a pendulum or a hot gas. Or biology.

Similarly, for induction there are actually easy rules applicable to certain domains of interest. Like Laplace's rule of succession, which assumes random i.i.d. sampling. Which implies the sample distribution tends to resemble the population distribution. The same assumption is made by supervised learning about the training distribution, which works very well in many cases. There are other examples like the Lindy effect (mentioned in another comment) and various popular models in statistics. Induction heads also come to mind.

Even if there is just one, complex, fully general method applicable to science or induction, there may still exist "easy" specialized methods, with applicability restricted to a certain domain.

6Rob Lucas2y

This reminds me of a bit from Feynman's Lectures on Physics: "What is this law of gravitation? It is that every object in the universe attracts every other object with a force which for any two bodies is proportional to the mass of each and varies inversely as the square of the distance between them. This statement can be expressed mathematically by the equation F=Gmm'/r^2. If to this we add the fact that an object responds to a force by accelerating in the direction of the force by an amount that is inversely proportional to the mass of the object, we shall have said everything required, for a sufficiently talented mathematician could then deduce all the consequences of these two principles." [emphasis added] Like Feynman, however, I think his next sentence is important: "However, since you are not assumed to be sufficiently talented yet, we shall discuss the consequences in more detail, and not just leave you with these two bare principles."

4dr_s2y

I think you could, but then it would be unintelligible to most people who don't know wtf is Solomonoff Induction. The Ponzi Pyramid scheme IMO is sn excellent framework, but the post still suffers from a certain, eh, lack of conciseness. I think you could make the point a lot more simply with just a few exchanges from the first section and anyone worth their salt will absolutely get the spirit of the point.

2AnthonyC2y

Yes on the overall gist, and I feel like most of the rest of the post is trying to define the word "things" more precisely. The Spokesperson things "past annual returns of a specific investment opportunity" are a "thing." The Scientist thinks this is not unreasonable, but that "extrapolations from established physical theories I'm familiar with" are more of a "thing." The Epistemologist says only the most basic low-level facts we have, taken as a whole set, are a "thing" and we would ideally reason from all of them without drawing these other boundaries with too sharp and rigid a line. Or at least, that in places where we disagree about the nature of the "things," that's the direction in which we should move to settle the disagreement.

[-]ryan_greenblatt2y5721

I find this essay interesting as a case study in discourse and argumentation norms. Particularly as a case study of issues with discourse around AI risk.

When I first skimmed this essay when it came out, I thought it was ok, but mostly uninteresting or obvious. Then, on reading the comments and looking back at the body, I thought it did some pretty bad strawmanning.

I reread the essay yesterday and now I feel quite differently. Parts (i), (ii), and (iv) which don't directly talk about AI are actually great and many of the more subtle points are pretty well executed. The connection to AI risk in part (iii) is quite bad and notably degrades the essay as a whole. I think a well-executed connection to AI risk would have been good. Part (iii) seems likely to contribute to AI risk being problematically politicized and negatively polarized (e.g. low quality dunks and animosity). Further, I think this is characteristic of problems I have with the current AI risk discourse.

In parts (i), (ii), and (iv), it is mostly clear that the Spokesperson is an exaggerated straw person who doesn't correspond to any particular side of an issue. This seems like a reasonable rhetorical move to better explain... (read more)

[-]ryan_greenblatt2y153

Another way to put this is that posts should often discuss their limitations, particular when debunking bad arguments that are similar to more reasonable arguments.

I think discussing limitations clearly is a reasonable norm for scientific papers that reduces the extent to which people intentionally or unintentionally get away with implying their results prove more than they do.

[-]ryan_greenblatt2y115

What are the close-by arguments that are actually reasonable? Here is a list of close-by arguments (not necessarily endorsed by me!):

On empirical updates from current systems: If current AI systems are broadly pretty easy to steer and there is good generalization of this steering, that should serve as some evidence that future more powerful AI systems will also be relatively easier to steer. This will help prevent concerns like scheming from arising in the first place or make these issues easier to remove.
- This argument holds to some extent regardless of whether current AIs are smart enough to think through and successfully execute scheming strategies. For instance, imagine we were in a world where steering current AIs was clearly extremely hard: AIs would quickly overfit and goodhart training processes, RLHF was finicky and had terrible sample efficiency, and AIs were much worse at sample efficiently updating on questions about human deontological constraints relative to questions about how to successfully accomplish other tasks. In such a world, I think we should justifiably be more worried about future systems.
- And in fact, people do argue about how hard it is to steer curren

... (read more)

-4Noosphere892y

I basically endorse argument 1, and one other update you haven't mentioned but which is important is that the values of a human turn out to be less complicated and fragile, and more generalizable than people thought (this is because human values data is likely a small part of GPT-4, and yet it can correctly answer a lot of morality questions, and I think LLMs are genuinely learning new regularities here, so they can generalize from their training data). Implications for AI risk of course abound.

[-]Daniel Kokotajlo2y4715

This part resonates with me; my experience in philosophy of science + talking to people unfamiliar with philosophy of science also led me to the same conclusion:

"You talk it out on the object level," said the Epistemologist. "You debate out how the world probably is. And you don't let anybody come forth with a claim that Epistemology means the conversation instantly ends in their favor."
"Wait, so your whole lesson is simply 'Shut up about epistemology'?" said the Scientist.
"If only it were that easy!" said the Epistemologist. "Most people don't even know when they're talking about epistemology, see? That's why we need Epistemologists -- to notice when somebody has started trying to invoke epistemology, and tell them to shut up and get back to the object level."

The main benefit of learning about philosophy is to protect you from bad philosophy. And there's a ton of bad philosophy done in the name of Empiricism, philosophy masquerading as science.

9Chris_Leong2y

Very Wittgensteinian: [...]

[-]TurnTrout2y3512

This scans as less "here's a helpful parable for thinking more clearly" and more "here's who to sneer at" -- namely, at AI optimists. Or "hopesters", as Eliezer recently called them, which I think is a play on "huckster" (and which accords with this essay analogizing optimists to Ponzi scheme scammers).

I am saddened (but unsurprised) to see few others decrying the obvious strawmen:

what if [the optimists] cried 'Unfalsifiable!' when we couldn't predict whether a phase shift would occur within the next two years exactly?
...
"But now imagine if -- like this Spokesperson here -- the AI-allowers cried 'Empiricism!', to try to convince you to do the blindly naive extrapolation from the raw data of 'Has it destroyed the world yet?' or 'Has it threatened humans? no not that time with Bing Sydney we're not counting that threat as credible'."

Thinly-veiled insults:

Nobody could possibly be foolish enough to reason from the apparently good behavior of AI models too dumb to fool us or scheme, to AI models smart enough to kill everyone; it wouldn't fly even as a parable, and would just be confusing as a metaphor.

and insinuations of bad faith:

What if, when you tried to reason about why the mo

... (read more)

[-]habryka2y26-4

I don't think this essay is commenting on AI optimists in-general. It is commenting on some specific arguments that I have seen around, but I don't really see how it relates to the recent stuff that Quintin, Nora or you have been writing (and I would be reasonably surprised if Eliezer intended it to apply to that).

You can also leave it up to the reader to decide whether and when the analogy discussed here applies or not. I could spend a few hours digging up people engaging in reasoning really very closely to what is discussed in this article, though by default I am not going to.

[-]Martin Randall2y1723

Ideally Yudkowsky would have linked to the arguments he is commenting on. This would demonstrate that he is responding to real, prominent, serious arguments, and that he is not distorting those arguments. It would also have saved me some time.

But now imagine if -- like this Spokesperson here -- the AI-allowers cried 'Empiricism!', to try to convince you to do the blindly naive extrapolation from the raw data of 'Has it destroyed the world yet?'

The first hit I got searching for "AI risk empiricism" was Ignore the Doomers: Why AI marks a resurgence of empiricism. The second hit was AI Doom and David Hume: A Defence of Empiricism in AI Safety, which linked Anthropic's Core Views on AI Safety. These are hardly analogous to the Spokesman's claims of 100% risk-free returns.

Next I sampled several Don't Worry about the Vase AI newsletters and "some people are not so worried". I didn't really see any cases of blindly naive extrapolation from the raw data of 'Has AI destroyed the world yet?'. I found Alex Tabarrok saying "I want to see that the AI baby is dangerous before we strangle it in the crib.". I found Jacob Buckman saying "I'm Not Worried About An AI Apocalypse". These things are... (read more)

[-]Zack_M_Davis2y1613

saddened (but unsurprised) to see few others decrying the obvious strawmen

In general, the "market" for criticism just doesn't seem very efficient at all! You might have hoped that people would mostly agree about what constitutes a flaw, critics would compete to find flaws in order to win status, and authors would learn not to write posts with flaws in them (in order to not lose status to the critics competing to point out flaws).

I wonder which part of the criticism market is failing: is it more that people don't agree about what constitutes a flaw, or that authors don't have enough of an incentive to care, or something else? We seem to end up with a lot of critics who specialize in detecting a specific kind of flaw ("needs examples" guy, "reward is not the optimization target" guy, "categories aren't arbitrary" guy, &c.), with very limited reaction from authors or imitation by other potential critics.

5kave2y

My quick guess is that people don't agree about what constitutes a (relevant) flaw. (And there are lots of irrelevant flaws so you can't just check for the existence of any flaws at all). I think if people could agree, the authorial incentives would follow. I'm fairly sympathetic to the idea that readers aren't incentivised to correctly agree on what consitutes a flaw.

[-]tailcalled2y14-5

Apparently Eliezer decided to not take the time to read e.g. Quintin Pope's actual critiques, but he does have time to write a long chain of strawmen and smears-by-analogy.

A lot of Quintin Pope's critiques are just obviously wrong and lots of commenters were offering to help correct them. In such a case, it seems legitimate to me for a busy person to request that Quintin sorts out the problems together with the commenters before spending time on it. Even from the perspective of correcting and informing Eliezer, people can more effectively be corrected and informed if their attention is guided to the right place, with junk/distractions removed.

(Note: I mainly say this because I think the main point of the message you and Quintin are raising does not stand up to scrutiny, and so I mainly think the value the message can provide is in certain technical corrections that you don't emphasize as much, even if strictly speaking they are part of your message. If I thought the main point of your message stood up to scrutiny, I'd also think it would be Eliezer's job to realize it despite the inconvenience.)

[-]Quintin Pope2y1317

I stand by pretty much everything I wrote in Objections, with the partial exception of the stuff about strawberry alignment, which I should probably rewrite at some point.

Also, Yudkowsky explained exactly how he'd prefer someone to engage with his position "To grapple with the intellectual content of my ideas, consider picking one item from "A List of Lethalities" and engaging with that.", which I pointed out I'd previously done in a post that literally quotes exactly one point from LoL and explains why it's wrong. I've gotten no response from him on that post, so it seems clear that Yudkowsky isn't running an optimal 'good discourse promoting' engagement policy.

I don't hold that against him, though. I personally hate arguing with people on this site.

[-]Eliezer Yudkowsky2y15-2

Unless I'm greatly misremembering, you did pick out what you said was your strongest item from Lethalities, separately from this, and I responded to it. You'd just straightforwardly misunderstood my argument in that case, so it wasn't a long response, but I responded. Asking for a second try is one thing, but I don't think it's cool to act like you never picked out any one item or I never responded to it.

EDIT: I'm misremembering, it was Quintin's strongest point about the Bankless podcast. https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky?commentId=cr54ivfjndn6dxraD

6tailcalled2y

I'm kind of ambivalent about this. On the one hand, when there is a misunderstanding, but he claims his argument still goes through after correcting the misunderstanding, it seems like you should also address that corrected form. On the other hand, Quintin Pope's correction does seem very silly. At least by my analysis: [...] This approach considers only the things OpenAI could do with their current ChatGPT setup, and yes it's correct that there's not much online learning opportunity in this. But that's precisely why you'd expect GPT+DPO to not be the future of AI; Quintin Pope has clearly identified a capabilities bottleneck that prevents it from staying fully competitive. (Note that humans can learn even if there is a fraction of people who are sharing intentionally malicious information, because unlike GPT and DPO, humans don't believe everything we're told.) A more autonomous AI could collect actionable information at much greater scale, as it wouldn't be dependent on trusting its users for evaluating what information to update on, and it would have much more information about what's going on than the chat-based I/O. This sure does look to me like a huge bottleneck that's blocking current AI methods, analogous to the evolutionary bottleneck: The full power of the AI cannot be used to accumulate OOM more information to further improve the power of the AI.

9Noosphere892y

My main disagreement is that I actually do think that at least some of the critiques are right here. In particular, the claims that Quintin Pope is making that I think are right is that evolution is extremely different from how we train our AIs, and thus none of the inferences that work under an evolution model work under the AIs under consideration, which importantly includes a lot of analogies to apes/Neanderthals making smarter humans (which they didn't do, BTW.), which presumably failed to be aligned, ergo we can't align AI smarter than us. The basic issue though is that evolution doesn't have a purpose or goal, and thus the common claim that evolution failed to align humans to X thing is nonsensical, as it assumes a teleological goal that just does not exist in evolution, which is quite different from humans making AIs with particular goals in mind. Thus talk of an alignment problem between say chimps/Neanderthals and humans is entirely nonsensical. This is also why this generalized example of misgeneralization fails to work, since evolution is not a trainer or designer in the way that say. an OpenAI employee making AI would be, and thus there is no generalization error, since there wasn't a goal or behavior to purposefully generalize in the first place: [...] There are other problems with the analogy that Quintin Pope covered, like the fact that it doesn't actually capture misgeneralization correctly, since the ancient/modern human distinction is not the same as one AI doing a treacherous turn, or how the example of ice cream overwhelming our reward center isn't misgeneralization, but the fact that evolution has no purpose or goal is the main problem I see with a lot of evolution analogies. Another issue is that evolution is extremely inefficient at the timescales required, which is why dominant training methods for AI borrow little from evolution at best, and even from an AI capabilities perspective it's not really worth it to rerun evolution to get AI p

[-]Quintin Pope2y246

The basic issue though is that evolution doesn't have a purpose or goal

FWIW, I don't think this is the main issue with the evolution analogy. The main issue is that evolution faced a series of basically insurmountable, yet evolution-specific, challenges in successfully generalizing human 'value alignment' to the modern environment, such as the fact that optimization over the genome can only influence within lifetime value formation theough insanely unstable Rube Goldberg-esque mechanisms that rely on steps like "successfully zero-shot directing an organism's online learning processes through novel environments via reward shaping", or the fact that accumulated lifetime value learning is mostly reset with each successive generation without massive fixed corpuses of human text / RLHF supervisors to act as an anchor against value drift, or evolution having a massive optimization power overhang in the inner loop of its optimization process.

These issues fully explain away the 'misalignment' humans have with IGF and other intergenerational value instability. If we imagine a deep learning optimization process with an equivalent structure to evolution, then we could easily predi... (read more)

2Daniel Kokotajlo2y

I'm curious to hear more about this. Reviewing the analogy: Evolution, 'trying' to get general intelligences that are great at reproducing <--> The AI Industry / AI Corporations, 'trying' to get AGIs that are HHH Genes, instructing cells on how to behave and connect to each other and in particular how synapses should update their 'weights' in response to the environment <--> Code, instructing GPUs on how to behave and in particular how 'weights' in the neural net should update in response to the environment Brains, growing and learning over the course of lifetime <--> Weights, changing and learning over the course of training Now turning to your three points about evolution: 1. Optimizing the genome indirectly influences value formation within lifetime, via this unstable Rube Goldberg mechanism that has to zero-shot direct an organism's online learning processes through novel environments via reward shaping --> translating that into the analogy, it would be "optimizing the code indirectly influences value formation over the course of training, via this unstable Rube Goldberg mechanism that has to zero-shot direct the model's learning process through novel environments vai reward shaping... yep seems to check out. idk. What do you think? 2. Accumulated lifetime value learning is mostly reset with each successive generation without massive fixed corpuses of human text / RLHF supervisors --> Accumulated learning in the weights is mostly reset when new models are trained since they are randomly initialized; fortunately there is a lot of overlap in training environment (internet text doesn't change that much from model to model) and also you can use previous models as RLAIF supervisors... (though isn't that also analogous to how humans generally have a lot of shared text and culture that spans generations, and also each generation of humans literally supervises and teaches the next?) 3. Massive optimization power overhang in the inner loop of its optimization proce

2tailcalled2y

Can people who vote disagree also mark the parts they disagree with using reacts or something?

1Tapatakt2y

Do you think that if someone filtered and steelmanned Quintin's criticism, it would be valuable? (No promises)

4tailcalled2y

Yes. Filtering away mistakes, unimportant points, unnecessary complications, etc., from preexisting ideas is (as long as the core idea one extracts is good) a very general way to contribute value, because it makes the ideas involved easier to understand. Adding stronger arguments, more informative and accessible examples, etc. contributes value because then it shows what is more robust and gives more material to dig down into understanding it, and also because it clarifies why some people may find the idea attractive. Explanations for the changes, especially for the dropped things, can build value because it clarifies the consensus about what parts were wrong, and if Quintin disagrees with the removals, it provides signals to him about what he didn't clarify well enough. When these are done on a sufficiently important point, with sufficiently much skill, and maybe also with sufficiently much luck, this can in principle provide a ton of value, both because information in general is high-leverage due to being easily shareable, and because this particular form of information can help resolve conflicts and rebuild trust.

[-]Eliezer Yudkowsky2y134

If Quintin hasn't yelled "Empiricism!" then it's not about him. This is more about (some) e/accs.

2Eli Tyre2y

This is a definitely a tangent, and I don't want to detract from your more substantive points (about which I don't have as strong an opinion one way or the other). [...] I read this as a play on the word "Doomer", which is a term that is slightly derogatory, but mostly descriptive. My read of "hopester", without any additional context, is the same.

1Tapatakt2y

I think from Eliezer's point of view it goes kinda like this: 1. People can't see why the arguments of other side are invalid. 2. Eliezer tried to engage with them, but most listeners/readers can't tell who is right in this discussions. 3. Eliezer thinks that if he provides people with strawmenned versions of other side's arguments and refutation of this strawmenned arguments, then the chance that this people will see why he's right in the real discussion will go up. 4. Eliezer writes this discussion with strawmen as a fictional parable because otherwise it would be either dishonest and rude or a quite boring text with a lot of disclaimers. Or because it's just easier for him to write it this way. After reading this text at least one person (you) thinks that the goal "avoid dishonesty and rudeness" were not achieved, so text is a failure. After reading this text at least one person (me) thinks that 1. I got some useful ideas and models. 2. Of course, at least the smartest opponents of Eliezer have better arguments and I don't think Eliezer would disagree with that, so text is a success. Ideally, Eliezer should update his strategy of writing texts based on both pieces of evidence. I can be wrong, of course.

[-]Richard_Ngo2y354

"Well, since it's too late there," said the Scientist, "would you maybe agree with me that 'eternal returns' is a prediction derived by looking at observations in a simple way, and then doing some pretty simple reasoning on it; and that's, like, cool? Even if that coolness is not the single overwhelming decisive factor in what to believe?"
"Depends exactly what you mean by 'cool'," said the Epistemologist.

"Okay, let me give it a shot," said the Scientist. "Suppose you model me as having a bunch of subagents who make trades on some kind of internal prediction market. The whole time I've been watching Ponzi Pyramid Incorporated, I've had a very simple and dumb internal trader who has been making a bunch of money betting that they will keep going up by 20%. Of course, my mind contains a whole range of other traders too, so this one isn't able to swing the market by itself, but what I mean by 'cool' is that this trader does have a bunch of money now! (More than others do, because in my internal prediction markets, simpler traders start off with more money.)"

"The problem," said the Epistemologist, "is that you're in an adversarial context, where the observations you're seeing have ... (read more)

1Martín Soto2y

Cool connections! Resonates with how I've been thinking about intelligence and learning lately. Some more connections: [...] That's reward/exploration hacking. Although I do think most times we "look up some data" in real life it's not due to an internal heuristic / subagent being strategic enough to purposefully try and exploit others, but rather just because some earnest simple heuristics recommending to look up information have scored well in the past. [...] I think this doesn't always happen. As good as the internal traders might be, the agent sometimes needs to explore, and that means giving up some of the agent's money. [...] Here (starting at "Put in terms of Logical Inductors") I mention other "computational shortcuts" for inductors. Mainly, if two "categories of bets" seem pretty unrelated (they are two different specialized magisteria), then not having thick trade between them won't lose you out on much performance (and will avoid much computation). You can have "meta-traders" betting on which categories of bets are unrelated (and testing them but only sparsely, etc.), and use them to make your inductor more computationally efficient. Of course object-level traders already do this (decide where to look, etc.), and in the limit this will converge like a Logical Inductor, but I have the intuition this will converge faster (at least, in structured enough domains). This is of course very related to my ideas and formalism on meta-heuristics. [...] This adversarial selection is also a problem for heuristic arguments: Your heuristic estimator might be very good at assessing likelihoods given a list of heuristic arguments, but what if the latter has been selected against your estimator, top drive it in a wrong direction? Last time I discussed this with them (very long ago), they were just happy to pick an apparently random process to generate the heuristic arguments, that they're confident enough hasn't been tampered with. Something more ambitious would be

[-]Sheikh Abdur Raheem Ali2y192

As a direct result of reading this, I have changed my mind on an important, but private, decision.

[-]titotal2y13-1

The basic premise of this post is wrong, based on the strawman that an empiricist/scientist would only look at a single piece of information. You have the empiricist and scientists just looking at the returns on investment on bankmans scheme, and extrapolating blindly from there.

But an actual empiricist looks at all the empirical evidence. They can look the average rate of return of a typical investment, noting that this one is unusually high.They can learn how the economy works and figure out if there are any plausible mechanisms for this kind of economic returns. They can look up economic history, and note that Ponzi schemes are a thing that exists and happen reasonably often. From all the empirical evidence, the conclusion "this is a Ponzi scheme" is not particularly hard to arrive at.

Your "scientist" and "empricist" characters are neither scientists nor empiricists: they are blathering morons.

As for AI risk, you've successfully knocked down the very basic argument that AI must be safe because it hasn't destroyed us yet. But that is not the core of any skeptics argument that I know.

Instead, an actual empiricist skeptic might look at the actual empirical e... (read more)

5habryka2y

I don't think this essay is intended to make generalizations to all "Empiricists", scientists, and "Epistemologists". It's just using those names as a shorthand for three types of people (whose existence seems clear to me, though of course their character does not reflect everyone who might identify under that label).

[-]Said Achmiz2y130

Or:

“In the past, people who have offered such apparently-very-lucrative deals have usually been scammers, cheaters, and liars. And, in general, we have on many occasions observed people lying, scamming, cheating, etc. On the other hand, we have only very rarely seen such an apparently-very-lucrative deal turn out to actually be a good idea. Therefore, on the general principle that the future will be similar to the past, we predict a very high chance that Bernie is a cheating, lying scammer, and that this so-called ‘investment opportunity’ is fake.”

We thus defeat the Spokesperson’s argument on his own terms, without needing to get into abstractions or theory—and we do it in one paragraph.

This happens to also be precisely the correct approach to take in real life when faced with apparently-very-lucrative deals and investment opportunities (unless you have the time to carefully investigate, in great detail and with considerable diligence, all such deals that are offered to you).

[-]niplav2y5625

Ah, but there is some non-empirical cognitive work done here that is really relevant, namely the choice of what equivalence class to put Bernie Bankman into when trying to forecast. In the dialogue, the empiricists use the equivalence class of Bankman in the past, while you propose using the equivalence class of all people that have offered apparently-very-lucrative deals.

And this choice is in general non-trivial, and requires abstractions and/or theory. (And the dismissal of this choice as trivial is my biggest gripe with folk-frequentism—what counts as a sample, and what doesn't?)

2Said Achmiz2y

I disagree. It seems to me that this choice is, in general, pretty easy to make, and takes naught but common sense. Certainly that’s the case in the given example scenario. Of course there are exceptions, where the choice of reference class is trickier—but in general, no, it’s pretty easy. (Whether the choice “requires abstractions and/or theory” is another matter. Perhaps it does, in a technical sense. But it doesn’t particularly require talking about abstractions and/or theory, and that matters.)

9xpym2y

Sure, there is common sense, available to plenty of people, of which reference classes apply to Ponzi schemes (but, somehow, not to everybody, far from it). Yudkowsky's point, however, is that the issue of future AIs is entirely analogous, so people who disagree with him on this are as dumb as those taken in by Bernies and Bankmans. Which just seems empirically false - I'm sure that the proportion of AI doom skeptics among ML experts is much higher than that that of Ponzi believers among professional economists. So, if there is progress to be made here, it probably lies in grappling with whatever asymmetries are between these situations. Telling skeptics a hundredth time that they're just dumb doesn't look promising.

1Ben Livengood2y

I mean, the Spokesperson is being dumb, the Scientist is being confused. Most AI researchers aren't even being Scientists, they have different theoretical models than EY. But some of them don't immediately discount the Spokesperson's false-empiricism argument publicly, much like the Scientist tries not to. I think the latter pattern is what has annoyed EY and what he writes against here. However, a large number of current AI experts do recently seem to be boldly claiming that LLMs will never be sufficient for even AGI, not to mention ASI. So maybe it's also aimed at them a bit.

1xpym2y

Most likely as a part of the usual arguments-as-soldiers political dynamic. I do think that there's an actual argument to be made that we have much less empirical evidence regarding AIs compared to Ponzis, and plently of people on both sides of this debate are far too overconfident in their grand theories, EY very much included.

1Martin Randall2y

I agree that there is some non-empirical cognitive work to be done in choosing how to weight different reference classes. How much do we weight the history of Ponzi Pyramid Inc, the history of Bernie Bankman, the history of the stock market, and the history of apparently-very-lucrative deals? This is all useful work to do to estimate the risk of investing in PP Inc. However, the mere existence of other possible reference classes is sufficient to defeat the Spokesperson's argument, because it shows that his arguments lead to a contradiction.

8quetzal_rainbow2y

Apparently, the dialogue is happening in inverted world - Ponzi schemes have never happened here and everybody agrees on AI X-risk problem.

4Said Achmiz2y

Yes. (If it were otherwise, then the response would be even simpler: “oh, this is obviously just a Ponzi scheme”.)

4AnthonyC2y

Unfortunately in the world I live in, the same people who would accept "This is obviously a Ponzi scheme" (but who don't understand AI x-risk well) have to also contend with the fact that most people they hear talking about AI are indistinguishable (to them) from people talking about crypto as an investment, or about how transformative AI will lead to GDP doubling times dropping to years, months, or weeks. So, the same argument could be used to get (some of) them to dismiss the notion that AI could become that powerful at all with even less seeming-weirdness. Arguments that something has the form of a Ponzi scheme are, fortunately and unfortunately, not always correct. Some changes really do enable permanently (at least on the timescales the person thinks of as permanent) faster growth.

2Said Achmiz2y

I don’t say that you’re wrong, necessarily, but what would you say is an example of something that “has the form of a Ponzi scheme”, but is actually a change that enables permanently faster growth?

2AnthonyC2y

From the outside, depending on your level of detail of understanding, any franchise could look that way. Avon and Tupperware look a bit that way. Some MLM companies are more legitimate than others. From a more abstract point of view, I could argue that "cities" are an example. "Hey, send your kids to live here, let some king and his warriors be in charge, and give up your independence, and you'll all get richer!" It wasn't at all clear in the beginning how "Pay taxes and die of diseases!" was going to be good for anyone but the rulers, but the societies that did it more and better thrived and won.

6Said Achmiz2y

That… does not seem like a historically accurate account of the formation and growth of cities.

2AnthonyC2y

Yeah, you're right, but for most of history they were net population sinks that generated outsized investment returns. Today they're not population sinks because of sanitation etc. etc. I know I'm being imprecise and handwavy, so feel free to ignore me, but really my thought was just that lots of things look vaguely like ponzi schemes without getting into more details than most people are going to pay attention to.

2Simon Fischer2y

I think this would be a good argument against Said Achmiz's suggested response, but I feel the text doesn't completely support it, e.g. the Epistemologist says "such schemes often go through two phases" and "many schemes like that start with a flawed person", suggesting that such schemes are known to him.

4Said Achmiz2y

Even setting aside such textual anomalies, why is this a good argument? As I noted in a sibling comment to yours, my response assumes that Ponzi schemes have never happened in this world, because otherwise we’d simply identify the Spokesperson’s plan as a Ponzi scheme! The reasoning that I described is only necessary because we can’t say “ah, a Ponzi scheme”!

1Simon Fischer2y

Ah, I think there was a misunderstanding. I (and maybe also quetzal_rainbow?) thought that in the inverted world also no "apparently-very-lucrative deals" that turn out to be scams are known, whereas you made a distinction between those kind of deals and Ponzi schemes in particular. I think my interpretation is more in the spirit of the inversion, otherwise the Epistemologist should really have answered as you suggested, and the whole premise of the discussion (people seem to have trouble understanding what the Spokesperson is doing) is broken.

3Martin Randall2y

If I was living in a world where there are zero observed apparently-very-lucrative deals that turn out to be scams then I hope I would conclude that there is some supernatural Creator who is putting a thumb on the scale to be sure that cheaters never win and winners never cheat. So I would invest in Ponzi Pyramid Inc. I would not expect to be scammed, because this is a world where there are zero observed apparently-very-lucrative deals that turn out to be scams. I would aim to invest in a diversified portfolio of apparently-very-lucrative deals, for all the same reasons I have a diversified portfolio in this world. In such a world the Epistemologist is promoting a world model that does not explain my observations and I would not take their investment advice, similarly to how in this world I ignore investment advice from people who believe that the economy is secretly controlled by lizard people.

3Said Achmiz2y

If the premise is a world where nobody ever does any scams or tries to swindle anyone out of money, then it’s so far removed from our world that I don’t rightly know how to interpret any of the included commentary on human nature / psychology / etc. Lying for personal gain is one of those “human universals”, without which I wouldn’t even recognize the characters as anything resembling humans.

3aphyer2y

<trolling> The S&P500 has returned an average of ~8%/year for the past 30 years. As you say, we have on many occasions observed people lying, cheating, and scamming. But we have only rarely observed lucrative good ideas! Why, even banks, which claim much more safety and offer much lower returns than the stock market, have frequently gone bust! It follows inevitably, therefore, that there is a very high chance that the S&P 500, and the stock market in general, is a scam, and will steal all your money. It follows further that the only safe investment approach is to put all your money into something that you retain personal custody of. Like gold bars buried in your backyard! Or Bitcoin! </trolling>

2Said Achmiz2y

Well, here’s a question: what happens more often—stock market downturns, or banks going bust? [...] Now this is simply an invalid extrapolation. Note that I made no claims along these lines about what does or does not supposedly follow. Claims like “X reasoning is invalid” / “Y plan is unlikely to work” stand on their own; “what is the correct reasoning” / “what is a good plan” is a wholly separate question.

2Shankar Sivarajan2y

This is perfectly sound reasoning. What does applying it to people prophesying doom, arising from technological advance or otherwise, yield?

4Said Achmiz2y

Well, people prophesying doom in general have a pretty poor track record, so if that’s all we know, our prior should be that any such person is likely to be very wrong. Of course, most people throughout history who have prophesied doom have had in mind a religious sort of doom. People prophesying doom from technological advance specifically have a better track record. The Luddites were correct, for example. (Their chosen remedy left something to be desired, of course; but that is common, sadly. Identifying the problem does not, by itself, suffice to solve the problem.) And we’ve had quite a bit of doom from technological advance. Indeed, as technology has advanced, we’ve had more and more doom from that advance. So, on the whole, I’d say that applying the reasoning I describe to people prophesying doom from technological advance is that there is probably something to what they say, even if their specific predictions are not spot-on.

3Shankar Sivarajan2y

You consider some people's jobs being automated an instance of "doom"?

4Said Achmiz2y

This is in reference to the Luddites, I suppose? If so, “some people’s jobs being automated” is rather a glib description of the early effects of industrialization. There was considerable disruption and chaos, which, indeed, is “doom”, of more or less the sort that the Luddites predicted. (They never claimed that the world would end as a result of the new machines, as far as I know.)

[-]mike_hawke2y128

Only praise yourself as taking 'the outside view' if (1) there's only one defensible choice of reference class;

I think this point is underrated. The word "the" in "the outside view" is sometimes doing too much work, and it is often better to appeal to an outside view, or multiple outside views.

9Andrew McKnight2y

lukeprog argued similarly that we should drop the "the"

[-]Ben Pace, the Vacationing Vagabond2y120

Crossposted from where?

7Eli Tyre2y

Twitter. https://threadreaderapp.com/thread/1767710372306530562.html <= This link will take you to the thread, but NOT hosted on twitter.

3kave2y

Twitter

[-]Ben Pace, the Vacationing Vagabond2y118

K. I recommend that people include links for those of us who mostly do not read Twitter.

[-]Ben Pace, the Vacationing Vagabond6mo82Review for 2024 Review

+4. In some regards it's sad to have to rehash this argument, but I feel that this argument has been going around in the public discourse, and so it's worthwhile to write up a thorough account of what's naive about it and how to move past it. My sense is that it has become less prevalent since the essay; perhaps the essay helped.

Many folks have distaste for Eliezer's style, or for perhaps implying that a weak-man argument is fully representative of positions he disagrees with; I think some of these criticisms are valid but do not mean the essay isn't (a) p... (read more)

2Martin Randall5mo

It is written that [...] Is The Spokesperson a realistic villain? Has this argument been going around? In 2024, I wasn't able to find anyone making this argument. My sense is that it was not at all prevalent, and continues to be not at all prevalent. By analogy, Bernie Bankman is OpenAI (or other AI lab) and The Spokesperson is OpenAI's representatives. As far as I know, OpenAI were not making the argument in 2024 that OpenAI hasn't killed everyone and therefore they won't kill everyone in the future. Since 2024, AI has advanced substantially, so I asked Opus 4.5 for examples of people making this argument. It wasn't aware of any. Its first concrete suggestion was Andrew Ng: Fearing a rise of killer robots is like worrying about overpopulation on Mars from 2015. [...] That was a defensible position in 2015. With the benefit of hindsight it doesn't seem that "work on not turning AI evil" in 2015 was especially effective at altering our trajectory as a civilization, the main group who tried to do that work was MIRI, and while they argue it was worth doing the work, they admit that it didn't pan out. Regardless, Andrew Ng is not making The Spokesperson's argument, he specifically allows that killer robots could exist in the future, despite not existing in 2015. So I remain unaware of anyone making the argument with a straight face. If you disagree, I encourage you to find an example (or two!) and update me. Best existing rebuttal? There are certainly many people who act in various circumstances as if there will never be any surprises, but without actually saying things like "there will never be any surprises". So maybe we need a rebuttal to the blindness, rather than to the non-existent arguments. Thinking about the best other rebuttals to such blindness, I think Nassim Taleb covers this well as "tail risk blindness". Nassim Taleb is not to everyone's taste, I know, but he's a good writer on this topic. It may seem silly to talk about AI-caused extinction

6Steven Byrnes5mo

I feel like I see it pretty often. Check out “Unfalsifiable stories of doom”, for example. Or really, anyone who uses the phrase “hypothetical risk” or “hypothetical threat” as a conversation-stopper when talking about ASI extinction, is implicitly invoking the intuitive idea that we should by default be deeply skeptical of things that we have not already seen with our own eyes. [...] Obviously I agree that The Spokesperson is not going to sound realistic and sympathetic when he is arguing for “Ponzi Pyramid Incorporated” led by “Bernie Bankman”. It’s a reductio ad absurdum, showing that this style of argument proves too much. That’s the whole point.

[-]Martin Randall5mo*140

Thank you for the concrete example of Unfalsifiable Stories of Doom from Barnett et al in November 2025 I think there are several important differences between the two arguments. To avoid taking up too much of our time, I'm going to dwell on one in particular.

Dismiss or engage with theoretical arguments?

The Spokesperson in Empiricism! is dismissive of the entire concept of predicting the future using "words words words and thinking". Barnett et al are not. I think this is clearest in their engagement with IABIED's claim that AIs steer in alien directions that only mostly coincide with helpfulness. Here's the claim:

Modern AIs are pretty helpful (or at least not harmful) to most users, most of the time. But as we noted above, a critical question is how to distinguish an AI that deeply wants to be helpful and do the right thing, from an AI with weirder and more complex drives that happen to line up with helpfulness under typical conditions, but which would prefer other conditions and outcomes even more. ... This long list of cases look just like what the “alien drives” theory predicts, in sharp contrast with the “it’s easy to make AIs nice” theory that labs are eager to put forward.

Th... (read more)

[-]Vivek Hebbar1y64

I suspect there is some merit to the Scientist's intuition (and the idea that constant returns are more "empirical") which nobody has managed to explain well. I'll try to explain it here.^[1]

The Epistemologist's notion of simplicity is about short programs with unbounded runtime which perfectly explain all evidence. The [non-straw] empiricist notion of simplicity is about short programs with heavily-bounded runtime which approximately explain a subset of the evidence. The Epistemologist is right that there is nothing of value in the empiri... (read more)

6Vivek Hebbar1y

Slightly more spelled-out thoughts about bounded minds: 1. We can't actually run the hypotheses of Solomonoff induction. We can only make arguments about what they will output. 2. In fact, almost all of the relevant uncertainty is logical uncertainty. The "hypotheses" (programs) of Solomonoff induction are not the same as the "hypotheses" entertained by bounded Bayesian minds. I don't know of any published formal account of what these bounded hypotheses even are and how they relate to Solomonoff induction. But informally, all I'm talking about are ordinary hypotheses like "the Ponzi guy only gets money from new investors". 3. In addition to "bounded hypotheses" (of unknown type), we also have "arguments". An argument is a thing whose existence provides fallible evidence for a claim. 4. Arguments are made of pieces which can be combined "conjuctively" or "disjunctively". The conjunction of two subarguments is weaker evidence for its claim than each subargument was for its subclaim. This is the sense in which "big arguments" are worse.

[-]tailcalled2y31

You've got to think about what might be going on behind the scenes, in both cases.

But a tricky bit with AI is that it involves innovating fundamentally new ways of doing things. The methods we already have are not sufficient to create ASI, and also if you extrapolate out the SOTA methods at larger scale, it's genuinely not that dangerous. Rather with AI, we imagine that people will make up new things behind the scenes which is radically different from what we have so far, or that what we have so far will turn out to be much more powerful due to being radically different from how we understand it today.

3dxu2y

I think I like the disjunct “If it’s smart enough to be transformative, it’s smart enough to be dangerous”, where the contrapositive further implies competitive pressures towards creating something dangerous (as opposed to not doing that). There’s still a rub here—namely, operationalizing “transformative” in such a way as to give the necessary implications (both “transformative -> dangerous” and “not transformative -> competitive pressures towards capability gain”). This is where I expect intuitions to differ the most, since in the absence of empirical observations there seem multiple consistent views.

4tailcalled2y

That (on it's own, without further postulates) is a fully general argument against improving intelligence. We have to accept some level of danger inherent in existence; the question is what makes AI particularly dangerous. If this special factor isn't present in GPT+DPO, then GPT+DPO is not an AI notkilleveryoneism issue.

2dxu2y

Well, it's a primarily a statement about capabilities. The intended construal is that if a given system's capabilities profile permits it to accomplish some sufficiently transformative task, then that system's capabilities are not limited to only benign such tasks. I think this claim applies to most intelligences that can arise in a physical universe like our own (though necessarily not in all logically possible universes, given NFL theorems): that there exists no natural subclass of transformative tasks that includes only benign such tasks. (Where, again, the rub lies in operationalizing "transformative" such that the claim follows.) [...] I'm not sure how likely GPT+DPO (or GPT+RLHF, or in general GPT-plus-some-kind-of-RL) is to be dangerous in the limits of scaling. My understanding of the argument against, is that the base (large language) model derives most (if not all) of its capabilities from imitation, and the amount of RL needed to elicit desirable behavior from that base set of capabilities isn't enough to introduce substantial additional strategic/goal-directed cognition compared to the base imitative paradigm, i.e. the amount and kinds of training we'll be doing in practice are more likely to bias the model towards behaviors that were already a part of the base model's (primarily imitative) predictive distribution, than they are to elicit strategic thinking de novo. That strikes me as substantially an empirical proposition, which I'm not convinced the evidence from current models says a whole lot about. But where the disjunct I mentioned comes in, isn't an argument for or against the proposition; you can instead see it as a larger claim that parametrizes the class of systems for which the smaller claim might or might not be true, with respect to certain capabilities thresholds associated with specific kinds of tasks. And what the larger claim says is that, to the extent that GPT+DPO (and associated paradigms) fail to produce reasoners which could (in

2tailcalled2y

What I'm saying is that if GPT+DPO creates imitation-based intelligences that can be dangerous due to being intentionally instructed to do something bad ("hey, please kill that guy" and then it kills him), then that's not particularly concerning from an AI alignment perspective, because it has a similar danger profile to telling humans this. You would still want policy to govern it, similar to how we have policy to govern human-on-human violence, but it's not the kind of x-risk that notkilleveryoneism is about. So basically you can have "GPT+DPO is superintelligent, capable and dangerous" without having "GPT+DPO is an x-risk". That said, I expect GPT+DPO to be stagnate and be replaced by something else, and that something else could be an x-risk (and conditional on the negation of natural impact regularization, I strongly expect it would be).

2dxu2y

To the extent that I buy the story about imitation-based intelligences inheriting safety properties via imitative training, I correspondingly expect such intelligences not to scale to having powerful, novel, transformative capabilities—not without an amplification step somewhere in the mix that does not rely on imitation of weaker (human) agents. Since I believe this, that makes it hard for me to concretely visualize the hypothetical of a superintelligent GPT+DPO agent that nevertheless only does what is instructed. I mostly don't expect to be able to get to superintelligence without either (1) the "RL" portion of the GPT+RL paradigm playing a much stronger role than it does for current systems, or (2) using some other training paradigm entirely. And the argument for obedience/corrigibility becomes weaker/nonexistent respectively in each of those cases. Possibly we're in agreement here? You say you expect GPT+DPO to stagnate and be replaced by something else; I agree with that. I merely happen to think the reason it will stagnate is that its safety properties don't come free; they're bought and paid for by a price in capabilities.

2tailcalled2y

Are we using the word "transformative" in the same way? I imagine that if society got reorganized into e.g. AI minds that hire tons of people to continually learn novel tasks that it can then imitate, that would be considered transformative because it would entirely change people's role in society, like the agricultural revolution did. Whereas right now very few people have jobs that are explicitly about pushing the frontier of knowledge, in the future that might be ~the only job that exists (conditional on GPT+DPO being the future, which again is not a mainline scenario).

2ChristianKl2y

One core problem with AI is that it's not just "people" who make up new things behind teh scenes but AI itself that will make up new things.

[-]Signer2y32

I agree that this should be said, but there is also actual disagreement about which theory is better.

Getting reliable 20% returns every year is really quite amazingly hard.

Foundations for analogous arguments about future AI systems are not sufficiently understood - I mean, maybe we can get very capable system that optimise softly like current systems.

And then the AI companies, if they’re allowed to keep selling those—we have now observed—just brute-RLHF their models into not talking about that. Which means we can’t get any trustworthy observations of

... (read more)

1Tapatakt2y

As I understand, interpretability research doesn't exactly got stuck, but it's very-very-very far from something like this even for not-SotA models. And the gap is growing.

[-]cubefox2y2-3

There does actually seem to be a simple and general rule of extrapolation that can be used when no other data is available: If a trend has so far held for some timespan t, it will continue to hold, in expectation, for another timespan t, and then break down.

In other words, if we ask ourselves how long an observed trend will continue to hold, it does seem, absent further data, a good indifference assumption to think that we are currently in the middle of the trend; that we have so far seen half of it.

Of course it is possible that we are currently near the b... (read more)

6Zac Hatfield-Dodds2y

Trivially true to the extent that you are about equally likely to observe a thing throughout that timespan; and the Lindy Effect is at least regularly talked of. But there are classes of observations for which this is systematically wrong: for example, most people who see a ship part-way through a voyage will do so while it's either departing or arriving in port. Investment schemes are just such a class, because markets are usually up to the task of consuming alpha and tend to be better when the idea is widely known - even Buffett's returns have oscillated around the index over the last few years!

5tailcalled2y

Another reason investment schemes are an exception is because they grow exponentially. This probably means you are much more likely to see them at their peak than at a random time.

1cubefox2y

Yeah, one has to correct, when possible, for likelihood of observing a particular part of the lifetime of the trend. Though absent any further information our probability distribution should arguably be even. Which does suggest there is indeed a sort of "straight rule" of induction when extrapolating trends, as the scientist in the dialogue suspected. It is just that it serves as a weak prior that is easily changed by additional information.

[-]Kieren2y10

Fun read! I was surprised that the spokesperson kept up with the conversation as well as he did 🙂

Of course there's an Art of when to trust more in less complicated reasoning -- an Art of when to pay attention to data more narrowly in a domain and less to inferences from generalizations on data from wider domains

I would like to try and expand on what that Art is. The Spokesperson is offering up an inductive argument. For an inductive argument to be any good I believe it requires something like the following.

A theory or general rule that it is attempting to

... (read more)

[-]Tapatakt2y11

Which concept they might obtain by reading my book on Highly Advanced Epistemology 101 For Beginners, or maybe just my essay on Local Validity as a Key to Sanity and Civilization, I guess?"

Perhaps, there should be two links here?

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Moderation Log