Bayesian updating in real life is mostly about understanding your hypotheses
My sense is that an increasingly common viewpoint around here is that the last ~20 years of AI development and AI x-risk discourse are well-described by the following narrative: > Eliezer Yudkowsky (and various others who were at least initially heavily influenced by his ideas) developed detailed models of key issues likely to be inherent in the process of developing smarter-than-human AI. > These models were somewhere between "maybe plausible" and "quite compelling" at the time that they were put forth, but recent developments in AI (e.g. behavioral characteristics of language models, smoothness / gradualness of scaling) have shown that reality just isn't panning out in quite the way Eliezer's models predicted. > These developments haven't entirely falsified Eliezer's models and key predictions, but there are now plenty of alternative models and theories. Some or all of these competing models either are or claim to: > > * have a better recent track record of predicting near-term AI developments > * better retrodict past developments[1] > * be backed by empirical results in machine learning and / or neuroscience > * feel more intuitively plausible and evidence-backed to people with different backgrounds and areas of expertise > Therefore, even if we can't entirely discount Eliezer's models, there's clearly a directional Bayesian update which any good Bayesian (including Eliezer himself) should be able to make by observing recent developments and considering alternate theories which they support. Even if the precise degree of the overall update (and the final landing place of the posterior) remains highly uncertain and debatable, the basic direction is clear. Without getting into the object-level too much, or even whether the narrative as a whole reflects the actual views of particular real people, I want to make some remarks on the concept of belief updating as typically used in narratives like this. Note, there's a sense in which any (valid) change in on
Peter Thiel pointed out that the common folk wisdom in business that you learn more from failure than success is actually wrong - failure is overdetermined and thus uninteresting.
I think you can make an analogous observation about some prosaic alignment research - a lot of it is the study of (intellectually) interesting failures, which means that it can make for a good nerdsnipe, but it's not necessarily that informative or useful if you're actually trying to succeed at (or model) doing something truly hard and transformative.
Glitch tokens, the hot mess work, and various things related to jailbreaking, simulators, and hallucinations come to mind as examples of lines of research and discussion that... (read more)