Adam Scholl

Wiki Contributions


I agree we don't really understand anything in LLMs at this level of detail, but I liked Jan highlighting this confusion anyway, since I think it's useful to promote particular weird behaviors to attention. I would be quite thrilled if more people got nerd sniped on trying to explain such things!

John, it seems totally plausible to me that these examples do just reflect something like “hallucination,” in the sense you describe. But I feel nervous about assuming that! I know of no principled way to distinguish “hallucination” from more goal-oriented thinking or planning, and my impression is that nobody else does either.

I think it’s generally unwise to assume LLM output reflects its internal computation in a naively comprehensible way; it usually doesn’t, so I think it’s a sane prior to suspect it doesn't here, either. But at our current level of understanding of the internal computation happening in these models, I feel wary of confident-seeming assertions that they're well-described in any particular way—e.g., as "hallucinations," which I think is far from a well-defined concept, and which I don't have much confidence carves reality at its joints—much less that they're not dangerous.

So while I would personally bet fairly strongly against the explicit threats produced by Bing being meaningfully reflective of threatening intent, it seems quite overconfident to me to suggest they don’t “at all imply” it! From my perspective, they obviously imply it, even if that's not my lead hypothesis for what's going on.

If simple outcompetes complex, wouldn't we expect to see more prokaryotic DNA in the biosphere? Whereas in fact we see 2-3x as much eukaryotic DNA, depending on how you count—hardly a small niche!

I also found the writing way clearer than usual, which I appreciate - it made the post much easier for me to engage with.

As I understand it, the recent US semiconductor policy updates—e.g., CHIPS Act, export controls—are unusually extreme, which does seem consistent with the hypothesis that they're starting to take some AI-related threats more seriously. But my guess is that they're mostly worried about more mundane/routine impacts on economic and military affairs, etc., rather than about this being the most significant event since the big bang; perhaps naively, I suspect we'd see more obvious signs if they were worried about the latter, a la physics departments clearing out during the Manhattan Project.

Critch, I agree it’s easy for most people to understand the case for AI being risky. I think the core argument for concern—that it seems plausibly unsafe to build something far smarter than us—is simple and intuitive, and personally, that simple argument in fact motivates a plurality of my concern. That said:

  • I think it often takes weirder, less intuitive arguments to address many common objections—e.g., that this seems unlikely to happen within our lifetimes, that intelligence far superior to ours doesn’t even seem possible, that we’re safe because software can’t affect physical reality, that this risk doesn’t seem more pressing than other risks, that alignment seems easy to solve if we just x, etc.
  • It’s also remarkably easy to convince many people that aliens visit Earth on a regular basis, that the theory of evolution via natural selection is bunk, that lottery tickets are worth buying, etc. So while I definitely think some who engage with these arguments come away having good reason to believe the threat is likely, for values of “good” and “believe” and “likely” at least roughly similar those common around here, I suspect most update something more like their professed belief-in-belief, than their real expectations—and that even many who do update their real expectations do so via symmetric arguments that leave them with poor models of the threat.

These factors make me nervous about strategies that rely heavily on convincing everyday people, or people in government, to care about AI risk, for reasons I don’t think are well described as “systematically discounting their opinions/agency.” Personally, I’ve engaged a lot with people working in various corners of politics and government, and decently much with academics, and I respect and admire many of them, including in ways I rarely admire rationalists or EA’s.

(For example, by my lights, the best ops teams in government are much more competent than the best ops teams around here; the best policy wonks, lawyers, and economists are genuinely really quite smart, and have domain expertise few R/EA’s have without which it’s hard to cause many sorts of plausibly-relevant societal change; perhaps most spicily, I think academics affiliated with the Santa Fe Institute have probably made around as much progress on the alignment problem so far as alignment researchers, without even trying to, and despite being (imo) deeply epistemically confused in a variety of relevant ways).

But there are also a number of respects in which I think rationalists and EA’s tend to far outperform any other group I’m aware of—for example, in having beliefs that actually reflect their expectations, trying seriously to make sure those beliefs are true, being open to changing their mind, thinking probabilistically, “actually trying” to achieve their goals as a behavior distinct from “trying their best,” etc. My bullishness about these traits is why e.g. I live and work around here, and read this website.

And on the whole, I am bullish about this culture. But it’s mostly the relative scarcity of these and similar traits in particular, not my overall level of enthusiasm or respect for other groups, that causes me to worry they wouldn’t take helpful actions if persuaded of AI risk.

My impression is that it’s unusually difficult to figure out how to take actions that reduce AI risk without substantial epistemic skill of a sort people sometimes have around here, but only rarely have elsewhere. On my models, this is mostly because:

  • There are many more ways to make the situation worse than better;
  • A number of key considerations are super weird and/or terrifying, such that it's unusually hard to reason well about them;
  • It seems easier for people to grok the potential importance of transformative AI, than the potential danger.

My strong prior is that, to accomplish large-scale societal change, you nearly always need to collaborate with people who disagree with you, even about critical points. And I’m sympathetic to the view that this is true here, too; I think some of it probably is. But I think the above features make this more fraught than usual, in a way that makes it easy for people who grok the (simpler) core argument for concern, but not some of the (typically more complex) ancillary considerations, to accidentally end up making the situation even worse.

Here are some examples of (what seem to me like) this happening:

  • The closest thing I'm aware of to an official US government position on AI risk is described in the 2016 and 2017 National Science and Technology Council reports. I haven't read all of them, but the parts I have read struck me as a strange mix of claims like “maybe this will be a big deal, like mobile phones were,” and “maybe this will be a big deal, in the sense that life on Earth will cease to exist.” And like, I can definitely imagine explanations for this that don't much involve the authors misjudging the situation—maybe their aim was more to survey experts than describe their own views, or maybe they were intentionally underplaying the threat for fear of starting an arms race, etc. But I think my lead hypothesis is more that the authors just didn’t actually, viscerally consider that the sentences they were writing might be true, in the sense of describing a reality they might soon inhabit.
    • I think rationalists and EA's tend to make this sort of mistake less often, since the “taking beliefs seriously”-style epistemic orientation common around here has the effect of making it easier for people to viscerally grasp that trend lines on graphs and so forth might actually reflect reality. (Like, one frame on EA as a whole, is “an exercise in avoiding the ‘learning about the death of a million feels like a statistic, not a tragedy’ error”). And this makes me at least somewhat more confident they won’t do dumb things upon becoming worried about AI risk, since without this epistemic skill, I think it’s easier to make critical errors like overestimating how much time we have, or underestimating the magnitude or strangeness of the threat.
  • As I understand it, OpenAI is named what it is because, at least at first, its founders literally hoped to make AGI open source. (Elon Musk: “I think the best defense against the misuse of AI is to empower as many people as possible to have AI. If everyone has AI powers, then there’s not any one person or a small set of individuals who can have AI superpower.”)
    • By my lights, there are unfortunately a lot of examples of rationalists and EA’s making big mistakes while attempting to reduce AI risk. But it’s at least... hard for me to imagine most of them making this one? Maybe I’m being insufficiently charitable here, but from my perspective, this just fails a really basic “wait, but then what happens next?” sanity check, that I think should have occurred to them more or less immediately, and that I suspect would have to most rationalists and EA's.
  • For me, the most striking aspect of the AI Impacts poll, was that all those ML researchers who reported thinking ML had a substantial chance of killing everyone, still research ML. I’m not sure why they do this; I’d guess some of them are convinced for some reason or another that working on it still makes sense, even given that. But my perhaps-uncharitable guess is that most of them actually don’t—that they don’t even have arguments which feel compelling to them that justify their actions, but that they for some reason press on anyway. This too strikes me as a sort of error R/EA’s are less likely to make.
    • (When Bostrom asked Geoffrey Hinton why he still worked on AI, if he thought governments would likely use it to terrorize people, he replied, "I could give you the usual arguments, but the truth is that the prospect of discovery is too sweet").
  • Sam Altman recently suggested, on the topic of whether to slow down AI, that “either we figure out how to make AGI go well or we wait for the asteroid to hit."
    • Maybe he was joking, or meant "asteroid" as a stand-in for all potentially civilization-ending threats, or something? But that's not my guess, because his follow-up comment is about how we need AGI to colonize space, which makes me suspect he actually considers asteroid risk in particular a relevant consideration for deciding when to deploy advanced AI. Which if true, strikes me as... well, more confused than any comment in this thread strikes me. And it seems like the kind of error that might, for example, cause someone to start an org with the hope of reducing existential risk, that mostly just ends up exacerbating  it.

Obviously our social network doesn't have a monopoly on good reasoning, intelligence, or competence, and lord knows it has plenty of its own pathologies. But as I understand it, most of the reason the rationality project exists is to help people reason more clearly about the strange, horrifying problem of AI risk. And I do think it has succeeded to some degree, such that empirically, people with less exposure to this epistemic environment far more often take actions which seem terribly harmful to me.

One comment in this thread compares the OP to Philip Morris’ claims to be working toward a “smoke-free future.” I think this analogy is overstated, in that I expect Philip Morris is being more intentionally deceptive than Jacob Hilton here. But I quite liked the comment anyway, because I share the sense that (regardless of Jacob's intention) the OP has an effect much like safetywashing, and I think the exaggerated satire helps make that easier to see.

The OP is framed as addressing common misconceptions about OpenAI, of which it lists five:

  1. OpenAI is not working on scalable alignment.
  2. Most people who were working on alignment at OpenAI left for Anthropic.
  3. OpenAI is a purely for-profit organization.
  4. OpenAI is not aware of the risks of race dynamics.
  5. OpenAI leadership is dismissive of existential risk from AI.

Of these, I think 1, 3, and 4 address positions that are held by basically no one. So by “debunking” much dumber versions of the claims people actually make, the post gives the impression of engaging with criticism, without actually meaningfully doing that. 2 at least addresses a real argument, but at least as I understand it, is quite misleading—while technically true, it seriously underplays the degree to which there was an exodus of key safety-conscious staff, who left because they felt OpenAI leadership was too reckless. So of these, only 5 strikes me as responding non-misleadingly to a real criticism people actually regularly make.

In response to the Philip Morris analogy, Jacob advised caution:

rhetoric like this seems like an excellent way to discourage OpenAI employees from ever engaging with the alignment community.

For many years, the criticism I heard of OpenAI in private was dramatically more vociferous than what I heard in public. I think much of this was because many people shared Jacob’s concern—if we say what we actually think about their strategy, maybe they’ll write us off as enemies, and not listen later when it really counts?

But I think this is starting to change. I’ve seen a lot more public criticism lately, which I think is probably at least in part because it’s become so obvious that the strategy of mincing our words hasn't worked. If they mostly ignore all but the very most optimistic alignment researchers now, why should we expect that will change later, as long as we keep being careful to avoid stating any of our offensive-sounding beliefs?

From talking with early employees and others, my impression is that OpenAI’s founding was incredibly reckless, in the sense that they rushed to deploy their org, before first taking much time to figure out how to ensure that went well. The founders' early comments about accident risk mostly strike me as so naive and unwise, that I find it hard to imagine they thought much at all about the existing alignment literature before deciding to charge ahead and create a new lab. Their initial plan—the one still baked into their name—would have been terribly dangerous if implemented, for reasons I’d think should have been immediately obvious to them had they stopped to think hard about accident risk at all.

And I think their actions since then have mostly been similarly reckless. When they got the scaling laws result, they published a paper about it, thereby popularizing the notion that “just making the black box bigger” might be a viable path to AGI. When they demoed this strategy with products like GPT-3, DALL-E, and CLIP, they described much of the architecture publicly, inspiring others to pursue similar research directions.

So in effect, as far as I can tell, they created a very productive “creating the x-risk” department, alongside a smaller “mitigating that risk” department—the presence of which I take the OP to describe as reassuring—staffed by a few of the most notably optimistic alignment researchers, many of whom left because even they felt too worried about OpenAI’s recklessness.

After all of that, why would we expect they’ll suddenly start being prudent and cautious when it comes time to deploy transformative tech? I don’t think we should.

My strong bet is that OpenAI leadership are good people, in the standard deontological sense, and I think that’s overwhelmingly the sense that should govern interpersonal interactions. I think they’re very likely trying hard, from their perspective, to make this go well, and I urge you, dear reader, not to be an asshole to them. Figuring out what makes sense is hard; doing things is hard; attempts to achieve goals often somehow accidentally end up causing the opposite thing to happen; nobody will want to work with you if small strategic updates might cause you to suddenly treat them totally differently.

But I think we are well past the point where it plausibly makes sense for pessimistic folks to refrain from stating their true views about OpenAI (or any other lab) just to be polite. They didn’t listen the first times alignment researchers screamed in horror, and they probably won’t listen the next times either. So you might as well just say what you actually think—at least that way, anyone who does listen will find a message worth hearing.

Incorrect: OpenAI leadership is dismissive of existential risk from AI.

Why, then, would they continue to build the technology which causes that risk? Why do they consider it morally acceptable to build something which might well end life on Earth?

Incorrect: OpenAI is not aware of the risks of race dynamics.

I don't think this is a common misconception. I, at least, have never heard anyone claim OpenAI isn't aware of the risk of race dynamics—just that it nonetheless exacerbates them. So I think this section is responding to a far dumber criticism than the one which people actually commonly make.

I don’t expect a discontinuous jump in AI systems’ generality or depth of thought from stumbling upon a deep core of intelligence

I felt surprised reading this, since "ability to automate AI development" feels to me like a central example of a "deep core of intelligence"—i.e., of a cognitive ability which makes attaining many other cognitive abilities far easier. Does it not feel like a central example to you?

Load More