In their hopes that it’s not too late for course correction around AI, Nate and Eliezer have written a book making the detailed case for this unfortunate reality. Available in September, you can preorder it now, or read endorsements, quotes, and reviews from scientists, national security officials, and more.
This is the second of a two-post series on foom (previous post) and doom (this post).
The last post talked about how I expect future AI to be different from present AI. This post will argue that this future AI will be of a type that will be egregiously misaligned and scheming, not even ‘slightly nice’, absent some future conceptual breakthrough.
I will particularly focus on exactly how and why I differ from the LLM-focused researchers who wind up with (from my perspective) bizarrely over-optimistic beliefs like “P(doom) ≲ 50%”.[1]
In particular, I will argue that these “optimists” are right that “Claude seems basically nice, by and large” is nonzero evidence for feeling good about current LLMs (with various caveats). But I think that future AIs...
This is quite specific and only engaging with section 2.3 but it made me curious.
I want to ask a question around a core assumption in your argument about human imitative learning. You claim that when humans imitate, this "always ultimately arises from RL reward signals" - that we imitate because we "want to," even if unconsciously. Is this the case at all times though?
Let me work through object permanence as a concrete case study. The standard developmental timeline shows infants acquiring this ability around 8-12 months through gradual exposur...
Two decades don't seem like enough to generate the effect he's talking about. He might disagree though.
This is a two-post series on AI “foom” (this post) and “doom” (next post).
A decade or two ago, it was pretty common to discuss “foom & doom” scenarios, as advocated especially by Eliezer Yudkowsky. In a typical such scenario, a small team would build a system that would rocket (“foom”) from “unimpressive” to “Artificial Superintelligence” (ASI) within a very short time window (days, weeks, maybe months), involving very little compute (e.g. “brain in a box in a basement”), via recursive self-improvement. Absent some future technical breakthrough, the ASI would definitely be egregiously misaligned, without the slightest intrinsic interest in whether humans live or die. The ASI would be born into a world generally much like today’s, a world utterly unprepared for this...
LLM are already good at solving complicated, Ph.D. level mathematical problems, which improves
They're not. I work a lot with math, and o3 is useful for asking basic questions about domains I'm unfamiliar with and pulling up relevant concepts/literature. But if you ask it to prove something nontrivial, 95+% of the time it will invite you for a game of "here's a proof that 2 + 2 = 5, spot the error!".
That can also be useful: it's like dropping a malfunctioning probe into a cave and mapping out its interior off of the random flashes of light and sounds of imp...
Edition #9, that School is Hell, turned out to hit quite the nerve.
Thus, I’m going to continue with the system of making the roundups have more focus in their themes, with this one being the opposite of school questions, except for the question of banning phones in schools which seemed to fit.
...Henry Shevlin: I asked a high school teacher friend about the biggest change in teens over the past decade. His answer was interesting. He said whereas the ‘default state’ of teenage psychology used to be boredom, now it was
On references: I find it baffling how much of a cultural disconnect I feel between myself (born 1987) and almost anyone <~5-8 yrs younger than me. I can easily have conversations with people in their 70s and get at least a majority of their references, but go just a few years in the other direction and (for a recent example) I'll talk to a coworker who not only had never seen Seinfeld but had never heard of the Soup Nazi. Or (for another) a trivia night where the hosts not only didn't know Anaconda sampled Baby Got Back but were somehow confused by the ...
In an attempt to get myself to write more here is my own shortform feed. Ideally I would write something daily, but we will see how it goes.
Of course this is now used as an excuse to revert any recent attempts to improve the article.
From reading the relevant talk-page it is pretty clear those arguing against the changes on these bases aren’t exactly doing so in good faith, and if they did not have this bit of ammunition to use they would use something else, but then with fewer detractors (since clearly nobody else followed or cared about that page).
I’ve been thinking a lot recently about the relationship between AI control and traditional computer security. Here’s one point that I think is important.
My understanding is that there's a big qualitative distinction between two ends of a spectrum of security work that organizations do, that I’ll call “security from outsiders” and “security from insiders”.
On the “security from outsiders” end of the spectrum, you have some security invariants you try to maintain entirely by restricting affordances with static, entirely automated systems. My sense is that this is most of how Facebook or AWS relates to its users: they want to ensure that, no matter what actions the users take on their user interfaces, they can't violate fundamental security properties. For example, no matter what text I enter into the...
one reason it works with humans is that we have skin in the game
Another reason is that different humans have different interests, your accountant and your electrician would struggle to work out a deal to enrich themselves at your expense, but it would get much easier if they shared the same brain and were just pretending to be separate people.
"We don't want it to be the case that models can be convinced to blackmail people just by putting them in a situation that the predictor thinks is fictional!"
This is interesting! I guess that in, some sense, means that you see certain ways in which even a future Claude N+1 won't be a truly general intelligence?
[ Context: The Debate on Animal Consciousness, 2014 ]
There's a story in Growing Up Yanomamo where the author, Mike Dawson, a white boy from America growing up among Yanomamö hunter-gatherer kids in the Amazon, is woken up in the early morning by two of his friends.
One of the friends says, "We're going to go fishing".
So he goes with them.
At some point on the walk to the river he realizes that his friends haven't said whose boat they'll use [ they're too young to have their own boat ].
He considers asking, then realizes that if he asks, and they're planning to borrow an older tribesmember's boat without permission [ which is almost certainly the case, given that they didn't specify up front ], his friends will have to...
Yet trying to imagine being something with half as much consciousness or twice as much consciousness as myself, seems impossible
To me, it doesn't even need to be imagined. Everyone experienced partial consciousness, e.g.
Dreaming, where you have phenomenal awareness , but not of an external world.
Deliberate visualisation, which is less phenomenally vivid than perception in most people.
Drowsiness, states between sleep.and waking.
Autopilot and flow states , where the sense of a self deciding actions isn absent.
More rarely there are forms of ...
Salutations,
I have been a regular reader (and big fan) of LessWrong for quite some time now, so let me just say that I feel honoured to be able to share some of my thoughts with the likes of you folks.
I don't reckon myself a good writer, nor a very polished thinker (as many of the veteran writers here), so I hope you'll bear with me and be gentle with your feedback (it is my first time after all).
Without further ado, I have been recently wrestling with the concept of abductive reasoning. I have been perusing for good definitions and explanations of it, but none persuade me that abductive reasoning is actually a needed concept.
The argument goes as follows: “Any proposed instance of abductive reasoning can be fully...
Abductive reasoning results from the abduction of one's reason.
Couldn't resist the quip. To speak more seriously: There is deduction, which from true premises always yields true conclusions. There is Bayesian reasoning, which from probabilities derives probabilities. There is no other form of reasoning. "Induction" and "abduction" are pre-Bayesian gropings in the dark, of no more account than the theory of humours in medicine.