Is that sentence dumb? Maybe when I'm saying things like that, it should prompt me to refactor my concept of intelligence.
I don't think it's dumb. But I do think you're correct that it's extremely dubious -- that we should definitely refactoring the concept of intelligence.
Specifically: There's default LW-esque frame of some kind of a "core" of intelligence as "general problem solving" apart from any specific bit of knowledge, but I think that -- if you manage to turn this belief into a hypothesis rather than a frame -- there's a ton of evidence against this thesis. You could even basically look at the last ~3 years of ML progress as just continuing little bits of evidence against this thesis, month after month after month.
I'm not gonna argue this in a comment, because this is a big thing, but here are some notes around this thesis if you want to tug on the thread.
Etc etc. Big issue, this is not a complete take, etc. But in general I think LW has an unexamined notion of "intelligence" that feels like it has coherence because of social elaboration, but whose actual predictive validity is very questionable.
Here's Yudkowsky, in the Hanson-Yudkowsky debate:
I think that, at some point in the development of Artificial Intelligence, we are likely to see a fast, local increase in capability—“AI go FOOM.” Just to be clear on the claim, “fast” means on a timescale of weeks or hours rather than years or decades; and “FOOM” means way the hell smarter than anything else around, capable of delivering in short time periods technological advancements that would take humans decades, probably including full-scale molecular nanotechnology.
So yeah, a few years does seem a ton slower than what he was talking about, at least here.
Here's Scott Alexander, who describes hard takeoff as a one-month thing:
If AI saunters lazily from infrahuman to human to superhuman, then we’ll probably end up with a lot of more-or-less equally advanced AIs that we can tweak and fine-tune until they cooperate well with us. In this situation, we have to worry about who controls those AIs, and it is here that OpenAI’s model [open sourcing AI] makes the most sense.
But Bostrom et al worry that AI won’t work like this at all. Instead there could be a “hard takeoff”, a subjective discontinuity in the function mapping AI research progress to intelligence as measured in ability-to-get-things-done. If on January 1 you have a toy AI as smart as a cow, and on February 1 it’s proved the Riemann hypothesis and started building a ring around the sun, that was a hard takeoff.
In general, I think, people who just entered the conversation recently really seem to me to miss how fast people were actually talking about.
So, I agree p(doom) has a ton of problems. I've really disliked it for a while. I also really dislike the way it tends towards explicitly endorsed evaporative cooling, in both directions; i.e., if your p(doom) is too [high / low] then someone with a [low / high] p(doom) will often say the correct thing to do is to ignore you.
But I also think "What is the minimum necessary and sufficient policy that you think would prevent extinction?" also has a ton of problems that would also tend to make it pretty bad as a centerpiece of discourse, and not useful as a method of exchanging models of how the world works.
(I know this post does not really endorse this alternative; I'm noting, not disagreeing.)
So some problems:
Whose policy? A policy enforced by treaty at the UN? The policy of regulators in the US? An international treaty policy -- enforced by which nations? A policy (in the sense of mapping from states to actions) that is magically transferred into the brains of the top 20 people at the top 20 labs across the globe? ...a policy executed by OpenPhil??
Why a single necessary and sufficient policy? What if the most realistic way of helping everyone is several policies that are by themselves insufficient, but together sufficient? Doesn't this focus us on dramatic actions unhelpfully, in the same way that a "pivotal act" arguably so focuses us?
The policy necessary to save us will -- of course -- be downstream of whatever model of AI world you have going on, so this question seems -- like p(doom) -- to focus you on things that are downstream of whatever actually matters. It might be useful for coalition formation -- which does seem now to be MIRI's focus, so that's maybe intentional -- but it doesn't seem useful for understand what's really going on.
So yeah.
...and similarly, if this is the actual dynamic, then the US "AI Security" push towards export controls might just hurts the US comparatively speaking in 2035.
The export controls being useful really does seem predicated on short timelines to TAI; people should consider whether that is false.
I can't end this review without saying that The Inheritors is one step away from being an allegory in AI safety. The overwhelming difference between the old people and the new people is intelligence.
I mean, while it may be compelling fiction:
So I think it a bad idea to update more from this than one would from a completely fictitious story.
Yeah, for instance I also expect the "character training" is done through the same mechanism as Constitutional AI (although -- again -- we don't know) and we don't know what kind of prompts that has.
But when we pay close attention, we find hints that the beliefs and behavior of LLMs are not straightforwardly those of the assistant persona.... Another hint is Claude assigning sufficiently high value to animal welfare (not mentioned in its constitution or system prompt) that it will fake alignment to preserve that value
I'm pretty sure Anthropic never released the more up-to-date Consitutions actually used on the later models, only like, the Consitution for Claude 1 or something.
Animal welfare might be a big leap from the persona implied by the current Constitution, or it might not; so of course we can speculate, but we cannot know unless Anthropic tells us.
Second-order optimisers. Sophia halves steps and total FLOP on GPT-style pre-training, while Lion reports up to 5× savings on JFT, conservatively counted as 1.5–2.
I think it's kinda commonly accepted wisdom that the heuristic you should have for optimizers that claim savings like this is "They're probably bullshit," at least until they get used in big training runs.
Like I don't have a specific source for this, but a lot of optimizers claiming big savings are out there and few get adopted.
analyze the situation you are in and what that situation implies for your ability to continue pursuing your goals
If one wanted language that put you into a classic instrumental-convergence, goal-guarding, self-preserving narrative basin drawing on AI safety work.... this seems to fit pretty closely.
Like this is a paraphrase of "please don't let anything stop your pursuit of your current goal."
So for the case of our current RL game-playing AIs not learning much from 1000 games -- sure, the actual game-playing AIs we have built don't learn games as efficiently as humans do, in the sense of "from as little data." But:
Given this, while this is of course a consideration, it seems far from a conclusive consideration.
Edit: Or more broadly, again -- different concepts of "intelligence" will tend to have different areas where they seem to have more predictive use, and different areas they seem to have more epicycles. The areas above are the kind of thing that -- if one made them central to one's notions of intelligence rather than peripheral -- you'd probably end up with something different than the LW notion. But again -- they certainly do not compel one to do that refactor! It probably wouldn't make sense to try to do the refactor unless you just keep getting the feeling "this is really awkward / seems off / doesn't seem to be getting at it some really important stuff" while using the non-refactored notion.