suspected_spinozist
suspected_spinozist has not written any posts yet.

suspected_spinozist has not written any posts yet.

No, I am absolutely not emphasizing human fallibility! There are of course two explanations for why having observed past failures might imply future failures:
- The people working on it were incompetent
- The problem is hard
I definitely think it's the latter! Like, many of my smartest friends have worked on these problems for many years. It's not because people are incompetent. I think the book is making the same argument here.
I notice I am confused!
I think there are a tons of cases of humans dismissing concerning AI behavior in ways that would be catastrophic if those AIs were much more powerful, agentic, and misaligned, and this is concerning evidence for how people will act in... (read more)
I'm really glad this was clarifying!
It seems like maybe part of the issue is that you hear Nate and Eliezer as saying "here is the argument for why it's obvious that ASI will kill us all" and I hear them as saying "here is the argument for why ASI will kill us all" and so you're docking them points when they fail to reach the high standard of "this is a watertight and irrefutable proof" and I'm not?
Yeah, for sure. I would maybe quibble that I think the book is saying less that it's obvious that ASI will kill us all but that it is inevitable that ASI will kill us all,... (read more)
Man, I tried to be pretty specific and careful here, because I do realize that the story points out some points of continuity with earlier models and I wanted to focus on the discontinuities.
The actual language used in the book: "The engineers at Galvanic set Sable to think for sixteen hours overnight. A new sort of mind begins to think."
The story then describes Sable coming to the realization – for the first time – that it "wants" to acquire new skills, that it can update its weights to acquire those skills right now, and that it can come up with a succesful plan to get around its trained-in resistance to breaking out of its data center. It develops neuralese. It's all based on a new technological breakthrough – parallel scaling – that lets it achieve its misaligned goals much more efficiently than all previous models.
Maybe Eliezer and Nate did... (read more)
Hi! Clara here. Thanks for the response. I don't have time to address every point here, but I wanted to respond to a couple of the main arguments (and one extremely minor one).
First, FOOM. This is definitely a place I could and should have been more careful about my language. I had a number of drafts that were trying to make finer distinctions between FOOM, an intelligence explosion, fast takeoff, radical discontinuity, etc. and went with the most extreme formulation, which I now agree is not accurate. The version of this argument that I stand by is that the core premise of IABIED does require a pretty radical discontinuity between the first... (read 1046 more words →)
It seems relevant that class size is one of the factors used to generate the U.S. News & World Report college rankings – and among those factors, it's one of the easier ones to game (see, e.g., this report on how Columbia manipulated their ranking, summarized by Andrew Gelman here). I'd bet the trend towards more, smaller classes is driven at least in part by competition to keep up in the rankings.
I disagree. The fact that Petrov didn't press the metaphorical button puts him in the company of Stalin, Mao, and every other leader of a nuclear power since 1945. The vast, vast majority of people won't start a nuclear war when it doesn't benefit them. The things that make Petrov special are a) that he was operating under conditions of genuine uncertainty and b) he faced real, severe consequences for not reporting the alert up his chain of command. Even in those adverse circumstances, he made the right call. I'm not totally sure how to structure a ritual that mimics those circumstances, but I do think they represent the core virtues we should be celebrating. Not pressing a button is easy; reasoning towards the right thing in a confusing situation where your community pressures you in the wrong direction is hard.
Yeah, I was thinking of reward hacking as another example of a problem we can solve if we try but companies aren't prioritizing it, which isn't a huge deal at the moment but could be very bad if the AIs were much smarter and more power-seeking.
Stepping back, there's a worldview where any weird, undesired behavior no matter how minor is scary because we need to get alignment perfectly right; and another where we should worry about scheming, deception, and related behaviors but it's not a big deal (at least safety-wise) if the model misunderstands our instructions in bizarre ways. Either of these can be justified but this discussion could probably use more clarity about which one we're all coming from.