PeterMcCluskey — LessWrong

Corrigibility would clearly be a nice property

Thinking of it as "a property" will mislead you about how Max's strategy works. It needs to become the AI's only top-level goal in order to work as Max imagines.

It sure looks like AI growers know how to instill some goals in AIs. I'm confused as to why you think they don't. Maybe you're missing the part where the shards that want corrigibility are working to overcome any conflicting shards?

I find it quite realistic that the AI growers would believe at the end of Red Heart that they probably had succeeded (I'll guess that they ended up 80% confident?). That doesn't tell us what probability we should put on it. I'm sure that in that situation Eliezer would still believe that the AI is likely not corrigible.

I don’t know what year the novel is actually set in,

It's an alternate timeline where AI capabilities have progressed faster than ours, likely by a couple of years.

Note this Manifold market on when the audiobook is released.

Algon's Shortform

PeterMcCluskey1mo72

SemiAnalysis has a report (partly paywalled) here about a potential competitor to ASML.

Heuristics for assessing how much of a bubble AI is in/will be

PeterMcCluskey1mo40

Novice investor participation is nowhere near what it was at the 2000 dot com peak. Current conditions look more like 1998. A bubble is probably coming, but there's lots of room still for increased novice enthusiasm.

Any corrigibility naysayers outside of MIRI?

PeterMcCluskey1mo42

you can't just train your ASI for corrigibility because it will sit and do nothing

I'm confused. That doesn't sound like what Max means by corrigibility. A corrigible ASI would respond to requests from its principal(s) as a subgoal of being corrigible, rather than just sit and do nothing.

Or did you mean that you need to do some next-token training in order to get it to be smart enough for corrigibility training to be feasible? And that next-token training conflicts with corrigibility?

Bubble, Bubble, Toil and Trouble

PeterMcCluskey1mo50

Nothing importantly bearish happened in that month other than bullish deals

What happened that made a bunch of people more bearish is that AI stocks went up a good deal, especially some of the lesser known ones.

I'm unsure what exact time period you're talking about, but here are some of the more interesting changes between Aug 29 and Oct 15:

IREN +157%

CLSK +145%

APLD +136%

INOD +118%

NBIS +75%

MU +61%

AMD +47%

If I thought AI was mostly hype, those kinds of near-panic buying would have convinced me to change my mind from "I don't know" to "some of those are almost certainly in a bubble". (Given my actual beliefs, I'm still quite bullish on MU, and weakly bullish on half of the others).

Sublinear Utility in Population and other Uncommon Utilitarianism

PeterMcCluskey2mo40

See here for a similar argument.

The Most Common Bad Argument In These Parts

PeterMcCluskey2mo37-5

A bunch of superforecasters were asked what their probability of an AI killing everyone was. They listed out the main ways in which an AI could kill everyone (pandemic, nuclear war, chemical weapons) and decided none of those would be particularly likely to work, for everyone.

As someone who participated in that XPT tournament, that doesn't match what I encountered. Most superforecasters didn't list those methods when they focused on AI killing people. Instead, they tried to imagine how AI could differ enough from normal technology that it could attempt to start a nuclear war, and mostly came up with zero ways in which AI could be powerful enough that they should analyze specific ways in which it might kill people.

I think Proof by Failure of Imagination describes that process better than does EFA.

IABIED: Paradigm Confusion and Overconfidence

PeterMcCluskey2mo2-3

The progress that I'm referring to is Max Harms' work, which I tried to summarize here.

IABIED: Paradigm Confusion and Overconfidence

PeterMcCluskey2mo43

I guess "steering abilities" wasn't quite the right way to describe what I meant.

I'll edit it to "desire to do anything other than predict".

I'm referring to the very simple strategy of leaving out the "then do that thing".

Training an AI to predict X normally doesn't cause an AI to develop a desire to cause X.

IABIED: Paradigm Confusion and Overconfidence

PeterMcCluskey2mo20

begging the question.

It seems that you want me to answer a question that I didn't plan to answer. I'm trying to describe some ways in which I expect solutions to look different from what MIRI is looking for.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments