Eli Tyre

Comments

Eli Tyre-2-10

I think that it is about as likely that Midjourney is conscious as that Claude is conscious. I'd assign maybe 20%? (But this is really an ass number.)

But I'd assign at least 5% to plants, and my laptop being at least somewhat conscious, and at least 10% that some large fraction of intelligent, civilization-building, aliens being non-conscious. ¯\_(ツ)_/¯
 

Eli Tyre6-1

Hm. I'm tempted to upvote this post, except that the tile is click-bait and, crucially, misleading. (The prompting here is a critical piece of the result, but the title doesn't suggest that). I think that's sufficient to downvote instead.

All this sounds correct to me.

Reflecting on some previous conversations in which parallel the opening vignette, I now suspect that many people are just not conscious in the way that I am / seem to be.

Eli Tyre126

It seems important to note that the posited reasoning flaw is much more rational in the context of a puzzle game. In that context, you know that there's a correct answer. If you've exhaustively searched through the possibilities, and you're very confident that there's no solution unless assumption X holds...that's actually a pretty good reason to believe X. 

A syllogism: 

P1: There is a solution.

P2: If not X, there would be no solution.

C: Therefore X.

In this context, subjects seem to be repeatedly miscalibrated about P2. It turns out that they had searched through the space of possibilities much less exhaustively than they thought they had. So they were definitely wrong, overall. 

But this syllogism is still valid. And P1 is a quite reasonable assumption. 

There's an implication that people are applying an identical pattern of reasoning to plans for AI risk. In that context, not only is P2 non-obvious for any given version of P2, but P1 isn't solid either: there's no guarantee that there's a remotely feasible solution in scope, given the available resources. 

In that case, if indeed people are reasoning this way, they're making two mistakes, both PI and P2, whereas in puzzle game context, people are only really making one.


From my observations of people thinking about AI risk, I also think that people are (emotionally) assuming that there's a viable solution AI risk, and their thinking warps accordingly.

But it seems important to note that if we're drawing the conclusion that people make that particular error, from this dataset, you're attributing a reasoning error to them in one context, based on observing them use a similar assumption in a different context in which that assumption is, in fact, correct.

These are interesting anecdotes but it feels like they could just as easily be used to argue for the opposite conclusion.

That is, your frame here is something like "planning is hard therefore you should distrust alignment plans".

But you could just as easily frame this as "abstract reasoning about unfamiliar domains is hard therefore you should distrust doom arguments".

That doesn't sound right to me. 

The reported observation is not just that these particular people people failed at a planning / reasoning task. The reported observation is that they repeatedly made optimistic miscalibrated assumptions, because those assumptions supported a plan

There's a more specific reasoning error that's being posited, beyond "people are often wrong when trying to reason about abstract domains without feedback". Something like "people will anchor on ideas, if those ideas are necessary for the success of a plan, and they don't see an alternative plan."

If that posit is correct, that's not just an update of "reasoning abstractly is hard and we should widen our confidence intervals / be more uncertain". We should update to having a much higher evidential bar for the efficacy of plans.

It seems like "human values" aren't particularly reflective then? Like I could describe the behavioral properties of a species of animal, including what they value or don't value.

But that leaves something out? 

when a concept definition has a step where its boundary depends on your own internals rather than pure facts about the environment

Could anyone possibly offer 2 positive and 2 negative examples of a reflective-in-this-sense concept?

Thank you for this. As you note, this seems like a very important insight / clarification, for power- accrual / status-accrual based plans. In general, I observe people thinking only very vaguely about these kinds of plans, and this post gives me a sense of the kind of crisp modeling that is possible here.

Hm. Can I request tabooing the phrase "genuine safety role" in favor of more detailed description of the work that's done? There's broad disagreement about which kinds of research are (or should count as) "AI safety", and what's required for that to succeed. 

Load More