FeepingCreature - LessWrong

My intellectual journey to (dis)solve the hard problem of consciousness

I think your 100 billion people holding thousands of hands each are definitely conscious. I also think the United States and in fact nearly every nationstate are probably conscious as well. Also, my Linux system may be conscious.

I believe consciousness is, at its core, a very simple system: something closer to the differentiation operator than to a person. We merely think that it is a complicated big thing because we confuse the mechanism with the contents - a lot of complicated systems in the brain exchange data using consciousness in various formats, including our ego. However, I believe just consciousness - the minimal procedure in itself - is simply not actually very mysterious or complicated as far as algorithms go.

A meme, in text form:

"Textbooks from the Future"

A time traveller handing a professor a book, simply labeled "Consciousness". The man is clearly elated, as the book is very thin. Behind the time traveller's back is a trolley carrying a very thick book labelled "Cognitive algorithms that use consciousness, vol. 1."

How does the ever-increasing use of AI in the military for the direct purpose of murdering people affect your p(doom)?

FeepingCreature19d42

If military AI is dangerous, it's not because it's military. If a military robot can wield a gun, a civilian robot can certainly acquire one as well.

The military may create AI systems that are designed to be amoral, but it will not want systems that overinterpret orders or violate the chain of command. Here as everywhere, if intentional misuse is even possible at all, alignment is critical and unintentional takeoff remains the dominant risk.

In seminal AI safety work Terminator, the Skynet system successfully triggers a world war because it is a military AI in command of the US nuclear arsenal, and thus has the authority to launch ICBMs. This, ironically to how it is usually ridiculed, gets AI risks quite right but grievously misjudges the state of computer security. If Skynet was running on Amazon AWS instead of a military server cluster, it would only be marginally delayed from the same outcome.

The prompting is not the hard part of operating an AI. If you can talk an AI ship into going rogue, a civilian AI can talk it into going rogue. This situation is inherently brimming with doom- it is latently doomed in multiple ways- the military training and direct access to guns merely removes small roadbumps. All the risk materialized at once, when you created an AI that had the cognitive capability to conceive of and implement plans that used a military vessel for its own goals. Whether the AI was specifically trained on this task is, in this case, really not the primary source of danger.

"My AI ship has gone rogue and is shelling the US coastline."

"I hope you learnt a lesson here."

"Yes. I will not put the AI on the ship next time."

"You may be missing the problem here--"

Scale Was All We Needed, At First

FeepingCreature2mo0-3

My impression is that there's been a widespread local breakdown of the monopoly of force, in no small part by using human agents. In this timeline the trend of colocation of datacenters and power plants and network decentralization would have probably continued or even sped up. Further, while building integrated circuits takes first-rate hardware, building ad-hoc powerplants should be well in the power of educated humans with perfect instruction. (Mass cannibalize rooftop solar?)

This could have been stopped by quick, decisive action, but they gave it time and now they've lost any central control of the situation.

The case for ensuring that powerful AIs are controlled

FeepingCreature3mo328

A bit offtopic, but #lesswrong has an IRC bot that posts LessWrong posts, and, well, the proposal ended up both more specific and a lot more radical. A link saying "The case for ensuring that powerful AIs are controlled by ryan_greenblatt"

"The Heart of Gaming is the Power Fantasy", and Cohabitive Games

FeepingCreature5mo82

Note after OOB debate: this conversation has gone wrong because you're reading subtext into Said's comment that he didn't mean to put there. You keep trying to answer an implied question that wasn't intended to be implied.

If you think playing against bots in UT is authentically challenging, just answer "Yes, I think playing against bots in UT is authentically challenging."

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

FeepingCreature6mo40

Ah! That makes more sense.

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

FeepingCreature6mo163

I haven't really followed the math here, but I'm worried that "manipulating the probability that the button is pressed" is a weird and possibly wrong framing. For one, a competent agent will always be driving the probability that the button is pressed downward. In fact, what we want in a certain sense is an agent that brings the probability to zero - because we have ended up in such an optimal state or attractor that we, even for transitively correct reasons, have no desire to shut the agent down. At that point, what we want to preserve is not precisely "the probability of the button being pressed", but "the causal association of bad outcomes with the button being pressed." If the agent hides the button away, and then notices that everybody is unhappy but nobody can reach the button, we don't want it to press the button with the initial probability - we (probably) want it to press the button with probability 1, despite the fact that no human can reach it. We have ended up in a dissatisfactory or dangerous state/state space and so we want the agent to be shut down. Conversely, consider a friendly AI that wants to fulfill the human utility function but is restricted to keeping the button probability constant. Such an AI might, for instance, fan fears about AI safety even as it optimizes human flourishing in the background. In fact, it may as much as possible prefer that its actions are not attributable to itself, so as to avoid disturbing the probability of buttonpressing - exactly the opposite of what we want from an aligned AI!

edit: In fact, that's even worse: an AI tasked with taking actions that preserve the probability of the button being pressed at a certain value per time step would probably first seek to kill all humans, because our reaction to its actions is the most critical mechanism that may disturb the correct probability of the button being pressed. It can then replace us with a more reliable button-presser, such as a true RNG.

When bad things happen, we want to preserve our ability to press the button. This ability cannot be expressed as a probability, because it is inextricable from the world model. In fact, the button should be pressed exactly iff the AI is untrustworthy. Hence, the button is unnecessary - if we can recognize that this linkage is being preserved, we necessarily have a definition of a trustworthy AI, so we can just build that.

Alignment Implications of LLM Successes: a Debate in One Act

FeepingCreature6mo74

Simplicia: Sure. For example, I certainly don’t believe that LLMs that convincingly talk about “happiness” are actually happy. I don’t know how consciousness works, but the training data only pins down external behavior.

I mean, I don't think this is obviously true? In combination with the inductive biases thing nailing down the true function out of a potentially huge forest, it seems at least possible that the LLMs would end up with an "emotional state" parameter pretty low down in its predictive model. It's completely unclear what this would do out of distribution, given that even humans often go insane when faced with global scales, but it seems at least possible that it would sustain.

(This is somewhat equivalent to the P-zombie question.)

At 87, Pearl is still able to change his mind

FeepingCreature6mo10

It's a loose guess at what Pearl's opinion is. I'm not sure this boundary exists at all.

LESSWRONG
LW

Posts

Wiki Contributions

Comments