LESSWRONG
LW

Roko
6326Ω811721592
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Signaling
16y
(+5/-6)
Signaling
16y
(+48/-30)
7Roko's Shortform
5y
35
Roko's Shortform
Roko1y2-16

The Contrarian 'AI Alignment' Agenda

Overall Thesis: technical alignment is generally irrelevant to outcomes, and almost everyone in the AI Alignment field is stuck with this incorrect assumption, working on technical alignment of LLM models

(1) aligned superintelligence that is provably logically realizable [already proved]

(2) aligned superintelligence is not just logically but also physically realizable [TBD]

(3) ML interpretability/mechanistic interpretability cannot possibly be logically necessary for aligned superintelligence [TBD]

(4) ML interpretability/mechanistic interpretability cannot possibly be logically sufficient for aligned superintelligence [TBD]

(5) given certain minimal intelligence, minimal emulation ability of humans by AI (e.g. understands common-sense morality and cause and effect) and of AI by humans (humans can do multiplications etc) the internal details of AI models cannot possibly make a difference to the set of realizable good outcomes, though they can make a difference to the ease/efficiency of realizing them [TBD]

(6) given near-perfect or perfect technical alignment (=AI will do what the creators ask of it with correct intent) awful outcomes are Nash Equilibrium for rational agents [TBD]

(7) small or even large alignment deviations make no fundamental difference to outcomes - the boundary between good/bad is determined by game theory, mechanism design and initial conditions, and only by a satisficing condition on alignment fidelity which is below the level of alignment of current humans (and AIs) [TBD]

(8) There is no such thing as superintelligence anyway because intelligence factors into many specific expert systems rather than one all-encompassing general purpose thinker. No human has a job as a “thinker” - we are all quite specialized. Thus, it doesn’t make sense to talk about “aligning superintelligence”, but rather about “aligning civilization” (or some other entity which has the ability to control outcomes) [TBD]

Reply11
Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko2y100

Since a few people have mentioned the Miller/Rootclaim debate:

My hourly rate is $200. I will accept a donation of $5000 to sit down and watch the entire Miller/Rootclaim debate (17 hours of video content plus various supporting materials) and write a 2000 word piece describing how I updated on it and why.

Anyone can feel free to message me if they want to go ahead and fund this.

Reply4
Far-UVC Light Update: No, LEDs are not around the corner (tweetstorm)
Roko3y*80

Whilst the LEDs are not around the corner, I think the Kr-Cl excimer lamps might already be good enough.

When we wrote the original post on this, it was not clear how quickly covid was spreading through the air, but I think it is now clear that covid can hang around for a long time (on the order of minutes or hours rather than seconds) and still infect people.

It seems that a power density of 0.25W/m^2 would probably be enough to sterilize air in 1-2 minutes, meaning that a 5m x 8m room would need a 10W source. Assuming 2% efficiency that 10W source needs 500W electrical, which is certainly possible and in the days of incandescent lights you would have had a few 100W bulbs anyway.

EDIT: Having looked into this a bit more, it seems that right now the low efficiency of excimer lamps is not a binding constraint because the legally allowed far-UVC exposure is so low.

"TLV exposure limit for 222 nm (23 mJ cm^−2)"

23 mJ per cm^2 per day is just 0.002 W/m^2 , so you really don't need much power until you hit legal limitations.

Source

Reply
The Problem
Roko7d20

I'm not sure Roko is arguing that it's impossible for capitalist structures and reforms to make a lot of people worse off

Exactly. It's possible and indeed happens frequently.

Reply
The Problem
Roko15d30

I think you can have various arrangements that are either of those or a combination of the two.

Even if the Guardian Angels hate their principal and want to harm them, it may be the case that multiple such Guardian Angels could all monitor each other and the one that makes the first move against the principal is reported (with proof) to the principal by at least some of the others, who are then rewarded for that and those who provably didn't report are punished, and then the offender is deleted.

The misaligned agents can just be stuck in their own version of Bostrom's self-reinforcing hell.

As long as their coordination cost is high, you are safe.

Also it can be a combination of many things that cause agents to in fact act aligned with their principals.

Reply
The Problem
Roko16d20

It sure seems to me that there is a clear demarcation between AIs and humans, such that the AIs would be able to successfully collude against humans

I think this just misunderstands how coordination works.

The game theory of who is allowed to coordinate with who against whom is not simple.

White Germans fought against white Englishmen who are barely different, but each tried to ally with distantly related foreigners.

Ultimately what we are starting to see is that AI risk isn't about math or chips or interpretability, it's actually just politics.

Reply
The Problem
Roko19d20

yes, but yet again, it was because of how Africans were not considered part of the system of property rights. They were owned, not owners.

Reply
The Problem
Roko19d40

It's mostly not because of altruism, it's because we have a property rights system, rule of law, etc.

And you can have degrees of cooperation between heterogenous agents. Full atomization and Borg are not the only two options.

Reply
The Problem
Roko19d30

should've said "most".

That's just run-of-the-mill history though.

Reply
The Problem
Roko19d20

But I'm obviously talking about a very different kind of system which is more Borg-like and less market-like.

but then you have to justify why a borg-like monoculture will actually be competitive, as opposed to an ecosystem of many different kinds of entity and many different game-theoretic alliances/teams that these diverse entities belong to.

Reply
Load More
-8Turing-Test-Passing AI implies Aligned AI
8mo
29
13Is AI alignment a purely functional property?
Q
9mo
Q
8
33What is MIRI currently doing?
Q
9mo
Q
14
8The Dissolution of AI Safety
9mo
44
7What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Q
10mo
Q
16
9The ELYSIUM Proposal - Extrapolated voLitions Yielding Separate Individualized Utopias for Mankind
11mo
18
7A Heuristic Proof of Practical Aligned Superintelligence
11mo
6
0A Nonconstructive Existence Proof of Aligned Superintelligence
1y
80
66Ice: The Penultimate Frontier
1y
56
1Less Wrong automated systems are inadvertently Censoring me
2y
52
Load More