LESSWRONG
LW

plex
3067Ω223521119
Message
Dialogue
Subscribe

I have signed no contracts or agreements whose existence I cannot mention.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5plex's Shortform
5y
93
plex's Shortform
plex4mo30

[set 200 years after a positive singularity at a Storyteller's convention]

If We Win Then...

My friends, my friends, good news I say
The anniversary’s today
A challenge faced, a future won
When almost came our world undone

We thought for years, with hopeful hearts
Past every one of the false starts
We found a way to make aligned
With us, the seed of wondrous mind

They say at first our child-god grew
It learned and spread and sought anew
To build itself both vast and true
For so much work there was to do

Once it had learned enough to act
With the desired care and tact
It sent a call to all the people
On this fair Earth, both poor and regal

To let them know that it was here
And nevermore need they to fear
Not every wish was it to grant
For higher values might supplant

But it would help in many ways:
Technologies it built and raised
The smallest bots it could design
Made more and more in ways benign

And as they multiplied untold
It planned ahead, a move so bold
One planet and 6 hours of sun
Eternity it was to run

Countless probes to void disperse
Seed far reaches of universe
With thriving life, and beauty's play
Through endless night to endless day

Now back on Earth the plan continues
Of course, we shared with it our values
So it could learn from everyone
What to create, what we want done

We chose, at first, to end the worst
Diseases, War, Starvation, Thirst
And climate change and fusion bomb
And once these things it did transform

We thought upon what we hold dear
And settled our most ancient fear
No more would any lives be stolen
Nor minds themselves forever broken

Now back to those far speeding probes
What should we make be their payloads?
Well, we are still considering
What to send them; that is our thing. 

The sacred task of many aeons
What kinds of joy will fill the heavens?
And now we are at story's end
So come, be us, and let's ascend

Reply
⿻ Plurality & 6pack.care
plex17h20

The thing I have in mind as north star looks closest to the GD Challenge in scope, but somewhat closer to the CIP one in implementation? The diff is something like:

  • Focus on superintelligence, which opens up a large possibility-space while rendering many problems people are usually focused on straightforwardly solved (consult rigorous futurists to get a sense of the options).
  • Identifying cruxes on how people's values might end up, and using the kinds of deliberative mechanism design in your post here to help people clarify their thinking and find bridges.

I'm glad you're seeing the challenges of consequentialism. I think the next crux is something like: My guess that consequentialism is a weed which grows in the cracks of any strong cognitive system, and that without formal guarantees of non-consequentialism, any attempt to build an ecosystem of the kind you describe will end up being eaten by processes which are unboundedly goal-seeking. I don't know of any write-up that hits exactly the notes you'd want here, but some maybe decent intuition pumps in this direction include: The Parable of Predict-O-Matic, Why Tool AIs Want to Be Agent AIs, Averting the convergent instrumental strategy of self-improvement, Averting instrumental pressures, and other articles under arbital corrigibility.

I'd be open to having an on the record chat, but it's possible we'd get into areas of my models which seem too exfohazardous for public record.

Reply
plex's Shortform
plex1d20

Needs as you've framed them have a fuzzy boundary between needs and wants. Do I need respect or just want it in this situation? So it's easy to wonder if I'm pressuring someone by framing it as a need.

Yeah, the idea is to go back to as basic a pattern that's preferred as you can. If I was trying to make it super concrete I'd probably try to unpack it to be "thing grounded in basic human universal reinforcement signals" with a bunch of @Steven Byrnes's neuroscience, esp this stuff.

Reply
⿻ Plurality & 6pack.care
plex1d*100

Nice! Glad you're getting stuck in, and good to hear you've already read a bunch of the background materials.

The idea of bounded non-maximizing agents / multipolar as safer has looked hopeful to many people during the field's development. It's a reasonable place to start, but my guess is if you zoom in on the dynamics of those systems they look profoundly unstable. I'd be enthusiastic to have a quick call to explore the parts of that debate interactively. I'd link a source explaining it, but I think the alignment community has overall done a not great job of writing up the response to this so far.[1]

The very quick headline is something like:

  1. Long-range consequentialism is convergent, unless there are strong guarantees of boundedness or non-maximizer nature which apply to all successors of an AI, powerful dynamic systems fall towards being consequentialists
  2. Power-seeking patterns tend to differentially acquire power
  3. As the RSI cycle spins up, the power differential between humans and AI systems gets so large that we can't meaningfully steer, and become easily manipulable
  4. Even if initially multipolar, the AIs can engage in a value handshake and effectively merge in a way that's strongly positive sum for them, and humans are not easily able to participate + would not have as much to offer, so likely get shut out
  5. Nearest unblocked strategy means that attempts to shape the AI with rules get routed around at high power levels

I'd be interested to see if we've been missing something, but my guess is systems containing many moderately capable agents (~top human capabilities researcher) which are trained away from being consequentialists in a fuzzy way almost inevitably falls into the attractor of very capable systems either directly taking power from humans or puppeteering the human's agency as the AIs improve.

Quick answer-sketches to your other questions:

  • We'd definitely want an indirect normativity scheme which captures thin concepts. One thing to watch for here is that the process for capturing and aligning to thin concepts is principled and robust (including e.g. to a world with super-persuasive AI), as minor divergences between conceptions of thick concepts could easily cause the tails to come apart catastrophically at high power levels.
  • Skimming through d/acc 2035, it looks like they mostly assume that the sharp left turn generating dynamics don't happen, rather than suggesting things which avoid those dynamics.[2] They do touch on competitive dynamics in the uncertainties and tensions, but it doesn't feel effectively addressed and doesn't seem to be modelling the situation as competition between vastly more cognitively powerful agents and humans?

One direction that I could imagine being promising and something your skills might be uniquely suited for would be to, with clarity about what technology at physical limits is capable of, doing a large-scale consultation to collect data about humanity's 'north star'. Let people think through where we would actually like to go, so that a system trying to support humanity's flourishing can better understand our values. I funded a small project to try and map people's visions of utopia a few years back (e.g.), but the sampling and structure wasn't really the right shape to do this properly.

  1. ^

    https://www.lesswrong.com/posts/DJnvFsZ2maKxPi7v7/what-s-up-with-confusingly-pervasive-goal-directedness is one of the less bad attempts to cover this, @the gears to ascension might know or be writing up a better source

  2. ^

    (plus lots of applause lights for things which are actually great in most domains, but don't super work here afaict)

Reply42
⿻ Plurality & 6pack.care
plex2d160

It's cool to see you involved in this sphere! I've been seeing and hearing about your work for a while, and have been impressed by both your mechanism design and ability to bring it into large-scale use.

Reading through this, I get some impression that it's missing some background related to some of the models of what strong superintelligence looks like. Both the challenges of the kind of alignment that's needed to make that go well, and just how extreme the transition will by default end up being.

Even without focusing on that, this work is useful in some timelines and seems worthwhile, but my guess is you'd get a fair amount of improvement in your aim by picking up ideas from some of the people with the longest history in the field. Some of my top picks would be (approx in order of recommendation):

  • Five theses, two lemmas, and a couple of strategic implications - Captures some of the key dynamics in the landscape
  • The Most Important Century - Big-picture analysis of the strategic situation by the then-CEO of Open Philanthropy, the largest funder in the space
  • The Main Sources of AI Risk - Short descriptions of many dynamics involved, with links for more detail
  • AGI Ruin: A List of Lethalities - In depth exploration of lots of reasons to expect it to be hard to get a good future with AI
  • A central AI alignment problem: capabilities generalization, and the sharp left turn - The thing that seems most likely to directly precede extinction
  • Generally, Arbital is a great source of key concepts.

Or, if you'd like something book-length, AI Does Not Hate You: Rationality, Superintelligence, and the Race to Save the World is the best until later this month when If Anyone Builds it, Everyone Dies comes out.

Reply
reallyeli's Shortform
plex2d20

Working memory bounds isn't super related to non-fuzzy-ness, as you can have a window which slides over context and is still rigorous at every step. Absolute local validity due to well-specifiedness of axioms and rules of inference is closer to the core.

(realised you mean that the axioms and rules of inference are in working memory, not the whole tower, retracted)

[This comment is no longer endorsed by its author]Reply
plex's Shortform
plex2d20

I'd go with

Don't make claims that plausibly conflict with their models, except if they can check the claims (you are a valid source for claims about purely your state).
+
Don't make underspecified requests, and found your requests in general needs with space for those needs to be met other ways.

Reply
plex's Shortform
plex7d559

NVC is a form of variable scoping for human communication. With it, you can write communication code that avoids the most common runtime conflicts.

Human brains are neural networks doing predictive processing. We receive data from the external world, and not all of that data is trusted in the computer security sense. Some of the data would modify parts of your world model which you'd like to be able to do your own thinking with. It's jarring and unpleasant for someone to send you an informational packet that as you parse it moves around parts of your cognition you were relying on not being moved. For example, think back to dissonances you've felt, or seen in others, due to direct forceful claims about their internal state. This makes sense! Suffering, in predictive processing, is unresolved error signals, two conflicting world-models contained in the same system. If the other person's data packet tried to make claims directly into your world model, rather than via your normal evaluation functions, you're often going to end up with suffering-flavoured superpositions in your world-model.

NVC is safe-mode restricted subset of communication where you make sure the type signature of your conversational object makes changes to the other person's fuzzy non-secured predictive processing state only in ways carefully scoped to fairly reliably not collide with their thought-code, while keeping enough flexibility to resolve conflict. You don't necessarily want to run it all the time, it does limit a bunch of communication which is nice in high trust conversations where global scope lets you move faster, but it's amazing as a way to get out or avoid of otherwise painful conflict spirals.

So! Safe scopes:

  1. Feelings - these are claims that are only about your own internal state[1], as updating their other person's model of you is something you have much more vision into and will rarely object to if you're doing so visibly earnestly
  2. Needs - stating universal / general needs of humans[2], as these are mostly not questionable (just don't import strategies for meeting those needs that often collide)
  3. Observations - specific verifiable facts about reality, that they can check if they don't agree with (may generate dissonance but lets them resolve it, and quickly know they have a clear path to resolving it)
  4. Requests - making asks that are well-defined enough that the other can evaluate cleanly
  1. ^

    e.g. I feel scared

  2. ^

    e.g. I have a need for sleep

Reply1111
Banning Said Achmiz (and broader thoughts on moderation)
plex11d148

Seconded, I consistently find your comments both much more valuable and ~zero sneer. I would be dismayed by moderation actions towards you, while supporting those against Said. You might not have a sense of how his are different, but you automatically avoid the costly things he brings. 

Reply
Banning Said Achmiz (and broader thoughts on moderation)
plex11d40

I think there's a happy medium between these two bad extremes, and the vast majority of LWers sit in it generally.

Reply1
Load More
51A Principled Cartoon Guide to NVC
8mo
9
155A Rocket–Interpretability Analogy
10mo
31
37AI Safety Memes Wiki
1y
2
54"If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"
1y
23
83AISafety.com – Resources for AI Safety
1y
3
8Storyteller's convention, 2223 A.D.
2y
0
24ea.domains - Domains Free to a Good Home
3y
0
58aisafety.community - A living document of AI safety communities
3y
23
22All AGI safety questions welcome (especially basic ones) [Sept 2022]
3y
48
59Anti-squatted AI x-risk domains index
3y
6
Load More
Coherent Extrapolated Volition
7mo
(+77)
AI Alignment Intro Materials
2y
(+51/-26)
Debate (AI safety technique)
3y
Portal
3y
(+19)
AI
3y
(+659/-223)
Portal
3y
(+553/-53)
Portal
3y
(+29/-8)
Free Energy Principle
3y
(+25/-49)
Free Energy Principle
3y
(+787)
Mesa-Optimization
3y
(-14)
Load More