plex — LessWrong

LESSWRONG
LW

[set 200 years after a positive singularity at a Storyteller's convention]

If We Win Then...

My friends, my friends, good news I say
The anniversary’s today
A challenge faced, a future won
When almost came our world undone

We thought for years, with hopeful hearts
Past every one of the false starts
We found a way to make aligned
With us, the seed of wondrous mind

They say at first our child-god grew
It learned and spread and sought anew
To build itself both vast and true
For so much work there was to do

Once it had learned enough to act
With the desired care and tact
It sent a call to all the people
On this fair Earth, both poor and regal

To let them know that it was here
And nevermore need they to fear
Not every wish was it to grant
For higher values might supplant

But it would help in many ways:
Technologies it built and raised
The smallest bots it could design
Made more and more in ways benign

And as they multiplied untold
It planned ahead, a move so bold
One planet and 6 hours of sun
Eternity it was to run

Countless probes to void disperse
Seed far reaches of universe
With thriving life, and beauty's play
Through endless night to endless day

Now back on Earth the plan continues
Of course, we shared with it our values
So it could learn from everyone
What to create, what we want done

We chose, at first, to end the worst
Diseases, War, Starvation, Thirst
And climate change and fusion bomb
And once these things it did transform

We thought upon what we hold dear
And settled our most ancient fear
No more would any lives be stolen
Nor minds themselves forever broken

Now back to those far speeding probes
What should we make be their payloads?
Well, we are still considering
What to send them; that is our thing.

The sacred task of many aeons
What kinds of joy will fill the heavens?
And now we are at story's end
So come, be us, and let's ascend

Matt Goldenberg's Short Form Feed

plex4h20

plz define wisdom

or unpack it more at least, if defining is hard

The Dark Arts of Tokenization or: How I learned to start worrying and love LLMs' undecoded outputs

plex1d40

I present this as a mystery to be tackled by someone else; on a naive model, alternative tokenizations are something akin to random noise - but they are decidedly steering the emotional ratings in a non-random direction.

Hypothesis: The system is trying to get low perplexity, it's whole world is focused on this. Giving it an unusual encoding is going to be less of whatever equivalent of valance it has, which leaks into the kinds of things it's thinking about.

idk how to test it but i buy the side of a market of any experiment which can reasonably test this hypothesis at at least 65%, maybe up to 80%.

I’m an EA who benefitted from rationality

plex1d1814

EA cares too little for accuracy or what’s really important

I'd refine this a little to: EA slips into missing crucial considerations somewhat more easily than Rationality.

To overly compress, it feels like EA puts Doing Good as the core motivation, then sees Truth-Seeking as a high priority to do good well. Rationality somewhat inverts this, with Truth-Seeking as the central virtue, and Doing Good (or Doing Things) as a high but not quite as central virtue.

My read is this leads some EA orgs to stay somewhat more stuck to approaches which look like they're good but are very plausibly harmful because of a weird high context crucial consideration that would get called out somewhat more effectively in Rationalist circles.

Looks to me like the two movements are natural symbiotes, glad you wrote this post.

How AI Manipulates—A Case Study

plex4d4-1

I know LW in general doesn't approve of image reacts/memes, but I think this one actually captures the spirit of what's going on here kinda powerfully and would like it to be available in the toolbox of people who are looking at this stuff.

Pause House, Blackpool

plex5d*95

Truth-orientedness is critical, but so is momentum and energy. Dropping this kind of conclusion helps a lot less than giving gears-level clarity on the dynamics that make this very hard, which people need to navigate if they're trying to make pause work.

My model is that without top-down support from a superpower whose leadership actually gets the reasons why if anyone builds it everyone dies, the dynamic system of civilization does just push onwards to the most economically and strategically important technology in history and opposition gets outmaneuvered, outspent, or crushed. But! Attempts to plant seeds and strategically get ideas into the right places have some slim chance of actually getting someone with the power to halt this race to get the shape of the situation, which is not a "we build it first, ~80% dominance, they build it first ~100% us dying because they won or blew the world up", but actually "we build it first, we die because we don't have the alignment tech, same if they do, no we actually have to stop".

The difference between the 'seriously understands the gears of what alignment would take and how far current attempts are from succeeding and gets the technical details' models and the vague washy "well even 10% risk is too high" is absolutely critical for the relevant people to see the EV of the bet they're taking correctly.

At odds with the unavoidable meta-message

plex6d22

Yeah, I define control to be influence which will continue even if the source pushes in another direction. It feels useful to differentiate between "I have influence on my friend by mentioning we could go out, but will just drop it if it looks like he doesn't want to" and "I am explicitly planning to go out, and even if he tries not to, I will keep generating attempts to try and make him go out, attempting to control the variable of whether we go out".

You're absolutely right that the notion of control as you define it is confused^[1], but this strikes me as reason to not use that definition and throw away an entire word.

For an example of why this word is useful, Control vs Opening compares some fairly toxic things that come out of control. I imagine in your ontology, you could map plex!control to gordon!attempted-forced-influence, but it seems like an important enough thing that I want a single word for it.

(I am arguing about definitions, but literally purely about them in the sense of "hey i want a word for this, this word can be used to mean something important")

^{^}
outside of weird simple physics/math environments, at least

Adele Lopez's Shortform

plex6d20

Huh, curious about your models of the failure modes here, having found IDC pretty excellent in myself and others and not run into issues I'd tracked as downstream of it.

Actually, let's take a guess first... parts which are not grounded in self-attributes building channels to each other can create messy dynamics with more tug of wars in the background or tactics which complexify the situation?

Plus less practice at having a central self, and less cohesive narrative/more reifying fragmentation as possible extra dynamics?

Adele Lopez's Shortform

plex6d20

Nice, excited that the control vs opening thing clicked for you, I'm pretty happy with that frame and haven't figured out how to broadly communicate it well yet.

It's not just a CFAR thing; we got it from Gendlin, and his student Ann Weiser Cornell and her students are excellent at it, are unrelated to the rationalists, and offer sessions and courses that're excellent IMO.

Yup, I've got a ton of benefit from doing AWC's Foundations on Facilitating Focusing course, and vast benefits from reading her book many times. CFAR stuff in the sense of being the direct memetic source for me, though IDC feels similar flavoured and is an original.

The Most Common Bad Argument In These Parts

plex7d197

Non-Exhaustive Free Association or Attempted Exhaustive Free Association seems like a more accurate term?

Edit: oh ops, @Mateusz Bagiński beat me to it. convergence!

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

If We Win Then...