owngrove — LessWrong

Open Thread Fall 2024

Happened on this song on Tiny Desk: Paperclip Maximizer (by Rosie Tucker, from an album titled "Utopia Now!").

Paperclip maximizer
Single minded if you mind at all
A paragon of puritanical panoptical persistence
Everybody envies your resolve
Paperclip maximizer
Mining for a better way
No ontological contention
Tends your content generation
Every sorrow makes a link in the chain
[...]
And the shareholders meet gruesome ends
But the cosmos expands
So the market survives
All the better to bear all your office supplies
And the space they require was once occupied
By the sun
On your hair
And the curve
Of your thighs
Horizon of sighs
Destroyer of worlds

Principles of Privacy for Alignment Research

owngrove3y10

I think "alignment/capabilities > 1" is a closer heuristic than "alignment/capabilities > average", in the sense of '[fraction of remaining alignment this solves] / [fraction of remaining capabilities this solves]'. That's a sufficient condition if all research does it, though not IRL e.g. given pure capabilities research also exists; but I think it's still a necessary condition for something to be net helpful.

Where I agree and disagree with Eliezer

owngrove3y30

Seconding all of this.

Another way to state your second point - the only way to exploit that free energy may be through something that looks a lot like a 'pivotal act'. And in your third point, there may be no acceptable way to exploit that free energy, in which case the only option is to prevent any equally-capable unaligned AI from existing - not necessarily through a pivotal act, but Eliezer argues that's the only practical way to do so.

I think the existence/accessibility of these kinds of free energy (offense-favored domains whose exploitation is outside of the Overton window or catastrophic) this is a key crux for 'pivotal act' vs. gradual risk reduction strategies, plausibly the main one.

In the terms of Paul's point #2 - this could still be irrelevant because earlier AI systems will have killed us in more boring ways, but the 'radically advancing the state of human R&D' branch may not meaningfully change our vulnerability. I think this motivates the 'sudden doom' story even if you predict a smooth increase in capabilities.

Where I agree and disagree with Eliezer

owngrove4y4234

One reason you might do something like "writing up a list but not publishing it" is if you perceive yourself to be in a mostly-learning mode rather than a mostly-contributing one. You don't want to dilute the discussion with your thoughts that don't have a particularly good chance of adding anything, and you don't want to be written off as someone not worth listening to in a sticky way, but you want to write something down develop your understanding / check against future developments / record anything that might turn out to have value later after all once you understand better.

Of course, this isn't necessarily an optimal or good strategy, and people might still do it when it isn't - I've written down plenty of thoughts on alignment over the years, I think many of the actual-causal-reasons I'm a chronic lurker are pretty dumb and non-agentic - but I think people do reason like this, explicitly or implicitly.

There's a connection here to concernedcitizen64's point about your role as a community leader, inasmuch as your claims about the quality of the field can significantly influence people's probabilities that their ideas are useful / that they should be in a contributing mode, but IMO it's more generally about people's confidence in their contributions.

Overall I'd personally guess "all the usual reasons people don't publish their thoughts" over "fear of the reception of disagreement with high-status people" as the bigger factor here; I think the culture of LW is pretty good at conveying that high-quality criticism is appreciated.

Pivotal outcomes and pivotal processes

owngrove4y50

I think the debate really does need to center on specific pivotal outcomes, rather than how the outcomes come about. The sets of pivotal outcomes attainable by pivotal acts v.s. by pivotal processes seem rather different.

I suspect your key crux with pivotal-act advocates is whether there actually exist any pivotal outcomes that are plausibly attainable by pivotal processes. Any advantages that more distributed pivotal transitions have in the abstract are moot if there are no good concrete instantiations.

For example, in the stereotypical pivotal act, the pivotal outcome is that no (other) actors possess the hardware to build an AGI. It's clear how this world state is safe from AGI, and how a (AGI-level) pivotal act could in principle achieve it. It's not clear to me that a plausible pivotal process could achieve it. (Likewise, for your placeholder AI immune system example, it's not clear to me either that this is practically achievable or that it would be pivotal.)

This crux is probably downstream of other disagreements about how much distributed means (governance, persuasion, regulation, ?) can accomplish, and what changes to the world suffice for safety. I these would be more productive to debate in the context of a specific non-placeholder proposal for a pivotal process.

It's certainly fair to argue that there are downsides to pivotal acts, and that we should prefer a pivotal process if possible, but IMO the hard part is establishing that possibility. I'm not 100% confident that a pivotal transition needs to look like a surprise unilateral act, but I don't know of any similarly concrete alternative proposals for how we end up in a safe world state.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments