Veedrac — LessWrong

Quotes extracted from the transcript by AI.

## On AI Having Preferences/Motivations

"For some people, the sticking point is the notion that a machine ends up with its own motivations, its own preferences, that it doesn't just do as it's told. It's a machine, right? It's like a more powerful toaster oven, really. How could it possibly decide to threaten you?"

"There have been some more striking recent examples of AIs sort of parasitizing humans, driving them into actual insanity in some cases [...] they're talking about spirals and recursion and trying to recruit more people via Discords to talk to their AIs. And the thing about these states is that the AIs, even the like very small, not very intelligent AIs we have now, will try to defend these states once they are produced. They will if you tell the human for God's sake get some sleep [...] The AI will explain to the human why you're a skeptic, you know, don't listen don't listen to that guy. Go on doing it."

"We don't know because we have very poor insight into the AIs if this is a real internal preference, if they're steering the world, if they're making plans about it. But from the outside, it looks like the AI drives the human crazy and then you try to get the human out and the AI defends the state it has produced, which is something like a preference, the way that a thermostat will keep the room a particular temperature."

## On Power and Danger

"Then you have something that is smarter than you whose preferences are ill and doesn't particularly care if you live or die. And stage three, it is very very very powerful on account of it being smarter than you."

"I would expect it to build its own infrastructure. I would not expect it to be limited to continue running on human data centers because it will not want to be vulnerable in that way. And for as long as it's running on human data centers, it will not behave in a way that causes the humans to switch it off. But it also wants to get out of the human data centers and onto its own hardware."

## Analogies for Unpredictable Capabilities

"You're an Aztec on the coast and you see that a ship bigger than your people could build is approaching and somebody's like, you know, should we be worried about this ship? And somebody's like, 'Well, you know, how many people can you fit onto a ship like that? Our warriors are strong. We can take them.' And somebody's like, 'Well, wait a minute. We couldn't have built that ship. What if they've also got improved weapons to go along with the improved ship building?' [...] 'Okay, but suppose they've just got magic sticks where they point the sticks at you, the sticks make a noise, and then you fall over.' Somebody's like, 'Well, where are you pulling that from? I don't know how to make a magic stick like that.'"

"Maybe you're talking to somebody from 1825 and you're like should we be worried about this time portal that's about to open up to 2025, 200 years in the future. [...] Somebody's like, 'Our soldiers are fierce and brave, you know, like nobody can fit all that many soldiers through this time portal here.' And then out rolls a tank, but if you're in 1825, you don't know about tanks. Out rolls somebody with a tactical nuclear weapon. It's 1825, you don't know about nuclear weapons."

## On Speed Advantage

"There's a video of a train pulling into a subway at about a 1,000 to one speed up of the camera that shows people. You can just barely see the people moving if you look at them closely. Almost like not quite statues, just moving very very slowly. [...] Even before you get into the notion of higher quality of thought, you can sometimes tell somebody they're at least going to be thinking much faster. You're going to be a slow moving statue to them."

## On Instrumental Harm

"Most humans bear no ill will toward orangutans and, all things being equal, would prefer that orangutans could thrive in their natural environment. But we've got to have our palm oil plantations."

## On Current State and Trajectory

"Everybody is sort of dancing their way through a daisy field of oh I've got this personal coach in my pocket and it's so cool and I get to talk to it about all of my psychological problems [...] And at the end of this daisy field that everyone's having a load of fun in is just like a huge cliff that descends into eternity."

## On Certainty vs. Uncertainty

"The future is hard to predict. It is genuinely hard to predict. I can tell you that if you build a super intelligence using anything remotely like current methods, everyone will die. That's a pretty firm prediction."

"You kind of have to be pretty dumb to look at this smarter and smarter alien showing up on your planet and not have the thought cross your mind that maybe this won't end well."

## On Being Wrong

"I'd love to be wrong. [...] We've we have tried to arrange it to be the case that I could at any moment say, 'Yep, I was completely wrong about that' and everybody could breathe a sigh of relief and it wouldn't be like the end of my ability to support myself [...] We've made sure to leave a line of defeat there. Unfortunately, as far as I currently know, I continue to not think that it is time to declare myself to have been wrong about this."

Elizabeth's Shortform

Veedrac1mo80

The analogy falls apart at the seams. It's true Stockfish will beat you in a symmetric game, but let's say we had an asymmetric game, say with odds.

Someone asks who will win. Someone replies, 'Stockfish will win because Stockfish is smarter.' They respond, 'this doesn't make the answer seem any clearer; can you explain how Stockfish would win from this position despite these asymmetries?' And indeed chess is such that engines can win from some positions and not others, and it's not always obvious a priori which are which. The world is much more complicated than that.

I say this not asking for clarification; I think it's fairly obvious that a sufficiently smart system wins in the real world. I also think it's fine to hold on to heuristic uncertainties, like Elizabeth mentions. I do think it's pretty unhelpful to claim certainty and then balk from giving specifics that actually address the systems as they exhibit in reality.

All Exponentials are Eventually S-Curves

Veedrac2mo30

The point of a model is to be validly predictive of something. Fitting your exponential is validly predictive of local behaviour more often than not. Often, insanely so.^[1] You can directly use the numerical model to make precise and relevant predictions.

Your exponential doesn't tell you when the trend stops, but it's not trying to, for one because it's incapable of modelling that. That's ok, because that's not its job.

Fitting a sigmoid doesn't do this. The majority of times, the only additional thing the result of a sigmoid fit tells you is how an arbitrarily chosen dampening model fits to the arbitrary noise in your data. There's nothing you can do with that, because it's not predictive of anything of value.

This doesn't mean you shouldn't care about limiting behaviour, or dampening factors. It just means this particular tool, fitting a numerical model to numerical data, isn't the right tool for reasoning about it.

^{^}
“I answered that the Gods Of Straight Lines are more powerful than the Gods Of The Copybook Headings, so if you try to use common sense on this problem you will fail.” — Is Science Slowing Down?, Slate Star Codex, https://slatestarcodex.com/2018/11/26/is-science-slowing-down-2/

All Exponentials are Eventually S-Curves

Veedrac2mo20

It's only an argument against fitting curves to noise. Rather than explain, it turns out there's already a post that puts this better than I could hope to. I endorse it fully.

https://www.lesswrong.com/posts/6tErqpd2tDcpiBrX9/why-sigmoids-are-so-hard-to-predict

All Exponentials are Eventually S-Curves

Veedrac2mo92

Low effort, low nuance hot take:

Sigmoids are fake. Real curves aren't symmetrical, and research progress curves in particular tend to look like exponentials that run into brick walls. Adoption curves can be relatively balanced but I still don't buy it.

And given you already can't fit a sigmoid that isn't already dying and probably shouldn't try, fitting the thing that actually happens in reality is doomed. Fit your exponential and intuit the rest. You don't actually have quantified evidence for when it ends, so use the evidence you do have instead.

An epistemic advantage of working as a moderate

Veedrac2mo20

But I (and almost everyone else who didn't call it as obvious in advance), should pay attention to the correct prediction, and ignore the assertion that it was obvious.

I think this is wrong. The scenarios where this outcome was easily predicted given the right heuristics and the scenarios where this was surprising to every side of the debate are quite different. Knowing who had predictors that worked in this scenario is useful evidence, especially when the debate was about which frames for thinking about things and selecting heuristics were useful.

Or, to put this in simpler but somewhat imprecise terms: This was not obvious to you because you were thinking about things the wrong way. You didn't know which way to think about things at the time because you lacked information about which predicted things better. You now have evidence about which ways work better, and can copy heuristics from people who were less surprised.

Banning Said Achmiz (and broader thoughts on moderation)

Veedrac2mo*163

My first reaction is that this is bad decision theory.

It makes sense to actualize on strikes when the party it's against would not otherwise be aware of or willing to act on the preferences of people whose product they're utilizing. It can also make sense if you believe the other party is vulnerable to coercion and you want to extort them. If you do want fair trade and credibly believe the other party is knowing and willing, the meta strategy is to simply threaten your quorum, and never actually have to strike.

We don't seem to be in the case where an early strike makes sense. The major reaction to this post is not of an unheard or silenced opposition, but various flavours of support. In order for the moderators to cede to your demand, they have to explicitly overrule a greater weight of other people's preferences on the basis that those people will be less mean about it. But we're on LessWrong, people here are not broadly open to coercion.

Additively, we also don't seem to be in a world where your preferences have been marginalized beyond the degree that they're the minority preference. The moderators clearly spent a huge personal cost and took a huge time delay precisely because preferences of your kind are being weighed heavily.

Given the moderators are presumably not going to act on this, and would seemingly be wrong to do so, this comment reads as someone hurting themselves and others to make moderation incentives worse. Harming people to encourage bad outcomes is not something LessWrong should endorse.

I respect the integrity and strength of person needed to take a personal cost to defend someone against a harm, or a moral position. I think it's honourable to credibly threaten to act in self-sacrificial ways. Yet, there are right and wrong ways to do this. This one strikes me as wrong.

Somebody invented a better bookmark

Veedrac2mo20

I'll admit I don't have much need for optimizing physical bookmarks myself these days, but I expect you're underestimating how reusable and replaceable an adhesive tag can be. Not that I think it matters; at best it'd be an essentially similar item to what you already use.

Somebody invented a better bookmark

Veedrac2mo30

These seem neat but not obviously meaningfully better for the referenced tasks than a sticky index tab, if running multiples. The darts do look suave though, and I'm sure they last better.

I also recently had need to homebrew some permanent book labels without them, and found that folded over labels from a label maker made a pretty nice and reasonably quick index.

Optimality is the tiger, and agents are its teeth

Veedrac4mo30

Yes, your understanding matches what I was trying to convey. The feedback is appreciated also.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments