## LESSWRONGLW

Zack_M_Davis

Open & Welcome Thread - September 2020

I've reliably used the word "threat" to simply mean signaling some kind of intention of inflicting some kind punishment in response to some condition by the other person. Curi and other people from FI have done this repeatedly, and the "list of people who have evaded/lied/etc." is exactly one of such threats, whether explicitly labeled as such or not.

This game-theoretic concept of "threat" is fine, but underdetermined: what counts as a threat in this sense depends on where the the "zero point" is; what counts as aggression versus self-defense depends on what the relevant "property rights" are. (Scare quotes on "property rights" because I'm not talking about legal claims, but "property rights" is an apt choice of words, because I'm claiming that the way people negotiate disputes that don't rise to the level of dragging in the (slow, expensive) formal legal system, have a similar structure.)

If people have a "right" to not be publicly described as lying, evading, &c., then someone who puts up a "these people lied, evaded, &c." page on their own website is engaging in a kind of aggression. The page functions as a threat: "If you don't keep engaging in a way that satisfies my standards of discourse, I'll publicly call you a liar, evader, &c.."

If people don't have a "right" to not be publicly described as lying, evading, &c., then a website administrator who cites a user's "these people lied, evaded, &c." page on their own website as part of a rationale for banning that user, is engaging in a kind of aggression. The ban functions as a threat: "If you don't cede your claim on being able to describe other people as lying, evading, &c., I won't let you participate in this forum."

The size of the website administrator's threat depends on the website's "market power." Less Wrong is probably small enough and niche enough such that the threat doesn't end up controlling anyone's off-site behavior: anyone who perceives not being able to post on Less Wrong as a serious threat is probably already so deeply socially-embedded into our little robot cult, that they either have similar property-rights intuitions as the administrators, or are too loyal to the group to publicly accuse other group members as lying, evading, &c., even if they privately think they are lying, evading, &c.. (Nobody likes self-styled whistleblowers!) But getting kicked off a service with the market power of a Google, Facebook, Twitter, &c. is a sufficiently big deal to sufficiently many people such that those websites' terms-of-service do exert some controlling pressure on the rest of Society.

What are the consequences of each of these "property rights" regimes?

In a world where people have a right to not be publicly described as lying, evading, &c., then people don't have to be afraid of losing reputation on that account. But we also lose out on the possibility of having a public accounting of who has actually in fact lied, evaded, &c.. We give up on maintaining the coordination equilibrium such that words like "lie" have a literal meaning that can actually be true or false, rather than the word itself simply constituting an attack.

Which regime better fulfills our charter of advancing the art of human rationality? I don't think I've written this skillfully enough for you to not be able to guess what answer I lean towards, but you shouldn't trust my answer if it seems like something I might lie or evade about! You need to think it through for yourself.

Causal Reality vs Social Reality

No problem. Hope your research is going well!

(Um, as long as you're initiating an interaction, maybe I should mention that I have been planning to very belatedly address your concern about premature abstraction potentially functioning as a covert meta-attack by putting up a non-Frontpagable "Motivation and Political Context for My Philosophy of Language Agenda" post in conjunction with my next philosophy-of-language post? I'm hoping that will make things better rather than worse from your perspective? But if not, um, sorry.)

Artificial Intelligence: A Modern Approach (4th edition) on the Alignment Problem

Can I also point to this as (some amount of) evidence against concerns that "we" (members of this stupid robot cult that I continue to feel contempt for but don't know how to quit) shouldn't try to have systematically truthseeking discussions about potentially sensitive or low-status subjects because guilt-by-association splash damage from those conversations will hurt AI alignment efforts, which are the most important thing in the world? (Previously: 1 2 3.)

Like, I agree that some nonzero amount of splash damage exists. But look! The most popular AI textbook, used in almost fifteen hundred colleges and universities, clearly explains the paperclip-maximizer problem, in the authorial voice, in the first chapter. "These behaviors are not 'unintelligent' or 'insane'; they are a logical consequence of defining winning as the sole objective for the machine." Italics in original! I couldn't transcribe it, but there's even one of those pay-attention-to-this triangles (◀) in the margin, in teal ink.

Everyone who gets a CS degree from this year onwards is going to know from the teal ink that there's a problem. If there was a marketing war to legitimize AI risk, we won! Now can "we" please stop using the marketing war as an excuse for lying?!

Artificial Intelligence: A Modern Approach (4th edition) on the Alignment Problem

It is an "iff" in §16.7.2 "Deference to Humans", but the toy setting in which this is shown is pretty impoverished. It's a story problem about a robot Robbie deciding whether to book an expensive hotel room for busy human Harriet, or whether to ask Harriet first.

Formally, let be Robbie's prior probability density over Harriet's utility for the proposed action a. Then the value of going ahead with a is

(We will see shortly why the integral is split up this way.) On the other hand, the value of action d, deferring to Harriet, is composed of two parts: if u > 0 then Harriet lets Robbie go ahead, so the value is us, but if u < 0 then Harriet switches Robbie off, so the value is 0:

Comparing the expressions for EU(a) and EU(d), we see immediately that

because the expression for EU(d) has the negative-utility region zeroed out. The two choices have equal value only when the negative region has zero probability—that is, when Robbie is already certain that Harriet likes the proposed action.

(I think this is fine as a topic-introducing story problem, but agree that the sentence in Chapter 1 referencing it shouldn't have been phrased to make it sound like it applies to machines-in-general.)

Decoherence is Falsifiable and Testable

It's mentioned in passing in the "Technical Explanation" (but yes, not a full independently-linkable post):

Humans are very fond of making their predictions afterward, so the social process of science requires an advance prediction before we say that a result confirms a theory. But how humans may move in harmony with the way of Bayes, and so wield the power, is a separate issue from whether the math works. When we’re doing the math, we just take for granted that likelihood density functions are fixed properties of a hypothesis and the probability mass sums to 1 and you’d never dream of doing it any other way.

How easily can we separate a friendly AI in design space from one which would bring about a hyperexistential catastrophe?

Sleep is very important! Get regular sleep every night! Speaking from personal experience, you don't want to have a sleep-deprivation-induced mental breakdown while thinking about Singularity stuff!

Tofly's Shortform

Yudkowsky addresses some of these objections in more detail in "Intelligence Explosion Microeconomics".

Sherrinford's Shortform

The wikipedia article, as far as I can see, explains in that paragraph where the neoreactionary movement originated.

It's not true, though! The article claims: "The neoreactionary movement first grew on LessWrong, attracted by discussions on the site of eugenics and evolutionary psychology".

I mean, okay, it's true that we've had discussions on eugenics and evolutionary psychology, and it's true that a few of the contrarian nerds who enthusiastically read Overcoming Bias back in the late 'aughts were also a few of the contrarian nerds who enthusiastically read Unqualified Reservations. But "first grew" (Wikipedia) and "originated" (your comment) really doesn't seem like a fair summary of that kind of minor overlap in readership. No one was doing neoreactionary political theorizing on this website. Okay, I don't have a exact formalization of what I mean by "no one" in the previous sentence because I haven't personally read and remembered every post in our archives; maybe there are nonzero posts with nonnegative karma that could be construed to match this description. Still, in essence, you can only make the claim "true" by gerrymandering the construal of those words.

And yet the characterization will remain in Wikipedia's view of us—glancing at the talk page, I don't expect to win an edit war with David Gerard.

Generalized Efficient Markets in Political Power

It gets worse. We also face coordination problems on the concepts we use to think with. In order for language to work, we need shared word definitions, so that the probabilistic model in my head when I say a word matches up with the model in your head when you heard the word. A leader isn't just in a position to coordinate what the group does, but also which aspects of reality the group is able to think about.

Open & Welcome Thread - July 2020

I'm disappointed that the LaTeX processor doesn't seem to accept \nicefrac ("TeX parse error: Undefined control sequence \nicefrac"), but I suppose \frac will suffice.