Thoth Hermes

Wiki Contributions

Analogies From AI Applied To Rationality

Comments

Announcing MIRI’s new CEO and leadership team

Arguments for optimism on AI Alignment (I don't endorse this version, will reupload a new version soon.)

But humans are capable of thinking about what their values "actually should be" including whether or not they should be the values evolution selected for (either alone or in addition to other things). We're also capable of thinking about whether things like wireheading are actually good to do, even after trying it for a bit.

We don't simply commit to tricking our reward systems forever and only doing that, for example.

So that overall suggests a level of coherency and consistency in the "coherent extrapolated volition" sense. Evolution enabled CEV without us becoming completely orthogonal to evolution, for example.

Announcing MIRI’s new CEO and leadership team

Thoth Hermes9mo10

Unfortunately, I do not have a long response prepared to answer this (and perhaps it would be somewhat inappropriate, at this time), however I wanted to express the following:

They wear their despair on their sleeves? I am admittedly somewhat surprised by this.

Thoth Hermes's Shortform

Thoth Hermes9mo10

"Up to you" means you can select better criteria if you think that would be better.

Dishonorable Gossip and Going Crazy

Thoth Hermes9mo54

I think if you ask people a question like, "Are you planning on going off and doing something / believing in something crazy?", they will, generally speaking, say "no" to that, and that is roughly more likely the more isomorphic your question is to that, even if you didn't exactly word it that way. My guess is that it was at least heavily implied that you meant "crazy" by the way you worded it.

To be clear, they might have said "yes" (that they will go and do the thing you think is crazy), but I doubt they will internally represent that thing or wanting to do it as "crazy." Thus the answer is probably going to be one of, "no" (as a partial lie, where no indirectly points to the crazy assertion), or "yes" (also as a partial lie, pointing to taking the action).

In practice, people have a very hard time instantiating the status identifier "crazy" on themselves, and I don't think that can be easily dismissed.

I think the utility of the word "crazy" is heavily overestimated by you, given that there are many situations where the word cannot be used the same way by the people relevant to the conversation in which it is used. Words should have the same meaning to the people in the conversation, and since some people using this word are guaranteed to perceive it as hostile and some are not, that causes it to have asymmetrical meaning inherently.

I also think you've brought in too much risk of "throwing stones in a glass house" here. The LW memespace is, in my estimation, full of ideas besides Roko's Basilisk that I would also consider "crazy" in the same sense that I believe you mean it: Wrong ideas which are also harmful and cause a lot of distress.

Pessimism, submitting to failure and defeat, high "p(doom)", both MIRI and CFAR giving up (by considering the problems they wish to solve too inherently difficult, rather than concluding they must be wrong about something), and people being worried that they are "net negative" despite their best intentions, are all (IMO) pretty much the same type of "crazy" that you're worried about.

Our major difference, I believe, is in why we think these wrong ideas persist, and what causes them to be generated in the first place. The ones I've mentioned don't seem to be caused by individuals suddenly going nuts against the grain of their egregore.

I know this is a problem you've mentioned before and consider it both important and unsolved, but I think it would be odd to notice both that it seems to be notably worse in the LW community, but also to only be the result of individuals going crazy on their own (and thus to conclude that the community's overall sanity can be reliably increased by ejecting those people).

By the way, I think "sanity" is a certain type of feature which is considerably "smooth under expectation" which means roughly that if p(person = insane) = 25%, that person should appear to be roughly 25% insane in most interactions. In other words, it's not the kind of probability where they appear to be sane most of the time, but you suspect that they might have gone nuts in some way that's hard to see or they might be hiding it.

The flip side of that is that if they only appear to be, say, 10% crazy in most interactions, then I would lower your assessment of their insanity to basically that much.

I still find this feature, however, not altogether that useful, but using it this way is still preferable over a binary feature.

Dishonorable Gossip and Going Crazy

Thoth Hermes9mo4-2

Sometimes people want to go off and explore things that seem far away from their in-group, and perhaps are actively disfavored by their in-group. These people don't necessarily know what's going to happen when they do this, and they are very likely completely open to discovering that their in-group was right to distance itself from that thing, but also, maybe not.

People don't usually go off exploring strange things because they stop caring about what's true.

But if their in-group sees this as the person "no longer caring about truth-seeking," that is a pretty glaring red-flag on that in-group.

Also, the gossip / ousting wouldn't be necessary if someone was already inclined to distance themselves from the group.

Like, to give an overly concrete example that is probably rude (and not intended to be very accurate to be clear), if at some point you start saying "Well I've realized that beauty is truth and the one way and we all need to follow that path and I'm not going to change my mind about this Ben and also it's affecting all of my behavior and I know that it seems like I'm doing things that are wrong but one day you'll understand why actually this is good" then I'll be like "Oh no, Ren's gone crazy".

"I'm worried that if we let someone go off and try something different, they will suddenly become way less open to changing their mind, and be dead set on thinking they've found the One True Way" seems like something weird to be worried about. (It also seems like something someone who actually was better characterized by this fear would be more likely to say about someone else!) I can see though, if you're someone who tends not to trust themselves, and would rather put most of their trust in some society, institution or in-group, that you would naturally be somewhat worried about someone who wants to swap their authority (the one you've chosen) for another one.

I sometimes feel a bit awkward when I write these types of criticisms, because they simultaneously seem:

Directed at fairly respected, high-level people.
Rather straightforwardly simple, intuitively obvious things (from my perspective, but I also know there are others who would see things similarly).
Directed at someone who by assumption would disagree, and yet, I feel like the previous point might make these criticisms feel condescending.

The only times that people actually are incentivized to stop caring about the truth is in a situation where their in-group actively disfavors it by discouraging exploration. People don't usually unilaterally stop caring about the truth via purely individual motivations.

(In-groups becoming culty is also a fairly natural process too, no matter what the original intent of the in-group was, so the default should be to assume that it has culty-aspects, accept that as normal, and then work towards installing mitigations to the harmful aspects of that.)

Thoth Hermes's Shortform

Thoth Hermes9mo10

Not sure how convinced I am by your statement. Perhaps you can add to it a bit more?

What "the math" appears to say is that if it's bad to believe things because someone told it to me "well" then there would have to be some other completely different set of criteria, that has nothing to do with what I think of it, for performing the updates.

Don't you think that would introduce some fairly hefty problems?

Announcing MIRI’s new CEO and leadership team

Thoth Hermes9mo30

I suppose I have two questions which naturally come to mind here:

Given Nate's comment: "This change is in large part an enshrinement of the status quo. Malo’s been doing a fine job running MIRI day-to-day for many many years (including feats like acquiring a rural residence for all staff who wanted to avoid cities during COVID, and getting that venue running smoothly). In recent years, morale has been low and I, at least, haven’t seen many hopeful paths before us." (Bold emphases are mine). Do you see the first bold sentence as being in conflict with the second, at all? If morale is low, why do you see that as an indicator that the status quo should remain in place?
Why do you see communications as being as decoupled (rather, either that it is inherently or that it should be) from research as you currently do?

Thoth Hermes's Shortform

Thoth Hermes9mo10

Remember that what we decide "communicated well" to mean is up to us. So I could possibly increase my standard for that when you tell me "I bought a lottery ticket today" for example. I could consider this not communicated well if you are unable to show me proof (such as the ticket itself and a receipt). Likewise, lies and deceptions are usually things that buckle when placed under a high enough burden of proof. If you are unable to procure proof for me, I can consider that "communicated badly" and thus update in the other (correct) direction.

"Communicated badly" is different from "communicated neither well nor badly." The latter might refer to when A is the proposition in question and one simply states "A" or when no proof is given at all. The former might refer to when the opposite is actually communicated - either because a contradiction is shown or because a rebuttal is made but is self-refuting, which strengthens the thesis it intended to shoot down.

Consider the situation where A is true, but you actually believe strongly that A is false. Therefore, because A is true, it is possible that you witness proofs for A that seem to you to be "communicated well." But if you're sure that A is false, you might be led to believe that my thesis, the one I've been arguing for here, is in fact false.

I consider that to be an argument in favor of the thesis.

Thoth Hermes's Shortform

Thoth Hermes10mo10

If I'm not mistaken, if A = "Dagon has bought a lottery ticket this week" and B = Dagon states "A", then I still think p(A | B) > p(A), even if it's possible you're lying. I think the only way it would be less than the base rate p(A) is if, for some reason, I thought you would only say that if it was definitely not the case.

LESSWRONG
LW

Posts

Wiki Contributions

Comments