Wiki Contributions


You are way more fallible than you think

Common question: "Well, but what if God was real and actually appeared to you in flame and glory, wouldn't it be silly to not be convinced in that case?"

My answer: "I don't know, do you think my thought patterns are likely to be deployed in such an environment?"

Transcript for Geoff Anders and Anna Salamon's Oct. 23 conversation

I think it can be reasonable to have 100% confidence in beliefs where the negation of the belief would invalidate the ability to reason, or to benefit from reason. Though with humans, I think it always makes sense to leave an epsilon for errors of reason.

Discussion with Eliezer Yudkowsky on AGI interventions

I don't think the verbal/pre-verbal stream of consciousness that describes our behavior to ourselves is identical with ourselves. But I do think our brain exploits it to exert feedback on its unconscious behavior, and that's a large part of how our morality works. So maybe this is still relevant for AI safety.

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

That's true, but ... I feel in most cases, it's a good idea to run mixed strategies. I think that by naivety I mean the notion that any single strategy will handle all cases - even if there are strategies where this is true, it's wrong for almost all of them.

Humans can be stumped, but we're fairly good at dynamic strategy selection, which tends to protect us from being reliably exploited.

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Well, one may develop an AI that handles noisy TV by learning that it can't predict the noisy TV. The idea was to give it a space that is filled with novelty reward, but doesn't lead to a performance payoff.

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

What would stump a (naive) exploration-based AI? One may imagine a game as such: the player starts on the left side of a featureless room. If they go to the right side of the room, they win. In the middle of the room is a terminal. If one interacts with the terminal, one is kicked into an embedded copy of the original Doom.

An exploration-based agent would probably discern that Doom is way more interesting than the featureless room, whereas a human would probably put it aside at some point to "finish" exploring the starter room first. I think this demands a sort of mixed breadth-depth exploration?

I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness

Sure, but that definition is so generic and applies to so many things that are obviously not like human pain (landslides?) that it lacks all moral compulsion.

I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness


Oh God! I am in horrible pain right now! For no reason, my body feels like it's on fire! Every single part of my body feels like it's burning up! I'm being burned alive! Help! Please make it stop! Help me!!

Okay, so that thing that I just said was a lie. I was not actually in pain (I can confirm this introspectively); instead, I merely pretended to be in pain.

Sir Ian McKellen has an instructive video.

The Turing test works for many things, but I don't think it works for checking for the existence of internal phenomenological states. If you asked me what GPT-3 was doing, I would expect it to be closer to "acting" than "experiencing."

(Why? Because the experience of pain is a means to an end, and the end is behavioral aversion. GPT-3 has no behavior to be aversive to. If anything, I'd expect GPT-3 to "experience pain" during training - but of course, it's not aware while its weights are being updated. I think that at least, no system that is offline trained can experience pain at all.)

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

I mostly see where you're coming from, but I think the reasonable answer to "point 1 or 2 is a false dichotomy" is this classic, uh, tumblr quote (from memory):

"People cannot just. At no time in the history of the human species has any person or group ever just. If your plan relies on people to just, then your plan will fail."

This goes especially if the thing that comes after "just" is "just precommit."

My expectation is that interaction with Vassar is that the people who espouse 1 or 2 expect that the people interacting are incapable of precommitting to the required strength. I don't know if they're correct, but I'd expect them to be, because I think people are just really bad at precommitting in general. If precommitting was easy, I think we'd all be a lot more fit and get a lot more done. Also, Beeminder would be bankrupt.

Blood Is Thicker Than Water 🐬

I don't think Scott is claiming it's arbitrary, I think he's claiming it's subjective, which is to say instrumental. As Eliezer kept pointing out in the morality debates, subjective things are objective if you close over the observer - human (ie. specific humans') morality is subjective, but not arbitrary, and certainly not unknowable.

But also I don't think that phylo categorization is stronger per se than niche categorization in predicting animal behavior, especially when it comes to relatively mutable properties like food consumption. Behavior, body shape etc are downstream of genes, but genes are cyclical with niche. And a lot of animals select their food opportunistically.

Phylo reveals information that niche doesn't. But niche also reveals information that is much harder to predict from phylo. I think Scott's objection goes against the absolutizing claim that "phylo is all you need."

Load More