Wiki Contributions


I mean, that makes sense - perhaps more so than it does for Hells, if we allow arbitrarily smart deceptive adversaries - but now I'm wondering if your first sentence is a strawman.

I'm glad Jacob agrees that empowerment could theoretically help arbitrary entities achieve arbitrary goals. (I recall someone who was supposedly great at board games recommending it as a fairly general strategy.) I don't see how, if empowerment is compatible with almost any goal, it could prevent the AI from changing our goals whenever this is convenient.

Perhaps he thinks we can define "empowerment" to exclude this? Quick reaction: that seems likely to be FAI-complete, and somewhat unlikely to be a fruitful approach. My understanding of physics says that pretty much action has a physical effect on our brains. Therefore, the definition of which changes to our brains "empower" and which "disempower" us, may be doing all of the heavy lifting. How does this become easier to program than CEV?

Jacob responds: The distribution shift from humans born in 0AD to humans born in 2000AD seems fairly inconsequential for human alignment.

I now have additional questions. The above seems likely enough in the context of CEV (again), but otherwise false.

>FDT has bigger problems then that.

Does it. The post you linked does nothing to support that claim, and I don't think you've presented any actual problem which definitively wouldn't be solved by logical counterfactuals. (Would this problem also apply to real people killing terrorists, instead of giving in to their demands? Because zero percent of the people obeying FDT in that regard are doing so because they think they might not be real.) This post is actually about TDT, but it's unclear to me why the ideas couldn't be transferred.

I also note that 100% of responses in this thread, so far, appear to assume that your ghosts would need to have qualia in order for the argument to make sense. I think your predictions were bad. I think you should stop doing that, and concentrate on the object-level ideas.

Again, it isn't more resilient, and thinking you doubt a concept you call "qualia" doesn't mean you can doubt your own qualia. Perhaps the more important point here is that you are typically more uncertain of mathematical statements, which is why you haven't removed and cannot remove the need for logical counterfactuals.

Real humans have some degree of uncertainty about most mathematical theorems. There may be exceptions, like 0+1=1, or the halting problem and its application to God, but typically we have enough uncertainty when it comes to mathematics, that we might need to consider counterfactuals. Indeed, this seems to be required by the theorem alluded to at the above link - logical omniscience seems logically impossible.

For a concrete (though unimportant) example of how regular people might use such counterfactuals in everyday life, consider P=NP. That statement is likely false. Yet, we can ask meaningful-sounding questions about what its truth would mean, and even say that the episode of 'Elementary' which dealt with that question made unjustified leaps. "Even if someone did prove P=NP," I find myself reasoning, "that wouldn't automatically entail what they're claiming."

Tell me if I've misunderstood, but it sounds like you're claiming we can't do something which we plainly do all the time. That is unconvincing. It doesn't get any more convincing when you add that maybe my experience of doing so isn't real. I am very confident that you will convince zero average people by telling them that they might not actually be conscious. I'm skeptical that even a philosopher would swallow that.

If you think you might not have qualia, then by definition you don't have qualia. This just seems like a restatement of the idea that we should act as if we were choosing the output of a computation. On its face, this is at least as likely to be coherent as 'What if the claim we have the most certainty of were false,' because the whole point of counterfactuals in general is to screen off potential contradictions.

The problem arises because, for some reason, you've assumed the ghosts have qualia. Now, that might be a necessary assumption if you require us to be uncertain about our degree of ghostliness. Necessary or not, though, it seems both dubious and potentially fatal to the whole argument.

That is indeed somewhat similar to the "Hansonian adjustment" approach to solving the Mugging, when larger numbers come into play. Hanson originally suggested that, conditional on the claim that 3^^^^3 distinct people will come into existence, we should need a lot of evidence to convince us we're the one with a unique opportunity to determine almost all of their fates. It seems like such claims should be penalized by a factor of 1/3^^^^3. We can perhaps extend this so it applies to causal nodes as well as people. That idea seems more promising to me than bounded utility, which implies that even a selfish agent would be unable to share many goals with its future self (and technically, even a simple expected value calculation takes time.)

Your numbers above are, at least, more credible than saying there's a 1/512 chance someone will offer you a chance to pick between a billion US dollars and one hundred million.

I may reply to this more fully, but first I'd like you to acknowledge that you cannot in fact point to a false prediction by EY here, and in the exact post you seemed to be referring to, he says that his view is compatible with this sort of AI producing realistic sculptures of human faces!

Maybe I don't understand the point of this example in which AI creates non-conscious images of smiling faces. Are you really arguing that, based on evidence like this, a generalization of modern AI wouldn't automatically produce horrific or deadly results when asked to copy human values?

Peripherally: that video contains simulacra of a lot more than faces, and I may have other minor objections in that vein.

ETA, I may want to say more about the actual human analysis which I think informed the AI's "success," but first let me go back to what I said about linking EY's actual words. Here is 2008-Eliezer:

Now you, finally presented with a tiny molecular smiley - or perhaps a very realistic tiny sculpture of a human face - know at once that this is not what you want to count as a smile.  But that judgment reflects an unnatural category, one whose classification boundary depends sensitively on your complicated values.  It is your own plans and desires that are at work when you say "No!"

Hibbard knows instinctively that a tiny molecular smileyface isn't a "smile", because he knows that's not what he wants his putative AI to do.  If someone else were presented with a different task, like classifying artworks, they might feel that the Mona Lisa was obviously smiling - as opposed to frowning, say - even though it's only paint.

without inevitably failling by instead only producing superficial simulacra of faces

That's clearly exactly what it does today? It seems I disagree with your point on a more basic level than expected.


Load More