Christopher King

Sequences

Singularity now: is GPT-4 trying to takeover the world?

Wiki Contributions

Comments

Even moreso if we’re talking about human observers, who form memories of what they’ve seen in the form of changes to the structure of their brains. Macroscopically different branches can’t “cancel” and more generally macroscopically different branches can’t interfere in a way that has any measurable effect.

Ah, but that's the crux of the issue. They can. How should Wigner's friend be performing inference?

Born's rule alone doesn't suffice I don't think? What happens if the branch "you" are in gets cancelled with another branch? It's not clear to me how you're supposed to do inference with just Born's rule. See also: https://www.lesswrong.com/posts/7A9rsJFLFqjpuxFy5/i-m-still-mystified-by-the-born-rule#Q1__What_hypothesis_is_QM_

In, eg, the theory of Solomonoff induction, a "hypothesis" is some method for generating a stream of sensory data, interpreted as a prediction of what we'll see. Suppose you know for a fact that reality is some particular state vector in some Hilbert space. How do you get out a stream of sensory data?

 

The trick is to not use MWI (and don't use Copenhagen either, of course). Use something else like the pilot wave theory. Or some other interpretation with unique history. MWI is not suited to embedded agency.

I do agree that it provides some evidence against the idea, but I read some people trying to dismiss AI risk in it's entirety with the argument that it's sci-fi. This is obviously a way to strong conclusion to reach, because it would've prevented you from accepting the current harms.

A lot of the money comes from the bad traders. If you have no bad traders, the prices are correct.

A better mechanism though is to "subsidize the market", meaning the person who wants the information incentives the market to collect it. In particular, you can set up subsidy schemes where the average cost to the subsidizer is proportional to the number of bits of information they gained.

Actually, I think the universal prior being malign actually does break this. (I thought it might be only a little malign, which would be okay, but after a little reading it appears that it might be really malign!)

A crude example of how this might impact IMDEAR is that, while using solomonoff inductive inference to model the human, it sneakily inserts evil nanobots into the model of the bloodstream. (This specific issue can probably be patched, but there are more subtle ways it can mess up IMDEAR.)

Even creating a model of the simulation environment is messed up, since I planned on using inference for the difficult part.

The only thing I guess we can hope for is if we find a different prior that isn't malign, and for now we just leave the prior as a free variable. (See some of the ping backs on the universal prior for some approaches in this direction.) But I'm not sure how likely we are to find such a prior. 🤔

Also, Paul Christiano has a proposal with similar requirements to IMDEAR, but at a lower tech level: Specifying a human precisely (reprise).

The alternative is to adjust IMDEAR to not use solomonoff induction at all, and define/model everything directly, but this is probably much harder.

I believe this has been proposed before (I'm not sure what the first time was).

The main obstacles is that this still doesn't solve impact regularization, and a more generalized type of shutdownability then you presented.

'define a system that will let you press its off-switch without it trying to make you press the off-switch' presents no challenge at all to them...
...building a Thing all of whose designs and strategies will also contain an off-switch, such that you can abort them individually and collectively and then get low impact beyond that point. This is conceptually a part meant to prevent an animated broom with a naive 'off-switch' that turns off just that broom, from animating other brooms that don't have off-switches in them, or building some other automatic cauldron-filling process.

I think this might lead to the tails coming apart.

As our world exists, sentience and being a moral patient is strongly correlated. But I expect that since AI comes from an optimization process, it will hit points where this stops being the case. In particular, I think there are edge cases where perfect models of moral patients are not themselves moral patients.

From Some background for reasoning about dual-use alignment research:

Doing research but not publishing it has niche uses. If research would be bad for other people to know about, you should mainly just not do it.

Load More