jessicata — LessWrong

LESSWRONG
LW

Most of the alignment problem in this case would be getting to stratified utopia. If stratified utopia is going to be established, then there can be additional trades on top, though they have to be restricted so as to maintain stratification.

With current models, a big issue is, how to construe their preferences? Given they're stateless it's unclear how they could know others are assisting them. I guess they could do web search and find it in context? Future models could be trained to "know" things but they wouldn't be the same model.

And also, would they be motivated to hold up their end of the bargain? It seems like that would require something like interpretability, which would also be relevant to construing their preferences in the first place. But if they can be interpreted to this degree, more direct alignment might be feasible.

Like, there are multiple regimes imaginable:

Interpretability/alignment infeasible
Partial interpretability/alignment feasible; possible to construe preferences and trade with LLMs
Extensive interpretability/alignment feasible

And trade is most relevant in 2. However I'm not sure why 2 would be likely.

AI Doomers Should Raise Hell

jessicata2d91

Roko's basilisk is the optimistic hypothesis that making binding agreements with non-existent superintelligences is possible. If Roko's basilisk works, then "trade" with superintelligences can be effective; by making a deal with a superintelligence, you can increase its likelihood of existing, in return for it holding its end of the bargain, increasing the satisfaction of your values.

This probably doesn't work. But if it did work, it would be a promising research avenue for alignment. (Whether it's good to say that it works is probably dominated by whether it's true that it works, and I'm guessing no.)

Homomorphically encrypted consciousness and its implications

jessicata5d20

I think if M isn't "really mental", like there is no world representation, it shouldn't be included in M. I'm guessing depending on the method of encryption, keys might be checkable. If they are not checkable there's a pigeonhole argument that almost all (short) keys would decrypt to noise. Idk if it's possible to "encrypt two minds at once" intentionally with homomorphic encryption.

And yeah, if there isn't a list of minds in R, then it's hard for g to be efficiently computable, as it would be a search. That's part of what makes homomorphically encrypted consciousness paradoxical, and what makes possibility C worth considering.

Regarding subjective existence of subjective states: I think if you codify subjective states then you can ask questions like "which subjective states believe other subjective states exist?". Since it is a belief similar to other beliefs.

Homomorphically encrypted consciousness and its implications

jessicata8d20

See paragraph at the end on the trivialism objection to functionalism

Homomorphically encrypted consciousness and its implications

jessicata8d20

No but it's complicated. Wrote about speed prior + QM previously here.

Homomorphically encrypted consciousness and its implications

jessicata8d20

Speed prior type reasons. Like, a basic intuition is "my experiences are being produced somehow, by some process". Speed prior leads to "this process is at least somewhat efficient".

Like, usually if you see a hard computation being done (e.g. mining bitcoin), you would assume it happened somewhere. If one's experiences are produced by some process, and that process is computationally hard, it raises the question "is the computation happening somewhere?"

Homomorphically encrypted consciousness and its implications

jessicata8d20

Oh, maybe what you are imagining is that it is possible to perceive a homomorphic mind in progress, by encrypting yourself, and feeding intermediate states of that other mind to your own homomorphically encrypted mind. Interesting hypothetical.

I think with respect to "reality" I don't want to be making a dogmatic assumption "physics = reality" so I'm open to the possibility (C) that the computation occurs "in reality" even if not "in physics".

Homomorphically encrypted consciousness and its implications

jessicata8d40

Right so, by step 4 I'm not trying to assume that h is computationally tractable; the homomorphic case goes to show that it's probably not in general.

With respect to C, perhaps I'm not verbally expressing it that well, but the thing you are thinking of, where there is some omniscient perspective that includes "more than" just the low level of physics (where the "more than" could be certain informational/computational interconnections) would be an instance. Something like, "there is a way to construct an omniscient perspective, it just isn't going to be straightforwardly derivable from the physical state".

Homomorphically encrypted consciousness and its implications

jessicata8d20

Yeah that seems like a case where non-locality is essential to the computation itself. I'm not sure how the "provably random noise from both" would work though. Like, it is possible to represent some string as the xor of two different strings, each of which are themselves uniformly random. But I don't know how to generalize that to computation in general.

I think some of the non locality is inherited from "no hidden variable theory". Like it might be local in MWI? I'm not sure.

Homomorphically encrypted consciousness and its implications

jessicata9d30

Hmm... I think with Solomonoff induction I would say R is the UTM input, plus the entire execution trace/trajectory. Then M would be like the agent's observations, which are a simple function of R.

I see that we can't have all "real" things being R-efficiently computable. But the thing about doxastic states is, some agent has access to them, so it seems like from their perspective, they are "effective", being "produced somewhere"... so I infer they are probably "computed in reality" in some sense (although that's not entirely clear). They have access to their beliefs/observations in a more direct way than they have access to probabilities.

With respect to reversibility: The way I was thinking about it was that when the key is erased, it's erased really far away. Then the heat from the key gets distributed somehow. Like the information could even enter a black hole. Then there would be no way to retrieve it. (Shouldn't matter too much anyway if natural supervenience is local, then mental states couldn't be affected by far away physical states anyway)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments