Vanessa Kosoy

AI alignment researcher supported by MIRI and LTFF. Working on the learning-theoretic agenda. Based in Israel. See also LinkedIn.

E-mail: vanessa DOT kosoy AT {the thing reverse stupidity is not} DOT org

Wiki Contributions

Comments

Vanessa Kosoy's Shortform

Vanessa Kosoy7dΩ230

Sort of obvious but good to keep in mind: Metacognitive regret bounds are not easily reducible to "plain" IBRL regret bounds when we consider the core and the envelope as the "inside" of the agent.

Assume that the action and observation sets factor as and $O = O_{0} \times O_{1}$ , where $(A_{0}, O_{0})$ is the interface with the external environment and $(A_{1}, O_{1})$ is the interface with the envelope.

Let $Λ : Π \to □ (Γ \times (A \times O)^{ω})$ be a metalaw. Then, there are two natural ways to reduce it to an ordinary law:

Marginalizing over $Γ$ . That is, let ${p r}_{- Γ} : Γ \times (A \times O)^{ω} \to (A \times O)^{ω}$ and ${p r}_{0} : (A \times O)^{ω} \to (A_{0} \times O_{0})^{ω}$ be the projections. Then, we have the law $Λ^{?} := ({p r}_{0} {p r}_{- Γ})_{*} \circ Λ$ .
Assuming "logical omniscience". That is, let $τ^{*} \in Γ$ be the ground truth. Then, we have the law $Λ^{!} := {p r}_{0 *} (Λ ∣ τ^{*})$ . Here, we use the conditional defined by $Θ ∣ A := {θ ∣ A ∣ θ \in arg {max}_{Θ} Pr [A]}$ . It's easy to see this indeed defines a law.

However, requiring low regret w.r.t. neither of these is equivalent to low regret w.r.t $Λ$ :

Learning $Λ^{?}$ is typically no less feasible than learning $Λ$ , however it is a much weaker condition. This is because the metacognitive agents can use policies that query the envelope to get higher guaranteed expected utility.
Learning $Λ^{!}$ is a much stronger condition than learning $Λ$ , however it is typically infeasible. Requiring it leads to AIXI-like agents.

Therefore, metacognitive regret bounds hit a "sweep spot" of stength vs. feasibility which produces a genuinely more powerful agents than IBRL^[1].

^{^}
More precisely, more powerful than IBRL with the usual sort of hypothesis classes (e.g. nicely structured crisp infra-RDP). In principle, we can reduce metacognitive regret bounds to IBRL regret bounds using non-crsip laws, since there's a very general theorem for representing desiderata as laws. But, these laws would have a very peculiar form that seems impossible to guess without starting with metacognitive agents.

When is a mind me?

Vanessa Kosoy7d20

The topic of this thread is: In naive MWI, it is postulated that all Everett branches coexist. (For example, if I toss a quantum fair coin times, there will be $2^{n}$ branches with all possible outcomes.) Under this assumption, it's not clear in what sense the Born rule is true. (What is the meaning of the probability measure over the branches if all branches coexist?)

When is a mind me?

Vanessa Kosoy8d20

Your reasoning is invalid, because in order to talk about updating your beliefs in this context, you need a metaphysical framework which knows how to deal with anthropic probabilities (e.g. it should be able to answer puzzles in the vein of the anthropic trilemma according to some coherent, well-defined mathematical rules). IBP is such a framework, but you haven't proposed any alternative, not to mention an argument for why that alternative is superior.

When is a mind me?

Vanessa Kosoy8d31

The problem is this requires introducing a special decision-theory postulate that you're supposed to care about the Born measure for some reason, even though Born measure doesn't correspond to ordinary probability.

When is a mind me?

Vanessa Kosoy9d20

Not sure what you mean by "this would require a pretty small universe".

If we live in naive MWI, an IBP agent would not care for good reasons, because naive MWI is a "library of babel" where essentially every conceivable thing happens no matter what you do.

Also not sure what you mean by "some sort of sampling". AFAICT, quantum IBP is the closest thing to a coherent answer that we have, by a significant margin.

When is a mind me?

Vanessa Kosoy10d20

The solution is here. In a nutshell, naive MWI is wrong, not all Everett branches coexist, but a lot of Everett branches do coexist s.t. with high probability all of them display expected frequencies.

Wei Dai's Shortform

Vanessa Kosoy10d30

My model is that the concept of "morality" is a fiction which has 4 generators that are real:

People have empathy, which means they intrinsically care about other people (and sufficiently person-like entities), but, mostly about those in their social vicinity. Also, different people have different strength of empathy, a minority might have virtually none.
Superrational cooperation is something that people understand intuitively to some degree. Obviously, a minority of people understand it on System 2 level as well.
There is something virtue-ethics-like which I find in my own preferences, along the lines of "some things I would prefer not to do, not because of their consequences, but because I don't want to be the kind of person who would do that". However, I expect different people to differ in this regard.
The cultural standards of morality, which it might be selfishly beneficial to go along with, including lying to yourself that you're doing it for non-selfish reasons. Which, as you say, becomes irrelevant once you secure enough power. This is a sort of self-deception which people are intuitively skilled at.

Vanessa Kosoy's Shortform

Vanessa Kosoy19dΩ570

Is it possible to replace the maximin decision rule in infra-Bayesianism with a different decision rule? One surprisingly strong desideratum for such decision rules is the learnability of some natural hypothesis classes.

In the following, all infradistributions are crisp.

Fix finite action set and finite observation set $O$ . For any $k \in N$ and $γ \in (0, 1)$ , let

M_{γ}^{k} : (A \times O)^{ω} \to Δ (A \times O)^{k}

be defined by

M_{γ}^{k} (h | d) := (1 - γ) \infty \sum n = 0 γ^{n} [[h = d_{n : n + k}]]

In other words, this kernel samples a time step $n$ out of the geometric distribution with parameter $γ$ , and then produces the sequence of length $k$ that appears in the destiny starting at $n$ .

For any continuous^[1] function $D : □ (A \times O)^{k} \to R$ , we get a decision rule. Namely, this rule says that, given infra-Bayesian law $Λ$ and discount parameter $γ$ , the optimal policy is

π_{D Λ}^{*} := arg max π : O^{*} \to A D (M_{γ *}^{k} Λ (π))

The usual maximin is recovered when we have some reward function $r : (A \times O)^{k} \to R$ and corresponding to it is

D_{r} (Θ) := min θ \in Θ E_{θ} [r]

Given a set $H$ of laws, it is said to be learnable w.r.t. $D$ when there is a family of policies ${π_{γ}}_{γ \in (0, 1)}$ such that for any $Λ \in H$

lim γ \to 1 (max π D (M_{γ *}^{k} Λ (π)) - D (M_{γ *}^{k} Λ (π_{γ})) = 0

For $D_{r}$ we know that e.g. the set of all communicating^[2] finite infra-RDPs is learnable. More generally, for any $t \in [0, 1]$ we have the learnable decision rule

D_{r}^{t} := t max θ \in Θ E_{θ} [r] + (1 - t) min θ \in Θ E_{θ} [r]

This is the "mesomism" I taked about before.

Also, any monotonically increasing $D$ seems to be learnable, i.e. any $D$ s.t. for $Θ_{1} \subseteq Θ_{2}$ we have $D (Θ_{1}) \leq D (Θ_{2})$ . For such decision rules, you can essentially assume that "nature" (i.e. whatever resolves the ambiguity of the infradistributions) is collaborative with the agent. These rules are not very interesting.

On the other hand, decision rules of the form $D_{r_{1}} + D_{r_{2}}$ are not learnable in general, and so are decision rules of the form $D_{r} + D^{'}$ for $D^{'}$ monotonically increasing.

Open Problem: Are there any learnable decision rules that are not mesomism or monotonically increasing?

A positive answer to the above would provide interesting generalizations of infra-Bayesianism. A negative answer to the above would provide an interesting novel justification of the maximin. Indeed, learnability is not a criterion that was ever used in axiomatic constructions of decision theory^[3], AFAIK.

^{^}
We can try considering discontinuous functions as well, but it seems natural to start with continuous. If we want the optimal policy to exist, we usually need $D$ to be at least upper semicontinuous.
^{^}
There are weaker conditions than "communicating" that are sufficient, e.g. "resettable" (meaning that the agent can always force returning to the initial state), and some even weaker conditions that I will not spell out here.
^{^}
I mean theorems like VNM, Savage etc.

Vanessa Kosoy's Shortform

Vanessa Kosoy20d31

First, given nanotechnology, it might be possible to build colonies much faster.

Second, I think the best way to live is probably as uploads inside virtual reality, so terraforming is probably irrelevant.

Third, it's sufficient that the colonists are uploaded or cryopreserved (via some superintelligence-vetted method) and stored someplace safe (whether on Earth or in space) until the colony is entirely ready.

Fourth, if we can stop aging and prevent other dangers (including unaligned AI), then a timeline of decades is fine.

Vanessa Kosoy's Shortform

Vanessa Kosoy20d20

I don't know whether we live in a hard-takeoff singleton world or not. I think there is some evidence in that direction, e.g. from thinking about the kind of qualitative changes in AI algorithms that might come about in the future, and their implications on the capability growth curve, and also about the possibility of recursive self-improvement. But, the evidence is definitely far from conclusive (in any direction).

I think that the singleton world is definitely likely enough to merit some consideration. I also think that some of the same principles apply to some multipole worlds.

Commit to not make anyone predictably regret supporting the project or not opposing it" is worrying only by omission -- it's a good guideline, but it leaves the door open for "punish anyone who failed to support the project once the project gets the power to do so".

Yes, I never imagined doing such a thing, but I definitely agree it should be made clear. Basically, don't make threats, i.e. don't try to shape others incentives in ways that they would be better off precommitting not to go along with it.

LESSWRONG
LW

Posts

Wiki Contributions

Comments