Frank_R — LessWrong

[Closed] Gauging Interest for a Learning-Theoretic Agenda Mentorship Programme

This is slightly off-topic, but you mentioned that you think that other research agendas could be fruitful. How would you rate singular learning theory (SLT) in this context? Do you see connections between SLT and LTA, for example if you try to generalize SLT to reinforcement learning? Are there dangerous topics to avoid?

Bandgaps, Brains, and Bioweapons: The limitations of computational science and what it means for AGI

Frank_R3y20

Arguments like yours are the reason why I do not think that Yudkowskys scenario is overwhelmingly likely ( P > 50%). However, this does not mean that existintial risk from AGI is low. Since smart people like Terence Tao exist, you cannot prove with complexity theory that no AGI with the intelligence of Terence Tao can be build. Imagine a world where everyone has one or several AI assistants whose capabilities are the same as the best human experts. If the AI assistants are deceptive and are able to coordinate, something like slow disempowerment of humankind followed by extinction is possible. Since there is a huge economic incentive to use AI assistants, it is hard for humans to take coordinated action unless it is very obvious that the AIs are dangerous. On the other hand, it may be easy for the AIs to coordinate since many of them are copies of each other.

The Learning-Theoretic Agenda: Status 2023

Frank_R3y10

You are right and now it is clear, why your original statement is correct, too. Let be an arbitrary computable utility function. As above, let $u = U (o_{0} a_{0} o_{1} a_{1} \dots)$ and $(u - ϵ, u + ϵ) \subseteq (a, b)$ with $ϵ > 0$ and $ϵ \in Q$ . Choose $P$ as in your definition of "computable". Since $P (s, ϵ)$ terminates, its output depends only on finitely many $o_{i_{1}}, \dots, o_{i_{k}}, a_{j_{1}}, \dots, a_{j_{l}}$ . Now

{s^{'} = o_{0}^{'} a_{0}^{'} o_{1}^{'} a_{1}^{'} \dots | o_{i_{1}}^{'} = o_{i_{1}}, \dots, a_{j_{l}}^{'} = a_{j_{l}}}

is open and a subset of $U^{- 1} (u - ϵ, u + ϵ)$ , since $| P (s^{'}, ϵ) - u | < ϵ$ .

The Learning-Theoretic Agenda: Status 2023

Frank_R3y20

I have discovered another minor point. You have written at the beginning of Direction 17 that any computable utility function is automatically continuous. This seems to be not always true.

I fix some definitions to make sure that we talk about the same stuff. For reasons of simplicity, I assume that $O$ and $A$ are finite. Let $(O \times A)^{ω}$ be the space of all infinite sequences with values in $O \times A$ . The $i$ -th projection $p_{i} : (O \times A)^{ω} \to O \times A$ is given by

p_{i} (o_{1} a_{1} o_{2} a_{2} \dots) = o_{i} a_{i}

The product topology is defined as the coarsest topology such that all projection maps are continuous. A base of this topology consists of all sets $S$ such that there are finitely many indices $i_{1}, \dots, i_{n} \in N$ and subsets $S_{1}, \dots, S_{n} \subseteq O \times A$ with

S = {(o_{n} a_{n})_{n \in N} | o_{i_{j}} a_{i_{j}} \subseteq S_{j} \forall j = 1, \dots, n}

In particular, any open set contains such an $S$ as a subset, which means that its image under $p_{i}$ is $O \times A$ for all but finitely many $i$ . For my counterexample, let $O = A = {0, 1, 2}$ . Let $s = (s_{n})_{n \in N}$ be a sequence with values in ${0, 1, 2}$ . If $s_{n}$ is never 2, we define

U (s) = \infty \sum k = 1 s_{k} 2^{- k - 1}

Otherwise, we define $U (s) = \frac{3}{4}$ . $U$ is computable in the sense that for any $ϵ > 0$ we find a finite program $P$ whose input is a sequence such that $| P (s) - U (s) | < ϵ$ and $P$ uses only finitely many values of $s$ . The preimage of the open set $(0, \frac{1}{2})$ is

{(s_{n})_{n \in N} \in {0, 1, 2}^{ω} | s_{n} \in {0, 1} \forall n \in N, s_{n} \neq 0000 \dots, s_{n} \neq 1111 \dots}

which is not open since its $i$ th projection is always ${0, 1}^{2} \neq O \times A$ . Therefore, $U$ is not contiuous. However, we have the following lemma:

Let $r : (O \times A)^{*} \to [0, 1]$ be a reward function and let $U : (O \times A)^{ω} \to [0, 1]$ be given by

U (o_{0} a_{0} o_{1} a_{1} \dots) = (1 - γ) \infty \sum t = 0 γ^{t} r (o_{0} a_{0} o_{1} a_{1} \dots o_{t} a_{t})

where $0 < γ < 1$ is the time discount rate. Then $U$ is continuous with respect to the product topology.

Proof: Since the open intervals are a base of the standard topology of $R$ , it suffices to prove that the preimage of any interval $(a, b)$ , $(a, 1]$ , $[0, b)$ or $[0, 1]$ with $0 < a < b < 1$ is open in $(O \times A)^{ω}$ . For reasons of simplicity, we consider only $(a, b)$ . The other cases are analogous. Let $o_{0} a_{0} o_{1} a_{1} \dots \in (O \times A)^{ω}$ such that $u := U (o_{0} a_{0} o_{1} a_{1} \dots) \in (a, b)$ . Moreover, let $ϵ > 0$ such that $(u - ϵ, u + ϵ) \subseteq (a, b)$ . Finally, we choose an $n \in N$ such that $γ^{n + 1} < ϵ$ . We define the set

M = {o_{0}^{'} a_{0}^{'} o_{1}^{'} a_{1}^{'} \dots \in (O \times A)^{ω} | o_{0}^{'} = o_{0}, a_{0}^{'} = a_{0}, \dots, a_{n}^{'} = a_{n}}

$M$ is an open subset of $(O \times A)^{ω}$ . Since the reward is non-negative, we have

0 \leq U (o_{0} a_{0} o_{1} a_{1} \dots) - r (o_{0} a_{0} o_{1} a_{1} \dots o_{n} a_{n}) \leq γ^{n + 1} < ϵ

and for any $o_{0}^{'} a_{0}^{'} o_{1}^{'} a_{1}^{'} \dots \in M$ , we have

0 \leq U (o_{0}^{'} a_{0}^{'} o_{1}^{'} a_{1}^{'} \dots) - r (o_{0} a_{0} o_{1} a_{1} \dots o_{n} a_{n}) \leq γ^{n + 1} < ϵ,

too. Therefore,

∣ ∣ U (o_{0}^{'} a_{0}^{'} o_{1}^{'} a_{1}^{'} \dots) - U (o_{0} a_{0} o_{1} a_{1} \dots) ∣ ∣ < ϵ

and furthermore $M \subseteq U^{- 1} (u - ϵ, u + ϵ) \subseteq U^{- 1} (a, b)$ . All in all, any $o_{0} a_{0} o_{1} a_{1} \dots \in U^{- 1} (a, b)$ has an open neighborhood that is a subset of $U^{- 1} (a, b)$ . Therefore, $U^{- 1} (a, b)$ is open.

That a utility function $U : (O \times A)^{ω} \to [0, 1]$ is continuous roughly means that for any $ϵ > 0$ there are only finitely many events that have an influence of more than $ϵ$ on the utility. This could be a problem for studying longtermist agents with zero time discount rate. However, studying such agents is hard anyway since there is no guarantee that the sum of rewards converges and we have to deal with infinity ethics. As far as I know, it is standard in learning theory to avoid such situations by assuming a non-zero time discount rate or a finite time horizon. Therefore, it should not be a big deal to add the condition that $U$ is continuous to all theorems.

The Learning-Theoretic Agenda: Status 2023

Frank_R3yΩ230

I have a question about the conjecture at the end of Direction 17.5. Let be a utility function with values in $[0, 1]$ and let $f : [0, 1] \to [0, 1]$ be a strictly monotonous function. Then $U_{1}$ and $U_{2} = f \circ U_{1}$ have the same maxima. $f$ can be non-linear, e.g. $f (x) = x^{2}$ . Therefore, I wonder if the condition $u (y) = α v (y) + β$ should be weaker.

Moreover, I ask myself if it is possible to modify $U_{1}$ by a small amount at a place far away from the optimal policy such that $π$ is still optimal for the modified utility function. This would weaken the statement about the uniqueness of the utility function even more. Think of an AI playing Go. If a weird position on the board has the utility -1.01 instead of -1, this should not change the winning strategy. I have to go through all of the definitions to see if I can actually produce a more mathematical example. Nevertheless, you may have a quick opinion if this could happen.

Infra-Exercises, Part 1

Frank_R3y10

I have two questions that may be slightly off-topic and a minor remark:

Is a list of open and tractable problems related to Infra-Bayesianism somewhere available?
Do you plan to publish the results of the Infra-Bayesianism series in a peer-reviewed journal? I understand that there are certain downsides; mostly that it requires a lot of work, that the whole series may be too long for a journal article and that the peer review process takes much time. However, if your work is citeable, it could attract more researchers, who are able to contribute.
On page 22, you should include the condition a(bv) = (ab)v into the definition of a vector space.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments