Golol - LessWrong

A basic mathematical structure of intelligence

Although we can not rigorously say this yet since we have not chosen a definition of agent, I think this intuitively applies and therefore (H2) can only hold when you are restricted to some set of tasks, perhaps "reasonable tasks", yea.

I wonder if in the stochastic inteprretation of task this issue disappears because "No Free Lunch" tasks that "diagonalize against a model in a particular fashion have very low probability.

A basic mathematical structure of intelligence

Golol1y10

What do you mean by a product topology here? The product topology being used for a stochastic processes? That requires a topology on the state space in the first place. Right now I have not specified any topologies.

Regarding the stochastic aspect, I have thought about that, but so far I have not yet seen a benefit by including it because any stochastic approach can somehow be seen as just a deterministic approach on the level of distributions. I.e. if a Model M is actually a random variable, and a task T is also a random variable, then the important thing, which is the function and which would now be a random object, can be replaced by a function $P (M) \times P (T) ⟶ P (R_{+}) .$ I.e. we map distributions of models and tests to distributions of scores.

Nevertheless on a bit of a different note, consider the following.

I described a task as something which a model can generate an answer to which is then somehow scored. If instead we consider the score of a model on a task to represent the expected value of correct answers given a large amount of tries, then we can say that

T (r \cdot M) = r \cdot T (M)

i.e. we get a new axiom! This states that tasks are no just any functions, but 1-homogeneous functions. But tasks are certainly not linear, as cooperation of a model with itself may bring no improvement on non-parallelizable tasks.

A Butterfly's View of Probability

Golol2y20

Let me try rephrase is in more conventional probability theory. You are looking at a metric space of universes . You probably want to take the Borel-sigma algebra $B$ as your collection of events. We think of propositions as sets $A \in B$ , which really just means $A \subset U$ is a subset which is not too irregular. Then thebindicator function $χ_{A} (u)$ is $1$ if A holds in universe $u$ and $0$ otherwise.

Your elaborations do not depend much on the time so we set $t = 0$ .

You now talk about picking a universe uniformly from a ball $B_{ϵ} (u_{0}) = u \in U : d (u, u_{0}) < ϵ$ . This is a problem. On finite dimensional vector spaces we have the lebesgue measurenand we can have such a uniform distribution. On your metric space of universes it is entirely unclear what this means. You have to actually specify a distribution. This choice of distribution then influences your outcome to the extreme. It is similar to how you can not uniformly pick a natural number. So here your result will be strongly influenced by the distribution. What we can do is say the following: We fix a sequence of probability measures $ρ_{n}$ on $U$ so that $ρ_{n}$ converges to $δ_{u_{0}}$ in the sense of weak convergence of probability measures. What this means is that you choose a sequence of distributions which approximate the dirac delta at $u_{0}$ , the distribution which samples to $u_{0}$ with probability $1$ . Then you can say something like: "The butterfly probability decay sequence around $u_{0}$ with respect to $ρ_{n}$ is given by $P_{u \sim ρ_{n}} (u \in A)$ .

Here I am also not formalizing your sense of "convergence in the middle" because this is extremely unlikely to correapond to somwthing rigorous. You can view the above as a sequence in $n$ and then study it's decay as $n$ goes to infinity, which corresponds to $ϵ$ going to zero.

But everything here will depend on your choice of $ρ_{n}$ . You can not necessarily choose uniformly from a small neighbourhood in any metric space. If the metric space is an infinite dimensional vector space uniformly, this is not possible.

There may be an alternative which means you don't have to choose the $ρ_{n}$ . You can fix a metric betwern probability measures which metrizes weak convergence, for example the Wasserstein distance $W$ . athen you could perhaps look at: ${sup}_{ρ : W (ρ, δ_{u_{0}}) < ϵ} P_{u \sim ρ} (u \in A)$ .

This may be infinite or zero though.

LESSWRONG
LW

Posts

Wiki Contributions

Comments