We are in the case $g (x) > x^{2}$ . So if $g (y) >= y^{2}$ , then $g (x) + g (y) > x^{2} + y^{2} >= 2 x y$ , where the last inequality is true because it's equivalent to $(x - y)^{2} >= 0$ . So we have $g (x) + g (y) > 2 x y$ , which is a contradiction to the choice of y.

Using Finite Factored Sets for Causal Representation Learning?

Matthias G. Mayer2y21

A direct application would need that you have an uncountable variable. You might want to do this if you have enough evidence to say this confidently. As a simple example imagine a real-valued graph where all your data points lie almost on the identity diagonal. You might then want to infer a variable which is the identity.

As a more general application, we want to model infinities because the world is probably infinite in some aspects. We then want a theorem that tells us, that even if the underlying model is infinite, if you have enough data points then you are close enough, like with the Strong law of Large numbers, for example.

Using Finite Factored Sets for Causal Representation Learning?

Answer by Matthias G. MayerJan 11, 202330

I'm working on the FFS framework in general. I'm currently writing up decidability of finite temporal inference. After this I will probably start working on efficient finite temporal inference which is what you're referencing if I understood correctly.

I'm also working on extending the framework to the infinite setting and am almost finished except for conditional orthogonality for uncountable sets.

I quite like the name Logical Time Theory, under which I will probably publish those results in a month or so.

Finite Factored Sets in Pictures

Matthias G. Mayer2y80

Here, we need to find a variable W such that
P(W|X,Y) is deterministic, because X and Y already fully describe our sample space. This means P(W|X,Y) is either 0 or 1
Z and W are independent
X and W are dependent
Y and W are dependent

I think your arguments in Section 3 to rule out Graph 3 can't be correct if you accept Graph 2.

To see this, note that there is a symmetry between and $Z$ . Namely, if we use FFS temporal inference, then we know that $X$ and $Z$ are both before $Y$ (and $\neg Y$ ).(here we even have $P (X = x, Z = z) = P (X = z, Z = x)$ , so they are entirely exchangeable).

Therefore, if you accept Graph 2 then we can clearly switch $X$ and $Z$ in Graph 2 and obtain a solution for Graph 3. Also, note that in these solutions $X = \neg W$ or $Z = \neg W$ , so if we see variables as their information content, as in FFS, this is Graph 1 in disguise.

Also in Graph 2 there is a typo P(W=0|Z=0) instead of P(Z=0|W=0)

Basin broadness depends on the size and number of orthogonal features

Matthias G. Mayer2y40

Two features are orthogonal if their norm is zero

Just as a side note about terminology: It is a bit imprecise that you use innerproduct and norm interchangeably.

Innerproduct is the function $< ., . >: H \times H \to R$ and the norm is $| | . | | : H \to R, x \mapsto \sqrt{< x, x >}$

Trying to find the underlying structure of computational systems

Matthias G. Mayer2y40

The internals of a system of course determine its functional behavior. But there might be different systems that differ only in what they actually do. E.g. different sort algorithms all end up with a sorted list but sort it differently. Likewise, a pathfinding algorithm like Dijkstra is different than checking every possible path and picking the best one.

Looking only at functional behavior strips you of your ability to make predictions. You only know what has already happened. You can't generalize to new inputs.

This is the actual crux of why we care about the internals. We don't know the functional behavior of a NN except by executing it (There are some Interpretability tools but not sufficiently so). We want to understand what a NN will do before executing it.

Let's put this in the context of an AGI: We have a giant model which is executed on multiple GPUs. Ideally, we want to know that it won't kill us without trying to run it. If we would have a method to find 'search processes' and similar things going on in its brain, then we could see if it searches for things like 'how can I disempower humanity?'.

ethics and anthropics of homomorphically encrypted computations

Matthias G. Mayer2y3-1

What if you had some computation that could be interpreted (e.g. decrypted with two different keys) as either a simulation full of happy people, or a simulation full of depressed people? I think an adequate theory of experience is able to look at the encrypted computation (or any computation) and decide directly if there is suffering happening there.

Also, what is the difference between normal computation and encrypted computation? I feel like looking at a process that you haven't programmed yourself is not really that different than looking at an encrypted version of that. In either case, we don't have a clue about what's going on. And if we have a theory that lets us figure it out, it should work on both a normal and an encrypted version.

Trying to find the underlying structure of computational systems

Matthias G. Mayer2y40

This is not what I meant (I've edited the post to make that clearer). I am looking for a way to naturally express that a result of a computation changes how the computation progresses. In a*(b+c) + (1-a)*(b-c) you compute both (b+c)and (b-c) . This is not what actually happens in the program.

The curried node is an interesting idea but breaks down if we move away from this toy example. If both branches contain subgraphs with a different amount of nodes and different connections between them then currying does not work (or is very unnatural).

(Currying is a nice idea so yes)

Trying to find the underlying structure of computational systems

Matthias G. Mayer2y50

Turing completeness regards only the functional behavior of a class of computational systems. I want to look at the internals, what the system is actually doing, and find abstractions in there: Modularity, search processes, and steering mechanisms for instance.

So it’s not about finding yet another framework whose expressiveness is equivalent to Turing completeness. It’s about finding a framework to express the actual computation.