Master's student in applied mathematics, funded by Center on Long-Term Risk to investigate the cheating problem in safe pareto-improvements. Former dovetail fellow with @Alex_Altair.
I also talked to Aram recently & he's optimistic that there's an algorithmic version of the generalized heat engine where the hot vs cold pool correspond to high vs low k-complexity strings. I'm quite interested in doing follow-up work on that
The continuous state-space is coarse-grained into discrete cells where the dynamics are approximately markovian (the theory is currently classical) & the "laws of physics" probably refers to the stochastic matrix that specifies the transition probabilities of the discrete cells (otherwise we could probably deal with infinite precision through limit computability)
As in, take a set of variables X, then search for some set of its (non-overlapping?) subsets such that there's a nontrivial natural latent over it? Right, it's what we're doing here as well.
I think the subsets can actually be partially overlapping, for instance you may have a that’s approximately deterministic w.r.t and but not alone, weak redundancy (approximately deterministic w.r.t ) is also an example of redunds across overlapping subsets
Mm, this one's shaky. Cross-hypothesis abstractions don't seem to be a good idea, see here.
yea so I think the final theory of abstraction will have a weaker notion of equivalence espeically when we incorporate ontology shifts. E.g. we want to say that water is the same concept before and after we discover water is H2O, but the discovery obviously breaks predictive agreement (Indeed, the solomonoff version of natural latent is more robust to the agreement condition)
Also, you can totally add new information/abstraction that is not shared between your current and new hypothesis, & that seems consistent with the picture you described here (you can have separate ontologies but you try to capture the overlap as much as possible)
My guess is that there's something like a hierarchy of hypotheses, with specific high-level hypotheses corresponding to several lower-level more-detailed hypotheses, and what you're pointing at by "redundant information across a wide variety of hypotheses" is just an abstraction in a (single) high-level hypothesis which is then copied over into lower-level hypotheses. (E. g., the high-level hypothesis is the concept of a tree, the lower-level hypotheses are about how many trees are in this forest.)
yes I think that's the right picture
But we don't derive it by generating a bunch of low-level hypotheses and then abstracting over them, that'd lead to broken ontologies.
I agree that we don't do that practically as it'd be slower (instead we simply generate an abstraction & use future feedback to determine whether it's a robust one), but I think if you did generate a bunch of low-level hypotheses and look for redundant computation among them, then an adequate version of it would just recover the "high-level low-level hypotheses" picture you've described?
In particular, with cross-hypothesis abstraction we don't have to separately define what the variables are, so we can sidestep dataset-assembly entirely & perhaps simplify the shifting structures problem
Nice, I've gestured at similar things in this comment, conceptually the main thing you want to model is variables that control the relationships between other variables, the upshot is you can continue the recursion indefinitely: Once you have second order variables that control the relationships between other variables, you can then have variables that control the relationship among second order variables and so on.
Using function calls as an analogy: When you're executing a function that itself makes a lot of function calls, there are two main ways these function calls can be useful:
an adequate version of this should also be turing complete which means it can accomodate shifting structures, & function calls seem like a good way to represent hierarchies of abstractions
CSI in bayesian networks also deals with the idea that the causal structure between variables changes over time/depending on context (you're probably more interested in how relationships between levels of abstraction changes with context, but the two directions seem linked). I plan to explore the following variant at some point(not sure if it's already in the literature):
But note that synergistic information can be defined by referring purely to the system we're examining, with no "external" target variable. If we have a set of variables , we can define the variable s such that is maximized under the constraint of . (Where is the set of all subsets of except itself.)
That's a nice formulation of synergistic information, it's independent with redundant info via the data-processing inequality so somewhat promising that it can add up to total entropy.
You might be interested in this comment if distinguishing betweeen synergistic and redundant information is not your main objective: You can simply define redunds over collections of subsets, such that e.g. "dogness" is a redund over every subset of atoms that allows you to conclude you're looking at a dog. In particular, the redundancy lattice approach seems simpler when the latent depends on not just synergistic but also redundant and unique information
One issue with PID worth mentioning is that they haven't figured out what measure to use for quantifying multivariate redundant information. It's the same problem we seem to have. But it's probably not a major issue in the setting we're working in (the well-abstracting universes).
Recent impossibility result seems to rule out general multivariate PID that guarantees non-negativity of all components, though partial entropy decomposition may be more tractable
- If there's a pair of , such that , then necessarily contains all information in . Re-define , removing all information present in .
This seems similar to capturing unique information, where the constructive approach is probably harder in PID than PED. E.g. in BROJA it involves an optimization problem over distributions with some constraints on marginals, but it only estimates the magnitude of unique info, not an actual random variable that represents unique info
Nice post!
Some frames about abstractions & ontology shifts I had while thinking through similar problems (which you may have considered already):
happy to discuss more via PM as some of my ideas seem exfohazardous
Neat idea, I've thought about similar directions in the context of traders betting on traders in decision markets
A complication might be that a regular deductive process doesn't discount the "reward" of a proposition based on its complexity whereas your model does, so it might have a different notion of logical induction criterion. For instance, you could have an inductor that's exploitable but only for propositions with larger and larger complexities over time, such that with the complexity discounting the cash loss is still finite (but the regular LI loss would be infinite so it wouldn't satisfy regular LI criterion)
(Note that betting on "earlier propositions" already seems beneficial in regular LI since if you can receive payouts earlier you can use it to place larger bets earlier)
There's also some redundancy where each proposition can be encoded by many different turing machines, whereas a deductive process can guarantee uniqueness in its ordering & be more efficient that way
Are prices still determined using Brouwer’s fixed point theorem? Or do you have a more auction-based mechanism in mind?
Yes I agree
I think it's similar to CIRL except less reliant on the reward function & more reliant on the things we get to do once we solve ontology identification
The current theory is based on classical hamiltonian mechanics, but I think the theorems apply whenever you have a markovian coarse-graining. Fermion doubling is a problem for spacetime discretization in the quantum case, so the coarse-graining might need to be different. (E.g. coarse-grain the entire hilbert space, which might have locality issues but probably not load-bearing for algorithmic thermodynamics)
On outside view, quantum reduces to classical (which admits markovian coarse-graining) in the correspondence limit, so there must be some coarse-graining that works