An important prior I think you missed is the circuit-complexity prior, which is different from the kologomorov simplicity prior. I also note that most of the high-level structures we see in the world (like humans, tables, trees, pully systems, etc.) seem like they're drawn from some kind of circuit-complexity-adjacent distribution over processes, where objects in systems we see in the world generally have a small number of low-dimensional interfaces between them. This connects to the natural abstraction hypothesis.

Reply

[-]Paul Bricman3y10

Thanks a lot for the reference, I haven't came across it before. Would you say that it focuses on gauging modularity?

Reply

[-]Garrett Baker3y10

I don’t know enough details about the circuit prior or the NAH to say confidentiality yes or no, but I’d lean no. Bicycles are more modular than addition, but if you feed in the quantum wave function representing a universe filled to the brim with bicycles versus an addition function, the circuit prior will likely say addition is more probable.

Reply

[-]Garrett Baker3y10

But it would still be interesting to see work on whether a circuit complexity prior will induce more interpretable networks!

Reply

[-]Garrett Baker3y10

Another interesting note: If we’d like a prior such that the best search processes drawn from that prior themselves draw from the prior to resolve search queries, then whatever prior describes the distribution of high-level objects in our universe seems like it has this property.

ie. We can use approximately the same Ockham’s razor when analyzing human brains, and the objects they create, and the physical and chemical processes which produce human brains.

Reply

[-]Noosphere893y20

However, simplicity is not enough to e.g. select among all the possible objectives which might have explained the behavior of a human. Quite bluntly, Occam's razor is insufficient to infer the preferences of irrational agents. Humans (i.e. irrational agents) appear not to actually value the simplest thing which would explain their behavior. For another failure mode, consider the notion acausal attack highlighted by Vanessa Kossoy here, or in terms of the fact that the Solomonoff prior is malign. In this scenario, an AGI incentivized to keep it simple might conclude that the shortest explanation of how the universe works is "[The Great Old One] has been running all possible worlds based on all possible physics, including us." This inference might allegedly incentivize the AGI to defer to The Great Old One as its causal precursor, therefore missing us in the process. Entertaining related ideas with a sprinkle of anthropics has a tendency to get you into the unproductive state of wondering whether you're in the middle of a computation being run by a language model on another plane of existence which is being prompted to generate a John Wentworth post.

I feel a lot of the problem relates to an Extremal Goodhart effect, where in the popular imagination views simulations as not equivalent to reality.

However my guess is that simplicity, not speed or stability priors are the default.

Reply

[-]Paul Bricman3y10

I feel a lot of the problem relates to an Extremal Goodhart effect, where in the popular imagination views simulations as not equivalent to reality.

That seems right, but aren't all those heuristics prone to Goodharting? If your prior distribution is extremely sharp and you barely update from it, it seems likely that you run into all those various failure modes.

However my guess is that simplicity, not speed or stability priors are the default.

Not sure what you mean by default here. Likely to be used, effective, or?

Reply

[-]Noosphere893y10

I actually should focus on the circuit complexity prior, but my view is that due to the smallness of agents compared to reality, that they must generalize very well to new environments, which pushes in a simplicity direction.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

13

Cataloguing Priors in Theory and Practice

13

Ω 11

13

Ω 11

Intro

Priors

Simplicity

Speed

(Structural) Stability

NTK

Outro