The ability to get to consciously decide when to discard or rewrite or call on the simple programs is a superpower evolution didn't give humans. One that it seems would be the obvious solution for an AI that gets to call on an external, updatable set of tools. Or an ASI got got to rewrite the parts of itself that call the tools or notice (what it previously thought were) edge cases.

AKA, an ASI can go ahead and have a human-specific prior. It can choose to apply it until it meets entities that are alien, then stop applying it. Humans can't really do that, in the same way that we can't turn off our visual heuristics when encountering things we consciously know are weirdly constructed adversarial examples, even if we can sometimes override them with enough effort. The ASI, presumably, would further react to encountering aliens by reasoning from more basic principles (recurse as needed) as it learns enough to create 1) a new prior specific to those aliens, 2) a new prior specific to those aliens' species, culture, world, etc.

Or at least, that's my <4 minute human-level single attempt at guessing a lower bound on an ASI's solution.

Reply

[-]dr_s2mo20

I think if you start having meta-priors, then what, you gotta have meta-meta-priors and so on? At some point that's just having more basic, fundamental priors that embrace a wider range of possibilities. The question is what would those look like, or if being general enough doesn't descend into a completely uniform (or very little informative) prior that is essentially of no help; you can think anything, but the trade-off is it's always going to be inefficient.

Reply

[-]AnthonyC2mo20

True, but I think in this case there's at least no risk of an infinite regress. At one end, yes, it bottoms out in an extremely vague and inefficient but general hyperprior. I would guess from the little I've read that in humans these are the layers that govern how we learn from even before we're born. I would imagine an ASI would have at least one layer more fundamental than this, which enable it to change various fixed-in-humans assumptions about things.

At the other end would be the most specific or most abstracted layer of priors that has proven useful to date. Somewhere in the stack are your current best processes for deciding whether particular priors or layers of priors are useful or worth keeping or if you need a new one.

I am actually not sure whether 'prior' is quite the right term here? Some of it feels like the distinction between thingspace and conceptspace, where the priors might be more about the expectations what things exist and where natural concept boundaries lie and how to evaluate and re-evaluate those?

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

12

An N=1 observational study on interpretability of Natural General Intelligence (NGI)

12

12

The solution

Analysis