You are talking about a "verifier for explanations". I don't know how an explanation could be verified under constructivist epistemology and pragmatist meta-epistemology.

I've recently thought about the relationship between GFlowNets and constructivism. Here're some excerpts, unedited, but hopefully could be helpful in some way to someone.

It’s interesting that GFlowNets suggest constructing a trajectory, i. e., an explanation, rather than sampling it via Markov Chain Monte Carlo (MCMC) methods (e. g. Monte-Carlo Tree Search), as suggested in the current ActInf-based AI architectures (Fountas et al. 2020). This can either be explanation for oneself, justifying the most proximate action to take (as in Active Inference: an agent creates a plan, then takes a first step (action) from it, and then create a new plan), or explanation for others, explaining some actions that have already been made (P_B(\tau|x) in GFlowNets).

So, it seems that GFlowNets fit Deutsch’s account of creative explanations rather well. Deutsch (together with Pearl) contrapose constructed, subjective, individualised explanations to emiricism, Bayesianism (in the sense of associational reasoning, “rung one of the causality ladder”, per Pearl) and “counting averages and population stats”, which seems to correspond MCMC methods. Bengio also explains why GFlowNets are statistically superior to variational (Bayesian) inference methods (though I don’t understand what does he mean by “mode-following”, “mean-following”, and “high variance gradients”):

The typical variational inference objective (the ELBO or reverse-KL) leads to mode-following (focussing on one mode) and the forward KL leads to mean-following (overly conservative, sampling too broadly) and annoying variance when implemented with importance sampling. Instead the off-policy GFlowNet objectives (e.g., with a tempered version of P_F as training policy) seem to strike a different balance and tend to recover more of the modes without the down-side of the forward-KL variational inference variants (mean-following and high variance gradients).

With regard to this idea of “construction of explanations”, it’s teasing to try to find some parallels with Core Constructive Ontology, Baez and Stay’s ideas about system construction, and Deutsch and Marletto’s Constructor Theory, and together with GFlowNet call these trends in epistemology, ontology, and philosophy of language “the constructive turn”, by analogy with the pragmatic turn in philosophy.

GFlowNets don't assume a static causal graph, but stochastically construct one from Bayesian posterior (given the past evidence) in the space of all possible causal graphs.

GFlowNet seem to more relate to language generation and consciousness contents: humans seem to generate these “randomly”, indeed. Humans come up with “stochastic” justifications for past events and actions when they are not stabilised in their heads in reference frames.

Another thing that Bengio suggests, “hypergraph sampling”, doesn’t feel neurobiologically plausible (or not?), but it’s not clear whether Bengio suggests it as a neurobiological explanation or as architecture for intelligence (in which case the fact that human’s causal graphs are simple graphs, rather than hypergraphs, is our inductive prior).

Extending Bengio’s idea of sampling from the Bayesian posterior over the space of causal graphs in linguistic explanations, constructing explanatory theories in service of constructing an action trajectory to achieve a certain goal (the pragmatic stance) from the Bayesian posterior over the space of all possible theories (not only causal graphs, but also any other sorts of theories, from formal theories written as closed-form equations, attached to executable diagrams, to as unreliable “theories” as induction rules, heuristics, and intuitions) corresponds to epistemological pluralism, which, John Krakauer thinks, is also how the brain actually works: “There is pluralism in how the nervous system sees the control problem and the representations it uses, and that is the ontological truth of pluralism, and there is a mapping onto epistemological pluralism. This may well be the reason why we have psychology and neuroscience, and we have psychiatry and neurology.”

The selection of the explanatory stance (the perspective, the level of emergence) would be one of the core steps in constructing an explanatory theory for a pragmatic purpose. For instance, we can call either a psychiatrist or a neurologist if we have a goal of diagnosing and then curing the illness in the patient. So, there couldn’t be one “right” perspective on any object. Instead, any intelligent agent always chooses the perspective most suitable (that is, minimising the expected free energy) for reaching a particular goal. There is also a normative imperative to improve the quality of these choices (which can amount to training in selecting from a set of coherent theories which already exist as well as attempts to create and criticise new coherent explanatory theories).

Reply

[-]paulfchristiano3y20

To clarify: by a "verifier for explanations" I mostly mean something like a heuristic estimator as introduced in Formalizing the Presumption of Independence (or else something even further from formality that would fill a similar role).

Reply

[-]Roman Leventov3y-1-2

I think that adding new types of systems and agents to the universe changes the optimal "applied ethics" in the situation (I wrote about this here, in the "PS.", the last paragraph), so they only hope for the discriminator to be 1) a general intelligence; 2) using a scale-free, naturalistic theory of ethics as a theoretical discipline for evaluating any applied ethics theories in any situations and contexts.

Also, hopefully, the "least wrong" scale-free ethics is "aligned" with humans, in the sense that it "saves" us from oblivion. For example, a version of theoretical scale-free ethics could just favour increasing the amount of consciousness in the universe while destroying as little existing consciousness as possible. Let's say something like IIT is right. So the best plan that AGI should come up with is to engineer some mind upload scheme for humans and integrate our consciousnesses together, to form a single planetary-scale mega-consciousness (which must be more ethically valuable precisely because of the integration; by the same token, a brain is more ethically valuable than all neurons when they are isolated).

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

64

Can we efficiently explain model behaviors?

64

Ω 29

64

Ω 29

I. Finding explanations is closely related to interpretability

II. Searching for explanations is a well-posed and plausibly tractable search problem

III. If this search problem is intractable it may be a much deeper problem for alignment

IV. I’m excited about ARC’s plan even if we can’t solve every step for arbitrary models

Conclusion