This is a continuation of a discussion with Vanessa from the MIRIxDiscord group. I'll make some comments on things Vanessa has said, but those should not be considered a summary of the discussion so far. My comments here are also informed by discussion with Sam.
1: Logic as Proxy
1a: The Role of Prediction
Vanessa has said that predictive accuracy is sufficient; consideration of logic is not needed to judge (partial) models. A hypothesis should ultimately ground out to perceptual information. So why is there any need to consider other sorts of "predictions" it can make? (IE, why should we think of it as possessing internal propositions which have a logic of their own?)
But similarly, why should agents use predictive accuracy to learn? What's the argument for it? Ultimately, predicting perceptions ahead of time should only be in service of achieving higher reward.
We could instead learn from reward feedback alone. A (partial) "hypothesis" would really be a (partial) strategy, helping us to generate actions. We would judge strategies on (something like) average reward achieved, not even trying to predict precise reward signals. The agent still receives incoming perceptual information, and strategies can use it to update internal states and to inform actions. However, strategies are not asked to produce any predictions. (The framework I'm describing is, of course, model-free RL.)
Intuitively, it seems as if this is missing something. A model-based agent can learn a lot about the world just by watching, taking no actions. However, individual strategies can implement prediction-based learning within themselves. So, it seems difficult to say what benefit model-based RL provides beyond model-free RL, besides a better prior over strategies.
It might be that we can't say anything recommending model-based learning over model free in a standard bounded-regret framework. (I actually haven't thought about it much -- but the argument that model-free strategies can implement models internally seems potentially strong. Perhaps you just can't get much in the AIXI framework because there are no good loss bounds in that framework at all, as Vanessa mentions.) However, if so, this seems like a weakness of standard bounded-regret frameworks. Predicting the world seems to be a significant aspect of intelligence; we should be able to talk about this formally somehow.
Granted, it doesn't make sense for bounded agents to pursue predictive accuracy above all else. There is a computational trade-off, and you don't need to predict something which isn't important. My claim is something like, you should try and predict when you don't yet have an effective strategy. After you have an effective strategy, you don't really need to generate predictions. Before that, you need to generate predictions because you're still grappling with the world, trying to understand what's basically going on.
If we're trying to understand intelligence, the idea that model-free learners can internally manage these trade-offs (by choosing strategies which judiciously choose to learn from predictions when it is efficacious to do so) seems less satisfying than a proper theory of learning from prediction. What is fundamental vs non-fundamental to intelligence can get fuzzy, but learning from prediction seems like something we expect any sufficiently intelligent agent to do (whether it was built-in or learned behavior).
On the other hand, judging hypotheses on their predictive accuracy is kind of a weird thing to do if what you ultimately want a hypothesis to do for you is generate actions. It's like this: You've got two tasks; task A, and task B. Task A is what you really care about, but it might be quite difficult to tackle on its own. Task B is really very different from task A, but you can get a lot of feedback on task B. So you ask your hypotheses to compete on task B, and judge them on that in addition to task A. Somehow you're expecting to get a lot of information about task A from performance on task B. And indeed, it seems you do: predictive accuracy of a hypothesis is somehow a useful proxy for efficacy of that hypothesis in guiding action.
(It should also be noted that a reward-learning framework presumes we get feedback about utility at all. If we get no feedback about reward, then we're forced to only judge hypotheses by predictions, and make what inferences about utility we will. A dire situation for learning theory, but a situation where we can still talk about rational agency more generally.)
1b: The Analogy to Logic
My argument is going to be that if achieving high reward is task A, and predicting perception is task B, logic can be task C. Like task B, it is very different from task A. Like task B, it nonetheless provides useful information. Like task B, it seems to me that a theory of (boundedly) rational agency is missing something without it.
The basic picture is this. Perceptual prediction provides a lot of good feedback about the quality of cognitive algorithms. But if you really want to train up some good cognitive algorithms for yourself, it is helpful to do some imaginative play on the side.
One way to visualize this is an agent making up math puzzles in order to strengthen its reasoning skills. This might suggest a picture where the puzzles are always well-defined (terminating) computations. However, there's no special dividing line between decidable and undecidable problems -- any particular restriction to a decidable class might rule out some interesting (decidable but non-obviously so) stuff which we could learn from. So we might end up just going with any computations (halting or no).
Similarly, we might not restrict ourselves to entirely well-defined propositions. It makes a lot of sense to test cognitive heuristics on scenarios closer to life.
Why do I think sufficiently advanced agents are likely to do this?
Well, just as it seems important that we can learn a whole lot from prediction before we ever take an action in a given type of situation, it seems important that we can learn a whole lot by reasoning before we even observe that situation. I'm not formulating a precise learning-theoretic conjecture, but intuitively, it is related to whether we could reasonably expect the agent to get something right on the first try. Good perceptual prediction alone does not guarantee that we can correctly anticipate the effects of actions we have never tried before, but if I see an agent generate an effective strategy in a situation it has never intervened in before (but has had opportunity to observe), I expect that internally it is learning from perception at some level (even if it is model-free in overall architecture). Similarly, if I see an agent quickly pick up a reasoning-heavy game like chess, then I suspect it of learning from hypothetical simulations at some level.
Again, "on the first try" is not supposed to be a formal learning-theoretic requirement; I realize you can't exactly expect anything to work on the first try with learning agents. What I'm getting at has something to do with generalization.
2: Learning-Theoretic Criteria
Part of the frame has been learning-theory-vs-logic. One might interpret my closing remarks from the previous section that way; I don't know how to formulate my intuition learning-theoretically, but I expect that reasoning helps agents in particular situations. It may be that the phenomena of the previous section cannot be understood learning-theoretically, and only amount to a "better prior over strategies" as I mentioned. However, I don't want it to be a learning-theory-vs-logic argument. I would hope that something learning-theoretic can be said in favor of learning from perception, and in favor of learning from logic. Even if it can't, learning theory is still an important component here, regardless of the importance of logic.
I'll try to say something about how I think learning theory should interface with logic.
Vanessa said some relevant things in a comment, which I'll quote in full:
Heterodox opinion: I think the entire MIRIesque (and academic philosophy) approach to decision theory is confused. The basic assumption seems to be, that we can decouple the problem of learning a model of the world from the problem of taking a decision given such a model. We then ignore the first problem, and assume a particular shape for the model (for example, causal network) which allows us to consider decision theories such as CDT, EDT etc. However, in reality the two problems cannot be decoupled. This is because the type signature of a world model is only meaningful if it comes with an algorithm for how to learn a model of this type.
For example, consider Newcomb's paradox. The agent makes a decision under the assumption that Omega behaves in a certain way. But, where did the assumption come from? Realistic agents have to learn everything they know. Learning normally requires a time sequence. For example, we can consider the iterated Newcomb's paradox (INP). In INP, any reinforcement learning (RL) algorithm will converge to one-boxing, simply because one-boxing gives it the money. This is despite RL naively looking like CDT. Why does it happen? Because in the learned model, the "causal" relationships are not physical causality. The agent comes to believe that taking the one box causes the money to appear there.
In Newcomb's paradox EDT succeeds but CDT fails. Let's consider an example where CDT succeeds and EDT fails: the XOR blackmail. The iterated version would be IXB. In IXB, classical RL doesn't guarantee much because the environment is more complex than the agent (it contains Omega). To overcome this, we can use RL with incomplete models. I believe that this indeed solves both INP and IXB.
Then we can consider e.g. counterfactual mugging. In counterfactual mugging, RL with incomplete models doesn't work. That's because the assumption that Omega responds in a way that depends on a counterfactual world is not in the space of models at all. Indeed, it's unclear how can any agent learn such a fact from empirical observations. One way to fix it is by allowing the agent to precommit. Then the assumption about Omega becomes empirically verifiable. But, if we do this, then RL with incomplete models can solve the problem again.
The only class of problems that I'm genuinely unsure how to deal with is game-theoretic superrationality. However, I also don't see much evidence the MIRIesque approach has succeeded on that front. We probably need to start with just solving the grain of truth problem in the sense of converging to ordinary Nash (or similar) equilibria (which might be possible using incomplete models). Later we can consider agents that observe each other's source code, and maybe something along the lines of this can apply.
Besides the MIRI-vs-learning frame, I agree with a lot of this. I wrote a comment elsewhere making some related points about the need for a learning-theoretic approach. Some of the points also relate to my CDT=EDT sequence; I have been arguing that CDT and EDT don't behave as people broadly imagine (often not having the bad behavior which people broadly imagine). Some of those arguments were learning-theoretic while others were not, but the conclusions were similar either way.
In any case, I think the following criterion (originally mentioned to me by Jack Gallagher) makes sense:
A decision problem should be conceived as a sequence, but the algorithm deciding what to do on a particular element of the sequence should not know/care what the whole sequence is.
Asymptotic decision theory was the first major proposal to conceive of decision problems as sequences in this way. Decision-problem-as-sequence allows decision theory to be addressed learning-theoretically; we can't expect a learning agent to necessarily do well in any particular case (because it could have a sufficiently poor prior, and so still be learning in that particular case), but we can expect it to eventually perform well (provided the problem meets some "fairness" conditions which make it learnable).
As for the second part of the criterion, requiring that the agent is ignorant of the overall sequence when deciding what to do on an instance: this captures the idea of learning from logic. Providing the agent with the sequence is cheating, because you're essentially giving the agent your interpretation of the situation.
Jack mentioned this criterion to me in a discussion of averaging decision theory (AvDT), in order to explain why AvDT was cheating.
AvDT is based on a fairly simple idea: look at the average performance of a strategy so far, rather than its expected performance on this particular problem. Unfortunately, "performance so far" requires things to be defined in terms of a training sequence (counter to the logical-induction philosophy of non-sequential learning).
I created AvDT to try and address some shortcomings of asymptotic decision theory (let's call it AsDT). Specifically, AsDT does not do well in counterlogical mugging. AvDT is capable of doing well in counterfactual mugging. However, it depends on the training sequence. Counterlogical mugging requires the agent to decide on the "probability" of Omega asking for money vs paying up, to figure out whether participation is worth it overall. AvDT solves this problem by looking at the training sequence to see how often Omega pays up. So, the problem of doing well in decision problems is "reduced" to specifying good training sequences. This (1) doesn't obviously make things easier, and (2) puts the work on the human trainers.
Jack is saying that the system should be looking through logic on its own to find analogous scenarios to generalize from. When judging whether a system gets counterlogical mugging right, we have to define counterlogical mugging as a sequence to enable learning-theoretic analysis; but the agent has to figure things out on its own.
This is a somewhat subtle point. A realistic agent experiences the world sequentially, and learns by treating its history as a training sequence of sorts. This is physical time. I have no problem with this. What I'm saying is that if an agent is also learning from analogous circumstances within logic, as I suggested sophisticated agents will do in the first part, then Jack's condition should come into play. We aren't handed, from on high, a sequence of logically defined scenarios which we can locate ourselves within. We only have regular physical time, plus a bunch of hypothetical scenarios which we can define and whose relevance we have to determine.
This gets back to my earlier intuition about agents having a reasonable chance of getting certain things right on the first try. Learning-theoretic agents don't get things right on the first try. However, agents who learn from logic have "lots of tries" before their first real try in physical time. If you can successfully determine which logical scenarios are relevantly analogous to your own, you can learn what to do just by thinking. (Of course, you still need a lot of physical-time learning to know enough about your situation to do that!)
So, getting back to Vanessa's point in the comment I quoted: can we solve MIRI-style decision problems by considering the iterated problem, rather than the single-shot version? To a large extent, I think so: in logical time, all games are iterated games. However, I don't want to have to set an agent up with a training sequence in which it encounters those specific problems many times. For example, finding good strategies in chess via self-play should come naturally from the way the agent thinks about the world, rather than being an explicit training regime which the designer has to implement. Once the rules for chess are understood, the bottleneck should be thinking time rather than (physical) training instances.