I think probabilistic lambda calculus is exactly equivalent to monotone Turing machines, so in the continuous case relevant to Solomonoff induction there is no difference. It’s “unfair” to compare the standard lambda calculus to monotone Turing machines because the former does not admit infinite length “programs.”
...The prototypical example of a prior based on Turing machines is Solomonoff's prior. Someone not familiar with the distinction between Turing-complete and Turing-universal might naively think that a prior based on lambda calculus would be equally powerful. It is not so. Solomonoff's prior guarantees a constant Bayes loss compared to the best computable prior for the job. In contrast, a prior based on lambda calculus can guarantee only a multiplicative loss.
Can you please make this precise?
When I think of "a prior based on lambda calculus", I imagine something like the following. First, we choose some reasonable complexity measure on lambda terms, such as:
Denote the set of lambda-terms by . We then choose s.t. .
Now, we choose some reasonable way to describe lower-semicomputable semimeasures using lambda terms, and make the prior probabilities of different lambda terms proporitional to . It seems to me that the resulting semimeasure dominates every lower-semicomputable semimeasure and is arguably "as good as" the Solomonoff prior. What am I missing?
I think people often confuse the concept of Turing-Complete with Turing-Universal.
Turing-universality is the property a universal Turing machine (UTM) has:[1]
Turing-universality: an abstract machine[2] is Turing-universal if and only if, for any Turing machine , there is a sequence of bits which can be fed into after which exactly imitates .
That is: a Turing-universal machine can be programmed to precisely imitate any Turing machine.
Turing-completeness, on the other hand, works as follows:
Turing-completeness: a mathematical object (such as an abstract machine, a formalism for computation, a cellular automaton, a video game, etc.) is Turing-complete if and only if it can be used to simulate any computable function.
The word "emulation" might replace "simulation" above.
Some critical differences:
The reason for these differences is that Turing-universality is a notion for use within the study of Turing machines, to show that there's something like a programmable computer in the formalism. Turing-completeness, on the other hand, is a notion for comparing different formalisms for computation. It would not be appropriate to require the input/output to be in bits, since bits are an arbitrary choice of the formalism.
As a consequence, Turing-completeness is an inherently weaker property. This has important learning-theoretic consequences. For example, the prototypical example of a Turing-complete formalism is lambda calculus. The prototypical example of a prior based on Turing machines is Solomonoff's prior. Someone not familiar with the distinction between Turing-complete and Turing-universal might naively think that a prior based on lambda calculus would be equally powerful. It is not so. Solomonoff's prior guarantees a constant Bayes loss compared to the best computable prior for the job. In contrast, a prior based on lambda calculus can guarantee only a multiplicative loss.[3]
Turing-completeness is also less well-specified, because the notion of "simulation" is informal. As far as I know, there are still open debates in philosophy about whether a bucket of water is "simulating" a human mind (making computationalist metaphysics effectively trivial -- basically, affirming Dust Theory from Permutation City).
More concretely: are the digits of Turing-complete?[4] Argument in favor: we must be allowed some computation, in order to find the outputs of a computation in our chosen formalism and translate them. We can use this computation to search for whatever answers we want in the digits of .[5]
The above two examples are silly, sure, but there are real cases where the ambiguity matters. Specifically, I've heard there was controversy about Matthew Cook's proof that Wolfram's "rule 110" is Turing-complete. (I haven't been able to find a good reference for this, however.)
I think it isn't difficult to come up with rules which block obvious bad arguments, but I believe there is still some ambiguity about what counts as a valid "simulation" for this purpose. How much computation is allowed in the steps where you translate between formalisms? How do we rule out using this computation "illegitimately" (sneaking the computational work the so-called Turing-complete formalism was supposed to do into the translation step, as with the argument for Turing-completeness of digits of )?
Note: although I think the terminology here follows easily from commonly given definitions, it is not entirely standard. When I try to search the internet for good references on these matters, I commonly see people using "universal" when they clearly mean what I here call "Turing-complete". LLMs also don't make the distinction the way I'm making it here; in my tests, they either don't make a firm distinction, or they say that Turing-completeness applies to formalisms while universality applies to a single machine. (I think this is typically true, but not the right principled distinction to point out; for one thing, a formalism can be such that it has just a single machine within it, and still be Turing-complete.)
The phrase "universal Turing machine" is common in the literature, but "Turing-universal" is not. I think the common practice is to just say "universal" as in "We can prove that this machine is universal by considering...". However, while this usage is clear enough in an appropriate context, it seems better for the general term to stipulate that the universality is across Turing machines, so I'm saying "Turing-universal" here.
An abstract machine is just a mathematically specified machine (in contrast to some specific physical machine).
The reason: a universal turing machine can be "locked in" to behaving like a target Turing machine , given just the bits of the program . Solomonoff's prior therefore merely has to learn those bits, to imitate any other (computable) prior.
Lambda calculus, on the other hand, cannot be "locked in" in this way. We can construct the machine , of course, and we can simulate 's behavior by feeding it translated 1s and 0s. (The standard translation of 0 is , and 1 is .) However, there's no way to create a directive forcing simulated-T to only be fed simulated-bits. A lambda-calculus-based prior might "try to be clever" by finding regularities in the bits, EG, encoding one hundred zeros in a row via a more compact term which expands to one hundred simulated-zeros.
This might sound like an advantage, but it isn't. It means there's no way to learn precisely and only the regularity represented by in lambda calculus. The universal Turing machine can also do the clever thing where it compresses one hundred zeros together, but it is a distinct hypothesis. This makes universal machines strictly more powerful, learning-theoretically, than lambda calculus.
Philip Parker introduced me to this argument.
I don't think this bad argument is too hard to refute, but I don't have the energy right now.