SamEisenstat — LessWrong

I have a feeling that something is missing from the Łukasiewicz stuff you've been doing because I don't think that assigning a quantitative degree of truth is all that vagueness really ought to be about. Vague sentences shift their meanings as the context changes. Vague statements can become crisp statements, and accordingly our language can gain more powerful synonymy. As a slogan perhaps, "vagueness wants to become ambiguity".

Informality

SamEisenstat19d40

I want to use some notion of the power of logic/synonymy relation here. We can always have a vacuous synonymy relation--then all valid substitions preserve truth. Something like propositional logic would be more powerful, and quantificational logic more powerful yet.

Consider how much we can do in Peano arithmetic because of the power of the quantifier. We can use induction to deduce many things about divisibility, the solutions of Diophantine equations, and onward to large parts of mathematics.

For a less precise example, consider reasoning only in coordinates, or with the ability to perform a change of coordinates. A change of coordinates gives a synonymy relation, which we may find simplifies an argument significantly, or lets us "really understand" what's happening. The picture that I have is that of a weaker and a stronger logical system--in the weaker system, we can't express the notion of a change of coordinates, but in the stronger system we can. In fact, it's doubtful whether there's a reasonable language in which we can express many other thoughts, but not changes of coordinates, not even by some "coding trick". So, we can take this example as a metaphor, or as a premonition of a generalization of the idea struggling to come into existence. I hope it is still able to shed light on what I mean by the power of a synonymy relation.

Anyway, here you want a normative direction pointing from vaguer languages to crisper languages. The crisper languages are better because they have synonymy. As you say "you don't really understand something unless you can translate it into a transparent-context description".

I want to construe this as about the power of synonymy relations. Crisp languages are better because we can use them to have interesting sequences of thoughts, because their synonymy does something for us. And so not just any synonymy relation are better--powerful ones are better.

I strongly doubt, as may be clear, the possibility of a "simple" definition that could measure power in terms of e.g. elementary syntactic criteria, but I think that we still have to admit that it's what we want here.

In Defense of Goodness

SamEisenstat19d10

I worry that this equivocates between "we make our values in an ongoing way together" and simpler kinds of aggregation (analogous to voting, linear combinations, etc.). Well, strictly speaking you haven't spelled out the former position--maybe I should just assert that it's the only plausible way to interpret the creation of the individual out of "sub-self" entities, and that ultimately I'd like to describe what different individuals do together in this way also.

To explain somewhat further, I'll assert that entities to which the concept of reinforcement learning is appropriate (say, Q-learners with function approximation, taking some sort of observation as the "state")--and which I think I am right to construe as exemplifying the sort of sub-self entities that you'd like this discussion to describe--don't have values. (The truth of that assertion probably depends on how I'm intending some of the words in ways that readers may not find obvious.) Most simply, they don't really have beliefs; they don't have anything we could call "world models" in a strong sense. To ask what they'd think of some possible world, we have to invent a way of presenting that world to their observation stream, opening the possibility of things like framing effects. If we find something that looks like incoherence, we don't have a way of asking them to reflect on it. (We could try to invent ways to do such things, but if we equip them with such, then I want to say that they are no longer "mere" reinforcement learners.)

As such, they can't vote, assign utilities, Nash bargain, etc. I think that there's an interesting phenomenon of the arising of selves. I think that economics-flavored discussion of rational agent concepts can often distract from what's most important, but in cases like this, can point in a productive direction. And here we see the disanalogy between the arising of selves and simple aggregation.

(Indeed, we see the limits of economic-flavored analysis is that fact that what is really missing from reinforcement learners is the space of possible worlds, which is classically taken as the precondition for analysis of rationality rather than something that can be evaluated as rational or irrational.)

Reflective oracles as a solution to the converse Lawvere problem

SamEisenstat9mo10

Yeah, it's a bit weird to call this particular problem the converse Lawvere problem. The name suggests that there is a converse Lawvere problem for any concrete category and any base object $Y$--namely, to find an object $X$ and a map $f: X \to Y^X$ such that every morphism $X \to Y$ has the form $f(x)$ for some $x \in X$. But, among a small group of researchers, we ended up using the name for the case where $Y$ is the unit interval and the category is that of topological spaces, or alternatively a reasonable topological-like category. In this post, I play with that problem a bit, since I pass to considering functions with computability properties, rather than a topological definition. (I also don't exactly define a category here...) I agree that what's going on here is a UTM property.

I think this context of considering different categories is interesting. There's an interplay between diagonal-lemma-type fixed-point theorems and topological fixed-point theorems going on in work like probabilistic truth predicates, reflective oracles, and logical induction. For more on analogies between fixed-point theorems, you can see e.g. A Universal Approach to Self-Referential Paradoxes, Incompleteness and Fixed Points and Scott Garrabrant's fixed point sequence.

Shortform

SamEisenstat1y215

I think a lot of this is factual knowledge. There are five publicly available questions from the FrontierMath dataset. Look at the last of these, which is supposed to be the easiest. The solution given is basically "apply the Weil conjectures". These were long-standing conjectures, a focal point of lots of research in algebraic geometry in the 20th century. I couldn't have solved the problem this way, since I wouldn't have recalled the statement. Many grad students would immediately know what to do, and there are many books discussing this, but there are also many mathematicians in other areas who just don't know this.

In order to apply the Weil conjectures, you have to recognize that they are relevant, know what they say, and do some routine calculation. As I suggested, the Weil conjectures are a very natural subject to have a problem about. If you know anything about the Weil conjectures, you know that they are about counting points of varieties over a finite field, which is straightforwardly what the problems asks. Further, this is the simplest case, that of a curve, which is e.g. what you'd see as an example in an introduction to the subject.

Regarding the calculation, parts of it are easier if you can run some code, but basically at this point you've following a routine pattern. There are definitely many examples of someone working out what the Weil conjectures say for some curve in the training set.

Further, asking Claude a bit, it looks like are particularly common cases here. So, if you skip some of the calculation and guess, or if you make a mistake, you have a decent chance of getting the right answer by luck. You still need the sign on the middle term, but that's just one bit of information. I don't understand this well enough to know if there's a shortcut here without guessing.

Overall, I feel that the benchmark has been misrepresented. If this problem is representative, it seems to test broad factual knowledge of advanced mathematics more than problem-solving ability. Of course, this question is marked as the easiest of the listed ones. Daniel Litt says something like this about some other problems as well, but I don't really understand how routine he's saying that they are, are I haven't tried to understand the solutions myself.

The consistent guessing problem is easier than the halting problem

SamEisenstat2y10

Well, I guess describing a model of a computably enumerable theory, like PA or ZFC counts. We could also ask for a model of PA that's nonstandard in a particular way that we want, e.g. by asking for a model of , and that works the same way. Describing a reflective oracle has low solutions too, though this is pretty similar to the consistent guessing problem. Another one, which is really just a restatement of the low basis theorem, but perhaps a more evocative one, is as follows. Suppose some oracle machine $T$ has the property that there is some oracle that would cause it to run forever starting from an empty tape. Then, there is a low such oracle.

(Technically, these aren't decision problems, since they don't tell us what the right decision is, but just give us conditions that whatever decisions we make have to satisfy. I don't know what to say instead; this is more general then e.g. promise problems. Maybe I'd use something like decision-class problems?)

All these have a somewhat similar flavour by the nature of the low basis theorem. We can enumerate a set of constraints, but we can't necessarily compute a single object satisfying all the constraints. But the theorem tells us that there's a low such object.

I don't know what the situation is for subsets of the digits of Chaitin's constant. Can it be as hard as the halting problem? You might try to refute this using some sort of incompressibility idea. Can it be low? I'd expect not, at least for computable subsets of indices of positive density. Plausibly computability theorists know about this stuff. They do like constructing posets of Turing degrees of various shapes, and they know about which shapes can be realized between $0$ and the halting degree $0^{'}$ . (E.g. this paper.)

The consistent guessing problem is easier than the halting problem

SamEisenstat2y*30

Yeah, there's a sort of trick here. The natural question is uniform--we want a single reduction that can work from any consistent guessing oracle, and we think it would be cheating to do different things with different oracles. But this doesn't matter for the solution, since we produce a single consistent guessing oracle that can't be reduced to the halting problem.

This reminds me of the theory of enumeration degrees, a generalization of Turing degrees allowing open-set-flavoured behaviour like we see in partial recursive functions; if the answer to an oracle query is positive, the oracle must eventually tell you, but if the answer is negative it keeps you waiting indefinitely. I find the theory of enumeration degrees to be underemphasized in discussion of computability theory, but e.g. Odifreddi has a chapter on it all the way at the end of Classical Recursion Theory Volume II.

The consistent guessing problem isn't a problem about enumeration degrees. It's using a stronger kind of uniformity--we want to be uniform over oracles that differently guess consistently, not over a set of ways to give the same answers, but to present them differently. But there is again a kind of strangeness in the behaviour of uniformity, in that we get equivalent notions if we do or do not ask that a reduction between sets , $B$ be a single function that uniformly enumerates $A$ from enumerations of $B$ , so there might be some common idea here. More generally, enumeration degrees feel like they let us express more naturally things that are a bit awkward to say in terms of Turing degrees--it's natural to think about the set of computations that are enumerable/ $Σ_{1}$ in a set--so it might be a useful keyword.

The consistent guessing problem is easier than the halting problem

SamEisenstat2y192

Note that Andy Drucker is not claiming to have discovered this; the paper you link is expository.

Since Drucker doesn't say this in the link, I'll mention that the objects you're discussing are conventionally know as PA degrees. The PA here stands for Peano arithmetic; a Turing degree solves the consistent guessing problem iff it computes some model of PA. This name may be a little misleading, in that PA isn't really special here. A Turing degree computes some model of PA iff it computes some model of ZFC, or more generally any theory capable of expressing arithmetic.

Drucker also doesn't mention the name of the theorem that this result is a special case of: the low basis theorem. "Low" here suggests low computability strength. Explicitly, a Turing degree $A$ is low if solving the halting problem for machines with an oracle for $A$ is equivalent (in the sense of reductions) to solving the halting problem for Turing machines without any oracle. The low basis theorem says that every computable binary tree has a low path. We are able to apply the theorem to this problem, concluding that there is a consistent guessing oracle $C$ which is low. So, we cannot use this oracle to solve the halting problem; if we could, then an oracle machine with access to $C$ would be at least as strong as an oracle machine with access to the halting set, but we know that the halting set suffices to compute the halting problem for such a machine, which is a contradiction.

Various other things are known about PA degrees, though I'm not sure what might be of interest to you or others here. This stuff is discussed in books on computability theory, like Robert Soare's Computability Theory and Applications. Though, I thought I learned about PA degrees from his earlier book, but now I don't see them in there, so maybe I just learned about PA degrees around the same time, possibly following my interest in your and others' work on reflective oracles. The basics of computability theory--Turing degrees, the Turing jump, and the arithmetic hierarchy in the computability sense--may be of interest to the extent there is anything there that you're not already familiar with. With regard to PA degrees, in particular people like to talk about diagonally nonrecursive functions. This works as follows. Let $φ_{n}$ denote the $n$ th partial computable function according to some Goedel numbering. The PA degrees are exactly the Turing degrees that compute functions $f : N \to 2$ such that $f (n) \neq φ_{n} (n)$ for all numbers $n$ at which the right-hand side is defined. This is suggestive of the ideas around reflective oracles, the Lawvere fixed-point theorem, etc. But I wouldn't say that when I think about these things, I think of them in terms of diagonally nonrecursive functions; plausibly that's not an interesting direction to point people in.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments