Eigil Rischel

I maintain a reading list on Goodreads. I have a personal website with some blog posts, mostly technical stuff about math research. I am also on github

OpenAI Solves (Some) Formal Math Olympiad Problems

Most of the heavy lifting in these proofs seem to be done by the Lean tactics. The comment "arguments to `nlinarith`

are fully invented by our model" above a proof which is literally the single line
`nlinarith [sq_nonneg (b - a), sq_nonneg (c - b), sq_nonneg (c - a)]`

makes me feel like they're trying too hard to convince me this is impressive.

The other proof involving multiple steps is more impressive, but this still feels like a testament to the power of "traditional" search methods for proving algebraic inequalities, rather than an impressive AI milestone. People on twitter have claimed that some of the other problems are also one-liners using existing proof assistant strategies - I find this claim totally plausible.

I would be much more impressed with an AI-generated proof of a combinatorics or number theory IMO problem (eg problem 1 or 5 from 2021). Someone with more experience in proof assistants probably has a better intuition for which problems are hard to solve with "dumb" searching like `nlinarith`

, but this is my guess.

Eigil Rischel's Shortform

Yes, that's right. It's the same basic issue that leads to the Anvil Problem

Eigil Rischel's Shortform

Compare two views of "the universal prior"

- AIXI: The external world is a Turing machine that receives our actions as input and produces our sensory impressions as output. Our prior belief about this Turing machine should be that it's simple, i.e. the Solomonoff prior
- "The embedded prior": The "entire" world is a sort of Turing machine, which we happen to be
*one component of*in some sense. Our prior for this Turing machine should be that it's simple (again, the Solomonoff prior), but we have to condition on the observation that it's complicated enough to contain observers ("Descartes' update"). (This is essentially Naturalized induction

I think of the difference between these as "solipsism" - AIXI gives its own existence a distinguished role in reality.

Importantly, the laws of physics seem fairly complicated in an absolute sense - clearly they require tens^{[1]} or hundreds of bits to specify. This is evidence against solipsism, because on the solipsistic prior, we expect to interact with a largely empty universe. But they *don't* seem much more complicated than necessary for a universe that contains at least one observer, since the minimal source code for an observer is probably also fairly long.

More evidence against solipsism:

- The laws of physics don't seem to privilege my frame of reference. This is a pretty astounding coincidence on the solipsistic viewpoint - it means we randomly picked a universe which simulates some observer-independent laws of physics, then picks out a specific point inside it, depending on some fairly complex parameters, to show me.
- When I look out into the universe external to my mind, one of the things I find there is my brain, which really seems to contain a copy of my mind. This is another pretty startling coincidence on the solipsistic prior, that the external universe being run happens to contain this kind of representation of the Cartesian observe

This is obviously a very small number but I'm trying to be maximally conservative here. ↩︎

davidad's Shortform

Right, that's essentially what I mean. You're of course right that this doesn't let you get around the existence of nonmeasurepreserving automorphisms. I guess what I'm saying is that, if you're trying to find a prior on , you should try to think about what system of finite measurements this is idealizing, and see if you can apply a symmetry argument to those bits. Which isn't always the case! You can only apply the principle of indifference if you're actually indifferent. But *if* it's the case that is generated in a way where "there's no reason to suspect that any bit should be rather than , or that there should be correlations between the bits", then it's of course not the case that has this same property. But of course you still need to look at the actual situation to see what symmetry exists or doesn't exist.

Haar measures are a high-powered way of doing this, I was just thinking about taking the iterated product measure of the uniform probability measure on (justified by symmetry considerations). You can of course find maps from to all sorts of spaces but it seems harder to transport symmetry considerations along these maps.

The Promise and Peril of Finite Sets

I strongly recommend Escardó's Seemingly impossible functional programs, which constructs a function `search : ((Nat -> Bool) -> Bool) -> (Nat -> Bool)`

which, given a predicate on infinite bitstrings, finds an infinite bitstring satisfying the predicate if one exists. (In other words, if `p : (Nat -> Bool) -> Bool`

and any bitstring at all satisfies `p`

, then `p (search p) == True`

(Here I'm intending `Nat`

to be the type of natural numbers, of course) .

davidad's Shortform

Ha, I was just about to write this post. To add something, I think you can justify the uniform measure on bounded intervals of reals (for illustration purposes, say ) by the following argument: "Measuring a real number " is obviously simply impossible if interpreted literally, containing an infinite amount of data. Instead this is supposed to be some sort of idealization of a situation where you can observe "as many bits as you want" of the binary expansion of the number (choosing another base gives the same measure). If you now apply the principle of indifference to each measured bit, you're left with Lebesgue measure.

It's not clear that there's a "right" way to apply this type of thinking to produce "the correct" prior on (or or any other non-compact space.

I’m no longer sure that I buy dutch book arguments and this makes me skeptical of the "utility function" abstraction

This seems somewhat connected to this previous argument. Basically, coherent agents can be modeled as utility-optimizers, yes, but what this really proves is that almost any behavior fits into the model "utility-optimizer", not that coherent agents must necessarily look like our intuitive picture of a utility-optimizer.

Paraphrasing Rohin's arguments somewhat, the arguments for universal convergence say something like "for "most" "natural" utility functions, optimizing that function will mean acquiring power, killing off adversaries, acquiring resources, etc". We know that all coherent behavior comes from a utility function, but it doesn't follow that most coherent behavior exhibits this sort of power-seeking.

We need a standard set of community advice for how to financially prepare for AGI

My impression from skimming a few AI ETFs is that they are more or less just generic technology ETFs with different branding and a few random stocks thrown in. So they're not catastrophically worse than the baseline "Google, Microsoft and Facebook" strategy you outlined, but I don't think they're better in any real way either.

Finite Factored Sets

This is really cool!

The example of inferring from the independence of and reminds me of some techniques discussed in Elements of Causal Inference. They discuss a few different techniques for 2-variable causal inference.

One of them, which seems to be essentially analogous to this example, is that if are real-valued variables, then if the regression error (i.e for some constant ) is independent of , it's highly likely that is downstream of . It sounds like factored sets (or some extension to capture continuous-valued variables) might be the right general framework to accommodate this class of examples.

I think we're basically in agreement here (and I think your summary of the results is fair, I was mostly pushing back on a too-hyped tone coming from OpenAI, not from you)