Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Can HCH epistemically dominate Ramanujan?

8Wei Dai

2Wei Dai

1William_S

-3avturchin

New Comment

4 comments, sorted by Click to highlight new comments since: Today at 7:13 PM

If HCH is ascription universal, then it should be able to epistemically dominate an AI theorem-prover that reasons similarly to how Ramanujan reasoned.

My guess is that HCH has to reverse engineer the theorem prover, figure out how/why it works, and then reproduce the same kind of reasoning. This seems plausible to me but (if this is what Paul has in mind too) I'm not sure why Paul assumes HCH to be able to do this quickly:

Moreover, I’ll assume that it’s universal with some small blow-up, i.e. that if we give HCH a budget k N, then it epistemically dominates any computation that could be simulated by HCH with budget N.

It seems to me that it could easily take a super-linear (or even super-polynomial) budget to reverse engineer how a computation works since it could require disentangling opaque and convoluted algorithms, and acquiring new theoretical understandings that would explain how/why the algorithms work. In the case of "an AI theorem-prover that reasons similarly to how Ramanujan reasoned" this might for example require gaining a good theoretical understanding of logical uncertainty and seeing how the AI theorem-prover uses an approximate solution to logical uncertainty to guide its proof search.

It occurs to me that if the overseer understands everything that the ML model (that it's training) is doing, and the training is via some kind of local optimization algorithm like gradient descent, the overseer is essentially manually programming the ML model by gradually nudging it from some initial (e.g., random) point in configuration space. If this is a good way to think about what's happening, we could generalize the idea by letting the overseer use other ways to program the model (for example by using standard programming methodologies adapted to the model class) which can probably be much more efficient than just using the "nudging" method. This suggests that maybe IDA has less to do with ML than it first appears, or maybe that basing IDA on ML only makes sense if ML goes beyond local optimization at some point (so that the analogy with "nudging" breaks down), or we have to figure out how to do IDA safely without the overseer understanding everything that the ML model is doing (which is another another way that the analogy could break down).

It seems like for Filtered-HCH, the application in the post you linked to, you might be able to do a weaker version where you label any computation that you can't understand in kN steps as problematic, only accepting things you think you can efficiently understand. (But I don't think Paul is arguing for this weaker version).

Srinivasa Ramanujan is an Indian mathematician who is famously known for solving math problems with sudden and inexplicable flashes of insight. From his Wikipedia page:

and

His style of mathematical reasoning was completely novel to the mathematicians around him, and led to groundbreaking research:

If HCH is ascription universal, then it should be able to epistemically dominate an AI theorem-prover that reasons similarly to how Ramanujan reasoned. But I don't currently have any intuitions as to why explicit verbal breakdowns of reasoning should be able to replicate the intuitions that generated Ramanujan's results (or any style of reasoning employed by any mathematician since Ramanujan, for that matter).

I do think explicit verbal breakdowns of reasoning are adequate for verifying the validity of Ramanujan's results. At the very least, mathematicians since Ramanujan have been able to verify a majority of his claims.

But, as far as I'm aware, there has not been a single mathematician with Ramanujan's style of reasoning since Ramanujan himself. This makes me skeptical that explicit verbal breakdowns of reasoning would be able to replicate the intuitions that generated Ramanujan's results, which I understand (perhaps erroneously) to be a necessary prerequisite for HCH to be ascription universal.