Paradigms for computation

Nice post Cole.

I think I'm sympathetic to your overall point. That said, I am less pessimistic than you that Neural Network computation can never be understood beyond the macroscopic level ' what does it do' .

The Turing machine paradigm is just one out of many paradigms to understand computation. It would be a mistake to be too pessimistic based on just the failure of the ur-classical TM paradigm.

Computational learning theory's bounds are vacuous for realistic machine learning. I would guess, and I say this as a nonexpert, that this is chiefly due to

(i) a general immaturity of the field of computational complexity, i.e. most of the field is conjectures, it's hard to prove much about time complexity even if we're quite confident the statements are likely true

(ii) computational learning theory grew out of classical learning theory and has not fully incorporated the lessons of singular learning theory. Much of the field is working in the wrong 'worst-case/pessimistic' framework when they should be thinking in terms of Bayesian inference & simplicity/degeneracy bias. Additionally, there is perhaps too much focus on exact discrete bounds when instead one should be thinking in terms of smooth relaxation and geometry of loss landscapes.

That said, I agree with you that the big questions are currently largely open.

[-]mishka5mo80

Thanks for writing this! I'd like to put together a few hopefully relevant remarks.

The gap between theory and practice is strongly pronounced in computer science. The theory is governed by its inner logic (what's more tractable, what leads to more rich, more intellectually satisfying formalisms, and so on). As a result, the fit of theory to practical matters is often so-so.

For example, with run-time complexity, the richest, most developed, most taught to students theory deals with worst-case complexity, whereas in practice average complexity tends to be more relevant (e.g. simplex method, quicksort, and so on).

Computers operating in real world are at the very least "computations with an oracle" (somewhat modeled by type-2 Turing machines), with the world being the oracle. If one believes together with Penrose that the world is not computable, then the "Penrose argument" is falling apart. If one believes together with some other thinkers that the world is computable, then any valid version of the "Penrose argument" would be equally applicable to humans.

In reality, even the "computations with an oracle" paradigm is not quite adequate for the real world, since when computers write into an external world and read from it, they can activate "computational circuits" in the external world, and therefore the notion of strict boundary between the internal computations within the computer and the dynamics of the external world becomes inadequate.

The theoretical computational paradigms are not very hardware-friendly (e.g. neither Turing machines, nor lambda-calculus map to our computers all that well). At the same time, a typical computer does not have well-developed autonomous capability to expand memory in an unlimited fashion and, therefore, is not Turing-complete in a formal sense (e.g. if memory is limited, the halting problem stops being undecidable, and so on).

In particular, these days we are missing a GPU-friendly computational paradigm. It is actually possible to create a neuromorphic computational paradigm, and such paradigms go decades back, but a typical incarnation would not be GPU-friendly, so a sizeable gap between theory and practice would remain.

For example, results of Turing universality of RNNs with memory expanding in an unlimited fashion go back to at least 1987. However, originally those results were using unlimited precision reals and were abusing binary expansions of those reals as tapes of Turing machines. This was incompatible with the key property of practical neural computations, namely their relative robustness to noise. This incompatibility is yet another example of the gap between theory and practice.

More recently, this particular incompatibility has been fixed. One can take an RNN and replace neurons handling scalar flows with neurons handling vector flows, and one can consider countably-sized vectors having finite number of non-zero coordinates at any given moment of time, replacing linear combinations of scalar inputs with linear combinations of vector inputs ("generalized attention"). Then one gets a theoretically nice neuromorphic platform for stream-oriented computations which is more robust to noise. It is still not GPU-friendly in its fully generality.

These remarks above are about computational architectures. Much more can be said about the great diversity of possible methods of generating (synthesizing) computations. When our architectures are neuromorphic rather than discrete, we have greater variety of such methods potentially available (not just gradient methods, but, for example, decomposition of connectivity matrices into series, and so on).

[-]Noosphere895mo53

My take on how recursion theory failed to be relevant for today's AI is that it turned out that what a machine could do if unconstrained basically didn't matter at all, and in particular it basically didn't matter what limits an ideal machine could do, because once we actually impose constraints that force computation to use very limited amounts of resources, we get a non-trivial theory and importantly all of the difficulty of explaining how humans do stuff lies here.

There was too much focus on "could a machine do something at all?" and not enough focus on "what could a machine with severe limitations could do?"

The reason is that in a sense, it is trivial to solve any problem with a machine if I'm allowed zero constraints except that the machine has to exist in a mathematical sense.

A good example of this is the paper on A Universal Hypercomputer, which shows how absurdly powerful computation can be if you are truly unconstrained:

https://arxiv.org/abs/1806.08747

Or davidad's comment on how every optimization problem is trivial under worst-case scenarios:

https://www.lesswrong.com/posts/yTvBSFrXhZfL8vr5a/worst-case-thinking-in-ai-alignment#N3avtTM3ESH4KHmfN

There are other problems, like embeddedness issues, but this turned out to be a core issue, and it turned out that the constants and constraints mattered more than recursion theory thought.

I'll flag here that while it's probably true that a future paradigm will involve more online learning/continual learning, LLMs currently don't do this, and after they're trained their weights/neurons are very much frozen. Indeed, I think this is a central reason why LLMs currently underperform their benchmarks, and is why I don't expect pure LLMs to work out as a paradigm for AGI in practice:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF

Unfortunately, trained neural networks are the things that perform learning and inference online, during deployment. It seems like a bad sign for alignment if we can only understand the behavior of A.G.I. indirectly.

On the argument against simple algorithms that learn generally and have high performance existing, I think a key assumption of LW is that our physics is pretty simple in Kolmogorov complexity (perhaps excluding the constants), and if we buy this, then the true complexity of the world is upper bounded low enough that glass-box learners have real use.

That doesn't mean they have to be findable easily, though.

IMO, the most successful next paradigm will require us to much more strongly focus on the behavioral properties, rather than trying to hope for mechanistic interpretability success, and we will likely have to focus on generalizable statements about the input/output behavior, and mostly ignore the model's structure, since it's probably going to be too messy to work with by default.

This is why I'm more bullish on Vanessa Kosoy's IBP than basically any other theoretical agenda except Natural Abstractions.

[-]Cole Wyeth5mo20

My take on how recursion theory failed to be relevant for today's AI is that it turned out that what a machine could do if unconstrained basically didn't matter at all, and in particular it basically didn't matter what limits an ideal machine could do, because once we actually impose constraints that force computation to use very limited amounts of resources, we get a non-trivial theory and importantly all of the difficulty of explaining how humans do stuff lies here.

That's partially true (computational complexity is now much more active than recursion theory), but it doesn't explain the failure of the Penrose-Lucas argument.

I'll flag here that while it's probably true that a future paradigm will involve more online learning/continual learning, LLMs currently don't do this, and after they're trained their weights/neurons are very much frozen.

While their weights are frozen and they don't really do continual learning, they do online learning.

On the argument against simple algorithms that learn generally and have high performance existing, I think a key assumption of LW is that our physics is pretty simple in Kolmogorov complexity (perhaps excluding the constants), and if we buy this, then the true complexity of the world is upper bounded low enough that glass-box learners have real use.

Well, it's not a key assumption of mine :)

I mostly agree with your observation, but I also think it's funny when people say this kind of thing to me: I've been reading lesswrong for like an hour a day for years (maybe, uh, more than an hour if I'm being honest...), and I produce a non-negligible fraction of the rationality/epistemics content. So - the fact that this is sometimes taken as an assumption on lesswrong doesn't seem to move me very much - I wonder if you mean to accept or to question this assumption? Anyway, I don't mean this as a criticism of your raising the point.

Physics aside, I think that when you take indexical complexity into account, it's not true.

[-]Noosphere895mo20

but it doesn't explain the failure of the Penrose-Lucas argument.

The failure of the Penrose-Lucas argument is that Godel's incompleteness theorem doesn't let you derive the conclusion he derived, because it only implies that you cannot use a computably enumerable set of axioms to make all of mathematics sound and complete, and critically this doesn't mean you cannot automate a subset of mathematics that is relevant.

There's an argument really close to this with the Chinese Room where I pointed out that intuitions from our own reality that include lots of constraints fundamentally fail to transfer over to hypotheticals, and this is a really important example of why arguments around AI need to actually attend to the constraints that are relevant in specific worlds, because without them it's trivial to have strong AI solve any problem:

https://www.lesswrong.com/posts/zxLbepy29tPg8qMnw/refuting-searle-s-wall-putnam-s-rock-and-johnson-s-popcorn#wbBQXmE5aAfHirhZ2

Really, a lot of the issues with the arguments against strong AI made by philosophers is that they have no sense of scale/sense of what mathematical theorems are actually saying, and thus fail to understand what's actually been said, combined with way overextrapolating their intuitions into cases where the intuitions have been deliberately made to fail to work.

While their weights are frozen and they don't really do continual learning, they do online learning.

While I agree In-context learning does give them some form of online learning, which at least partially explains why LLMs succeed (combined with their immense amount of data and muddling through extremely data inefficient algorithms compared to brains, which is a known weakness that could plausibly lead to the death of pure LLM scaling by 2028-2030, though note that doesn't necessarily mean timelines get that much longer), this currently isn't enough to automate lots of jobs away, and fairly critically might not be good enough in practice with realistic compute and data constraints to compete with better continual learning algorithms.

To be clear, this doesn't mean any future paradigm will be more understandable, because they will use more online/continual learning than current LLMs.

Well, it's not a key assumption of mine :)
I mostly agree with your observation, but I also think it's funny when people say this kind of thing to me: I've been reading lesswrong for like an hour a day for years (maybe, uh, more than an hour if I'm being honest...), and I produce a non-negligible fraction of the rationality/epistemics content. So - the fact that this is sometimes taken as an assumption on lesswrong doesn't seem to move me very much - I wonder if you mean to accept or to question this assumption? Anyway, I don't mean this as a criticism of your raising the point.
Physics aside, I think that when you take indexical complexity into account, it's not true.

I'm mostly in the "pointing out the assumption exists and is used widely", rather than questioning or accepting it, because I wanted to understand how MIRI could believe simple algorithms that learn generally and have high performance existing despite the theorem, and this was the first thing that came to my mind.

And while AIXI is a useful toy model of how intelligence works, it's treatment of indexical complexity/the first person view as being privileged is an area where I seriously start to departure from it, and one of the biggest reasons I'm much more of a fan of IBP than AIXI is that it actually makes a serious effort to actually remove the first-person privilege often seen in many frameworks of intelligence.

And this actually helps us to pretty much entirely defuse the Solomonoff prior's potential malignness, because we no longer have immense probability mass on malignant simulation hypotheses, and the simulation hypotheses that do get considered can't trick the IBP agent into changing it's values.

So the indexical complexity isn't actually relevant here, thankfully (though this doesn't guarantee that the world is truly low complexity in the sense necessary for LW's view of simple, general high performing algorithms to actually work).

And the nice thing is that it no longer requires montonic preferences:

https://www.lesswrong.com/posts/dPmmuaz9szk26BkmD/vanessa-kosoy-s-shortform?commentId=kDaAdjc3YbnuS2ssp

https://www.lesswrong.com/posts/DobZ62XMdiPigii9H/non-monotonic-infra-bayesian-physicalism

[-]Cole Wyeth5mo122

Whether you use AIXI or IBP, a continual learning algorithm must contend with indexical uncertainty, which means it must contend with indexical complexity in some fashion.

As far as I understand, IBP tries to evaluate hypotheses according to the complexity of the laws of physics, not the bridge transformation (or indexical) information. But that cannot allow it to overcome the fundamental limitations of the first-person perspective faced by an online learner as proved by Shane Legg. That’s a fact about the difficulty of the problem, not a feature (or bug) of the solution.

I agree that Solomonoff induction faces a potential malign prior problem and i default to believing Vanessa that IBP solves this.

By the way, Scott Aaronson has also made this rebuttal to Searle in his paper on why philosophers should care about computational complexity.

[-]Noosphere894mo20

Re Scott Aaronson's rebuttal, I remember he focused on computational complexity, not indexical complexity, and while I think his paper is very useful and makes good points, strictly speaking it doesn't directly address the argument, while my comment above does.

Though I do think his paper is underrated amongst philosophers (barring the parts about how complexity theory relates to our specific universe).

But that cannot allow it to overcome the fundamental limitations of the first-person perspective faced by an online learner as proved by Shane Legg. That’s a fact about the difficulty of the problem, not a feature (or bug) of the solution.

My question which I forgot to ask is why can't we reduce the first-person perspective to the third person perspective and have our lower bounds be based on the complexity of the world, rather than the agent?

[-]Cole Wyeth4mo53

Because we can’t build agents with a “gods-eye” view of the universe, we can only build agents inside the universe with limited sensors.

[-]Eleni Angelou5mo30

Is this the post you had in mind? https://www.lesswrong.com/posts/Bi4yt7onyttmroRZb/executable-philosophy-as-a-failed-totalizing-meta-worldview

[-]Cole Wyeth5mo20

Thanks, but no. The post I had in mind was an explanation of a particular person's totalizing meta-worldview, which had to do with evolutionary psychology. I remember recognizing the username - also I have that apparently common form of synesthesia where letters seem to have colors and I vaguely remember the color of it (@lc? @lsusr?) but not what is was.

^{^}

He wasn't exactly a logician, but E. Mark Gold studied language learning in the limit in the 1960s, and I think this is an exception in spirit.

^{^}

This may justify the developmental approach to interpretability through singular learning theory. Pretraining may be the last point at which we have a chance to understand the details of what is going on.

^{^}

Reader familiar with agent foundations may guess that constructing a rigorous theory of logical uncertainty should explain heuristic reasoning. But I have never seen a theory of logical uncertainty executed to top benchmarks on a practical problem - and though I think this sort of idea is promising and may yield fruit eventually, it is not clear that a formally derived LI algorithm will defeat loosely inspired heuristic methods on the same sort of problems. So I think this only pushes the question one level higher.

LESSWRONG
LW

LESSWRONG
LW

67

Paradigms for computation

67

Ω 33

67

Ω 33

Recursion theory

Computational learning theory

Bayesian decision theory... as a paradigm of computation?

Paradigms are for generating good ideas