LESSWRONG
LW

Computer ScienceAIWorld Modeling
Frontpage

64

Paradigms for computation

by Cole Wyeth
30th Jun 2025
AI Alignment Forum
14 min read
8

64

Ω 31

Computer ScienceAIWorld Modeling
Frontpage

64

Ω 31

Paradigms for computation
10Alexander Gietelink Oldenziel
8mishka
5Noosphere89
2Cole Wyeth
2Noosphere89
12Cole Wyeth
3Eleni Angelou
2Cole Wyeth
New Comment
8 comments, sorted by
top scoring
Click to highlight new comments since: Today at 5:52 PM
[-]Alexander Gietelink Oldenziel13d102

Nice post Cole. 

I think I'm sympathetic to your overall point. That said, I am less pessimistic than you that Neural Network computation can never be understood beyond the macroscopic level ' what does it do' .

The Turing machine paradigm is just one out of many paradigms to understand computation. It would be a mistake to be too pessimistic based on just the failure of the ur-classical TM paradigm. 

Computational learning theory's bounds are vacuous for realistic machine learning. I would guess, and I say this as a nonexpert, that this is chiefly due to 

(i) a general immaturity of the field of computational complexity, i.e. most of the field is conjectures, it's hard to prove much about time complexity even if we're quite confident the statements are likely true 

(ii) computational learning theory grew out of classical learning theory and has not fully incorporated the lessons of singular learning theory. Much of the field is working in the wrong 'worst-case/pessimistic' framework when they should be thinking in terms of Bayesian inference & simplicity/degeneracy bias. Additionally, there is perhaps too much focus on exact discrete bounds when instead one should be thinking in terms of smooth relaxation and geometry of loss landscapes. 

That said, I agree with you that the big questions are currently largely open. 

Reply
[-]mishka12d80

Thanks for writing this! I'd like to put together a few hopefully relevant remarks.

The gap between theory and practice is strongly pronounced in computer science. The theory is governed by its inner logic (what's more tractable, what leads to more rich, more intellectually satisfying formalisms, and so on). As a result, the fit of theory to practical matters is often so-so.

For example, with run-time complexity, the richest, most developed, most taught to students theory deals with worst-case complexity, whereas in practice average complexity tends to be more relevant (e.g. simplex method, quicksort, and so on).

Computers operating in real world are at the very least "computations with an oracle" (somewhat modeled by type-2 Turing machines), with the world being the oracle. If one believes together with Penrose that the world is not computable, then the "Penrose argument" is falling apart. If one believes together with some other thinkers that the world is computable, then any valid version of the "Penrose argument" would be equally applicable to humans.

In reality, even the "computations with an oracle" paradigm is not quite adequate for the real world, since when computers write into an external world and read from it, they can activate "computational circuits" in the external world, and therefore the notion of strict boundary between the internal computations within the computer and the dynamics of the external world becomes inadequate.

The theoretical computational paradigms are not very hardware-friendly (e.g. neither Turing machines, nor lambda-calculus map to our computers all that well). At the same time, a typical computer does not have well-developed autonomous capability to expand memory in an unlimited fashion and, therefore, is not Turing-complete in a formal sense (e.g. if memory is limited, the halting problem stops being undecidable, and so on).

In particular, these days we are missing a GPU-friendly computational paradigm. It is actually possible to create a neuromorphic computational paradigm, and such paradigms go decades back, but a typical incarnation would not be GPU-friendly, so a sizeable gap between theory and practice would remain.

For example, results of Turing universality of RNNs with memory expanding in an unlimited fashion go back to at least 1987. However, originally those results were using unlimited precision reals and were abusing binary expansions of those reals as tapes of Turing machines. This was incompatible with the key property of practical neural computations, namely their relative robustness to noise. This incompatibility is yet another example of the gap between theory and practice.

More recently, this particular incompatibility has been fixed. One can take an RNN and replace neurons handling scalar flows with neurons handling vector flows, and one can consider countably-sized vectors having finite number of non-zero coordinates at any given moment of time, replacing linear combinations of scalar inputs with linear combinations of vector inputs ("generalized attention"). Then one gets a theoretically nice neuromorphic platform for stream-oriented computations which is more robust to noise. It is still not GPU-friendly in its fully generality.

These remarks above are about computational architectures. Much more can be said about the great diversity of possible methods of generating (synthesizing) computations. When our architectures are neuromorphic rather than discrete, we have greater variety of such methods potentially available (not just gradient methods, but, for example, decomposition of connectivity matrices into series, and so on).

Reply11
[-]Noosphere899d53

My take on how recursion theory failed to be relevant for today's AI is that it turned out that what a machine could do if unconstrained basically didn't matter at all, and in particular it basically didn't matter what limits an ideal machine could do, because once we actually impose constraints that force computation to use very limited amounts of resources, we get a non-trivial theory and importantly all of the difficulty of explaining how humans do stuff lies here.

There was too much focus on "could a machine do something at all?" and not enough focus on "what could a machine with severe limitations could do?"

The reason is that in a sense, it is trivial to solve any problem with a machine if I'm allowed zero constraints except that the machine has to exist in a mathematical sense.

A good example of this is the paper on A Universal Hypercomputer, which shows how absurdly powerful computation can be if you are truly unconstrained:

https://arxiv.org/abs/1806.08747

Or davidad's comment on how every optimization problem is trivial under worst-case scenarios:

https://www.lesswrong.com/posts/yTvBSFrXhZfL8vr5a/worst-case-thinking-in-ai-alignment#N3avtTM3ESH4KHmfN

There are other problems, like embeddedness issues, but this turned out to be a core issue, and it turned out that the constants and constraints mattered more than recursion theory thought.

I'll flag here that while it's probably true that a future paradigm will involve more online learning/continual learning, LLMs currently don't do this, and after they're trained their weights/neurons are very much frozen. Indeed, I think this is a central reason why LLMs currently underperform their benchmarks, and is why I don't expect pure LLMs to work out as a paradigm for AGI in practice:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF

Unfortunately, trained neural networks are the things that perform learning and inference online, during deployment. It seems like a bad sign for alignment if we can only understand the behavior of A.G.I. indirectly.

On the argument against simple algorithms that learn generally and have high performance existing, I think a key assumption of LW is that our physics is pretty simple in Kolmogorov complexity (perhaps excluding the constants), and if we buy this, then the true complexity of the world is upper bounded low enough that glass-box learners have real use.

That doesn't mean they have to be findable easily, though.

IMO, the most successful next paradigm will require us to much more strongly focus on the behavioral properties, rather than trying to hope for mechanistic interpretability success, and we will likely have to focus on generalizable statements about the input/output behavior, and mostly ignore the model's structure, since it's probably going to be too messy to work with by default.

This is why I'm more bullish on Vanessa Kosoy's IBP than basically any other theoretical agenda except Natural Abstractions.

Reply
[-]Cole Wyeth8d20

My take on how recursion theory failed to be relevant for today's AI is that it turned out that what a machine could do if unconstrained basically didn't matter at all, and in particular it basically didn't matter what limits an ideal machine could do, because once we actually impose constraints that force computation to use very limited amounts of resources, we get a non-trivial theory and importantly all of the difficulty of explaining how humans do stuff lies here.

That's partially true (computational complexity is now much more active than recursion theory), but it doesn't explain the failure of the Penrose-Lucas argument. 

I'll flag here that while it's probably true that a future paradigm will involve more online learning/continual learning, LLMs currently don't do this, and after they're trained their weights/neurons are very much frozen.

While their weights are frozen and they don't really do continual learning, they do online learning.

 On the argument against simple algorithms that learn generally and have high performance existing, I think a key assumption of LW is that our physics is pretty simple in Kolmogorov complexity (perhaps excluding the constants), and if we buy this, then the true complexity of the world is upper bounded low enough that glass-box learners have real use.

Well, it's not a key assumption of mine :)

I mostly agree with your observation, but I also think it's funny when people say this kind of thing to me: I've been reading lesswrong for like an hour a day for years (maybe, uh, more than an hour if I'm being honest...), and I produce a non-negligible fraction of the rationality/epistemics content. So - the fact that this is sometimes taken as an assumption on lesswrong doesn't seem to move me very much - I wonder if you mean to accept or to question this assumption? Anyway, I don't mean this as a criticism of your raising the point.  

Physics aside, I think that when you take indexical complexity into account, it's not true.

Reply
[-]Noosphere898d20

but it doesn't explain the failure of the Penrose-Lucas argument.

The failure of the Penrose-Lucas argument is that Godel's incompleteness theorem doesn't let you derive the conclusion he derived, because it only implies that you cannot use a computably enumerable set of axioms to make all of mathematics sound and complete, and critically this doesn't mean you cannot automate a subset of mathematics that is relevant.

There's an argument really close to this with the Chinese Room where I pointed out that intuitions from our own reality that include lots of constraints fundamentally fail to transfer over to hypotheticals, and this is a really important example of why arguments around AI need to actually attend to the constraints that are relevant in specific worlds, because without them it's trivial to have strong AI solve any problem:

https://www.lesswrong.com/posts/zxLbepy29tPg8qMnw/refuting-searle-s-wall-putnam-s-rock-and-johnson-s-popcorn#wbBQXmE5aAfHirhZ2

Really, a lot of the issues with the arguments against strong AI made by philosophers is that they have no sense of scale/sense of what mathematical theorems are actually saying, and thus fail to understand what's actually been said, combined with way overextrapolating their intuitions into cases where the intuitions have been deliberately made to fail to work.

While their weights are frozen and they don't really do continual learning, they do online learning.

While I agree In-context learning does give them some form of online learning, which at least partially explains why LLMs succeed (combined with their immense amount of data and muddling through extremely data inefficient algorithms compared to brains, which is a known weakness that could plausibly lead to the death of pure LLM scaling by 2028-2030, though note that doesn't necessarily mean timelines get that much longer), this currently isn't enough to automate lots of jobs away, and fairly critically might not be good enough in practice with realistic compute and data constraints to compete with better continual learning algorithms.

To be clear, this doesn't mean any future paradigm will be more understandable, because they will use more online/continual learning than current LLMs.

Well, it's not a key assumption of mine :)

I mostly agree with your observation, but I also think it's funny when people say this kind of thing to me: I've been reading lesswrong for like an hour a day for years (maybe, uh, more than an hour if I'm being honest...), and I produce a non-negligible fraction of the rationality/epistemics content. So - the fact that this is sometimes taken as an assumption on lesswrong doesn't seem to move me very much - I wonder if you mean to accept or to question this assumption? Anyway, I don't mean this as a criticism of your raising the point.  

Physics aside, I think that when you take indexical complexity into account, it's not true.

I'm mostly in the "pointing out the assumption exists and is used widely", rather than questioning or accepting it, because I wanted to understand how MIRI could believe simple algorithms that learn generally and have high performance existing despite the theorem, and this was the first thing that came to my mind.

And while AIXI is a useful toy model of how intelligence works, it's treatment of indexical complexity/the first person view as being privileged is an area where I seriously start to departure from it, and one of the biggest reasons I'm much more of a fan of IBP than AIXI is that it actually makes a serious effort to actually remove the first-person privilege often seen in many frameworks of intelligence.

And this actually helps us to pretty much entirely defuse the Solomonoff prior's potential malignness, because we no longer have immense probability mass on malignant simulation hypotheses, and the simulation hypotheses that do get considered can't trick the IBP agent into changing it's values.

So the indexical complexity isn't actually relevant here, thankfully (though this doesn't guarantee that the world is truly low complexity in the sense necessary for LW's view of simple, general high performing algorithms to actually work).

And the nice thing is that it no longer requires montonic preferences:

https://www.lesswrong.com/posts/dPmmuaz9szk26BkmD/vanessa-kosoy-s-shortform?commentId=kDaAdjc3YbnuS2ssp

https://www.lesswrong.com/posts/DobZ62XMdiPigii9H/non-monotonic-infra-bayesian-physicalism

Reply
[-]Cole Wyeth8d122

Whether you use AIXI or IBP, a continual learning algorithm must contend with indexical uncertainty, which means it must contend with indexical complexity in some fashion. 

As far as I understand, IBP tries to evaluate hypotheses according to the complexity of the laws of physics, not the bridge transformation (or indexical) information. But that cannot allow it to overcome the fundamental limitations of the first-person perspective faced by an online learner as proved by Shane Legg. That’s a fact about the difficulty of the problem, not a feature (or bug) of the solution. 

I agree that Solomonoff induction faces a potential malign prior problem and i default to believing Vanessa that IBP solves this.


By the way, Scott Aaronson has also made this rebuttal to Searle in his paper on why philosophers should care about computational complexity.

Reply
[-]Eleni Angelou13d30

Is this the post you had in mind? https://www.lesswrong.com/posts/Bi4yt7onyttmroRZb/executable-philosophy-as-a-failed-totalizing-meta-worldview

Reply
[-]Cole Wyeth13d20

Thanks, but no. The post I had in mind was an explanation of a particular person's totalizing meta-worldview, which had to do with evolutionary psychology. I remember recognizing the username - also I have that apparently common form of synesthesia where letters seem to have colors and I vaguely remember the color of it (@lc? @lsusr?) but not what is was. 

Reply
Moderation Log
Curated and popular this week
8Comments

Epistemic status: Though I can't find it now, I remember reading a lesswrong post asking "what is your totalizing worldview?" I think this post gets at my answer; in fact, I initially intended to title it "My totalizing worldview" but decided on a slightly more restricted scope (anyway, I tend to change important aspects of my worldview so frequently it's a little unsettling, so I'm not sure if it can be called totalizing). Still, I think these ideas underlie some of the cruxes behind my meta-theory of rationality sequence AND my model of what is going on with LLMs among other examples.

The idea of a fixed program as the central objects of computation has gradually fallen out of favor. As a result, the word "algorithm" seems to have replaced program as a catch-all term for computations that computers run. When the computation is massive, automatically generated, without guarantees, and illegible to humans, "program" and "algorithm" both have the wrong connotations - I'm not sure I even know a "true name" for such a thing. Circuit is almost right, but doesn't include iteration. Perhaps discrete finite automaton or just  computational process to avoid inappropriate associations? In the context of machine learning, at least, we have the word "model." 

When computer science was born, the subject was deeply entangled with recursion theory, the study of programs (and at the time, an algorithm was just a mathematical abstraction of a program). With an increasing focus on learning, algorithms took the role of learning/constructing models, until in recent years the models even do the learning part in-context, to a limited extent. In this essay, I am mostly interested in investigating the accompanying rise and fall of conceptual frames (or paradigms) for understanding computation, and in asking which were less wrong, and in which cases this was predictable. 

Recursion theory

Back in the 20th century, mathematicians were really interested in the limits of mathematical logic. For instance, Goedel's incompleteness theorems (particularly the first) show that in some sense mathematics cannot be automated. A lot of intelligent people took this and ran with it; if mathematics cannot be automated, computers can't do mathematics, so human creativity must not be computable! Therefore, computers will never be able to do X (for many values of X).

For instance, according to Penrose:

The inescapable conclusion seems to be: Mathematicians are not using a knowably sound calculation procedure in order to ascertain mathematical truth. We deduce that mathematical understanding – the means whereby mathematicians arrive at their conclusions with respect to mathematical truth – cannot be reduced to blind calculation!

This is referred to as a "Penrose-Lucas Argument." See "An Introduction to Universal Artificial Intelligence," section 16.5.4 (pg 423) for some further examples. 

Now computers are getting pretty good at mathematics, and though it is too early to declare Penrose and Lucas empirically wrong, I think the writing is pretty clearly on the wall. In fact, I believe that it has become pretty obvious that this entire 20th century way of looking at things was mistaken (though Roger Penrose apparently still hasn't realized it).

 But this was all based on rigorous (and rather impressively deep) mathematical theorems! How could it be so misleading?

Let's consider the object level first.

The first incompleteness theorem says that there is no recursively enumerable (meaning computably listable) and consistent set of axioms sufficient to prove all true statements about arithmetic.

Mathematicians proposed that part of their job (proposing axioms) therefore could not be automated away. After all, a computer can only produce recursively enumerable axioms by definition!

I would argue that this definition had more to do with the surmountable limits of computers at the time than the fundamental limits of computation. It conceptualized a computer program as a fixed finite set of instructions that ran in a dark room and spat its output on a tape: a machine of pure contemplation. Perhaps that is how computers acted in the 20th century. 

It's obvious that humans aren't like that. We go out into the world and experience things and learn. That informs the axioms that we choose to explore - they are intended to model some of the interesting systems that we encounter. 

Computers can also be hooked up to a continual stream of rich input, and adapt to that input. Indeed, it wasn't long before we attached them to a world much, much larger than their source code (for instance, high resolution sensors or an entire internet of text).

Of course, machines accept input in recursion theory. It's just that recursion theory doesn't really centralize the input, doesn't conceptualize it as massive, messy, and richly structured, and that perspective leaks into the type of results that the logicians and theoretical computer scientists of that time pursued.

At least, that's the best way I've been able to summarize the feeling that I get reading the work of 20th century logicians (admittedly, mostly secondhand through today's recursion theory textbooks). There's some kind of break between our implicit mental models of computation. It's hard to put my finger on exactly where we depart - the first unjustified assumption, the first "wrong" turn.

My initial hypothesis was that they just didn't think of the inputs as big enough, compared to the machines. This is kind of a compelling idea, since initially computers filled a room, and their inputs were a stack of punch cards.

Hypothesis 1: Algorithms were supposed to accept very cleanly structured input, and perform some fairly constrained task, using some equipment that was already inside it to begin with. The purpose of an algorithm was something built into it.

There is some evidence that the logicians visualized machines as operating in a prescribed way on smaller inputs. For instance, streaming algorithms (with infinite input and/or output) do not seem to be as well-studied during that period (even real-valued computation in the style of computable analysis seems to be less developed to this day). In algorithmic information theory we call these monotone machines, while computable analysts call them Type 2 machines (with slightly different intentions but identical definitions). While recursion theorists do consider infinite inputs (for instance in descriptive set theory, the Baire space NN) this usually seems to be in the context of higher Turing degrees that have little to say about real machines (my interest decays exponentially with each Turing jump).

We can extend recursion theory to infinite input/output streams, but a lot of the classical results become inapplicable/irrelevant and once you reorient your perspective enough to have some kind of interesting theory you've wandered in to another discipline at least as distant as algorithmic information theory (and usually even more distant) from where you started.

I don't think this framing of the mistake is exactly right though. One of the earliest important results was the existence of a universal Turing machine, which takes as input another machine's description and an input to simulate that machine on. This clearly breaks down the program/data distinction. The functional perspective of λ-calculus and its implementation in LISP totally obliterate the distinction. It's clear that the idea of an input as containing rich semantics was in the Overton window.

So, their selective blindness wasn't exactly about restricting the size or meaning of the input.

Hypothesis 2: Logicians didn't study learning.[1]

This hypothesis feels right, but in a way it begs the question. What, precisely, is it about machine learning that recursion theory did not capture? In fact, the obvious answer is "the input is large and has semantics," and that is just hypothesis 1.

I don't have a confident answer to this question. I think a big piece of it is just the correctness standard for programs - a teleological shift. Classically, an algorithm is meant to correctly solve a problem. But a learning "algorithm" can be deceived. A learning "algorithm" usually works. There's a pretty good reason for those scare quotes (though I won't stick to them from now on, because it would be exhausting).

Of course, there are useful randomized algorithms that we wouldn't describe as learning algorithms. I think Hypothesis 2 still stands up though; I don't recall reading much serious consideration of randomized algorithms (for learning or otherwise) in the 20th century, and certainly Goedel doesn't seem to have had anything to say about them.

Randomized algorithms grow as they run

I think the advantage of randomized algorithms is that they are almost-non-uniform. A randomized algorithm is really an interpolation of a family of infinitely complicated algorithms: a finite program + an infinite sequence of coin flips. At a fixed input size it (usually) only uses a finite number of coin flips, but that number grows longer and longer on larger inputs, so that the full algorithm description effectively grows with the input. A randomized algorithm usually succeeds because you can't construct adversarial inputs for the entire family at once (or even a constant fraction of the family). 

(Relatedly, I (weakly) hold the controversial inside view that it is quite plausible that BPP is not equal to P - but I am far from an expert, and would not bet that way.)  

I give you the stronger claim:

Hypothesis 2.1: Logicians didn't study algorithms that fail sometimes.   

By the way, I think this distinction has a lot to do with Moravec's paradox. Logic seemed like the most serious intellectual activity to logicians, and logic is concerned with absolute certainty. The hardest aspects of intelligent behavior to compute tend to deal with the much messier real world (particularly sensing and acting) where it seems like occasional failure is pretty much guaranteed for any (computer or biological) agent. I haven't been able to wrap this into an alternative hypothesis though, because I think Moravec's paradox was not a proximal cause.

For whatever reason, it turns out that the entire paradigm of logic (and recursion theory) just isn't a very good description for a machine that observes and adapts. Modern A.I. algorithms draw much more from statistics, probability, and optimization than logic (though arguably even these newer paradigms are becoming outdated to various degrees - it's not clear there is a good candidate for a theoretical replacement).

The real situation has diverged so much from the once-reasonable assumptions of the 20th century logic-based model that, despite making no specific errors, the pure logicians have pretty much just slid into irrelevance - at least, when it comes to the limitations of AI-style computing.

How could this error have been predicted in advance? I'm not exactly sure. I think one would have had to imagine the development of technology not only pushing computers towards the ideal of recursion theory (that is, allowing many things computable in principle to become computable in practice) but also to push computers beyond it, making the assumptions of recursion theory outdated. Or perhaps one should have just looked at humans and asked not "can our current machines do what humans do?" but "if humans were just machines, what type of machine would they be?" Basically, one would have needed to suspect that limitations revealed by recursion theory might be limitations of recursion theory. 

Sufficiently surprising conclusions within a paradigm cast doubt on the paradigm. Surprising conclusions are both the highest success and the death of paradigms. 

Computational learning theory

Recursion theory was buried by LLMs, but it was killed much earlier. I remember a stretch of at least 10 or 20 years (about 2000-2020) when the paradigm of A.I. had clearly switched to machine learning, but before neural nets (and in particular, later, foundation models) had taken their place as the standard approach to nearly all learning problems. During that time, computational learning theory (CLT) was the ruling paradigm, basically porting over ideas from statistics (particularly statistical learning theory) and studying their computability  / computational complexity. I'm not going to say much about CLT, but I will say that it definitely incorporates the idea that learning should usually succeed (see, for example, Valiant's PAC-learning). But it still studies human-interpretable algorithms with probabilistic guarantees. 

The computational learning theory paradigm had barely coalesced when LLMs rose, and (perhaps as result) it has died more quietly, but also less completely... As far as I am aware, CLT has no convincing explanation for why neural networks generalize effectively, let alone the phenomena of LLMs. 

I think that the growing irrelevance of computational learning theory goes a bit deeper than theory temporarily lagging behind practice. Existing CLT is struggling to incorporate a higher-level reappearance of the bitter lesson: with pretraining, even the learner is learned. I think that the real impressive ability of LLMs is their incredibly efficient and flexible learning and inference in-context, which is essentially amortized over the course of pretraining. Though no one seems to put it this way, pretraining is one of the first actually-useful meta-learning algorithms (pretraining -> metalearning, ICL -> learning). Another framing is that LLMs learn offline to learn online, becoming vast before it even starts performing its intended purpose (there doesn't seem to be a well-developed theory for this problem - someone should probably invent it...).

Viewed this way, an LLM is a very messy "algorithm" with no(?) proven (even probabilistic) guarantees. Yes, the scare quotes are back. An LLM really doesn't look much like an algorithm as conceptualized by CLT. It's more like a massive circuit than a Turing machine. Though technically a circuit is computable by a TM, this is not a productive way to think of circuits - they are non-uniform model of computation and they act very differently. For instance, there is a circuit that solves the halting problem at its input size for the simple reason that there is a circuit to solve ANY problem at a fixed input size, with (basically) a massive lookup table. 

This conceptual shift is particularly crisp when it comes to computational complexity. Our theory of computational lower bounds on circuits basically doesn't exist - the subject is notoriously intractable. My personal view (received from my advisor/collaborator Carl Sturtivant) is that there are circuits to do all sorts of particular crazy things compactly, and some of them might work for reasons that depend on mathematics hundreds of years beyond us or resist concise proof entirely. I think computational complexity itself remains important, but perhaps the Turing machine resource model grows less relevant for understanding A.I.

I am somewhat more optimistic about vaguely-CLT-like techniques proving things about how pretraining arrives at neural networks that generalize.[2] I am much less optimistic that we will ever be able to understand how trained neural networks work. Unfortunately, trained neural networks are the things that perform learning and inference online, during deployment. It seems like a bad sign for alignment if we can only understand the behavior of A.G.I. indirectly.

Also, my pessimism about understanding individual trained neural networks does a lot to dampen my hopes for constructing a rigorous theory of deep learning. An inability to understand the space of circuits seems likely to be a barrier to even high-level (say, statistical) attempts at understanding a search process over that space.

AI has always been distinguished from the more rigorous branches of computer science by finding heuristic solutions. Now, we've automated the search for heuristic solutions. I think that both theoretical computer scientists and rationalists are uncomfortable with this (for good reason) and this discomfort is sometimes expressed in the hope that beneath it all there is some rigorous and elegant way to construct an intelligence. I certainly hope so; that would be a beautiful and enlightening thing to behold. But the idea, really, is questionable, and perhaps based on a twice-outdated paradigm of computation. Why should every heuristic be explicable - and if not, why should a heuristic search be more explicable, and not less? To me, it seems that there is no reason that everything true about mathematics or computation should be true for a fundamental, rather than an incidental reason (as an old friend of mine from pure mathematics would say, I don't believe in "Math God"). At the very least, the search for proofs often seems to lag behind the recognition of truth.[3]

Are there simple, well-performing learning algorithms at all?

I wrote a whole post about this.

My current best answer is a little subtle. I think there is no simple, general, high-performance online learner - that is essentially ruled out by Shane Legg's "no elegant universal theory of prediction" impossibility result. 

BUT: 

-Solomonoff induction avoids this barrier by not being an algorithm (it is only l.s.c.) so I suppose that it should be possible to spend enough dollars on inference time compute to force a simple algorithm to perform well in practice. 

-LLMs and other foundation models avoid this problem by becoming actually very complicated algorithms by gorging themselves on a massive amount of training data offline before they ever need to learn online (in context). In hindsight, this is an obvious "flaw" in Shane Legg's argument (or rather, in an easy misapplication of his argument): there is no hard division between algorithm and input, as long as the adversary in Legg's paper doesn't get to see (the pretraining part of) the input. As an existence proof, there is a simple algorithm which interprets the beginning of its input as encoding a learning algorithm and then simulates that learning algorithm; the simulated learning algorithm can be arbitrarily complex and therefore difficult for the adversary to defeat.    

For this and other reasons, I am skeptical that there is an elegant glass box learning algorithm which remains glass box as it learns. But in principle, my model of the world does not rule out fairly simple "core" algorithms that eventually grow into powerful learning algorithms - I suppose that would be silly, since the evolution of life seems like a plausible candidate for an example of just that.

Again, this situation does not bode well for (theoretical) A.I. alignment. 

Bayesian decision theory... as a paradigm of computation?

Bayesian approaches are already incorporated into CLT (e.g. PAC-Bayes), but may take a more central role as a description of black-box systems we are unable to bound computationally. We can simply ask what the optimal performance is on a given decision problem, and assume that as AGI approaches ASI, it will perform in that way. Arguably, some form of Bayesian decision theory is normative, so we expect it to be the endpoint of the offline learning process - we expect Bayesian behavior online. This is, of course, essentially a retreat, abandoning any direct attempt at understanding the offline learning process. Perhaps this is the limiting paradigm of computation: the computer will do what it was optimized to do.

Of course, considering "inner alignment" issues, that might be considered too strident.

Now, because of this retreat in scope to a black-box view, it is not clear to me that Bayesian decision theory succeeds as a complete/totalizing paradigm for the next wave of computation. 

In contrast, some agent foundations researchers want to explicitly build Bayesian-or-so-inspired glass box learning algorithms from the ground up, which is a distinct approach that is not often distinguished explicitly.

These researchers might endorse the more ambitious claim is that (some future version of) Bayesian learning can entirely capture CLT. I think it remains unclear whether the Bayesian approach can improve on the CLT understanding of pretraining. Certainly current CLT struggles to understand generalization of overparameterized models, and it's plausible that a strong inductive bias expressible as a prior is the only explanation. 

Of course, this all strays a bit from paradigms of computation. I don't want to discuss it further here. In a future post, I want to get ahead of things and discuss the blindspots that the Bayesian paradigm, whether it is applied to the entire learning process, or only the resulting AGI/ASI, might enforce. Spoiler: I think that these limitations are somewhat more problematic when it is applied to the entire learning process.

Paradigms are for generating good ideas

One lesson from this history is that paradigms should not always be judged based on the formal correctness of their claims. The role of a paradigm is to help us direct our thinking towards high quality ideas, mostly by pruning unnecessary thinking through simplifying assumptions (which are often hidden). Obviously, this risks constraining creativity when the paradigm's assumptions become inappropriate. However, it can also constraint creativity when the paradigm fails to simplify things, effectively becoming dead weight - for instance, I think this can happen when the Bayesian approach serves as a semantic stop-sign (okay, I want something that can be represented as maximizing some expected utility... but what can't be?).

To combat the risks of using a paradigm, I suggest imagining that it breaks and working backgrounds to its weakest point - certainly NOT attacking the strength of its theorems. Even the axioms, taken one at a time, may not be the weakest point. I have more faith in the unbridled imagination constructing counterexamples as they might appear in the real world, which at their best should suggest the assumptions at fault, and finally the axioms expressing them.  

  1. ^

    He wasn't exactly a logician, but E. Mark Gold studied language learning in the limit in the 1960s, and I think this is an exception in spirit.   

  2. ^

    This may justify the developmental approach to interpretability through singular learning theory. Pretraining may be the last point at which we have a chance to understand the details of what is going on. 

  3. ^

    Reader familiar with agent foundations may guess that constructing a rigorous theory of logical uncertainty should explain heuristic reasoning. But I have never seen a theory of logical uncertainty executed to top benchmarks on a practical problem - and though I think this sort of idea is promising and may yield fruit eventually, it is not clear that a formally derived LI algorithm will defeat loosely inspired heuristic methods on the same sort of problems. So I think this only pushes the question one level higher. 

Mentioned in
54Alignment Proposal: Adversarially Robust Augmentation and Distillation