[-]Viliam3y20

Suppose you meet with a smart, pretty, high-status person but his/her genitals are concealed and his/her other sexually-dimorphic features are ambiguous. Would you want to kiss this sexy person?

Begging the question. It assumes that the person is "pretty" and "sexy", but isn't that the part we were actually curious about? A better question would be: "Do you find people with sexually dimorphic features sexy?" Some people would say "yes", some people would say "no".

[-]lsusr3y20

Good point. I've recently been talking with someone whose native language isn't English and we've been using "pretty" as the imprecise translation of a non-gender-specific adjective. I have removed the word "sexy" entirely.

[-]abramdemski3y20

We now have all the components we need to create a system that robustly pursues abstract goals.

(I note that if this part works out reliably, alignment would essentially be solved.)

[-]lsusr3y20

I would be flattered, had your comment be a compliment. ☺

What I meant is that we have a system with a self-correcting world model which solves the "finger pointing at the Moon" problem. It optimizes the world according to its beliefs about the Moon, even though all we could give it was the finger.

[-]abramdemski3y52

To be clear, I don't necessarily think you're wrong about how bio brains do it. A lot rests on the word "reliably". One possible explanation for sexual fetishes is that the human biological mechanism for pointing at sexual partners is quite unreliable (a hypothesis I predict you agree with).

But if we could get a similar mechanism to work reliably, we'd have a mechanism for pointing learning machines at things in the world.

[-]abramdemski3y20

We haven't programmed the predictive processor to do anything yet but it already has values (namely, to minimize surprise). The Orthogonality Thesis states that an agent can have any combination of intelligence level and final goal. Predictive processors are non-orthogonal. They cannot have any final goal because any goal must, in some sense, minimize free energy (surprise).

I could be wrong, because my understanding of PP is limited, but this feels like a level-confused argument to me. Adjusting neural weights to minimize surprise does not imply the same thing as active planning to minimize future surprise!! For example, generative pretraining for large language models (LLMs) minimizes next-bit prediction error, but does not train behaviors which actively plan ahead to manipulate the environment to minimize future predictive error. (Any such behaviors would be a side-effect due to unexpected inner optimization misgeneralizing the goal; not at all directly selected for.)

On the other hand, fine-tuning of LLMs can use RL, which is to say, can optimize for longer-term sequential strategies which accomplish specific objectives.

So gradient descent can do either thing -- pure prediction (supervised learning from labeled data, eg sequences labeled with continuations) or reinforcement learning (incentivizing long-term planning by assigning credit to many recent actions rather than just the one most recent output).

An earlier post of yours claims PP is equivalent to gradient descent, so I assume PP can also do both of those things, although I don't currently understand how PP does RL. But if so, there should be no obstacle to the orthogonality thesis within PP.

[-]abramdemski3y20

Having now worked through some of the technical details of the equivalence between backprop and predictive coding, I still think my objection is right.

[-]lsusr3y*30

Your comment is a good one. I want to give it the treatment it deserves. There are a few ways I can better explain what I'm attempting to get at in my original post, and I'm not sure what the best approach is.

Instead of addressing you point-by-point, I'm going to try backing up and looking at the bigger picture.

The Orthogonality Thesis

I think that the most important thing you take issue with is my claim that PP violates the orthogonality thesis. Well, I also claim that PP is (in some abstract mathematical sense) equivalent to backpropagation. If PP violates the orthogonality thesis then I should be able to provide an example of how backpropagation violates the orthogonality thesis too.

Consider a backpropagation-based feed-forward neural network. Our FFNN is the standard multilayer perceptron (perhaps with improvements such as an attention mechanism). The FFNN has some input sensors which read data from the real world. These can be cameras, microphones and an Internet connection. The FFNN takes actions on the outside work via its output nodes. Its output nodes are hooked up to robots.

We train our FFNN via backpropagation. We feed in examples of sensory information. The FFNN generates actions. Then we calculate the error between what we want the FFNN to output and what the FFNN actually did output. Then we use the backpropagation algorithm to adjust the internal weights of the FFNN.

Is there anything we can't teach the FFNN via this method?

We can teach it play chess, build cars, take over the world and disassemble stars. But there is one thing we can't teach it to do: We can't teach it to maximize its own error function. It's not just physically impossible. It's a logical contradiction.

Given sufficient data, a FFNN can map any reasonably behaved function from . But it can't necessarily map any function $F : F, A \to B$ , because self-reference imposes a cyclic constrant.

If you build a FFNN in the real world and want it to optimize the real world…well, the FFNN is part of the real world. That causes a self-reference, which constrains the freedom of the orthogonality thesis.

I don't expect this explanation to fully answer all of your objections, but I hope it gets us closer to understanding each other.

Clarifying my original post

You write "I don't currently understand how PP does RL". I'm not claiming that PP does RL. PP can do RL, but that's not important. The biological neural network model in my original post is getting trained simultaneously by two different algorithms with two different optimization targets. The PP algorithm is running at all times and is training the neural network to minimize surprise. The RL algorithm is activated intermittently and trains the neural network to take actions that produce squirts of dopamine.

Adjusting neural weights to minimize surprise does not imply the same thing as active planning to minimize future surprise!!

You're right. The model I've described only does local gradient descent (of surprise, not error). It doesn't do strategic planning (unless it developed emergent complex machinery to do so).

[-]abramdemski3y40

You write "I don't currently understand how PP does RL". I'm not claiming that PP does RL. PP can do RL, but that's not important. The biological neural network model in my original post is getting trained simultaneously by two different algorithms with two different optimization targets. The PP algorithm is running at all times and is training the neural network to minimize surprise. The RL algorithm is activated intermittently and trains the neural network to take actions that produce squirts of dopamine.

To be clear here, I did understand that you posit a dual-system approach in this post, with squirts of dopamine for RL and PP for everything else. However, I didn't really understand why you wanted to posit that, in the context of your other posts, where you mention PP doing the RL part too.

We can teach it play chess, build cars, take over the world and disassemble stars. But there is one thing we can't teach it to do: We can't teach it to maximize its own error function. It's not just physically impossible. It's a logical contradiction.

But is this important/interesting?

Here's my problem. I feel like in general when talking about PP, I end up chasing shadows. First, there's a lot of naive PP discourse out there, where people just talk about "minimizing free energy" like it explains everything, with no apparent understanding of the nuances behind what different types of free energy you can minimize, and in what kind of minimization framework, etc. People claiming that they can explain any psychological phenomena in terms of minimizing predictive error. So you get paragraphs like:

If you connect the neurons in a predictive processor into motor (output) neurons, then the predictive processor will learn to send motor outputs which minimize predictive error i.e minimize surprise.

And then you gen the semi-experts/semi-dilettantes, who have read a few papers on the subject and can't claim to explain everything but recognize the obvious fallacies and believe that there are ways around them.

So then you get paragraphs like:

Wait a minute. Don't people we get bored and seek out novelty? And isn't novelty a form of surprise (which increases free energy)? Yes, but that's because the human brain isn't a pure predictive processor. The brain gets a squirt of dopamine when it exhibits a behavior that evolution wants to reinforce. Dopamine-moderated reinforcement learning alone is enough to elicit non-free-energy-minimizing behaviors (such as gambling) from a predictive processor.

Do you see my problem yet? First you start with a theory (free energy minimization) which can already explain anything and everything, but if you take it really seriously, it does heuristically suggest some predictions over others. And then some of those predictions are wrong; EG it predicts that organisms disproportionately like to hang out in dark, quiet rooms where there's no surprise. So maybe you retreat to the general can-predict-anything version. Or maybe you start patching it, by tacking on some amount of RL. Or maybe you do something else.

This seems to me like a recipe for scientific disaster.

I get this feeling that people must be initially attracted to PP by (a) the promised generality (which actually means it doesn't predict anything very strongly), or (b) the neat math, or (c) some particular clever arguments about how some specific phenomena can be understood as minimization of prediction error, like maybe how humans often seem to confuse 'is' with 'ought'. And then, if they get far enough, they start to see how the naive version can't make sense; but there are so many ways to patch it, and other smart people who seem to believe that things work out...

I haven't examined the pile of evidence that's supposedly in favor of actual PP in the actual brain. I'm missing a ton of context. I just get the feeling from a distance, that it's this intellectual black hole.

[-]lsusr3y30

We can't teach it to maximize its own error function. It's not just physically impossible. It's a logical contradiction.

But is this important/interesting?

Because it implies the existence of a fixed point of epistemic convergence that's robust against wireheading. It solves one of the fundamental questions of AI Alignment, at least in theory.

Do you see my problem yet? First you start with a theory (free energy minimization) which can already explain anything and everything, but if you take it really seriously, it does heuristically suggest some predictions over others. And then some of those predictions are wrong; EG it predicts that organisms disproportionately like to hang out in dark, quiet rooms where there's no surprise. So maybe you retreat to the general can-predict-anything version. Or maybe you start patching it, by tacking on some amount of RL. Or maybe you do something else.

I totally hang out in dark, quiet rooms where there's no surprise.

But more seriously, this is basically how evolution works too. It starts with a simple system and then it patches it. Evolved systems are messy and convoluted.

This seems to me like a recipe for scientific disaster.…I haven't examined the pile of evidence that's supposedly in favor of actual PP in the actual brain. I'm missing a ton of context. I just get the feeling from a distance, that it's this intellectual black hole.

You're right. The problem is even broader than you write. Psychology is a recipe for scientific disaster. Freud was a disaster. The Behaviorists were (less of) a disaster. And those are (to my knowledge) the two most powerful schools in psychiatry.

But I think I'm mostly right about the basics, and the right thing to do under such circumstances is to post my predictions on a public forum. If you think I'm wrong, then you can register your counter-prediction and we can check back in 30 years and we'll see if one of us has been proven right.

[-]abramdemski3y30

But more seriously, this is basically how evolution works too. It starts with a simple system and then it patches it. Evolved systems are messy and convoluted.

I don't deny this. My fear isn't a general fear that any time we conclude there's a base system with some patches, we're wrong. Rather, I have a fear of using these patches to excuse a bad theory, like epicycle theory vs Newton. The specific worry is more like why do people start buying this in the first place? I've never seen concrete evidence that it helps people understand things?? And when people check the math in Friston papers, it seems to be a Swiss Cheese of errors???

If you think I'm wrong, then you can register your counter-prediction and we can check back in 30 years and we'll see if one of us has been proven right.

To state the obvious, this feedback loop is too slow, but obviously that's compatible with your point here.

Still, I hope we can find predictions that can be tested faster.

Or even moreso, I hope that we can spell out reasons for believing things which help us find double-cruxes which we can settle through simple discussion.

Treating "PP" as a monolithic ideology probably greatly exaggerates the seeming disagreement. I don't have any dispute with a lot of the concrete PP methodology. For example, the predictive coding = gradient descent paper commits no sins by my lights. I haven't understood the math in enough detail to believe the biological implications yet (I feel, uneasily, like there might be a catch somewhere which makes it still not too biologically plausible). But at base, it's a result showing that a specific variational method is in-some-sense equivalent to gradient descent.

(As long as we're in the realm of "some specific variational method" instead of blurring everything together into "free energy minimization", I'm relatively happier.)

[-]lsusr3y30

If you want to get into that level of technical granularity then there are major things that need to change before applying the PP methodology in the paper to real biological neurons. Two of the big ones are brainwave oscillations and existing in the flow of time.

Mostly what I find interesting is the theory that the bulk of animal brain processing goes into creating a real-time internal simulation of the world, that this is mathematically plausible via forward-propagating signals, and that error and entropy are fused together.

When I say "free energy minimization" I mean the idea that error and surprise are fused together (possibly with an entropy minimizer thrown in).

[-]abramdemski3y20

Because it implies the existence of a fixed point of epistemic convergence that's robust against wireheading. It solves one of the fundamental questions of AI Alignment, at least in theory.

Your claim is a variant of, like, "you can't seek to minimize your own utility function". Like, sure, yeah...

I expected that the historical record would show that carefully spelled-out versions of the orthogonality thesis would claim something like "preferences can vary almost independently of intelligence" (for reasons such as, an agent can prefer to behave unintelligently; if it successfully does so, it scarcely seems fair to call it highly intelligent, at least in so far as definitions of intelligence were supposed to be behavioral).

I was wrong; it appears that historical definitions of the orthogonality thesis do make the strong claim that goals can vary independently of intellect.

So yeah, I think there are some exceptions to the strongest form of the orthogonality thesis (at least, depending on definitions of intelligence).

OTOH, the claims that no agent can seek to maximize its own learning-theoretic loss, or minimize its own utility-theoretic preferences, don't really speak against Orthogonality. Since they're intelligence-independent constraints.

But you were talking about wireheading.

How does agents cannot seek to maximize their own learning-theoretic loss take a bite out of wireheading? It seems entirely compatible with wireheading.

[-]lsusr3y20

I appreciate your epistemic honesty regarding the historical record.

As for the theory of wireheading, I think it's drifting away from the original topic of my post here. I created a new post Self-Reference Breaks the Orthogonality Thesis which I think provides a cleaner version of what I'm trying to say, without the biological spandrels. If you want to continue this discussion, I think it'd be better to do so there.

LESSWRONG
LW

LESSWRONG
LW

13

Beyond Reinforcement Learning: Predictive Processing and Checksums

13

13

The Orthogonality Thesis

Clarifying my original post

Beyond Reinforcement Learning

Strategizers

Abstract Concepts

Credits