# 11

Epistemic Status: Unfinished deep-dive into the nature of intelligence[1]. I committed to writing down my research path, but three weeks in I don't have a coherent answer to what intelligence is, and I do have a next question I want to dig into instead. Thus, here are the rough and rambly threads on intelligence that I've gathered. This piece is lower polish than I like cause of trade-off on writing-vs-research. Skimming might be more productive than a full read!

## Thread 1: Intelligence as path finding through reality

Intelligence is path finding through world states, where 'path finding' is a poetic term for optimization. Taking a closer look at optimization, it turns out that bad optimizers are still optimizers. Essentially, optimizers do not need to be optimal.

There exist three categories of optimization techniques

1. optimization algorithms (finitely terminating)
2. iterative methods (convergent)
3. heuristics (approximate solutions, but no guarantee)

Genetic algorithms and evolutionary algorithms are optimization heuristics. Thus we can trace our past from the primordial soup through simpler and simpler optimization techniques, and we can project our future to the singularity through the creation of better and better optimization techniques. Humans are a point on this scale of increasingly sophisticated and optimally performing optimization techniques instantiated in reality.

Each of the optimization techniques can in turn be instantiated in three different ways:

• Mechanical
• Computational
• Collective

I made these up -- There must be an existing framework that outlines something like this. Or maybe I'm misunderstanding the concept of optimization or how one can categorize the types of instantiations. Either way, here is what I mean by each:

Mechanical optimization cannot learn. It's a tree growing toward the light or a water wheel generating power.

Computational optimization can learn but cannot be divided. It can compute all computable functions (Turing machine or a human with pen/paper). However, if you break up the cognitive processing parts, no computation will take place.

Collective optimization can be divided. Every unit can implement mechanical or computational optimization in itself, and the units work together emergently or coordinately to a greater result than the individual pieces. For instance, a fungus can be split in two such that both halves will keep growing and functioning as individuals. A flock of birds can be split in two such that both halves will coordinate their flight in the same manner as when they were one. And of course, human societies can be split up in two and both halves will coordinate again in to societies.

### The structure of deep learning mimics the structure of intelligence as path finding through world states

Intelligence = Mapping current world state to target world state (or target direction)

Deep learning = Mapping input layer to output layer

This seems analogous to me, but maybe it's not. My reasoning is that deep learning relies on hidden layers between in- and output layers. Learning consists of setting the right weights between all the neurons in all the layers. This is analogous to my understanding of human intelligence as path finding through reality -- in machine learning, a neural network is finding the function that maps inputs to outputs. In human intelligence, we look for the actions that maps the current state of reality to a desired future state of or direction through reality.

Maybe this is a tautological non-insight.

### Segue on data augmentation

Data augmentation is transforming input data such that the network can learn to recognize more forms of that data and extract different features from it. Is human imagination and "thinking through different ways past events might have gone" a form of data augmentation? We perturb a memory and then project out how we would have felt and what we would have wanted to do. This seems quite similar to using simulation to generate and improve predictions.

## Thread 2: Alignment as preference mapping

My core insight here is 4 reasoning steps followed by an intuitive leap:

Neural Networks can encode any computable function.

Our neural activity is a computable function.

Our utility function is encoded in our neural activity.

Thus a neural network can encode our utility function.

Beep-boop-brrrrrr -- MAGIC LEAP:

An aligned AGI is one that has learned the function that maps our neurally encoded utility function to observable world states.

This seems true to me but maybe is not -- Loose threads indeed.

### Alignment and measurement error of human-in-the-loop

Alignment is human preference profiling performed by an artificial intelligence. In preference profiling, you need to make decisions on what input parameters you will use to predict the output parameter (preferences, in this case for world states instead of products). Input parameters can be behavioral, linguistic, or biological. They can also be directly elicited or indirectly observed. Behavioral and linguistic measures are imprecise because actions are only outputted by humans based on how their own cognitive ability and conflicting drives end up converging into actions. A lot of actions are suboptimal cause humans are not good optimizers. Thus the most reliable signal of the human utility function is either:

• Aggregation over a large enough sample that all the noise is cancelled out
• Direct biological measures of our utility function

However, who says there are no systematic biases and errors in our behavior that do not cancel out over large samples?

And who says that observing our utility function directly won't change it through observation? Our experiences change us, and if our experiences are limited to being measured in a lab room, then this will not represent anything current humans consider to be our utility function.

Notably, HLRF relies on linguistic (and/or behavioral) mappings, so that leads our humans-in-the-loop into the faulty mapping between what we actually want and our words and actions.

### Human Utility Functions are more hyper- than parameter

Hyperparameters are parameters across your parameters. For instance, learning rate is the parameter that controls how much you update the weights in a neural network at each step. Human utility functions seem to have hyperparameters too, which makes conceptualizing and encoded them complicated to say the least. Specifically, humans gain utility directly from various stimuli and observations like eating sweet food or looking at puppies. These would be the parameters of the human utility function. But much of the utility people strive for is not this direct hedonic payoff. Instead, we have many (scarcely known) hyperparameters where the utility we get from our observations comes from the transformation and evaluation of one or many sets of observations. For instance, the satisfaction of a job well-done relies on observing the entire process and then evaluating the end result as good. Similarly, many observations that consist of directly negative stimuli (parameters) are evaluated as positive by some hyperparameter such as the meaningfulness of childbirth or the beautiful release of a funeral.

The evaluation of the aggregates of our observations even change our biochemistry such that hyperparameters influence the parameters of direct experience. For instance, evaluating someone's social cues as them liking you can directly generate feelings of relaxation that are physiologically embodied and thus direct parameters. While the exact same encounter, if evaluated negatively, could cause tension in the body that is also a direct parameter. Thus the exact same stimuli result in completely different reward signal purely based on the settings of the hyperparameters that control how a set of observations is transformed and then evaluated.

Thus it seems conceptually straightforward to map the parameters of our utility function in to something learnable by an AGI, but it's much less clear how we'd map the ever-fickle hyperparameters of our utility function that entirely hinge on our evaluations and transformations we ourselves apply to our experiences ... it's a value we compute internally that would require the AGI to simulate us as full-bodied beings to get the exact same result. This would be undesireable cause such a simulation can thus validly suffer as much as we ourselves do. And thus we don't want to map our utility function directly to an AGI but use some proxy. And the only sensible proxy is then "do as I say, don't do as I seem to want to do", which then boils down to needing corrigibility.

### Collaborative Filtering on Values?

Collaborative filtering finds the latent factors for how to match two types of things together (like humans and movies). Are there latent factors to humans and the values they espouse? If you run Principle Component Analysis on the values, would you get a few limited clusters? This seems easy to google and has probably been done, but probably was also not encoded well and it's hard to see how one would accurately extract value data from people such that the analysis makes sense and has useful results.

## Thread 3: Natural language as data compression

Language is a data compression format that inherently encodes relational properties across abstract entities such that models of reality can be communicated and reasoned about. In contrast, images are sense data, where sense data can be compressed, but inherently is not. Similarly, sense data can encode relational properties across abstract entities, but inherently does not (e.g., a picture of a book with language in it, or a picture of a diagram).

Is it then true that AGI can result from language models but not from image models? The counterargument would be that language models lack grounding in reality. Image models can be grounded in reality cause they consume sense data and thus can be hooked up to cameras. However, we've created systems that allow sense data to be directly translated to language data and language data to be directly translated to actions. Thus, even though an abstract data compression format like language is not inherently grounded in reality, we have given it eyes and hands such that it can sense and act in the real world without directly consuming sense data or outputting motor data.

So actually, AGI can result from image models that read and write, but that's many much more steps than you'd need when using a language model. Thus AGI from language models will exist before AGI from image models or other sense-data-only models.

### What's mentalese?

Human reasoning happens in "mentalese". People's introspection on how they reason is plausibly faulty, but many people have some experience of reasoning in language, imagery, and spatial-relationally. Are these just a side-effect of reasoning, and does it all take place "under the hood" anyway? Could one reason without having any conscious process of reasoning? Presumably, yes. Is that what the zombie-discussion points to? What happens if we input both language and image data in to a big enough neural network? Will reasoning then take place in both? Is there any value in enhancing intelligence with sense data?

### Supervised Learning as the bootstrap of collective intelligence

Self-supervised learning is the default form of learning for individual agents embedded in reality. You make a prediction of what reality will look like, and then time passes and you see if your prediction is true. Or you make a prediction of what reality will look like if you do an action, then you do the action, and see if it is true.

Supervised learning in contrast is a form of collective intelligence. It only works if another intelligence has already learned the mapping and can thus output the labels for you. So supervised learning is how we bootstrap AI and launch it to a much higher entry point than we as biological organism could start with. We've learned to integrate supervised data since we're a collective intelligence that uses language (mostly) for coordination. However, self-supervised learning is the only option for an AGI to learn things we don't know yet.

Feature engineering seems like a form of pre-processing, and thus not a relevant concept for AGI? We'd expect AGI to learn it's own features. Which is what kernels in convolutional neural networks do, for instance.

Overfitting in a neural network is basically memorizing the data set. This lines up with Steve Byrnes' explanation of most all of the brain being essentially memory models, but particular types of memory models. But how does this work exactly?

Are System 1 and System 2 reasoning pretty much a 2-piece ensemble? If so, wouldn't you expect more models? Maybe there are? Maybe we are integrating those lower down? This seems not super relevant.

What's a positive feedback loop called in the alignment problem? In current ML you already need to watch out for positive feedback loops if the output of the network influences the input it will later get. The given example is a collaborative filtering network that matches users to movies, but then mostly the matched movies will be watched and thus rated, and thus matched again, etc. This clearly creates a massive issue with AGI that interacts with humans ... what is this problem called in the existing alignment literature? I had a concept on this called "manipulation threshold" meaning some formalization of how much and what kind of influence an AGI is allowed to have on any human when discussing plans that have not been signed off yet (as the other elements will be subsumed by corrigibility).

1. ^

The deep-dive consisted of running through the Fast.AI course in 2 weeks, generating my own spin-off questions from that, and then conceptually working through the nature of intelligence from scratch on my own, googling little bits and pieces as I went. The main body text will contain as many references as I can recall to link insights together and to sources.

# 11

New Comment

Did you accidentally forget to add this post to your research journal sequence?

Here my quick reactions on many of the points in the post:

1. optimization algorithms (finitely terminating)
2. iterative methods (convergent)

That sounds as if as if they are always finitely terminating or convergent, which they're not. (I don't think you wanted to say they are)

Computational optimization can learn but cannot be divided. It can compute all computable functions (Turing machine or a human with pen/paper). However, if you break up the cognitive processing parts, no computation will take place.

I don't quite understand this. What does the sentence "computational optimization can compute all computable functions" mean? Additionally, in my conception of "computational optimization" (which is admittedly rather vague), learning need not take place.

### The structure of deep learning mimics the structure of intelligence as path finding through world states

I find these analogies and your explanations a bit vague. What makes it hard for me to judge what's behind these analogies:

• You write "Intelligence = Mapping current world state to target world state (or target direction)":
• these two options are conceptually quite different and might influence the meaning of the analogy. If intelligence computes only a "target direction", then this corresponds to a heuristic approach in which locally, the correct direction in action space is chosen. However, if you view intelligence as an actual optimization algorithm, then what's chosen is not only a direction but a whole path.
• Further nitpick: I wouldn't use the verb "to map" here. I think you mean more something like "to transform", especially if you mean the optimization viewpoint.
• You write "Learning consists of setting the right weights between all the neurons in all the layers. This is analogous to my understanding of human intelligence as path-finding through reality"
• Learning is a thing you do once, and then you use the resulting neural network repeatedly. In contrast, if you search for a path, you usually use that path only once.
• The output of a neural network can be a found path itself. That makes the analogy even more difficult to me.

Is human imagination and "thinking through different ways past events might have gone" a form of data augmentation? We perturb a memory and then project out how we would have felt and what we would have wanted to do. This seems quite similar to using simulation to generate and improve predictions.

Off-policy reinforcement learning is built on this idea. One famous example is DQN, which uses experience replay. The paper is still worth reading today; some consider it the start of deep RL.

Our utility function is encoded in our neural activity.

I think the terms "our utility function" and "encoded" are not well-defined enough to be able to outright say whether this is true or not, but under a reasonable interpretation of the terms, it seems correct to me.

An aligned AGI is one that has learned the function that maps our neurally encoded utility function to observable world states.

I do not know what you mean by "mapping a utility function to world states". Is the following a correct paraphrasing of what you mean?

"An aligned AGI is one that tries to steer toward world states such that the neurally encoded utility function, if queried, would say 'these states are rather optimal' "

Thus the most reliable signal of the human utility function is either:

• Aggregation over a large enough sample that all the noise is cancelled out
• Direct biological measures of our utility function

However, who says there are no systematic biases and errors in our behavior that do not cancel out over large samples?

There are indeed biases in our decision-making that mean that the utility function cannot be inferred from our behavior alone, as shown in humans can be assigned any values whatsoever

I also don't think it's feasible to directly measure our utility function. In my own view, our utility function isn't an observable thing. There might be a utility function that gets revealed by running history far into the future and observing on what humans converge on, but I don't think the end result of what we value can be directly measured in our brains.

Specifically, humans gain utility directly from various stimuli and observations like eating sweet food or looking at puppies.

We cannot gain "utility". We can only gain "reward". Utility is a measure of world states, whereas reward is a thing happening in our brains.

Instead, we have many (scarcely known) hyperparameters where the utility we get from our observations comes from the transformation and evaluation of one or many sets of observations. For instance, the satisfaction of a job well-done relies on observing the entire process and then evaluating the end result as good. Similarly, many observations that consist of directly negative stimuli (parameters) are evaluated as positive by some hyperparameter such as the meaningfulness of childbirth or the beautiful release of a funeral.

I don't quite understand the analogy to hyperparameters here. To me, it seems like childbirth's meaning is in itself a reward that, by credit assignment, leads to a positive evaluation of the actions that led to it, even though in the experience the reward was mostly negative. It is indeed interesting figuring out what exactly is going on here (and the shard theory of human values might be an interesting frame for that, see also this interesting post looking at how the same external events can trigger different value updates), but I don't yet see how it connects to hyperparameters.

but it's much less clear how we'd map the ever-fickle hyperparameters of our utility function that entirely hinge on our evaluations and transformations we ourselves apply to our experiences ... it's a value we compute internally that would require the AGI to simulate us as full-bodied beings to get the exact same result.

What if instead of trying to build an AI that tries to decode our brain's utility function, we build the process that created our values in the first place and expose the AI to this process

The counterargument would be that language models lack grounding in reality.

The distinction between vision and language models breaks down with things like vision transformers. But in general, the lack of grounding of pure language models seems a problem to me for reaching AGI with it. But I think a language model that interacts with the world through, e.g., an internet connection, might already get rid of this grounding problem.

Self-supervised learning is the default form of learning for individual agents embedded in reality.

That seems pretty plausible to me for achieving AGI, but many RL agents do not have an explicit self-supervised component.

Feature engineering seems like a form of pre-processing, and thus not a relevant concept for AGI? We'd expect AGI to learn it's own features. Which is what kernels in convolutional neural networks do, for instance.

I mostly agree. But note that feature engineering is just a form of an inductive prior, and it's not possible to get rid of those and let them be "learned" --- there is no free lunch.

Overfitting in a neural network is basically memorizing the data set.

Many models that do not overfit also memorize much of the data set.

Did you accidentally forget to add this post to your research journal sequence?

I thought I added it but apparently hadn't pressed submit. Thank you for pointing that out!

1. optimization algorithms (finitely terminating)
2. iterative methods (convergent)

That sounds as if as if they are always finitely terminating or convergent, which they're not. (I don't think you wanted to say they are)

I was going by the Wikipedia definition:

To solve problems, researchers may use algorithms that terminate in a finite number of steps, or iterative methods that converge to a solution (on some specified class of problems), or heuristics that may provide approximate solutions to some problems (although their iterates need not converge).

I don't quite understand this. What does the sentence "computational optimization can compute all computable functions" mean? Additionally, in my conception of "computational optimization" (which is admittedly rather vague), learning need not take place.

I might have overloaded the phrase "computational" here. My intention was to point out what can be encoded by such a system. Maybe "coding" is a better word? E.g., neural coding. These systems can implement Turing machines so can potentially have the same properties of turing machines.

these two options are conceptually quite different and might influence the meaning of the analogy. If intelligence computes only a "target direction", then this corresponds to a heuristic approach in which locally, the correct direction in action space is chosen. However, if you view intelligence as an actual optimization algorithm, then what's chosen is not only a direction but a whole path.

I'm wondering if our disagreement is conceptual or semantic. Optimizing a direction instead of an entire path is just a difference in time horizon in my model. But maybe this is a different use of the word "optimize"?

You write "Learning consists of setting the right weights between all the neurons in all the layers. This is analogous to my understanding of human intelligence as path-finding through reality"

• Learning is a thing you do once, and then you use the resulting neural network repeatedly. In contrast, if you search for a path, you usually use that path only once.

If I learn the optimal path to work, then I can use that multiple times. I'm not sure I agree with the distinction you are drawing here ... Some problems in life only need to be solved exactly once, but that's the same as any thing you learn only being applicable once. I didn't mean to claim the processes are identical, but that they share an underlying structure. Though indeed, this might an empty intuitive leap with no useful implementation. Or maybe not a good matching at all.

I do not know what you mean by "mapping a utility function to world states". Is the following a correct paraphrasing of what you mean?

"An aligned AGI is one that tries to steer toward world states such that the neurally encoded utility function, if queried, would say 'these states are rather optimal' "

Yes, thank you.

I don't quite understand the analogy to hyperparameters here. To me, it seems like childbirth's meaning is in itself a reward that, by credit assignment, leads to a positive evaluation of the actions that led to it, even though in the experience the reward was mostly negative. It is indeed interesting figuring out what exactly is going on here (and the shard theory of human values might be an interesting frame for that, see also this interesting post looking at how the same external events can trigger different value updates), but I don't yet see how it connects to hyperparameters.

A hyperparameter is a parameter across parameters. So say with childbirth, you have a parameter pain on physical pain which is a direct physical signal, and you have a hyperparameter 'Satisfaction from hard work' that takes 'pain' as input as well as some evaluative cognitive process and outputs reward accordingly. Does that make sense?

What if instead of trying to build an AI that tries to decode our brain's utility function, we build the process that created our values in the first place and expose the AI to this process

Digging in to shard theory is still on my todo list. [bookmarked]

Many models that do not overfit also memorize much of the data set.

Is this on the sweet spot just before overfitting or should I be thinking of something else?

Thank you for you extensive comment! <3

I might have overloaded the phrase "computational" here. My intention was to point out what can be encoded by such a system. Maybe "coding" is a better word? E.g., neural coding. These systems can implement Turing machines so can potentially have the same properties of turing machines.

I see. I think I was confused since, in my mind, there are many Turing machines that simply do not "optimize" anything. They just compute a function.

I'm wondering if our disagreement is conceptual or semantic. Optimizing a direction instead of an entire path is just a difference in time horizon in my model. But maybe this is a different use of the word "optimize"?

I think I wanted to point to a difference in the computational approach of different algorithms that find a path through the universe. If you chain together many locally found heuristics, then you carve out a path through reality over time that may lead to some "desirable outcome". But the computation would be vastly different from another algorithm that thinks about the end result and then makes a whole plan of how to reach this. It's basically the difference between deontology and consequentialism. This post is on similar themes

I'm not at all sure if we disagree about anything here, though.

If I learn the optimal path to work, then I can use that multiple times. I'm not sure I agree with the distinction you are drawing here ... Some problems in life only need to be solved exactly once, but that's the same as any thing you learn only being applicable once.

I would say that if you remember the plan and retrieve it later for repeated use, then you do this by learning and the resulting computation is not planning anymore. Planning is always the thing you do at the moment to find good results now, and learning is the thing you do to be able to use a solution repeatedly.

Part of my opinion also comes from the intuition that planning is the thing that derives its use from the fact that it is applied in complex environments in which learning by heart is often useless. The very reason why planning is useful for intelligent agents is that they cannot simply learn heuristics to navigate the world.

To be fair, it might be that I don't have the same intuitive connection between planning and learning in my head that you do, so if my comments are beside the point, then feel free to ignore :)

A hyperparameter is a parameter across parameters. So say with childbirth, you have a parameter pain on physical pain which is a direct physical signal, and you have a hyperparameter 'Satisfaction from hard work' that takes 'pain' as input as well as some evaluative cognitive process and outputs reward accordingly. Does that make sense?

Conceptually it does, thank you! I wouldn't call these parameters and hyperparameters, though. Low-level and high-level features might be better terms.

Again, I think the shard theory of human values might be an inspiration for these thoughts, as well as this post on AGI motivation which talks about how valence gets "painted" on thoughts in the world model of a brain-like AGI.

Is this on the sweet spot just before overfitting or should I be thinking of something else?

I personally don't have good models for this. Ilya Sutskever mentioned in a podcast that under some models of bayesian updating, learning by heart is optimal and a component of perfect generalization. Also from personal experience, I think that people who generalize very well also often have lots of knowledge, though this may be confounded by other effects.