Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Tl;dr

In an earlier post, I introduced a metaphor for thinking about the epistemic landscape of AI alignment, and then described three epistemic strategies for making progress on the alignment problem. 

In this post, I will double-click on the third strategy: learning from intelligence-in-the-wild. In particular, I will explore in more detail a core assumption this epistemic bet is based on—namely, that intelligent behaviour, as exhibited by different types of systems, both natural and artificial, share underlying principles which we can study and exploit. 

An epistemic bet: Learning from Intelligence-in-the-Wild

Earlier, I defined the epistemic strategy of learning from intelligence-in-the-wild as follows: 

Finally, the third approach attempts to chart out the possibility space of intelligent behaviour by looking at how intelligent behaviour manifests in existing natural systems ([3] in Fig. 1). 

Instead of calling it a strategy, we could also call it an epistemic bet. This framing is helpful in emphasising the fact that in research, we are in the business of making bets. We cannot be entirely confident a given approach will bear fruit before we try it. But we can be more or less confident it will, and thus make informed bets. Furthermore, the term "bet" makes appeals to epistemic pluralism more intuitive (by reference to the reasons for why diversifying, say, one’s investment portfolio is a good idea). 

In the case of this specific bet, the hope is that by studying intelligent behaviour as it manifests in existing biological or social systems, and by recovering principles that govern complex systems across various scales and modalities of implementation, we can gain substantial insights into how to design intelligent behaviour (with certain desired properties) in artificial systems.

Premise: Intelligent behaviour across systems, scales and substrates

What reasons do we have to expect this strategy will be fruitful?

The promise of this epistemic bet is, in part, premised on treating intelligent behaviour[1] as a “real”, naturally occurring phenomenon, rather than, say, some theoretical or linguistic construct. Furthermore, intelligent behaviour - as we can observe it in different types of systems - shares underlying principles of functioning. As such, we can observe and investigate the phenomenon of intelligent behaviour in the real world and gain a substantive (i.e. predictive and explanatory) knoweldge of it that generalizes beyond specific substrates or modalities of implementation. 

To summarise:

"Learning from intelligence-in-the-wild" -  core assumptions: 

  1. Epistemic access: We can improve our understanding of the nature and function of intelligent behaviour by looking at currently-existing systems that exhibit intelligent behaviour 
     
  2. Substantive understanding: Intelligent behaviour is governed by principles that apply (at least in part) across modalities and scales of implementation and that we can investigate scientifically.   

 

 

 

 

Understanding intelligence: implementation-neutral principles and modalities of implementation-specificity 

In stipulating the existence of cross-system principles that govern intelligence behaviour, I do not mean to preclude that there also exist important differences between modalities and scales, i.e. aspects of intelligent behaviour that are implementation-specific/sensitive. 

The idea is to study intelligent behaviour (and related phenomena) as a (mostly) cross-system phenomenon, and one that is (mostly) substrate-neutral. But in order to draw useful insights from this approach, “mostly” is enough; it is not necessary to assume perfect parallels between different systems. 

We also learn about the functioning of intelligent behaviour in artificial systems by seeing where and how its analogies with natural systems break. As such, we are interested not only in phenomena that are governed by substrate-independent principles, but also in the modalities of substrate-dependence as such. In other words, the degree to which intelligent behaviour is governed by the same set of principles across different scales and modalities of implementation is at the core of the question we are trying to answer.

"Paradigm" and "marginal" cases 

Another bit of conceptual toolage that is useful here is the difference between "paradigm" and "marginal" cases. 

I am taking this from Peter Godfrey-Smith, who uses it to distinguish between "more paradigmatic" vs "more marginal" cases of Darwinian evolution in his book "Darwinian populations and natural selection”. This bit of language allows us to combine a gradualist view of reality with nontheless precise theories about the nature and functioning of (in this case) Darwinian evolution. 

The "intelligence-in-the-wild" approach treats intelligence in similar ways as Godfrey-Smith treats Darwinian evolution. As we learn about how intelligent behaviour manifests in different natural systems, we learn about which principles of intelligent behaviour are paradigmatic, and how implementation-specific deviations from the paradigm make for more marginal cases of the phenomena of interest. 

This way of thinking about intelligent behaviour - a view that allows for paradigm and marginal cases - is also a view that is interested in intelligent behaviour as it manifests in real, physically-instantiated systems, as opposed to idealized ones. Importantly, whenever a system or process is implemented in the physical world - once it becomes thus realised - it ceases to be an idealized system.

Examples of this premise "in action"

This has been pretty theoretical so far. Let's add some concreteness to what research under this approach might look like. 

A rough sketch of cross-system, analogy-based reasoning about intelligent behaviour

Understanding intelligence as “real” in the sense outlined above implies that it can be insightful to compare and contrast the way intelligent behaviour manifests across different systems. Below, I will give a "quick-and-dirty" example of how we can think about intelligence as a cross-system phenomenon. 

To do so, let’s consider how learning behaviour[2] is implemented in different types of systems. (NB this example doesn't mean to imply [intelligence = learning]. However, since learning seems to be a relevant aspect of intelligent behaviour, it serves as a valid illustration of what I have in mind by "cross-system" reasoning.) 

As a first draft, we might end up with something like the table below:[3] 

System“Goal”Learning mechanismDrivers
RL agents RewardBackpropagation Policy gradient
“Darwinian Population”[4]Inclusive fitnessEvolution by natural selectionGenetic variation
OrganismSurvivalHomeostatic loopsInformation transmitters (e.g., neurotransmitters)
Markets Efficient allocationPrice mechanisms(Technological/economic) innovation
Scientific progress[5]Explanatory powerFalsificationProposition of new theories, development of new arguments, and collection of new observations 


 

 

 

 

 

 

 

 

 

 

 

The table suggests analogies between these different systems and how they implement learning. These analogies can be exploited for insight in different ways: 

  • They might simply act as an “inspirational crutch” by letting us come up with new questions to ask about our target system or form new hypotheses we can subsequently test. (For example, looking at the above table, we might ask ourselves whether the scientific process should or not be understood as a goal-directed, “consequentialist” process, and how we might come to answer that question.)
  • They can be used to exploit a “knowledge gradient”—the idea being that if we have a more gears-level understanding of the functioning of system A (compared to system B), and if the analogy between A and B holds to a sufficient extent, then our knowledge of A may be helpful in making progress on understanding B; it may suggest something about the “shape” of the missing knowledge in B, or about where to look to find that knowledge, or even provide concrete insight into the mechanisms of B. (For example, insofar as one believes that there exist parallels in the functionning of biological and cultural evlution, our understanding of the intricacies of biological evolution can usefully inform "what to look for" when trying to improve (or critique) our understanding of the functioning of cultural evolution).
  • When it comes not only to understanding but also to designing artefacts with specific functions, we can get ideas for ways to implement those functions by looking at how various other systems do it. And similarly, we can look at the ways other systems fail to reliable produce the desired function, to inform what failure scenarios to expect in our designed systems. (For example, insofar as the analogy between markets and ML systems holds, why do markets (currently) not fall (entirely) prey to Goodhart’s law?[6] Additionally, can we take inspiration from economic policies that aim to correct for negative market outcomes (e.g., monopolisation, rent-seeking, etc.) to find ways to make ML systems more robust?)

Caveat: How to, and how not to, use analogies?

A note of caution is appropriate: when trying to learn from analogies between different types of systems, it is important to be aware of the epistemic challenges and potential pitfalls that come with this approach. The details of how an analogy is established often matter tremendously with regard to its validity. Any analogy holds only until it ceases to. Actual cognitive work and epistemological vigilance is required—as with any research approach.[7] While an interesting-looking analogy can provide us with valuable inspiration, this can always only represent the start of any serious investigation. Sooner or later, intuitions have to be grounded in empiricism or formalisation.

On the upside, just as much as we might hope to learn from analogies, we can also hope to learn from disanalogies (if not more!). Valuable insights can be gained from noticing the ways in which analogies break and from exploring the “boundaries” of their validity/applicability. 

But can we say something more gears-y about valid or invalid, useful or useless, cases of analogy drawing? I think we can; and I have found Herbert Simon's discussion in The Sciences of the Artificial particularly insightful in this respect. 

The inner system [of an artefact] is an organization of natural phenomena capable of attaining the goals in some range of environments, but ordinarily, there will be many functionally equivalent natural systems capable of doing this. [...] The outer environment determines the conditions of goal attainment. If the inner system is properly designed, it will be adapted to the outer environment, so that its behaviour will be determined in large part by the behaviour of the latter.” (p. 11-12) 

In short, Simon is arguing that many different “inner environments“ can exhibit “functional equivalence” in their relationship to the “outer environment”. Simon goes on: 

“We are seldom interested in explaining or predicting phenomena in all their particularity; we are usually interested only in a few properties abstracted from the complex reality. [...] We do not have to know, or guess at, all the internal structure of the systems but only that part of it that is crucial to the abstraction. [...] Resemblance in behaviour of systems without identity of the inner systems is particularly feasible if the aspects in which we are interested in only arise out of the organization of the parts, independently of all but a few properties of the individual components.” (p. 15-17). 

Here, Simon discusses which abstractions (away from implementation details) can be drawn without losing information relevant to the function or behaviour in question. 

Putting these pieces together, we can say that an analogy is exploitable to the extent that :

  1. the proposed analogy is based on a functional equivalence, and 
  2. relevant abstractions (away from implementation details and towards organizational/structural properties of the artefacts) can successfully be drawn

When such abstraction is possible, we can look at a class of behaviours and ignore a large part of their particularities while still being able to predict or understand said class of behaviours. Think about electricity: 

Electric energy that entered my house from the early atomic generating station at Shippingport did not "simulate" energy generated by means of a coal plant or a windmill. Maxwell's equations hold for both. The more willing we are to abstract from the detail of a set of phenomena, the easier it becomes to simulate the phenomena. Moreover, we do not have to know, or guess at, all the internal structure of the system but only that part of it that is crucial to the abstraction. (p. 16)

According to the view I am trying to defend here, intelligent behaviour - like electricity - is such a robust abstraction. And the fact that it is abstractable in this way makes the sort of cross-systems thinking that is fundamental to the Intelligence-in-the-wild research bet epistemically permissible and promising. 

Cross-system research on intelligent behaviour in the literature: The Levin Lab

An excellent example of how our perspective on the "intelligent behaviour" phenomenon cashes out in concrete research endeavours is the body of work pursued by Michael Levin and his lab

I am mentioning his work here to show that this research program is not restricted to mere philosophizing, but ought to grow out of an engaged dialogue between empiricism, engineering and theory. Levin's work is most impressive in that regard.

The following extract captures the high-level framing and motivations for their research and shows their commitment to a substrate-independent understanding of the intelligence-phenomenon/a. 

“The fields of basal cognition, Buddhist philosophy, computer science, and cognitive science are all concerned with fundamental questions around intelligence. What is unique about certain configurations of matter that enable them to exhibit intelligent behavior? How do the kinds and degrees of intelligence differ across beings? What processes drive the expansion of intelligence on evolutionary time scales, and what causes changes in the intelligence of a being during its lifespan? How can we understand intelligence in a way that would enable us to create novel instances, as well as improve our own intelligence for life-positive outcomes for all? 

And from the same paper: 

One way to think about a general, substrate-independent definition of “Intelligence” is centered on goal-directed activity [11, 12]: what is common to all intelligent systems, regardless of their composition or origin, is the ability to display a degree of competency in reaching a goal (in some problem space) despite changing circumstances and novel perturbations. [...] Evolution enables the scaling of intelligence by exploiting biophysical mechanisms that enable progressively larger goal states (and thus progressively more complex causes of stress) to be represented and pursued.

[...] The emphasis on functional problem-solving, learning, and creative responses to challenges enables a focus on the central invariant of intelligence, not contingent facts and frozen accidents of the evolutionary journey of life on Earth. Given that intelligent behavior does not require traditional brains [8, 10], and can take place in many spaces besides the familiar 3D space of motile behavior (e.g., physiological, metabolic, anatomical, and other kinds of problem spaces), how can we develop rigorous formalisms for recognizing, designing, and relating to truly diverse intelligences? 

(From “Biology, Buddhism, and AI: Care as the Driver of Intelligence”, p. 3-4)

The group does far more than just philosophize about intelligence, however. Their work is validated by concrete and tangible progress in synthetic biology, biorobotics and biomedicine. While it is outside of the scope of this post to give a comprehensive summary of their research program, I will drop a handful of examples here, and - hopefully - that will generate interest for people to explore Levin's work further. 

At a high level, Levin is intersted in "how evolution uses multiscale competency architecture to evolve bodies [...] that solve a wide class of problems, including novel ones". The group does research on the dynamics of information processing in biological structures, understanding information processing temporally (e.g. memory, learning) as well as spatially (e.g. bioelectric control of tissue/organism morphology), and as implementable both neurologically and somatically. 

Some concrete snaphots:

Most impressive, maybe, is how Levin's thinking and work span empirical science, engineering, bold theorizing and even visionary ideas about future forms of life and intelligent agents (see e.g. his closing words of this talk). His 'Technological Approach to Mind Everywhere' (Levin, 2022) framework "seeks to establish a way to recognize, study, and compare truly diverse intelligences in the space of possible agents". Citing further: "The goal of this project is to identify deep invariants between cognitive systems of very different types of agents, and abstract away from inessential features such as composition or origin,
which were sufficient heuristics with which to recognize agency in prior decades but will surely be insufficient in the future."

(Throughout this section, I mostly liked to popular outlets of this research. See here for an overview of the underlying academic publications.)

Summary

The epistemic bet of learning from intelligence-in-the-wild is premised on an assumption that there exist cross-system (-sclae/-substrate) principles of intelligent behaviour, which we can learn about by looking at currently-existing systems that exhibit intelligent behaviour and by finding.  

AcknowledgementsI thank TJ/particlemania for discussing and improving many of the ideas in this post; I also thank Adam Shimim, Igor Krawczuk, Eddie Jean, and Justis for useful critiques and comments on earlier drafts. 

 

  1. ^

    When trying to understand how intelligent behaviour functions, it is important to have in mind a working definition of intelligence. Arguing for such a working definition is not the aim of the present post. that said, good starting points for thinking about this are,  for example: Chollet (2019) "On the Measure of Intelligence", or Legg, Hutter (2005) "A Collection of Definitions of Intelligence".

  2. ^

    Other phenomena or behaviours (other than "learning behaviour") we might be interested in include things like: agency, adaptation, corrigibility, modularity, (the emergence of) mesa-optimisation, (the emergence of) pathological behaviours, robustness, self-improvement, (the emergence and the dynamics of) uni- vs multi-agency, etc.

    Note: I am not saying "learning = intelligence". 

  3. ^

    To clarify this point upfront: I am not bullish on this table being “the correct” representation of the analogy it is trying to evoke, or that the analogy is entirely valid in the first place. With respect to the former, exploring (and disagreeing about) different ways to draw the analogy is exactly what this entire research exercise is about; and with regard to the latter, for now, let me just say that analogies don’t need to be identities in order to be interesting or productiveIn the context of this post, I am only using it for illustrative purposes, not to make particular object-level claims. 

  4. ^

    Term coined by philosopher of biology Peter Godfrey-Smith to describe the entity which Darwinian evolution acts upon. “The central concept used is that of a ‘Darwinian population’, a collection of things with the capacity to undergo change by natural selection.” (Godfrey-Smith, Peter. Darwinian populations and natural selection. Oxford University Press, 2009.)

  5. ^

    I am going here for a Popperian view on scientific progress. While I think this is a reasonable (and interesting) choice, there exist different (plausible) proposals aiming to explain the logic of scientific progress. I won’t further justify my choice, as the goal of this post is not to argue for which theory of scientific progress is correct, and I am invoking it only for illustrative purposes. 

  6. ^

    The claim that markets don't currently fall entirely prey to Goodhart's law might not seem obvious to some readers. A reason for thinking this is that current markets do, in fact, provide real value to people. (That does not preclude that they also, in many places and ways, fail to provide value to humans where they should, e.g. marketing (one may claim).) In "Ascended Economy?", Scott Alexander explores a hypothetical scenario that I would characterise as one where markest fall completely prey to Goodhart's law; it's a scenario where feedback loops that increase profit get fully decoupled from feedback loops that promote human values/increase human welfare. For example, imagine AI-governed firm A - A maninfactures and sells machines to mine iron -, and AI-governed firm B - B mines and sells iron. A buys iron from B in order to manufactures more iron-mining machines; and B buys iron-mining machines in order to mine and sell more iron. Together, they form a fully-decoupled economic "bubble" that does really well in terms of increasing profits, but has  ceased to (be incentived to) produce anything of value for human beings. This A-B economic "bubble" doesn't need humans, or need to pay attention to human concerns or interests in order to increase its profits - the metric the firms are optimizing for. It faces, what one might describe as a "value grounding problem". Insofar as current markets do still find some grounding in human values/wellbeing, those markets are "not fully falling prey to Goodhart's law".

  7. ^

    Although more or less vigilance is needed depending on the shape of the problem. 

11

Ω 8

New Comment

New to LessWrong?