Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

[Metadata: crossposted from First completed 26 June 2022. I'm likely to not respond to comments promptly.]

A high-level confusion that I have that seems to be on the way towards understanding alignment, is the relationship between values and understanding. This essay gestures at the idea of structure in general (mainly by listing examples).

Why do we want AGI at all?

We want AGI in order to understand stuff that we haven't yet understood.

(This is not a trivial claim. It might be false. It could be that to secure the future of humane existence, something other than understanding is necessary or sufficient; e.g. it's conceivable that solving some large combinatorial problem, akin to playing Go well or designing a protein by raw search with an explicit criterion, would end the acute risk period. But I don't know how to point at such a thing--plans I know how to point at seem to centrally involve understanding that we don't already have.)

Elements and structure

Understanding implies some kind of structure. (This is a trivial claim, or a definition: structure is what a mind is or participates in, when it understands.) Structure is made of elements. "Structure" is the mass noun of, or continuous substance version of, "element". The point of the word "element" is just to abbreviate "any of that pattern-y, structure-y stuff, in a mind or in the world in general".

Elements. An element (of a mind) is anything that combines to constitute the mind, at any level of organization or description.

  • Examples of elements. Any instance within a mind of any of the following categories is an element: features, aspects, properties, parts, components, subagents, pieces, inputs, algorithms, code, processes, concepts, ideas, skills, methods, procedures, values, goals, architecture, modules, thoughts, propositions, beliefs, probabilities, principles, rules, axioms, heuristics, plans, operations, connections, associations, metaphors, abstractions, memories, arguments, reasons, purposes, modes, emotions, tendencies, organs, ingredients, functions, dynamics, structures, data, types, languages, proofs, justifications, motives, images, searches, knowledge, computations, rewards, reinforcement, specifications, information, intuitions, ideologies, protocols, stimuli, responses, domains, gradients, objective functions, optimizers, satisficers, control systems, basins of attraction, tasks, attitudes, stances, dispositions, words, terms, definitions, nexi, drives, perceptions, grammar, criteria, possibilities, combinations, categories, inferences, actions.

  • How elements are. Mental elements overlap, crisscross, lie on spectra, control, use, associate with, repel, and super- or sub-vene each other; are created, grown, modified, duplicated, refactored, deleted; participate together in doing tasks; and generally do basically anything that happens in a mind.

  • "Element" is not really a type. Example: "doing task X" is an element, and then the above point doesn't really make much sense; does "doing task X" participate in doing a task, or what? Example: some elements are supposed to be "ideal" (a priori, objective, real by themselves), such as propositions (distinct from representations/instances/applications of propositions), and then it doesn't make sense to say that the element was created. So statements about elements are duck-typed, so to speak; when applied to specific elements, they may or may not make sense.

  • Cosmos. The cosmos is the totality of all elements, mental or not.

Novelty, creativity

Acquiring elements

A mind's internal process of creating elements is creativity. The result of creativity is novelty: elements that are new to the mind.

Novelty can be encountered or acquired in ways other than creativity. Other minds are a source of novelty, encountered and potentially acquired but not created. Learning is a clear example of acquiring novelty, and most learning is only somewhat creative, being heavily alloyed with copying from another mind. Learning to do something on your own is creativity. Creativity is the "creative edge" or "creative froth" of thought; search, trying things out, program search, combinatorial thinking, tweaking ideas. Evolution and automated proof search are creative non-minds: they creative novel structures, without having the context in which those structures are fully themselves. An example of encountering novelty without acquiring it is if a superintelligent AGI kills you by understanding stuff that you don't understand, or if you see a car with an internal combustion engine go fast without knowing about PV=nRT and gears (even if you've already seen cars before; novelty is perennially novel until it's acquired).

Elements can be acquired by:

  • thinking

  • invention / design

  • discovery

  • learning

  • abduction

  • induction

  • deduction

  • inference

  • metaphor (mapping between elements of elements)

  • emergence (from some dynamical system of elements, e.g. gradient descent, or e.g. the ambient pressures placed on elements by demands to be useful in their context)

  • copying internally (copying code, or more loosely, distillation)

  • copying from another mind (e.g. reading an idea or imitating a behavior, or copying code)

  • "direct impressions" (as in, getting a picture of an object by looking at the object)

  • mutation, trial and error, ratcheting towards solutions by component-wise improvements

  • search; Ariadne's thread, exhaustive search by synthesizing all possibilities and eliminating failures (Wiki)

  • combining elements

Pierce's abduction

Charles Sanders Pierce described three kinds of inference:

  • Abduction: the creation of hypotheses (theories, ideas, concepts).

  • Deduction: making explicit the consequences of hypotheses, e.g. their predictions or logical implications; making explicit the analytic content of concepts; world-building.

  • Induction: selection between hypotheses, e.g. by comparing deductions from those hypotheses against data and eliminating ones that don't fit.

All three of these kinds of inference involve novelty. They are interweaved with each other. For example:

  • Abduction usually involves a pre-existing store of concepts, and those concepts are there because they've been selected to be somewhat deductively and inductively valid.
  • Deduction is guided by language, and that language had to be abducted previously.
  • A large swath of abduction can in some sense in principle be reduced to the idea of universal computation plus inference; see e.g. Solomonoff induction and universal Garrabrant induction.

But overall, abduction is the most creative form of inference: abductive reasoning always involves self-generated novelty, and if all the elements generated by abduction fail to be novel to the reasoner, then it was a failed abduction.

We could add non-linguistic elements to Pierce's scheme of inference:

  • hypothesis, concept ⟶ disposition to act, element

  • truth of proposition, descriptiveness of concept, prediction ⟶ (respectively:) success of disposition to act, usefulness of element, recommendation to act

  • abduction ⟶ creating a disposition to act, creating an element

  • deduction ⟶ excersizing / applying a disposition to act

  • induction ⟶ selecting between elements or dispositions to act, e.g. by comparing outcomes from applying them to guide action

Measuring structuredness

This section lists some theories that sift out the essence of some kinds of structure and compare structure with structure. This isn't trying to be comprehensive or to demarcate anything; it's a collection intended to gesture at what structure is by describing some of the gross contours of the universe of structure. For some coordinates of structure, see this list of directions in the space of concepts.

Examples part 1: Compressibility (prediction, surprise)

Theme: structuredness correlates with locating or being a small target in a large space.

  • Information, optimization power. How much something cuts down a search space. Cf. Eliezer's notion of "lawful creativity" (LessWrong)

  • Compression, retrodiction. How much something encodes stuff compactly.

  • Algorithmic complexity, sophistication. How uncompressible / hard to describe something is (without just being random, in the case of sophistication).

  • Surprise, anti-inductivity. How much something is unpredicted; how much it becomes more unpredicted / unpredictable because it's being predicted.

Examples part 2: Definability (computational strength, quantifier complexity, expressive strength)

Theme: structuredness correlates with being able to describe / point at / compute / subsume many things.

  • Language theory, automata theory. Regular expressions == finite automata; context free grammars == pushdown automata; context sensitive grammar == linear bounded automata. (Wiki)

  • Computational complexity theory, descriptive complexity theory. If you can solve computational problem X, does that imply that you can solve problem Y? Computational reducibility (with some resource constraints), complexity classes. Some complexity classes are equivalent to the expressiveness of certain kinds of logical formulae: (Wiki). E.g. the polynomial hierarchy level / / is equivalent to structures defined by / / formulae in second-order logic.

  • Recursion theory, arithmetical hierarchy. If you are given a solution to (uncomputable) problem X, can you use it to solve problem Y? Turing reducibility, Turing degrees (and other reductions and degrees). Given the -times-iterated Turing jump, its relatively recursive / recursively enumerable / co-recursively enumerable sets correspond to sets definable by / / formulae in first-order arithmetic. (Wiki) The first levels of the Borel hierarchy correspond, by dropping computability, to the arithmetical hierarchy. (Wiki)

[Entering Higher Recursion Theory Zone, which I don't understand so well]

  • Hyperarithmetical hierarchy, Borel sets. Extends the Turing jump to computable ordinals, extending the arithmetical hierarchy up to . (Also, defines the hyperjump of a set X as the sum of all -Turing jumps of X for an X-computable ordinal; hyperarithmetical reducibility, hyperdegrees.) Analogous to the Borel hierarchy, allowing arbitrary countable trees of unions and intersections of open sets. (Wiki) (Wiki) (This maybe corresponds to some infinitary logic; is maybe related to but not isomorphic with Borel sets (Link).)

  • Analytic hierarchy, projective hierarchy. Sets definable by / / formulae of second-order arithmetic. is equal to the hyperarithmetical hierarchy. Corresponds to the projective hierarchy of sets. (Wiki) (Wiki) I don't understand this stuff but there might be computational interpretations of this, e.g. in terms of games (Link) or in terms of what sounds like a kind of "uniform relative -ness" (Libgen) or maybe here (Libgen). See also infinite time Turing machines.

  • Much higher recursion theory. Apparently recursion theory can be generalized somehow to larger ordinals, and this is maybe related to constructible sets somehow, and maybe related to the analytic hierarchy? IDK. (Wiki) (Wiki)

  • Constructibility. Gödel's L. A set theoretic universe constructed in stages consisting of sets that can be defined in reference to the previous stages. Ranks structures in terms of how deep their construction is. (Wiki)

  • Logical interpretability. Roughly, given two theories T and S, is it the case that there's a set of formulas that, given any model of T, define a model of S as a set of tuples of elements of the model (providing interpretations for symbols of S)? (Link)

Examples part 3: Provability (logical strength)

Theme: structuredness correlates with deductively implying many things (while being consistent).

  • Proof theoretic ordinal analysis. Given theory T, how complex is it to perform induction on the structure of proofs in T to show that T-proofs can reduce to elementary proofs with weaker assumptions, and therefore T is consistent if the weaker assumptions are true? For what ordinal is the well-foundedness of necessary or sufficient to show that T is consistent, given only a weak metatheory? (Wiki)

  • Reverse mathematics. Continuous with set theory work, but in weaker systems and concerning more everyday mathematics: Which theorems imply which basic axioms? And from that, which theorems imply other (not obviously related) theorems? (Wiki)

  • Consistency and soundness strength. Which theories prove which other theories consistent or sound? For strong theorems / hypotheses, which axioms are necessary and sufficient to prove them? E.g., (Wiki) and comparisons with strong combinatorial principles.

Remarks on examples of structure

The examples above are somewhat arranged in order of complexity of the structure they describe. Complexity is correlated with "depth", but is not the same; simple things are often "deep", and things that are complex in some sense can be "shallow".

The above list is heavily biased towards things that I'm aware of, things that have some interesting developed theory, and things that fit into hierarchies and uniform comparisons. What other measurements or notions of structure-in-general are there? There are notions of simulation, e.g. "bisimulation", but I'm not aware of very interesting general results there. There's the informal notion of "deep" mathematics, or "deep" insights in general, which have the flavor of retrodiction and the flavor of being generally useful and implying many other things. See Penelope Maddy's work.

It's not necessarily interesting to try specifically to "measure structure", but speaking vaguely, I would like to know how different kinds or dimensions of structure relate. E.g., when someone learns a skill, in what senses are they accessing / using / participating in / creating propositions? (More concretely, what other skills must they be enabling themselves to also learn easily?) Algorithmic complexity theory and computability/definability theory touch on the complexity of "concepts" in some sense, but there's a lot left to ask about; when / how / in what senses does a mind come to understand something, and how can you tell, and what does that imply about what the mind can, can't, will, or won't do?


To interface with a mind, we have to understand what it understands. Understanding is some kind of structure. Minds are made of elements. Structure is elements. Structure that's new to a mind is novelty. Creativity is the process of generating novelty. Structuredness correlates with compression, expression, and impression implication.


Ω 8

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 3:55 PM

Alright, fair warning, this is an out there kind of comment. But I think there's some kind of there there, so I'll make it anyway.

Although I don't have much of anything new to say about it lately, I spent several years really diving into developmental psychology and my take on most of it is that its an attempt to map changes in the order of complexity of the structure thoughts can take on. I view the stages of human psychological development as building up the mental infrastructure to be able to hold up to three levels of fully-formed structure (yes, this is kind of handwavy about what a fully-formed structure is) in your mind simultaneously without effort (i.e. your System 1 can do this). My most recent post exploring this idea in detail is here.

This fact about how humans think and develop seems an important puzzle piece in understanding how, among other things, we address your questions around understanding what other minds understand.

For example, as people move through different phases of psychological development, one of the key skills they gain is better cognitive empathy. I think this comes from being able to hold more complex structures in their mind and thus be able to model other minds more richly. An interesting question I don't know the answer to is if you get more cognitive empathy past the end of where human psychological development seems to stop. LIke, if an AI could hold 4 or 5 levels simultaneously instead of just 3, would they understand more than us, or just be faster. I might compare it to a stack based computer. A 3-register stack is sufficient to run arbitrary computations, but if you've ever used an RPN calculator you know that having 4 or more registers sure makes life easier even if you know you could always do it with just 3.

I don't know that I really have a lot of answers here, but hopefully these are somewhat useful puzzle pieces you can work on fitting together with other things you're looking at.

An interesting question I don't know the answer to is if you get more cognitive empathy past the end of where human psychological development seems to stop.

Why isn't the answer obviously "yes"? What would it look like for this not to be the case? (I'm generally somewhat skeptical of descriptions like "just faster" if the faster is like multiple orders of magnitude and sure seems to result from new ideas rather than just a bigger computer.)

So there's different notions of more here.

There's more in the sense I'm thinking in that it's not clear additional levels of abstraction enable deeper understanding given enough time. If 3 really is all the more levels you need because that's how many it takes to think about any number of levels of depth (again by swapping out levels in your "abstraction registers"), additional levels end up being in the same category.

And then there's more like doing things faster which makes things cheaper. I'm more skeptical of scaling than you are perhaps. I do agree that many things become cheap at scale that are too expensive to do otherwise, and that does produce a real difference.

I'm doubtful in my comment of the former kind of more. The latter type seems quite likely.

I think the claim at the start doesn't nearly cover the reasons we want AGI. As I see it, the main reason we want AGI is that there's a lot of stuff that we can already do, but we want it done faster and more cheaply without sacrificing much flexibility or reliability.

The trouble is that we've picked most of the low-hanging fruit and many of those tasks that remain need something approximating human intelligence. They sometimes also need ability to work with human social and legal contexts, and to be fine-tuned without an (expensive!) army of programmers specifying every rule.

There's some possibility that creating an entity that thinks like a human may tie into our drive to reproduce.

It's also just an extremely interesting problem from a technical point of view.

But sure, structure and understanding do appear to be major factors in intelligence of a human-like nature and it's interesting to try to classify and define such things.

New to LessWrong?