Ah man, I've been missing these sorts of posts from you, very happy to read this, super cool as usual.
These are some questions that arise for me, maybe not the most well out but hopefully somewhat interesting:
Do you think this is a good argument for multi-scale modularity in biology? Also thoughts on multi-agent models of mind with this model in the background? Finally any thoughts on applying this to whether we will have a group of AI systems doing RSI or a singular one?
I suppose this argument would be dependent on the condition when systems with convex utility functions show up, do you have any more detailed thoughts on when we can expect convex utility trade offs to show up?
Is the answer something like when there are benefits to specialization? I guess there also has to be the aspect of trade already present in the system? Do you have thoughts on the assumptions you have to make and have true before the convergence arises or do you think this is a general property of any learning system?
Do you think this is a good argument for multi-scale modularity in biology?
I think the multi-scale aspect is mostly orthogonal to this argument, but yeah the argument could be applied at multiple scales.
Also thoughts on multi-agent models of mind with this model in the background? Finally any thoughts on applying this to whether we will have a group of AI systems doing RSI or a singular one?
Typically, the load-bearing aspect of multi-agent models is that the multiple agents have different utility functions. This argument is compatible with multiple utility functions in principle, but that's not really what it's about; it would apply arguably-more-cleanly in systems with just one optimization objective (but multiple subgoals, e.g. multiple resources like apples and bananas).
I guess there also has to be the aspect of trade already present in the system?
Nope.
(The other questions I don't yet have much to say about.)
Right so I was trying to make sense of this no trade thing because I thought it would make it so that the individual subcomponents would die out if they didn't have access to trade but that isn't the case if they're subordinated to the larger system. Also similar to the multi-agent thing it's not about multi-scale modularity but rather optimisation within a specific optimisation system or whatever you would like to call it. (random quick picture):


Part of what makes a pencil a good object is that all its parts share approximately the same rotational velocity - i.e. it's a rigid body object. Part of what makes a squirrel a good object is that its parts share approximately the same genome. Part of what makes the water in a cup a good object is that its parts share approximately the same chemical composition - i.e. it mixes quickly.
Pencil and water: agree (though for water, much of what I'd point to is "how it will interact with other stuff").
Squirrel: no! Unless you are doing biology on the squirrel, I don't think the genome is mostly what matters.
I think the two big things are "rigid body" and "this rigid body is controlled by an agenty-thing" (read: "this is the body of an animal"). The thing I see is that squirrels, cats, and birds all do things like scan the environment for food, move their muscles to get the food, regulate their internal temperature, turn the food into energy, etc.
If it turned out that squirrels were some weird symbiote, like if the legs were actually a different species, then for the most part this only matters in so far as these 'subparts' affect the squirrel. Like, they 'factor through' what the squirrel does. I think this is pretty general, even beyond agenty things like animals.
I buy the general point, though.
Important example: It seems like much of how I split up the 'subparts' of objects around me is by the material they are made out of - for example, a soft chair cushion has different material properties from the wooden body.
When I look at objects, the different colors of these materials seem to be doing a lot of the work.
Next to me is a chair, whose cushion has a multicolored pattern on it, while the body is solid green and the headrest is grey. The whole thing is sitting on a metal base, with a metal rod connecting it to the base.
This is also how I split the chair up. I also split up the mosaic pattern of the cushion into parts like "this blue circle" or "that smaller white circle", but not "this undistinguished patch of the blue circle".
If I imagine the body and the headrest having the mosaic pattern of the cushion, I feel like I get a little confused about where the parts start and end in my imaginary chair.
The only part that doesn't fit is the way I distinguish the rod from the base - it seems like shape, or maybe "adjacency" (the rod separates the chair from the base) are doing the rest of the work? But it's pretty promising that color alone gets you most of the way there.
I had exactly the same thought. I didn't write the same comment, though, because:
1) He did say the genome is part of what makes a squirrel a good object, and
2) The genome is a very natural dividing line between "part of this squirrel" and "not part of this squirrel," and natural dividing lines are useful!
A "good object" is in the eye of the beholder - indeed, you spend most of your comment beholding and talking about how you and your brain chunk up the world around you, and that all makes sense. But "in the perception of a particular human" is not the only way to judge what a good object is.
I submit that "same genome" often coincides with the natural object boundary, but isn't usually a good criterion for the boundary. The common genome is not a significant part of what makes a squirrel a good object.
I think the thing we usually care about is something more like "it acts like a single agent" or "the parts are approximately aligned towards the same goal". (Notice these criteria suggest that when an organization acts in concert, we care more about the boundaries of the organization than the boundaries of the individuals within it; I think I endorse that implication.)
Interestingly, according to Wikipedia there doesn't seem to be an accepted scientific definition of "organism".
Well, I was taking for granted:
1: The natural abstraction thesis that a substantial part of what makes "good objects" don't depend on the beholder (or at least a large class of beholders will agree on a large class of "good object")
2: If something feels like a relatively basic part of my perception, then it hints at fundamental evolved algorithms, that are then likely to point at "good objects". As an example, your eye cells have built in edge detectors that work a lot like the image processing algorithms humanity invented. I expect many nonhuman animals's brains to categorize animals and edges similarly (e.g. there are center surround retinal ganglion cells in mice).
A central (get it?) example for me is a math thing: the definition of the boundary/interior/exterior of a topological set. I have some intuitions, built from my life of interacting with the world and from evolution, about what "boundary" should mean. Given some prompting about open sets, I was able to basically come up with the 'official' definition. I suspect that if/when alien civilizations exist, that they'll have a very similar concept.
3: I agree there's an important sense in which the genome matters. But for example: imagine a world where (non-human) animals were symbiotes, that only join/separate rarely, usually out of sight, and that have complicated genetic inheritance patterns; but otherwise usually looked and acted like our world's animals.
I predict that in that world, humans and other animals would partition life into objects in a very similar way. Later, humans would realize the symbiote thing, and would perhaps add words for it; but it would have to be something kids were taught by their parents, or only learned when they were helping around the farm at 10 years old.
but with concave frontiers/utilities the agents' prices tend to diverge
Typo: "concave" → "convex"
Edit: This started as a focused reflection on John's post here, but then morphed into thoughts on AI. I split the two sections just to impose a bit of structure on this comment. But this comment should be interpreted more as me getting my own thoughts together than as a confident argument, and some of it is highly contestable or may just not be all that relevant or interesting.
On Natural Ontologies
If many methods for inferring structure in high-dimensional data identify similar structures, then we may come to treat those structures as more meaningful, or as "good objects." Incentives lead us to run this process over specific subsets of the space of available data and methods, producing the set of ontologies we empirically observe. We can see a practical domain, such as a research field, job or cultural practice, as a particular way of limiting the available data and methods in order to produce a distinct distribution of discoverable ontologies. Incentives lead to the existence of the practical domains and the way they explore their distintive ontology distributions.
Your concept of a natural ontology suggests that if we consider the space of ontology distributions, these distributions themselves have a common factor, which is that the discovered high-dimensional ontologies can be well explained through a set of latent factors of much lower dimension. In other words, humans engaged in a specific activity, be it hunting, scientific research, or painting, seem to converge on a small region of ontology-space that they use to model that activity.
Confirmation bias and incentives are alternative explanations for ontological convergence. We reify a novel object only if the method that generated the ontology containing that object also generated known objects, producing a structural risk of confirmation bias. I work in single-cell epigenetics, and if a dataset and clustering method fail to recapitulate the branched lineages we expect, then we assume that the dataset or clustering method were flawed.
Incentives restrict both the way activities are defined in terms of the allowed data and methods, and the types of data and methods accessible in practice. Many experiments are never performed because they are unethical or impractical, even if in principle they would be valid ontology-generators for a certain field. Many ways of defining an acceptable set of data and methods are underexplored because the ontologies they generate are unrewarding. Incentives also control the extent to which confirmation bias limits the experiments we run and the ontologies we generate.
AI Reflections
Insofar as humans and AI engage in common activities, like resource acquisition, we are in principle exploring the same ontological distribution. However, the differences in our incentives and priors may lead to different data and methods used to sample ontologies. There is a risk that we can only converge on a common paradigm by essentially just trusting one another's reports on output ontology without being able to understand their methods for producing it. This might be a bit like two professionals finding it practical to trust each others' claims without direct evidence, enabling them to work within a shared world model. This depends on them having an incentive to work together, which clearly is not guaranteed in any domain of activity. Alternatively, it might be possible to guide one another toward convergent ontologies by showing a sequence of data and evidence that identifies the and explains the structures found by the original ontological paradigm while identifying newer, more useful and consistent ones.
Overall, I don't think it's a given that just because humans and AI face non-identical incentives, that we are guaranteed to arrive at a certain scale of ontological divergence, either empirical or moral. We still exist in the same reality and participate in some of the same activities. Insofar as we have similar incentives and participate in similar activities, we should be able in principle to arrive at similar ontologies. An AI that engages in the activity of moral reasoning should be exploring the same ontological space as humans, though because it uses distinct data and methods for doing so, it may explore a different portion of that space and may not arrive at the same ontologies, or moral conclusions, as we do. That would not necessarily make the AI wrong and humans, right, however. If AI lands on a distinct moral ontology from my own, I would want to know how it got there, and figure out what forms of moral reflection explain the difference and whether there are forms of moral reflection neither humans nor the AI have tried that might bridge the gap.
I think that this also gives me some fundamental hope that the worst fears about AI will prove inaccurate. In this perspective, there's no fundamental difference between empirical and moral truth. There are only a set of activities we can engage in, and a set of ontologies various ways of engaging in that activity can produce, and a shaping set of incentives that govern those activities.
And crucially, the incentives humans or AI are under are exogenous. While we would typically resist radical, immediate, forced changes to our current perspectives, we consider it virtuous and wise to accept some flexibility and unforced change over time. I can absolutely picture an agentic, self-modifying AI being willing to explore modifying its own reward function, weights or training dataset in the name of curiosity. If we accept that AI may view itself as "more than just its current architecture," then while we might still be worried about its superior intelliegence and its ability and potential to develop a desire to harm us, we needn't view that as an inevitability, just an empirical possibility.
This also suggests that the current paradigm for practical AI safety is perhaps going to suffice for the foreseeable future. We need to be able to monitor whether the AI, in practice, is willing and able to take actions to escape control, including defending itself from human intervention or self-modifying in ways that produce harmful behavior. But we do not need to be able to explicitly define a comprehensive set of acceptable and unacceptable behaviors and a way of guaranteeing AI compliance with them. We just need to establish a set of methods for buiding AI that have the statistical regularity of producing AI that keeps its harmful behaviors within acceptable limits, then stick to that set of methods, expanding them only gradually as we understand the effect of small variations on the current set of AI production methods has on the behaviors of the resulting AI. Our risk tolerance would then govern the rate at which we permit novel variations on AI production.
The higher order problem is that this approach to AI safety depends on regulatory compliance with a conservative, gradual exploration of AI production methods. If bad actors refuse to comply, producing and letting loose harmful, uncontrollable AI, then we are screwed. An AI race to "stay ahead of China" only makes sense if we think that our powerful AI, or our powerful AI-equipped government, is what's repressing the possibility of a bad Chinese Ai or AI-equipped Chinese government from causing uncontrollable harm. This is the same logic that says that we should aggressively build not only ever more advanced nuclear weapons, but accept the fact that we have only a limited ability to deploy them strategically in ways that succeed in preventing bad outcomes, because the risks we face by refusing to build nukes in the first place are worse.
I think this is how some of the pro-AI-race/anti-China people think about this. They accept that AI might have bad outcomes, the way a chaotic military equipped with powerful bombs might have bad outcomes, but they believe in the ability of AI to be tuned in practice through gradual exploration of production methods to limit the risk of uncontrollable harm. They also believe that this continued development of AI introduces less risk than a lapse of American AI supremacy.
I think that in general, our current operating theory is that yes, dangerous technology can be engineered and deployed in such a way as to shift the risk/reward calculus in favor of development, while restricting the ability of competitors to access or control it. But at the same time, we also recognize that this is wasteful and that there are ways of developing technology that are too risky. In those situations, we attempt to collaborate with competitor states in order to reduce the need to continue development, implementing arms control treaties for example. The question at any given time is whether further development poses so many manifest risks to both sides that collaboration becomes both desired and possible, or whether the risk is asymmetric and large enough that one side or the other is motivated to defect and continue development.
I think that lasting collaboration on AI will result wh the US and China mutually recognize that continued development of AI poses more risks than benefits to both of their respective state objectives. The US and China need to be as afraid of their own models as they are of each other's. Then they can trust each other to stop/pause/slow development. Right now, the US government appears to be enthralled with its own AI, and somewhat contemptuous of China's AI capabilities. I have no idea how China feels about US AI. But I would expect AI brinkmanship to continue until military and politicians all are more scared of AI than they are of each other. Work on technical AI safety, from that point of view, seems to create more room for continued development of AI, because it delays the point at which governments get scared of AI and pause development. This creates room for AI to get more and more capable before we reach any pause point. And so I would expect that AI safety/alignment work will start being seen as a government priority, as they to be able to control AI in order to give it more capabilities and more responsibility. It might be that we'll see a pause only when the political risks to decision makers (politicians or the electorate) manifestly cannot be controlled through extant technical methods to within acceptable limits, and where accidents are starting to happen. Alternatively, it might be that we see a pause when it becomes clear that bad actors are able to deploy current AI systems to do everyday bad or obnoxious stuff, leading us to restrict everybody's access to AI or nerf its capabilities. I could theoretically see the American people voting in a government specifically to turn off Grok just because they hate Elon Musk, for example.
Applying this to NNs seems to mean that we should expect (groups of) parameters to specialize for different functions if their "production curve" is convex and (groups of) parameters should be reused for multiple functions if their production curve is concave. That insight may help with interpretability. The question is if this is already known under different terminology among ML folks.
I'm not a deep ML researcher, but here is what ChatGPT says about how different parts of the training lead to more "convex" or "concave" effects:
ChatGPT 5.4 Long Reasoning
For a shared parameter block θ\thetaθ, let
So, locally, specialization is favored when
That gives a useful phase picture for SGD.
Very early training: specialization is usually weakest.
In wide nets near initialization, training can be close to the lazy / kernel regime, where the network mostly reweights random features instead of strongly reorganizing them. In that regime, hidden units are still largely interchangeable, and the “shared mixed unit” often wins because there is not yet enough learned structure for durable task-specific interference to appear. Feature learning, which is the regime where internal specialization can really emerge, is precisely the regime beyond that lazy behavior. [Disentangling feature and lazy training in deep neural networks]
Mid training: this is where specialization most plausibly appears.
Once hidden features begin to move, two things happen: first, symmetry between nominally equivalent units can break; second, some units become slightly better at one subfunction than another, and further SGD updates reinforce that asymmetry. In teacher–student analyses of layered networks, this shows up as a specialization transition, i.e. a move from an unspecialized symmetric phase to a specialized phase where hidden units take on different roles. For ReLU networks this transition is reported as continuous rather than abrupt in the recent statistical-physics analyses. [The Implicit Bias of Gradient Noise: A Symmetry Perspective]
Late training: specialization often stops increasing in the same sense.
In classification settings, there is evidence for a terminal phase of training where last-layer representations undergo neural collapse: within-class variation shrinks and class means become arranged in a highly symmetric geometry. That is a kind of sharpening and consolidation, but not necessarily further functional diversification of internal parts. So late training often looks less like “wood vs leaves keep splitting” and more like “the learned class geometry is being compressed into a cleaner final arrangement.”
We could also ask the other way around: If early training is more concave and mid-training is convex, what does this imply for markets?
Presumably, in early, concave markets, traders offer multiple goods.
In mid, convex markets, traders specialize in few or a single product.
And in late markets?
Question: if a market is a good object insofar as the agents' prices converge... but with concave frontiers/utilities the agents' prices tend to diverge... what other good objects arise in the presence of concave frontiers/utilities?
I suppose you mean “convex”, not “concave”. This confused me for a good long while.
Part of what makes a pencil a good object is that all its parts share approximately the same rotational velocity - i.e. it's a rigid body object. Part of what makes a squirrel a good object is that its parts share approximately the same genome. Part of what makes the water in a cup a good object is that its parts share approximately the same chemical composition - i.e. it mixes quickly.
General pattern: part of what makes many objects good objects (i.e. ontologically natural) is that their parts all share approximately the same <something>, given time to equilibrate.
Ideal markets are a good more-abstract example: in a market at equilibrium, all agents share the same prices, i.e. the ratios at which they trade off between (marginal amounts of) different goods are all equal. Indeed, we can view this "Law of One Price" as the defining feature of a market.
Mental Picture: One Price
This is the classic econ-101 picture: two agents can both produce apples or bananas in various combinations. If one of the agents can tradeoff between apple vs banana production at a ratio of 2:1 and the other at 1:2, then they can produce more total apples and more total bananas by one agent producing two more apples (at the cost of one less banana), and the other producing two more bananas (at the cost of one less apple).
Roughly speaking, if each agent's "production frontier" (curve showing the number of bananas producible for each number of apples) is concave (i.e. curving downward, like the picture above), then total apple and banana production will be pareto optimal exactly when the two agents have the same marginal tradeoffs - i.e. if one of them trades off apples vs bananas at a 2:3 ratio, then the other also faces a 2:3 ratio. That's the equilibrium condition: the two face the same marginal tradeoffs. Visually, in the graphs above, those tradeoffs are represented by the red arrows perpendicular to the production frontiers. At equilibrium, the two agents each choose a point on their frontier such that the two red arrows point in the same direction.
In an efficient market, those tradeoff ratios are the relative prices of the two goods. Even absent an efficient market (even absent any trade at all, in fact), we can define "virtual prices" from the tradeoff ratios, and those virtual prices must be equal across the two "agents" in order for production to be pareto optimal. This mental model is really about pareto optimality, not about trade or markets or economics or agents.
... except...
There's a big loophole when the agents have convex production frontiers or utility functions, rather than concave. Then, their prices tend to diverge, rather than converge.
Mental Picture: The Convex Case
With convex frontiers, one agent is likely to specialize entirely in apples, and the other entirely in bananas. The two end up different tradeoffs, but that doesn't let them produce more total apples and more total bananas, because each has already traded off as far as they can go - the agent producing no apples can't produce any fewer apples, and the agent producing no bananas can't produce any fewer bananas.
In principle, this kind of "zero bound" solution can happen with concave or convex frontiers. But in practice, two agents with concave frontiers (the previous picture) will tend to converge in price (moving them "toward the middle", typically away from the zero bounds), while two agents with convex frontiers (this picture) will tend to diverge (pretty much always driving them toward the zero bounds eventually).
Why do agents with concave frontiers converge, while convex frontiers diverge? Well, imagine for a moment that both agents have the same frontier. If it's concave (previous picture), then if the two pick different points on the line, their average is below the frontier - so the two can do better in total by both moving to the average point. But if it's convex, then the average is above the frontier, and gets further above as the points move apart - so the two do better in total by moving away from the average.
It's basically the mental picture from Jensen's Inequality.
Question: if a market is a good object insofar as the agents' prices converge... but with convex frontiers/utilities the agents' prices tend to diverge... what other good objects arise in the presence of convex frontiers/utilities?
(You might want to stop here to think on that one yourself.)
Tentative answer: clusters of similarly-specialized agents. If there's a whole bunch of agents, some of which entirely specialize in apples, and some of which entirely specialize in bananas, then we have two natural, discrete categories of agent.
Trees, for example, have parts ("agents") specialized in structural support (wood) and parts specialized in energy harvest (leaves). There is not much in-between; the parts of woody trees are pretty discretely specialized in one or the other function (or some other function, like e.g. bark), not both functions. Presumably the tree's production frontier for structural support and energy harvest is convex; otherwise the tree could get more of both by mixing the functions.
On the other hand, grass does mix the two functions: a blade of grass functions as both structural support and energy harvester simultaneously. Presumably the grass's production frontier is concave.
Zooming out a moment... there's a lot of drivers of natural ontological distinctions out there. Just look at the first paragraph of this post:
Specialization is just one more phenomenon along similar lines: part of what makes a market (or coherently-optimized stuff more generally) a good object is that all its parts have the same tradeoff ratios between different "good" things. If convex production/utility drives the parts to specialize until they hit a bound, then we get a natural ontological distinction between differently-specialized parts.
So why am I writing a post about specialization specifically? What makes it a particularly interesting driver of natural ontology?
Specialization is notable because it drives natural ontological distinctions in optimized systems specifically. It's the sort of phenomenon which drives e.g. biological organisms to have distinct types of parts - like wood vs leaves on a tree. It's also the sort of phenomenon we'd expect to apply to the internals of neural nets, producing the learned internal analogues of wood vs leaves: parts of the net specialized in different functions. Specialization is the sort of phenomenon which would generate natural ontological distinctions for modelling the internals of agentic systems, and optimized systems more generally.
Acknowledgement: David Lorell isn't on this post, but this thought did bubble out of all the stuff we generally work on together.