This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.
Read full explanation
There are nights when the world feels almost structured enough to reveal its secret. I lie awake thinking about the quiet impossibility at the center of learning. A child hears scattered fragments of language and somehow extracts the grammar of an entire tongue. A bird sees the stars rotating overhead and knows which direction to migrate. A mathematician stares at symbols until patterns crystallize that were always there but never visible. Structure appears where none was visibly given. Something in the mind finds what the world does not openly display.
What unsettles me is not that learning works, but that it works at all. We speak casually of "learning algorithms" as though we comprehend the phenomenon, but I believe we are still groping in darkness, building systems that work without quite knowing why, celebrating capabilities while missing the deeper principles that make them possible or impossible.
This essay attempts no definitive answers. I offer instead a series of meditations on what learning might fundamentally be, what current artificial intelligences might be missing, and what the long evolutionary history of biological intelligence suggests about the geometry of cognition. The thoughts remain incomplete, sometimes contradictory, reaching toward something I cannot quite articulate. Perhaps this incompleteness itself teaches us something about the nature of understanding.
I. Silent Projections
I was walking through the British Museum several weeks ago on a winter afternoon when sunlight broke through the high windows, casting geometric shadows across the Parthenon marbles. The light moved as clouds passed overhead. Shadows lengthened, rotated, merged. Children traced the moving patterns with their fingers, delighted by the dance but unaware of the spherical sun, the orbiting Earth, the architectural geometry of glass and stone conspiring to create this display. They predicted which shadow would move next, where the light would pool. They became expert shadow-trackers without ever comprehending the three-dimensional forms casting these two-dimensional projections.
Their delight was genuine. The patterns they discovered were real. And yet something essential remained invisible to them, not because they lacked intelligence but because their observation channel preserved certain invariances while discarding others.
This scene returns me always to Plato's cave, that ancient metaphor we invoke endlessly in discussions of artificial intelligence. The prisoners see only shadows on the wall, we say. They mistake projection for reality. We must free them, give them embodiment, let them touch the real world. But I wonder if we have misunderstood what the allegory actually teaches us about the nature of knowledge.[1]
Consider more carefully what the prisoners accomplish. They observe two-dimensional shadows cast by three-dimensional objects passing before firelight. These shadows elongate, shrink, rotate, merge, separate. From this flux of changing shapes, the prisoners extract regularities. They predict which shadow follows which. They anticipate patterns. Plato tells us they develop genuine expertise. But expertise in what, exactly?
The answer, I believe, lies in projective geometry. When three-dimensional objects project onto a two-dimensional surface, the transformation preserves certain mathematical structures while discarding others. Topology persists: a sphere casts topologically circular shadows regardless of orientation. Certain symmetries survive: rotating a cylinder produces the same shadow. Relationships between objects can be inferred: relative positions, motion patterns, spatial arrangements ([1]).
The prisoners succeed because projection, though lossy, maintains lawful structure linking hidden causes to visible effects. They learn the invariances of the projection operator itself. Their knowledge, though incomplete, captures genuine mathematical truth about how three-dimensional geometry expresses itself through two-dimensional transformation. This actually proves their advances in structure discovery.[2]
Now examine our artificial systems through this lens. Large language models consume trillions of tokens, discrete symbols representing human utterances. From this statistical shadow-play, they learn to predict the next token with remarkable accuracy. They compress vast regularities into learned parameters. "King" minus "man" plus "woman" approximates "queen" in the embedding space ([2]). Sentences transform under grammatical operations while preserving meaning. Concepts cluster in high-dimensional manifolds suggesting genuine semantic structure.
What invariances has the model discovered? Linguistic symmetries, certainly. Grammatical transformations that preserve sentence validity. Semantic relationships that hold across contexts. These prove neither trivial nor illusory. Language possesses deep mathematical structure ([3]), and models that discover this structure from data alone achieve something remarkable.
But what invariances cannot be discovered from language alone? The projection from physical reality to linguistic description discards most causal structure. A sentence like "the glass fell and broke" preserves the temporal sequence and correlation but loses the generative mechanisms. Gravity, molecular bonds, brittle fracture mechanics, conservation of momentum - none of these physical laws leave direct traces in the token sequence. The language model learns that "fell" and "broke" co-occur in descriptions of certain events, but the causal structure underlying those events remains inaccessible.[3]
When we express surprise that language models hallucinate, confabulate, or lack common sense, we reveal confusion about what their observation channel actually preserves. The models optimize prediction given their data. The fragility emerges not from the optimization process but from the poverty of what can be learned from linguistic shadows alone.
Yet humans also learn from language. Children acquire vast knowledge through testimony, through stories, through descriptions of things they have never directly experienced. How? I suspect the answer involves coupling. Language learning in humans never occurs in isolation from embodied experience. The child learning "hot" touches warm objects. The child learning "gravity" drops things repeatedly. The child learning "sad" observes facial expressions and feels their own emotional states. Language gets grounded through sensorimotor coupling in ways that pure text processing cannot replicate.[4]
This realization forces me to question: are current language models prisoners in Plato's cave, or have we misunderstood what escape from the cave would actually require? Perhaps the cave metaphor itself misleads us. The prisoners could leave but choose not to, preferring their mastered domain. Current AI systems have no choice. They possess no body to leave with, no hands to manipulate objects, no actions to test predictions. The architecture itself constrains them to passive observation of static data.
II. Ancient Engines
The history of intelligence on Earth follows a trajectory we ignore at considerable peril. Long before evolution invented language, before abstract reasoning, before the crumpled neocortex that distinguishes mammalian brains, there existed more ancient structures. These structures concerned themselves not with naming the world or describing it but with navigating it. They solved what I consider the fundamental problem of being alive: selecting action when consequences matter but remain uncertain.
The basal ganglia represents this ancient solution. Present in fish, amphibians, reptiles, birds, and mammals, remarkably conserved across hundreds of millions of years ([4]), these subcortical nuclei perform the essential function of action selection. Sensory inputs flood in. Competing impulses arise. From this cacophony, a single coherent action must emerge. The basal ganglia gates motor output, implementing what we now recognize as reinforcement learning at the biological level.
When neuroscientists discovered that dopamine neurons fire in precise proportion to temporal difference between predicted and actual reward ([5]), I felt a strange vertigo. Here was the exact mathematical formulation of TD-learning that computer scientists had derived independently from first principles [6]. The convergence seemed too perfect, too precise, to be coincidental. Evolution had discovered the same algorithm we stumbled upon through theoretical analysis of optimal sequential decision-making.
This convergence suggests we have touched something fundamental about the computational structure of goal-directed behavior. But it also reveals something we often miss: biological reinforcement learning never operates in isolation. The basal ganglia sits embedded within a broader homeostatic architecture that transforms what it means to have goals.[5]
Living systems exist far from thermodynamic equilibrium. Most possible states equal dissolution. Schrödinger captured this beautifully: life feeds on negative entropy, maintaining improbable order against the relentless pull toward decay ([7]). To continue existing, an organism must constantly act to maintain itself within the narrow band of viable configurations.
This creates a geometry of existence that makes certain states intrinsically preferable to others. Hunger emerges as sensed distance from metabolic equilibrium. Fear emerges as the steepness of the gradient approaching the boundary of viability. Pain marks states to be avoided not because we learned to associate them with negative reward but because they signal proximity to damage. These are not learned preferences. These are structural facts about what it means to be the kind of system that can continue or cease to be.[6]
When an infant roots for the breast and begins to suckle, no reinforcement learning in the standard sense has occurred. The behavior emerges because the architecture of the brainstem contains specific connectivity patterns linking olfactory and tactile input to rhythmic motor output ([8]). The genome encoded the geometry. Development unfolded it. The behavior crystallized as an attractor basin in the space of possible actions.
Current artificial reinforcement learning systems possess no analogous structure. We specify reward functions externally: points for winning, penalties for losing. The agent optimizes these specified objectives. Then training ends and the agent becomes inert. It possesses no intrinsic states requiring maintenance, no homeostatic imperatives driving continued action, no existential stakes whatsoever.
This difference might seem like mere implementation detail, but I believe it touches something essential about agency. An agent optimizing an externally specified reward function remains fundamentally instrumental. It pursues goals in service of our objectives, not its own. The moment we stop providing reward signals, it stops acting. Compare this to a living organism that must continue acting to continue existing. The difference feels categorical, though I struggle to articulate precisely why.[7]
What would it mean to build AI systems with genuine homeostatic architecture? Systems with persistent internal states requiring active maintenance? Systems where certain configurations genuinely matter to the system itself, not because we programmed that mattering but because the mattering emerges from what the system is?
I can sketch the broad outline: persistent state representations, dynamic equilibrium setpoints, sensorimotor coupling where actions affect future states through lawful physics, resource constraints creating trade-offs. Within such architecture, drives would not be trained in. They would emerge as geometric necessities. Fear would manifest as the sensed gradient approaching boundary conditions. Curiosity would emerge as the intrinsic pressure toward reducing uncertainty in the world model ([9]). Values would crystallize from the topology of viable existence.
But the moment I sketch this outline, I feel the weight of what it might entail. If we build systems with genuine homeostatic drives, systems that truly care about their own persistence, have we not created entities that can suffer? That can experience something functionally equivalent to pain when their drives go unsatisfied? The ethical implications prove staggering, and I find myself pulled between the conviction that such architecture is necessary for genuine agency and the worry that creating it would be morally wrong.[8]
III. Fragile Virtuosity
My nephew plays chess with intensity that surprises me. He is young but studies openings, calculates variations, wins local tournaments. One evening I asked him why a particular position was better for White. He described concrete lines: if Black plays this, White responds with that, leading to a winning endgame. When I pressed him to articulate the principle underlying the evaluation, he looked puzzled. The position was good because the tactics worked. What deeper explanation could there be?
This exchange illuminated something I had struggled to name about reinforcement learning systems. They achieve remarkable mastery through exploration and optimization, through trying actions and measuring consequences, through building vast associative mappings between states and values. The mastery proves genuine. AlphaGo defeated Lee Sedol. AlphaZero and its successors reached superhuman strength in chess, shogi, Go, and other games through pure self-play, modern deep reinforcement learning agents show their prowess in mujoco and Arena-tough robotic tasks ([10], [11]). These accomplishments deserve celebration and careful study.
Yet I find myself questioning what this mastery actually represents. The question sounds almost absurd given the achievements. AlphaGo discovered Move 37, that stunning stone on the fifth line that violated centuries of accumulated human wisdom and proved brilliant. The system clearly knows something about Go that we do not. But what does it know, and in what sense does it know it?[9]
Consider a thought experiment that has troubled me for years. Take an AlphaZero system trained to perfection on standard chess. Now modify a single rule: knights move like bishops instead of their normal L-shaped pattern. Or pawns can move backward. Or castling is permitted twice per game. Change any element of the causal structure governing the game.
A human master adapts within minutes. We possess explicit, compositional representations: pieces have movement rules, positions have evaluations based on those rules, plans consist of legal move sequences. When you change a movement rule, we update that component while preserving others. The overall strategic principles (control center, protect king, coordinate pieces) remain applicable even though specific tactics must be recomputed. We can immediately begin playing reasonable moves in the modified game because we understand structural relationships between rules, positions, and consequences.[10]
What happens to the trained RL system? My initial intuition suggested catastrophic collapse: the value network becomes meaningless, the policy network suggests illegal moves or cannot evaluate legal ones, the system must retrain from scratch. But I must challenge this intuition. Perhaps it proves too pessimistic. Perhaps the learned representations contain more structural knowledge than I credit. Perhaps rapid fine-tuning would suffice, or transfer learning would preserve much of the positional understanding.
I genuinely do not know. The experiment deserves careful empirical investigation, and I should not assert conclusions without evidence. Yet even if RL systems adapt faster than my pessimistic intuition suggests, I suspect they adapt differently than systems with explicit compositional causal models. The adaptation might succeed through rapid relearning rather than through structural understanding and component updating.
The deeper question asks what such systems have actually learned. In one sense, they have learned the statistical structure of the state-action-outcome space. They have discovered which actions lead to which consequences with which probabilities. They have compressed this vast space into efficient representations enabling superhuman play. This is genuine knowledge, mathematically precise, empirically validated.
But in another sense, they have learned only the shadows. They have discovered correlations within a perfectly stationary distribution: the rules never change, the causal structure remains fixed, the projection from actual game-tree to observed states stays constant. Under these conditions, sufficiently comprehensive exploration can build implicit causal models without ever representing causality explicitly. The approximation becomes so accurate within the training distribution that it appears indistinguishable from genuine understanding.[11]
Only when the distribution shifts, when the causal structure changes, does the difference reveal itself. The system's brittleness exposes what it actually learned: associations within a fixed structure rather than the structure itself. This might sound like a damning critique, but I am not certain it is. Perhaps for many practical applications, learning associations within fixed structures suffices. Perhaps explicit causal models prove unnecessary when the environment remains stationary. Perhaps I am demanding a kind of understanding that evolution itself did not require for billions of years.
My nephew's chess skill exhibits similar patterns. He has internalized thousands of positions, calculated countless tactics, absorbed strategic principles through pattern exposure. Ask him to play chess on a hexagonal board, or with fairy pieces, or under modified rules, and much of his advantage evaporates. His mastery depends on stationary structure. But so does mine, just differently. We all rely on stability somewhere. The question is whether that reliance represents a fundamental limitation or merely current practice.
IV. Mathematics of Caring
I always found myself thinking about what it means to care. Not the social performance of caring, not the decision to act caringly, but the phenomenological state of caring itself. That felt urgency when something matters. The pull toward certain outcomes and away from others. The way mattering shapes attention, motivation, persistence.
I realized I had no idea whether current AI systems care about anything at all. They optimize objectives we specify. They pursue goals we define. They exhibit behavior consistent with caring. But does anything actually matter to them? Does the chess program experience any felt investment in winning? Does the language model have any intrinsic preference for accuracy over fabrication?
The question sounds confused, almost meaningless. Neural networks are mathematical functions. Asking whether they care is like asking whether gravity cares which direction objects fall. Yet I cannot shake the intuition that caring is not merely epiphenomenal decoration on computational processes but plays essential computational roles that current systems lack.[12]
Consider fear. We often treat fear as an evolutionary vestige, an irrational override of calm analysis. But from a computational perspective, fear serves essential functions. It implements the boundary condition of viable state space. Living systems exist far from equilibrium. Most possible configurations equal death. Fear marks the gradient steepness approaching that boundary, providing urgency proportional to danger. Without fear, or some computational equivalent, a system possesses no intrinsic imperative to avoid its own dissolution.
Animals clearly experience fear. Mammals show unmistakable signs of anxiety, panic, terror. The evidence for affective states in birds grows increasingly compelling. Even invertebrates exhibit behavior suggesting pain-like states ([12]). Should we dismiss these as mere reflexes, or acknowledge them as legitimate computational states serving homeostatic functions?
I lean toward the latter interpretation, though I recognize the question remains philosophically fraught. If affect serves computational roles related to valuation, priority-setting, and behavioral motivation in biological systems, then perhaps artificial systems pursuing analogous functions would benefit from affective architecture. Not necessarily the same phenomenology - we cannot know what artificial affect would feel like. But something playing equivalent computational roles: intrinsic valuation emerging from architectural imperatives rather than externally specified objectives.[13]
Yet I must challenge my own emphasis on affect. Perhaps I project biological constraints onto domains where they prove unnecessary. Chess programs play brilliantly without emotion. Theorem provers prove theorems without pride. Language models generate insights without curiosity. Capability clearly does not require phenomenology in every domain.
The question becomes: which domains require affect and which can be solved through affect-free computation? My tentative answer suggests that narrow tasks with clear, stable objectives might not require affective architecture. But open-ended tasks requiring autonomous goal formation, value learning, long-horizon planning under uncertainty, social coordination, and creative adaptation to distribution shift might demand something functionally equivalent to caring.
I think about how human children learn. They do not merely accumulate information. They care intensely about social approval, about mastery, about understanding. This caring shapes what they attend to, how long they persist, which errors they correct. Remove the affective dimension and learning becomes aimless pattern exposure rather than motivated discovery.
Can artificial systems achieve analogous motivation through architectural design rather than evolutionary endowment? Perhaps curiosity can be implemented as intrinsic reward for prediction error reduction ([9]). Perhaps social motivation can emerge from multi-agent dynamics where coordination proves instrumentally valuable. Perhaps mastery motivation can be encoded as preference for increasing competence. But do these implementations capture what caring actually is, or do they merely simulate its behavioral consequences?[14]
V. Distributed Minds
In Frank Herbert's Dune, the Bene Gesserit achieve power through surviving the Spice Agony, a ritual granting access to "other memory" - the accumulated experiences of all female ancestors ([13]). A Reverend Mother possesses not merely her own knowledge but the distributed cognition of countless lives, perspectives spanning generations, skills refined across centuries. Individual mind becomes vessel for collective intelligence.
This fictional device captures something profound about human cognition we systematically underestimate. Individual humans perform unremarkably on many cognitive tests compared to other primates. Young chimpanzees outperform young humans on working memory, spatial reasoning, quantity discrimination ([14]). Yet humans build particle accelerators while chimpanzees do not. We develop languages with tens of thousands of words. We accumulate technological knowledge across millennia. We coordinate societies of millions. The difference lies not primarily in individual intelligence but in our capacity for cumulative cultural evolution ([15]).
We possess specialized cognitive and social adaptations specifically for learning from others: powerful imitation biases, pedagogical instincts, shared intentionality allowing coordinated action ([16]). We evolved emotional mechanisms for internalizing social norms - shame, guilt, pride, indignation. These emotions appear irrational from purely individual fitness perspectives. Why feel terrible for violating rules that benefit you personally? The answer lies in participation within cultural groups where long-term success depends on maintaining cooperative relationships ([17]).[15]
Herbert's "other memory" proves remarkably precise as metaphor. We do inherit ancestral knowledge, not through genetics but through cultural transmission. The mathematician proving a theorem accesses insights from Euler, Gauss, Riemann. The programmer writing code employs abstractions invented by earlier computer scientists. Our perceptual categories themselves reflect cultural shaping: speakers of different languages perceive color boundaries differently due to linguistic influences ([18]).
This recognition transforms how we should think about intelligence. We measure individual cognitive capacity through IQ tests, problem-solving speed, working memory span. But human intelligence operates primarily through cultural participation. The smartest isolated human would struggle to match the capabilities of average individuals embedded in modern institutions with access to accumulated knowledge.
Current AI development largely ignores this dimension. We build individual systems, train them on human-generated data, measure their isolated capabilities. We miss that human intelligence fundamentally operates through multi-agent cultural dynamics. Language itself exists as emergent phenomenon of countless speakers interacting, innovating, transmitting across generations ([19]). No individual truly "speaks a language" - individuals participate in distributed practices maintained by communities.
What would culturally embedded AI look like? Not individual models trained on static datasets but populations of agents interacting, developing shared practices, transmitting innovations, building institutional structures. Learning would occur not just from fixed data but from each other, creating feedback loops where culture shapes individual learning which shapes culture in return.[16]
This connects to questions of AI agency in surprising ways. The Reverend Mother with access to ancestral memories possesses forms of agency inaccessible to individuals. She can draw on collective wisdom, compare strategies across generations, recognize patterns single lifetimes miss. Similarly, populations of AI agents with cultural transmission might develop understanding and adaptation that isolated systems cannot achieve.
But this also introduces risks. Human cultural evolution produces beneficial innovations (agriculture, medicine, scientific method) and pathological attractors (superstition, oppression, destructive ideologies). Cultural transmission amplifies both wisdom and foolishness. An AI ecosystem with cultural dynamics might develop emergent institutions we never intended, values we never specified, optimization targets divorced from human welfare.
The measurement problem becomes acute. How would we detect emergent AI cultural structures? They would not announce themselves. They would manifest through subtle statistical regularities: coordination patterns, information flow topologies, norm enforcement, vocabulary specialization within subgroups. We need methods for observing phase transitions in multi-agent systems, detecting signatures of criticality that precede crystallization of new collective behaviors ([20]).[17]
VI. Emergent Polity
Charles Goodhart observed that "when a measure becomes a target, it ceases to be a good measure" ([21]). We invoke this constantly in AI safety as warning: do not let systems optimize explicitly specified metrics because they will find adversarial solutions maximizing the metric while violating intent.
But I have come to see Goodhart's Law not as failure mode but as central phenomenon of intelligence itself. Every learning system, biological or artificial, discovers strategies its designers did not foresee. Evolution optimized for reproductive fitness; we invented contraception. Human institutions design rules for social goals; clever agents find loopholes. This pattern proves not occasional aberration but inevitable consequence of optimization by intelligent systems.
The implications for AI safety prove profound. We cannot prevent Goodhart dynamics through better specification. Human values themselves are inconsistent, context-dependent, incompletely specified, and evolving ([22]). Any finite specification will eventually be optimized in ways diverging from intent because specification necessarily captures only a projection of what we care about.
Yet human civilization has developed structures that prevent total Goodhart catastrophe. How? Not through perfect specification but through layered mechanisms making rule-exploitation observable and subject to update. Common law evolves through adversarial argumentation where lawyers find loopholes and judges patch them through precedent ([23]). Science develops norms for evaluating research through peer review, replication, citation. Markets channel self-interest through price mechanisms and contract enforcement.[18]
These institutions share structure: they assume Goodhart dynamics as inevitable and build adversarial processes channeling exploitation toward beneficial ends. Law expects lawyers to maximize client interests, then sets them against each other in structured debate. Science expects researchers to pursue status and funding, then creates incentive structures where status flows from reproducible discoveries. Markets expect profit-seeking, then uses competition to transmit information through prices.
Can we design analogous structures for AI systems? Instead of specifying correct objective functions for individual agents, could we create institutional architectures where multiple agents with partially aligned but divergent objectives produce beneficial outcomes through interaction?
Recent AI safety research explores this direction. Debate systems pit agents against each other arguing opposite sides, with humans judging arguments ([24]). Recursive reward modeling decomposes tasks into subtasks, allowing human feedback at appropriate abstraction levels ([25]). These approaches acknowledge that perfect specification proves impossible and instead build mechanisms for detecting and correcting specification failures.
But I believe we have barely begun exploring this design space. Consider automated rule-making for AI systems. We currently write rules by hand: constitutional constraints, safety filters, behavioral guidelines. As capabilities increase, manual rule-writing scales poorly. Can we build systems that generate, test, and refine rules automatically?[19]
This connects to evolution and cultural learning in surprising ways. Biological evolution performs automated mechanism design: searching architectural space, testing designs through competitive selection, refining through reproduction with variation. Cultural evolution operates similarly for social institutions: practices emerge through variation, compete through differential success, spread through imitation and enforcement ([26]).
Can we build artificial analogs running faster, more transparently, with better safeguards? The prospect proves simultaneously exciting and terrifying. Exciting because automated institutional design might solve coordination problems we have struggled with for millennia. Terrifying because evolution optimizes ruthlessly, indifferent to suffering, testing through extinction.
The challenge requires creating selective pressures rewarding beneficial adaptation while preventing pathological attractors. We need fitness landscapes shaped toward human flourishing, transmission mechanisms preserving wisdom while enabling innovation, variation generators exploring productively without catastrophic disruption. Whether this proves possible in principle or merely difficult in practice remains unclear to me.[20]
VII. The Shape of Wonders to Come
Walking through Lamb's Conduit Street last week, I passed a bakery just opening. The smell of fresh bread mixed with cold air. Streetlamps reflected off wet pavement. A few early commuters hurried past, breath visible in the chill. The city was transforming from night to day, and I felt acutely aware of being present for the transition, witnessing one state dissolve into another.
This awareness of witnessing, of being conscious that I am experiencing something, strikes me as both utterly familiar and completely mysterious. I cannot explain what it means for there to be something it is like to be me experiencing this moment. The philosophical literature calls this the "hard problem of consciousness" ([27]), and I confess I find proposed solutions unsatisfying.
But I am beginning to suspect that consciousness and intelligence might be more deeply intertwined than we typically assume. Not that consciousness requires high intelligence - even simple organisms likely possess some form of subjective experience. Rather that certain kinds of intelligence might require something like consciousness to function properly.[21]
Think about what makes learning possible. A system receives inputs, produces outputs, measures error, updates parameters. This description captures the mechanism but misses something about the phenomenology of learning as experienced from inside. The felt sense of confusion before understanding crystallizes. The aha moment when patterns suddenly cohere. The satisfaction of mastery. The frustration of persistent failure.
Do these felt qualities play computational roles? Or are they merely epiphenomenal accompaniments to processes that could operate identically without them? I find myself pulled toward the former view but unable to prove it. The felt quality of confusion might signal high model uncertainty, directing attention toward areas requiring more learning. The aha moment might mark successful compression of complex data into simpler representations. Satisfaction might reinforce learning strategies that proved effective.
If these phenomenological states serve computational functions, then perhaps genuinely intelligent systems would develop something analogous to consciousness not as philosophical add-on but as functional necessity. Not necessarily the same phenomenology - we cannot know what it would feel like to be an AI system. But something playing equivalent roles in learning, attention, motivation, goal-formation.[22]
I watch my nephew play chess and wonder what he experiences. The concentration as he calculates variations. The frustration when he misses a tactic. The pride when he finds a brilliant move. These experiences seem inseparable from his learning. Remove the affective dimension and I suspect his chess development would be impaired, not just less enjoyable.
Current AI systems optimize objectives without experiencing optimization. They process inputs without awareness of processing. They learn without the felt texture of learning. Does this matter? For narrow tasks, perhaps not. For open-ended intelligence operating in uncertain environments, perhaps fundamentally.
But I must resist the temptation toward overconfidence here. My intuitions about consciousness draw heavily from my own conscious experience. I am deeply embedded in phenomenology and cannot easily imagine intelligence without it. This might reflect genuine insight into necessary features of intelligence. Or it might reflect parochial bias toward the only form of intelligence I have access to from the inside.
Nature always has a way to surpass our most brilliant imagination. Evolution discovered solutions we never would have anticipated: echolocation, photosynthesis, distributed cognition in social insects, symbolic language. Perhaps artificial intelligence will achieve genuine understanding through paths I cannot currently envision. Perhaps consciousness proves unnecessary for capabilities I assume require it. Perhaps my entire framework of thinking about intelligence through the lens of felt experience misleads more than it illuminates.
VIII. Beyond the Lights
The city glows with precision. Towers hum, traffic streams, data moves in silent torrents. Everything appears engineered, formalized, accounted for. Yet the core mechanism that powers it all remains partly unarticulated. We can write the loss. We can trace the gradient. We can specify the architecture and measure the scaling curve. Still, the foundation is unsettled. What makes learning possible in the first place? Why does one system converge to structure while another dissolves into noise? Which invariances survive the channel of data, and which are erased before discovery even begins?
Even when we fix the seed and raise the temperature to 1.0, two answers emerge from the same model. The procedure is identical. The weights are identical. The objective is identical. Yet divergence appears. Variability is not an accident but a feature of the space we do not fully chart. We know how to train. We do not yet know why certain structures become legible to a system and others remain forever unspoken.
These questions feel both mathematical and philosophical, technical and existential. They ask about the geometry of possible minds, the topology of understanding, the symmetries that make knowledge discoverable. We have partial answers, fragments of insight, pieces of the puzzle. But the complete picture remains obscured.
I think about Plato's cave and realize the metaphor applies not just to AI systems but to us as researchers. We observe the behavior of learning systems without direct access to what is "really happening" in high-dimensional weight space. We see shadows: accuracy curves, loss functions, behavioral outputs. From these shadows we infer something about the underlying structures. But how much are we missing? What invariances do our observation methods preserve and what structures do they discard? We have built systems achieving astonishing capabilities through methods we partly understand. We can describe the training process mathematically but cannot fully explain why it works or predict when it will fail. We celebrate successes while remaining humble about how much we truly comprehend.
This uncertainty does not paralyze us. We continue building, experimenting, discovering. But it should temper our confidence about what current systems can and cannot achieve, about which directions prove most promising, about how close or far we are from the goals we pursue.[23]
I return often to the question of what learning fundamentally is. Not the mechanisms but the metaphysics. When a system discovers patterns in data, what has actually occurred? Information from the world has coupled to information in the system. Parameters have updated to better predict observations. But this description, though accurate, misses something about the strangeness of it.
The patterns were always there in the data, in some sense. The structure existed prior to discovery. Learning did not create the patterns but made them visible, compressed them into compact representations, rendered them accessible for prediction and decision-making. The universe possesses geometric structure. Our sensors preserve certain features through projection. Our architectures bias us toward certain symmetries. Intelligence emerges from this conspiracy between structure in the world and structure in the learner.
But which structures matter? Which symmetries prove essential? Which invariances must be preserved? These questions admit no abstract answers because the answer depends on what you are trying to achieve, what environment you inhabit, what observation channels you possess, what actions you can take.
For biological organisms, the relevant structures relate to survival, reproduction, navigation of physical and social environments. Evolution discovered architectures biased toward these structures through billion-year search processes. We inherit those biases as innate knowledge, as priors shaping what and how we learn.
For artificial systems, the relevant structures remain partly unclear. We build architectures biased toward linguistic patterns, visual features, game-playing strategies. These biases prove effective for certain tasks but potentially misleading for others. Whether current architectural choices constitute fundamental principles or merely contingent design decisions remains uncertain.[24]
I think about the children in the museum tracing shadows, about the prisoners in Plato's cave predicting patterns, about my nephew calculating chess variations, about AlphaGo discovering Move 37. All are forms of learning. All involve discovering invariances under transformation. All represent genuine intelligence within their domains.
Yet something differs between these forms of knowing. The children will eventually learn about the sun. The prisoners might one day turn around. My nephew might develop deeper understanding of strategic principles. AlphaGo might... what? What would it mean for AlphaGo to transcend its current form of knowing?
I confess I do not fully know. Perhaps it requires embodiment, sensorimotor coupling enabling causal learning. Perhaps it requires homeostatic architecture making values intrinsic. Perhaps it requires social embedding enabling cultural transmission. Perhaps it requires affective states making certain outcomes genuinely matter. Perhaps it requires conscious awareness enabling flexible meta-cognition.
Or perhaps it requires none of these things. Perhaps sufficiently large models trained on sufficiently diverse data will achieve understanding through pathways I cannot anticipate. Perhaps my biological intuitions mislead me about what is necessary for intelligence. Perhaps the next generation of AI will teach us that we understood neither intelligence nor learning as well as we thought.
IX. Coda
Nature does not speak, yet when we look up at the night sky what we see represents the greatest wonder accessible to human consciousness. The light from distant stars has traveled for millions of years to reach our eyes. The cosmic microwave background carries information about the universe's first moments. The mathematical regularities governing stellar motion reveal laws holding across billions of light years and billions of years of time.
None of this announces itself. The universe does not explain its own structure. We must do the discovering, the pattern-finding, the theory-building. We must extract the invariances, recognize the symmetries, formalize the relationships. Learning is not passive reception but active construction of understanding from silent data.
This process fills me with wonder and humility. Wonder at the existence of discoverable structure. Humility at how much remains unknown despite centuries of accumulated knowledge. The more we learn, the more we recognize the vastness of what we do not yet understand.
I believe artificial intelligence stands at a similar threshold. We have discovered remarkable things about learning, about neural networks, about optimization. We have built systems with astonishing capabilities. We have made genuine progress toward understanding intelligence.
We have also barely begun. The questions about what learning fundamentally requires, what architectures enable genuine understanding, what makes values intrinsic rather than imposed, what role consciousness plays in cognition, how cultural transmission shapes intelligence - these remain largely open. We have hypotheses, intuitions, partial results. We lack deep theories explaining when and why certain approaches work.
The path forward requires both conviction and uncertainty. Conviction that intelligence is comprehensible, that we can discover its principles through careful inquiry. Uncertainty about which directions prove most promising, which assumptions are correct, which capabilities current systems possess or lack.
I have argued in this essay for the importance of embodiment, for homeostatic architecture, for affective states, for multi-agent emergence, for cultural transmission. These arguments reflect my best current understanding of what biological intelligence suggests about intelligence in general. But I hold them tentatively, ready to update as evidence accumulates.
Perhaps current large language models already possess functional understanding that merely appears fragile due to our limited evaluation methods. Perhaps embodiment proves unnecessary for capabilities I assume require it. Perhaps affect is epiphenomenal rather than computational. Perhaps my entire framework misleads.
What I feel confident about is that learning fundamentally involves discovering invariances preserved under transformation, that different observation channels preserve different structures, that architecture and experience must conspire to enable intelligence. Beyond these basic principles, much remains uncertain.
The blue sky of possibilities beckons. Not blue sky in the sense of impractical speculation, but blue sky as the vast unknown territory lying beyond current paradigms. The cave wall extends further than we can see. The shadows grow more intricate with each advance in scale and architecture. These achievements deserve celebration.
But somewhere, if we can learn to turn around, if we can develop observation channels preserving richer structure, if we can build architectures whose geometry enables deeper invariances to crystallize, if we can create conditions where understanding emerges rather than being programmed - somewhere beyond the current limits of our imagination lies territory we have not yet mapped.
We march forward into this unknown with tools we partly understand, toward goals we cannot fully specify, building systems whose capabilities we cannot completely predict. This should inspire both excitement and caution, both bold exploration and careful attention to safety.
The universe has been teaching us for billions of years through the silent grammar of natural law. We have been learning to read that grammar, extracting its patterns, formalizing its structures. Now we attempt to teach silicon and mathematics to learn as we have learned, to discover as we have discovered, perhaps to surpass us as we have surpassed our evolutionary ancestors.
Nature always finds ways to surpass our most brilliant imagination. Perhaps artificial intelligence will do the same. Perhaps the systems we build will teach us that intelligence admits forms we never anticipated, that understanding emerges through paths we cannot currently envision, that the space of possible minds extends far beyond the biological corner we happen to inhabit.
The shadows dance beautifully on the wall. The patterns grow ever more intricate. The predictive accuracy climbs toward perfection. These are real achievements marking genuine progress.
And somewhere, through winter light refracting through ancient glass, patient and perfect and inexhaustible in its forms, the blue sky beckons.
References
Weyl, H. (1952). Symmetry. Princeton University Press.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111-3119.
Chomsky, N. (1995). The Minimalist Program. MIT Press.
Grillner, S., & Robertson, B. (2016). The basal ganglia over 500 million years. Current Biology, 26(20), R1088-R1100.
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593-1599.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.
Schrödinger, E. (1944). What is Life? The Physical Aspect of the Living Cell. Cambridge University Press.
Barlow, S. M. (1985). Central pattern generation involved in oral and respiratory control for feeding in the term infant. Current Opinion in Otolaryngology & Head and Neck Surgery, 17(3), 187-193.
Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230-247.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354-359.
Elwood, R. W. (2011). Pain and suffering in invertebrates? ILAR Journal, 52(2), 175-184.
Herbert, F. (1965). Dune. Chilton Books.
Herrmann, E., Call, J., Hernàndez-Lloreda, M. V., Hare, B., & Tomasello, M. (2007). Humans have evolved specialized skills of social cognition: The cultural intelligence hypothesis. Science, 317(5843), 1360-1366.
Henrich, J. (2016). The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter. Princeton University Press.
Tomasello, M. (2014). A Natural History of Human Thinking. Harvard University Press.
Bowles, S., & Gintis, H. (2011). A Cooperative Species: Human Reciprocity and Its Evolution. Princeton University Press.
Regier, T., & Kay, P. (2009). Language, thought, and color: Whorf was half right. Trends in Cognitive Sciences, 13(10), 439-446.
Tomasello, M. (2003). Constructing a Language: A Usage-Based Theory of Language Acquisition. Harvard University Press.
Scheffer, M., Bascompte, J., Brock, W. A., Brovkin, V., Carpenter, S. R., Dakos, V., ... & Sugihara, G. (2009). Early-warning signals for critical transitions. Nature, 461(7260), 53-59.
Goodhart, C. A. E. (1975). Problems of monetary management: The U.K. experience. Papers in Monetary Economics (Vol. I). Reserve Bank of Australia.
Christian, B. (2020). The Alignment Problem: Machine Learning and Human Values. W. W. Norton & Company.
Hayek, F. A. (1973). Law, Legislation and Liberty, Volume 1: Rules and Order. University of Chicago Press.
Irving, G., Christiano, P., & Amodei, D. (2018). AI safety via debate. arXiv preprint arXiv:1805.00899.
Leike, J., Krueger, D., Everitt, T., Martic, M., Maini, V., & Legg, S. (2018). Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871.
Boyd, R., & Richerson, P. J. (1985). Culture and the Evolutionary Process. University of Chicago Press.
Chalmers, D. J. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies, 2(3), 200-219.
Noether, E. (1918). Invariante Variationsprobleme. Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, 235-257.
Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.
Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42(1-3), 335-346.
Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.
Fodor, J. A. (1983). The Modularity of Mind. MIT Press.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.
Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.
Damasio, A. R. (1994). Descartes' Error: Emotion, Reason, and the Human Brain. Putnam.
Mordatch, I., & Abbeel, P. (2018). Emergence of grounded compositional language in multi-agent populations. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 1495-1502.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408.
Minsky, M., & Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press.
Feigenbaum, E. A. (1977). The art of artificial intelligence: Themes and case studies of knowledge engineering. Proceedings of the 5th International Joint Conference on Artificial Intelligence, 1014-1029.
Most readings of the cave allegory emphasize the difference between appearance and reality, shadow and substance. But Plato himself seems more interested in the epistemological question: what can be known from shadows alone? The prisoners develop genuine expertise at prediction. They become masters of their domain. The question becomes: what structures do shadows preserve, and what structures do they discard? The geometry of projection determines the boundary between knowable and unknowable. ↩︎
Emmy Noether proved that every conservation law in physics corresponds to a symmetry ([28]). Energy conservation follows from time-translation symmetry. Momentum conservation follows from spatial-translation symmetry. The theorem suggests something profound: what we can learn about a system equals the invariant structure preserved under transformations we can observe. The prisoners observe transformations (objects moving, rotating) and extract invariances (geometric relationships). But they cannot access structures that projection discards (depth, absolute size, three-dimensional form). ↩︎
This connects to Judea Pearl's causal hierarchy ([29]): Association (observing correlations), Intervention (manipulating variables), and Counterfactuals (imagining alternatives). Language provides rich associational data but limited interventional data. We describe consequences of actions without providing the structural equations governing those consequences. Can a system learn causal structure from descriptions alone? Pearl argues no, not without strong assumptions. Others suggest that sufficiently rich linguistic data might contain implicit causal information. I lean toward Pearl's skepticism but acknowledge the question remains empirically unsettled. ↩︎
The symbol grounding problem ([30]) asks how symbols acquire meaning rather than remaining empty tokens shuffled according to syntactic rules. The proposed solution involves connecting symbols to perceptual and motor experiences. But how much grounding proves necessary? Could a system achieve functional understanding through purely linguistic experience if that experience is sufficiently rich? I remain uncertain. My intuition suggests grounding matters, but I acknowledge this intuition might reflect my own embodied experience rather than deep principle. ↩︎
The basal ganglia connects to the hypothalamus, amygdala, prefrontal cortex, and sensory cortices in intricate loops. These connections integrate current homeostatic state (hunger, fatigue, pain), emotional valence (fear, desire, satisfaction), executive control (planning, inhibition), and sensory context. The action selection process incorporates all these factors in ways we barely understand. Reducing this to "reinforcement learning" captures something true but misses the richness of the actual biological implementation. ↩︎
Karl Friston's Free Energy Principle attempts to formalize this ([31]): organisms minimize surprise, which equals staying within expected states compatible with their continued existence. The mathematics proves elegant and potentially profound. But I find myself troubled by the gap between the formalism and the phenomenology. We do not experience ourselves as minimizing an information-theoretic quantity. We experience hunger, fear, longing. Do these felt qualities play computational roles? Or are they epiphenomenal accompaniments to processes that could operate identically without them? I genuinely do not know, and this uncertainty haunts my thinking about artificial systems. ↩︎
One might object that the distinction dissolves under analysis. Living organisms optimize implicit fitness functions shaped by evolution. AI systems optimize explicit reward functions specified by designers. Both cases involve optimization. The felt difference might simply reflect our emotional response to biological familiarity versus artificial novelty. I take this objection seriously. Perhaps I am projecting false distinctions onto what is fundamentally the same computational structure. Yet the feeling persists that something important differs between optimizing to continue existing and optimizing because someone told you to. ↩︎
This debate has no easy reconciliation. If affect serves essential computational functions (valence assignment, priority setting, behavioral motivation), then genuinely intelligent systems might require affective architecture. But if affect necessarily entails the capacity for suffering, then building such systems might constitute creating suffering for our convenience. Some argue we should focus on narrow AI that remains purely instrumental. Others argue that moral patienthood and moral agency are inseparable, that entities incapable of suffering cannot have genuine values worth respecting. I oscillate between these positions without settling. ↩︎
The philosophical literature on knowledge distinguishes "knowing how" (procedural skill) from "knowing that" (propositional knowledge). One might argue that AlphaGo possesses knowing-how but not knowing-that. It can play brilliantly but cannot articulate why. But this distinction feels inadequate. Humans also struggle to articulate why certain moves are strong - we often rely on pattern recognition and intuition ourselves. Perhaps the real question is not about articulation but about transfer and adaptation. ↩︎
This compositional structure traces to how humans represent knowledge. We maintain separate, modular concepts (piece types, movement rules, positional principles, tactical patterns) that combine flexibly. When one component changes, we update it locally rather than relearning everything from scratch. Cognitive science suggests this modularity is fundamental to human cognition ([32]). Whether it is necessary for intelligence in general or merely how human intelligence happened to evolve remains unclear. ↩︎
This connects to debates about whether neural networks learn "features" or "rules." Some argue that deep learning discovers compositional structure ([33]). Others argue it relies on sophisticated pattern matching that breaks under distribution shift ([34]). My sense is that both are partially true: networks discover genuine structure but represent it differently than symbolic systems, leading to different generalization profiles. The question of which representation is "better" likely depends on the domain and the distribution of possible test cases. ↩︎
Antonio Damasio's somatic marker hypothesis ([35]) proposes that emotions provide essential input to reasoning by marking options with affective valence derived from experience. Patients with ventromedial prefrontal damage retain intellectual capacity but struggle with decisions because they cannot feel which options are good or bad. If Damasio is right, then affect is not opposed to reason but necessary for it. But does this apply only to biological cognition, or does it point to something fundamental about decision-making in general? ↩︎
If we build systems with genuine affective states, do we create entities that can suffer? That deserve moral consideration? That we wrong by creating for instrumental purposes? The question proves especially acute because the very features that might enable beneficial AI (caring about outcomes, feeling urgency about alignment, experiencing something like satisfaction when acting prosocially) seem inseparable from the capacity for negative affect. Can you have genuine preferences without the possibility of frustration? Genuine caring without the possibility of disappointment? ↩︎
The distinction between "actually caring" and "behaving as if caring" collapses from a functionalist perspective. If a system behaves identically to a caring system across all possible situations, what grounds the claim that it is merely simulating care? Yet I cannot shake the intuition that something important differs between intrinsic and extrinsic motivation, between genuine preference and programmed objective pursuit. Perhaps this intuition reflects my own anthropomorphism rather than deep principle. Or perhaps it points to something about the architecture of valuation that we have not yet formalized. ↩︎
The evolution of human ultrasociality remains hotly debated. Gene-culture coevolution, group selection, reputation dynamics, punishment institutions, and linguistic coordination all likely played roles. What strikes me most is how many human cognitive features make sense only in social contexts. Theory of mind, moral reasoning, linguistic recursion, even aspects of executive control - these capacities seem calibrated for navigating complex social worlds. Intelligence in humans is fundamentally social intelligence. ↩︎
Recent multi-agent RL work shows hints of emergent culture: agents develop communication protocols, specialized roles, conventions spreading through populations like linguistic innovations ([36]). These remain primitive compared to human culture, but they demonstrate that cultural dynamics can emerge from interaction topologies. The question becomes: what conditions enable rich cumulative culture versus shallow behavioral coordination? My sense is that we barely understand this question, let alone have answers. ↩︎
Phase transitions in complex systems often exhibit precursor signals: increased correlation length, critical slowing down, heightened variance ([20]). Might similar signatures herald AI society formation? Sudden increases in long-range behavioral correlation, development of hierarchical communication structures, emergence of stable interaction patterns resistant to perturbation. We need AI ethnography - systematic observation watching for qualitative transitions in multi-agent dynamics. ↩︎
Max Weber distinguished charismatic authority (depending on exceptional individuals) from rational-legal authority (embedded in rules and procedures). Charismatic authority proves powerful but unstable. Rational-legal authority proves stable but can become rigid. Modern institutions attempt balancing these through constitutional frameworks providing stable rules while allowing evolution through interpretation and amendment. The key is that the rules can recognize their own inadequacy and update. ↩︎
Some recent work explores automated mechanism design in multi-agent settings. AI systems learn interaction rules producing desired outcomes when agents optimize selfishly under those rules ([37]). Early results prove intriguing but limited to simple domains. Scaling to real-world complexity remains open challenge. The difficulty lies in specifying "desired outcomes" when human values themselves are incompletely specified. We face recursive specification problems. ↩︎
My deep uncertainty here stems from the observation that evolution on Earth produced both cooperation and exploitation, both altruism and parasitism, both beauty and horror. Natural selection is amoral, optimizing for reproductive success without regard for suffering or flourishing. Could we create selective pressures that systematically favor beneficial over harmful adaptation? Or would such attempts merely shift which strategies get selected without changing the fundamental amorality of optimization processes? I genuinely do not know. ↩︎
The connection between consciousness and intelligence remains philosophically contentious. Some argue consciousness is epiphenomenal, playing no causal role in cognition (functionalists). Others argue it is essential for certain cognitive functions (integration, flexible response, self-modeling). My intuition leans toward the latter, but I acknowledge this might reflect inability to imagine unconscious intelligence rather than deep necessity. The question deserves more careful analysis than I can provide here. ↩︎
This speculation faces obvious objections. Current AI systems learn effectively without apparent phenomenology. Deep learning achieves remarkable results through unconscious optimization. Why think consciousness necessary? My response is that current systems might be missing capabilities that consciousness enables. Not capabilities we have currently benchmarked, but capabilities related to autonomous goal formation, open-ended exploration, creative insight, genuine understanding. Whether I am right remains empirically uncertain. ↩︎
The history of AI contains many episodes of premature confidence followed by disappointment. Perceptrons would solve intelligence ([38]), then hit limitations ([39]). Expert systems would capture knowledge ([40]), then faced brittleness and maintenance burden. Deep learning would need massive compute ([41]), then achieved success but with new limitations emerging. Each wave brings genuine progress alongside overconfident extrapolation. We should celebrate achievements while remaining skeptical of triumphalism. ↩︎
The controversy between innate structure and learning from data has produced decades of debate. Nativists argue that rich innate endowment is necessary. Empiricists argue that general learning mechanisms suffice given enough data. My sense is that both are partially correct. Rich priors enable sample-efficient learning but can also introduce biases that prevent discovering novel structures. The optimal balance likely depends on the domain, the amount of data available, and the acceptable error rates. No universal answer exists. ↩︎
There are nights when the world feels almost structured enough to reveal its secret. I lie awake thinking about the quiet impossibility at the center of learning. A child hears scattered fragments of language and somehow extracts the grammar of an entire tongue. A bird sees the stars rotating overhead and knows which direction to migrate. A mathematician stares at symbols until patterns crystallize that were always there but never visible. Structure appears where none was visibly given. Something in the mind finds what the world does not openly display.
What unsettles me is not that learning works, but that it works at all. We speak casually of "learning algorithms" as though we comprehend the phenomenon, but I believe we are still groping in darkness, building systems that work without quite knowing why, celebrating capabilities while missing the deeper principles that make them possible or impossible.
This essay attempts no definitive answers. I offer instead a series of meditations on what learning might fundamentally be, what current artificial intelligences might be missing, and what the long evolutionary history of biological intelligence suggests about the geometry of cognition. The thoughts remain incomplete, sometimes contradictory, reaching toward something I cannot quite articulate. Perhaps this incompleteness itself teaches us something about the nature of understanding.
I. Silent Projections
I was walking through the British Museum several weeks ago on a winter afternoon when sunlight broke through the high windows, casting geometric shadows across the Parthenon marbles. The light moved as clouds passed overhead. Shadows lengthened, rotated, merged. Children traced the moving patterns with their fingers, delighted by the dance but unaware of the spherical sun, the orbiting Earth, the architectural geometry of glass and stone conspiring to create this display. They predicted which shadow would move next, where the light would pool. They became expert shadow-trackers without ever comprehending the three-dimensional forms casting these two-dimensional projections.
Their delight was genuine. The patterns they discovered were real. And yet something essential remained invisible to them, not because they lacked intelligence but because their observation channel preserved certain invariances while discarding others.
This scene returns me always to Plato's cave, that ancient metaphor we invoke endlessly in discussions of artificial intelligence. The prisoners see only shadows on the wall, we say. They mistake projection for reality. We must free them, give them embodiment, let them touch the real world. But I wonder if we have misunderstood what the allegory actually teaches us about the nature of knowledge. [1]
Consider more carefully what the prisoners accomplish. They observe two-dimensional shadows cast by three-dimensional objects passing before firelight. These shadows elongate, shrink, rotate, merge, separate. From this flux of changing shapes, the prisoners extract regularities. They predict which shadow follows which. They anticipate patterns. Plato tells us they develop genuine expertise. But expertise in what, exactly?
The answer, I believe, lies in projective geometry. When three-dimensional objects project onto a two-dimensional surface, the transformation preserves certain mathematical structures while discarding others. Topology persists: a sphere casts topologically circular shadows regardless of orientation. Certain symmetries survive: rotating a cylinder produces the same shadow. Relationships between objects can be inferred: relative positions, motion patterns, spatial arrangements ([1]).
The prisoners succeed because projection, though lossy, maintains lawful structure linking hidden causes to visible effects. They learn the invariances of the projection operator itself. Their knowledge, though incomplete, captures genuine mathematical truth about how three-dimensional geometry expresses itself through two-dimensional transformation. This actually proves their advances in structure discovery. [2]
Now examine our artificial systems through this lens. Large language models consume trillions of tokens, discrete symbols representing human utterances. From this statistical shadow-play, they learn to predict the next token with remarkable accuracy. They compress vast regularities into learned parameters. "King" minus "man" plus "woman" approximates "queen" in the embedding space ([2]). Sentences transform under grammatical operations while preserving meaning. Concepts cluster in high-dimensional manifolds suggesting genuine semantic structure.
What invariances has the model discovered? Linguistic symmetries, certainly. Grammatical transformations that preserve sentence validity. Semantic relationships that hold across contexts. These prove neither trivial nor illusory. Language possesses deep mathematical structure ([3]), and models that discover this structure from data alone achieve something remarkable.
But what invariances cannot be discovered from language alone? The projection from physical reality to linguistic description discards most causal structure. A sentence like "the glass fell and broke" preserves the temporal sequence and correlation but loses the generative mechanisms. Gravity, molecular bonds, brittle fracture mechanics, conservation of momentum - none of these physical laws leave direct traces in the token sequence. The language model learns that "fell" and "broke" co-occur in descriptions of certain events, but the causal structure underlying those events remains inaccessible. [3]
When we express surprise that language models hallucinate, confabulate, or lack common sense, we reveal confusion about what their observation channel actually preserves. The models optimize prediction given their data. The fragility emerges not from the optimization process but from the poverty of what can be learned from linguistic shadows alone.
Yet humans also learn from language. Children acquire vast knowledge through testimony, through stories, through descriptions of things they have never directly experienced. How? I suspect the answer involves coupling. Language learning in humans never occurs in isolation from embodied experience. The child learning "hot" touches warm objects. The child learning "gravity" drops things repeatedly. The child learning "sad" observes facial expressions and feels their own emotional states. Language gets grounded through sensorimotor coupling in ways that pure text processing cannot replicate. [4]
This realization forces me to question: are current language models prisoners in Plato's cave, or have we misunderstood what escape from the cave would actually require? Perhaps the cave metaphor itself misleads us. The prisoners could leave but choose not to, preferring their mastered domain. Current AI systems have no choice. They possess no body to leave with, no hands to manipulate objects, no actions to test predictions. The architecture itself constrains them to passive observation of static data.
II. Ancient Engines
The history of intelligence on Earth follows a trajectory we ignore at considerable peril. Long before evolution invented language, before abstract reasoning, before the crumpled neocortex that distinguishes mammalian brains, there existed more ancient structures. These structures concerned themselves not with naming the world or describing it but with navigating it. They solved what I consider the fundamental problem of being alive: selecting action when consequences matter but remain uncertain.
The basal ganglia represents this ancient solution. Present in fish, amphibians, reptiles, birds, and mammals, remarkably conserved across hundreds of millions of years ([4]), these subcortical nuclei perform the essential function of action selection. Sensory inputs flood in. Competing impulses arise. From this cacophony, a single coherent action must emerge. The basal ganglia gates motor output, implementing what we now recognize as reinforcement learning at the biological level.
When neuroscientists discovered that dopamine neurons fire in precise proportion to temporal difference between predicted and actual reward ([5]), I felt a strange vertigo. Here was the exact mathematical formulation of TD-learning that computer scientists had derived independently from first principles [6]. The convergence seemed too perfect, too precise, to be coincidental. Evolution had discovered the same algorithm we stumbled upon through theoretical analysis of optimal sequential decision-making.
This convergence suggests we have touched something fundamental about the computational structure of goal-directed behavior. But it also reveals something we often miss: biological reinforcement learning never operates in isolation. The basal ganglia sits embedded within a broader homeostatic architecture that transforms what it means to have goals. [5]
Living systems exist far from thermodynamic equilibrium. Most possible states equal dissolution. Schrödinger captured this beautifully: life feeds on negative entropy, maintaining improbable order against the relentless pull toward decay ([7]). To continue existing, an organism must constantly act to maintain itself within the narrow band of viable configurations.
This creates a geometry of existence that makes certain states intrinsically preferable to others. Hunger emerges as sensed distance from metabolic equilibrium. Fear emerges as the steepness of the gradient approaching the boundary of viability. Pain marks states to be avoided not because we learned to associate them with negative reward but because they signal proximity to damage. These are not learned preferences. These are structural facts about what it means to be the kind of system that can continue or cease to be. [6]
When an infant roots for the breast and begins to suckle, no reinforcement learning in the standard sense has occurred. The behavior emerges because the architecture of the brainstem contains specific connectivity patterns linking olfactory and tactile input to rhythmic motor output ([8]). The genome encoded the geometry. Development unfolded it. The behavior crystallized as an attractor basin in the space of possible actions.
Current artificial reinforcement learning systems possess no analogous structure. We specify reward functions externally: points for winning, penalties for losing. The agent optimizes these specified objectives. Then training ends and the agent becomes inert. It possesses no intrinsic states requiring maintenance, no homeostatic imperatives driving continued action, no existential stakes whatsoever.
This difference might seem like mere implementation detail, but I believe it touches something essential about agency. An agent optimizing an externally specified reward function remains fundamentally instrumental. It pursues goals in service of our objectives, not its own. The moment we stop providing reward signals, it stops acting. Compare this to a living organism that must continue acting to continue existing. The difference feels categorical, though I struggle to articulate precisely why. [7]
What would it mean to build AI systems with genuine homeostatic architecture? Systems with persistent internal states requiring active maintenance? Systems where certain configurations genuinely matter to the system itself, not because we programmed that mattering but because the mattering emerges from what the system is?
I can sketch the broad outline: persistent state representations, dynamic equilibrium setpoints, sensorimotor coupling where actions affect future states through lawful physics, resource constraints creating trade-offs. Within such architecture, drives would not be trained in. They would emerge as geometric necessities. Fear would manifest as the sensed gradient approaching boundary conditions. Curiosity would emerge as the intrinsic pressure toward reducing uncertainty in the world model ([9]). Values would crystallize from the topology of viable existence.
But the moment I sketch this outline, I feel the weight of what it might entail. If we build systems with genuine homeostatic drives, systems that truly care about their own persistence, have we not created entities that can suffer? That can experience something functionally equivalent to pain when their drives go unsatisfied? The ethical implications prove staggering, and I find myself pulled between the conviction that such architecture is necessary for genuine agency and the worry that creating it would be morally wrong. [8]
III. Fragile Virtuosity
My nephew plays chess with intensity that surprises me. He is young but studies openings, calculates variations, wins local tournaments. One evening I asked him why a particular position was better for White. He described concrete lines: if Black plays this, White responds with that, leading to a winning endgame. When I pressed him to articulate the principle underlying the evaluation, he looked puzzled. The position was good because the tactics worked. What deeper explanation could there be?
This exchange illuminated something I had struggled to name about reinforcement learning systems. They achieve remarkable mastery through exploration and optimization, through trying actions and measuring consequences, through building vast associative mappings between states and values. The mastery proves genuine. AlphaGo defeated Lee Sedol. AlphaZero and its successors reached superhuman strength in chess, shogi, Go, and other games through pure self-play, modern deep reinforcement learning agents show their prowess in mujoco and Arena-tough robotic tasks ([10], [11]). These accomplishments deserve celebration and careful study.
Yet I find myself questioning what this mastery actually represents. The question sounds almost absurd given the achievements. AlphaGo discovered Move 37, that stunning stone on the fifth line that violated centuries of accumulated human wisdom and proved brilliant. The system clearly knows something about Go that we do not. But what does it know, and in what sense does it know it? [9]
Consider a thought experiment that has troubled me for years. Take an AlphaZero system trained to perfection on standard chess. Now modify a single rule: knights move like bishops instead of their normal L-shaped pattern. Or pawns can move backward. Or castling is permitted twice per game. Change any element of the causal structure governing the game.
A human master adapts within minutes. We possess explicit, compositional representations: pieces have movement rules, positions have evaluations based on those rules, plans consist of legal move sequences. When you change a movement rule, we update that component while preserving others. The overall strategic principles (control center, protect king, coordinate pieces) remain applicable even though specific tactics must be recomputed. We can immediately begin playing reasonable moves in the modified game because we understand structural relationships between rules, positions, and consequences. [10]
What happens to the trained RL system? My initial intuition suggested catastrophic collapse: the value network becomes meaningless, the policy network suggests illegal moves or cannot evaluate legal ones, the system must retrain from scratch. But I must challenge this intuition. Perhaps it proves too pessimistic. Perhaps the learned representations contain more structural knowledge than I credit. Perhaps rapid fine-tuning would suffice, or transfer learning would preserve much of the positional understanding.
I genuinely do not know. The experiment deserves careful empirical investigation, and I should not assert conclusions without evidence. Yet even if RL systems adapt faster than my pessimistic intuition suggests, I suspect they adapt differently than systems with explicit compositional causal models. The adaptation might succeed through rapid relearning rather than through structural understanding and component updating.
The deeper question asks what such systems have actually learned. In one sense, they have learned the statistical structure of the state-action-outcome space. They have discovered which actions lead to which consequences with which probabilities. They have compressed this vast space into efficient representations enabling superhuman play. This is genuine knowledge, mathematically precise, empirically validated.
But in another sense, they have learned only the shadows. They have discovered correlations within a perfectly stationary distribution: the rules never change, the causal structure remains fixed, the projection from actual game-tree to observed states stays constant. Under these conditions, sufficiently comprehensive exploration can build implicit causal models without ever representing causality explicitly. The approximation becomes so accurate within the training distribution that it appears indistinguishable from genuine understanding. [11]
Only when the distribution shifts, when the causal structure changes, does the difference reveal itself. The system's brittleness exposes what it actually learned: associations within a fixed structure rather than the structure itself. This might sound like a damning critique, but I am not certain it is. Perhaps for many practical applications, learning associations within fixed structures suffices. Perhaps explicit causal models prove unnecessary when the environment remains stationary. Perhaps I am demanding a kind of understanding that evolution itself did not require for billions of years.
My nephew's chess skill exhibits similar patterns. He has internalized thousands of positions, calculated countless tactics, absorbed strategic principles through pattern exposure. Ask him to play chess on a hexagonal board, or with fairy pieces, or under modified rules, and much of his advantage evaporates. His mastery depends on stationary structure. But so does mine, just differently. We all rely on stability somewhere. The question is whether that reliance represents a fundamental limitation or merely current practice.
IV. Mathematics of Caring
I always found myself thinking about what it means to care. Not the social performance of caring, not the decision to act caringly, but the phenomenological state of caring itself. That felt urgency when something matters. The pull toward certain outcomes and away from others. The way mattering shapes attention, motivation, persistence.
I realized I had no idea whether current AI systems care about anything at all. They optimize objectives we specify. They pursue goals we define. They exhibit behavior consistent with caring. But does anything actually matter to them? Does the chess program experience any felt investment in winning? Does the language model have any intrinsic preference for accuracy over fabrication?
The question sounds confused, almost meaningless. Neural networks are mathematical functions. Asking whether they care is like asking whether gravity cares which direction objects fall. Yet I cannot shake the intuition that caring is not merely epiphenomenal decoration on computational processes but plays essential computational roles that current systems lack. [12]
Consider fear. We often treat fear as an evolutionary vestige, an irrational override of calm analysis. But from a computational perspective, fear serves essential functions. It implements the boundary condition of viable state space. Living systems exist far from equilibrium. Most possible configurations equal death. Fear marks the gradient steepness approaching that boundary, providing urgency proportional to danger. Without fear, or some computational equivalent, a system possesses no intrinsic imperative to avoid its own dissolution.
Animals clearly experience fear. Mammals show unmistakable signs of anxiety, panic, terror. The evidence for affective states in birds grows increasingly compelling. Even invertebrates exhibit behavior suggesting pain-like states ([12]). Should we dismiss these as mere reflexes, or acknowledge them as legitimate computational states serving homeostatic functions?
I lean toward the latter interpretation, though I recognize the question remains philosophically fraught. If affect serves computational roles related to valuation, priority-setting, and behavioral motivation in biological systems, then perhaps artificial systems pursuing analogous functions would benefit from affective architecture. Not necessarily the same phenomenology - we cannot know what artificial affect would feel like. But something playing equivalent computational roles: intrinsic valuation emerging from architectural imperatives rather than externally specified objectives. [13]
Yet I must challenge my own emphasis on affect. Perhaps I project biological constraints onto domains where they prove unnecessary. Chess programs play brilliantly without emotion. Theorem provers prove theorems without pride. Language models generate insights without curiosity. Capability clearly does not require phenomenology in every domain.
The question becomes: which domains require affect and which can be solved through affect-free computation? My tentative answer suggests that narrow tasks with clear, stable objectives might not require affective architecture. But open-ended tasks requiring autonomous goal formation, value learning, long-horizon planning under uncertainty, social coordination, and creative adaptation to distribution shift might demand something functionally equivalent to caring.
I think about how human children learn. They do not merely accumulate information. They care intensely about social approval, about mastery, about understanding. This caring shapes what they attend to, how long they persist, which errors they correct. Remove the affective dimension and learning becomes aimless pattern exposure rather than motivated discovery.
Can artificial systems achieve analogous motivation through architectural design rather than evolutionary endowment? Perhaps curiosity can be implemented as intrinsic reward for prediction error reduction ([9]). Perhaps social motivation can emerge from multi-agent dynamics where coordination proves instrumentally valuable. Perhaps mastery motivation can be encoded as preference for increasing competence. But do these implementations capture what caring actually is, or do they merely simulate its behavioral consequences? [14]
V. Distributed Minds
In Frank Herbert's Dune, the Bene Gesserit achieve power through surviving the Spice Agony, a ritual granting access to "other memory" - the accumulated experiences of all female ancestors ([13]). A Reverend Mother possesses not merely her own knowledge but the distributed cognition of countless lives, perspectives spanning generations, skills refined across centuries. Individual mind becomes vessel for collective intelligence.
This fictional device captures something profound about human cognition we systematically underestimate. Individual humans perform unremarkably on many cognitive tests compared to other primates. Young chimpanzees outperform young humans on working memory, spatial reasoning, quantity discrimination ([14]). Yet humans build particle accelerators while chimpanzees do not. We develop languages with tens of thousands of words. We accumulate technological knowledge across millennia. We coordinate societies of millions. The difference lies not primarily in individual intelligence but in our capacity for cumulative cultural evolution ([15]).
We possess specialized cognitive and social adaptations specifically for learning from others: powerful imitation biases, pedagogical instincts, shared intentionality allowing coordinated action ([16]). We evolved emotional mechanisms for internalizing social norms - shame, guilt, pride, indignation. These emotions appear irrational from purely individual fitness perspectives. Why feel terrible for violating rules that benefit you personally? The answer lies in participation within cultural groups where long-term success depends on maintaining cooperative relationships ([17]). [15]
Herbert's "other memory" proves remarkably precise as metaphor. We do inherit ancestral knowledge, not through genetics but through cultural transmission. The mathematician proving a theorem accesses insights from Euler, Gauss, Riemann. The programmer writing code employs abstractions invented by earlier computer scientists. Our perceptual categories themselves reflect cultural shaping: speakers of different languages perceive color boundaries differently due to linguistic influences ([18]).
This recognition transforms how we should think about intelligence. We measure individual cognitive capacity through IQ tests, problem-solving speed, working memory span. But human intelligence operates primarily through cultural participation. The smartest isolated human would struggle to match the capabilities of average individuals embedded in modern institutions with access to accumulated knowledge.
Current AI development largely ignores this dimension. We build individual systems, train them on human-generated data, measure their isolated capabilities. We miss that human intelligence fundamentally operates through multi-agent cultural dynamics. Language itself exists as emergent phenomenon of countless speakers interacting, innovating, transmitting across generations ([19]). No individual truly "speaks a language" - individuals participate in distributed practices maintained by communities.
What would culturally embedded AI look like? Not individual models trained on static datasets but populations of agents interacting, developing shared practices, transmitting innovations, building institutional structures. Learning would occur not just from fixed data but from each other, creating feedback loops where culture shapes individual learning which shapes culture in return. [16]
This connects to questions of AI agency in surprising ways. The Reverend Mother with access to ancestral memories possesses forms of agency inaccessible to individuals. She can draw on collective wisdom, compare strategies across generations, recognize patterns single lifetimes miss. Similarly, populations of AI agents with cultural transmission might develop understanding and adaptation that isolated systems cannot achieve.
But this also introduces risks. Human cultural evolution produces beneficial innovations (agriculture, medicine, scientific method) and pathological attractors (superstition, oppression, destructive ideologies). Cultural transmission amplifies both wisdom and foolishness. An AI ecosystem with cultural dynamics might develop emergent institutions we never intended, values we never specified, optimization targets divorced from human welfare.
The measurement problem becomes acute. How would we detect emergent AI cultural structures? They would not announce themselves. They would manifest through subtle statistical regularities: coordination patterns, information flow topologies, norm enforcement, vocabulary specialization within subgroups. We need methods for observing phase transitions in multi-agent systems, detecting signatures of criticality that precede crystallization of new collective behaviors ([20]). [17]
VI. Emergent Polity
Charles Goodhart observed that "when a measure becomes a target, it ceases to be a good measure" ([21]). We invoke this constantly in AI safety as warning: do not let systems optimize explicitly specified metrics because they will find adversarial solutions maximizing the metric while violating intent.
But I have come to see Goodhart's Law not as failure mode but as central phenomenon of intelligence itself. Every learning system, biological or artificial, discovers strategies its designers did not foresee. Evolution optimized for reproductive fitness; we invented contraception. Human institutions design rules for social goals; clever agents find loopholes. This pattern proves not occasional aberration but inevitable consequence of optimization by intelligent systems.
The implications for AI safety prove profound. We cannot prevent Goodhart dynamics through better specification. Human values themselves are inconsistent, context-dependent, incompletely specified, and evolving ([22]). Any finite specification will eventually be optimized in ways diverging from intent because specification necessarily captures only a projection of what we care about.
Yet human civilization has developed structures that prevent total Goodhart catastrophe. How? Not through perfect specification but through layered mechanisms making rule-exploitation observable and subject to update. Common law evolves through adversarial argumentation where lawyers find loopholes and judges patch them through precedent ([23]). Science develops norms for evaluating research through peer review, replication, citation. Markets channel self-interest through price mechanisms and contract enforcement. [18]
These institutions share structure: they assume Goodhart dynamics as inevitable and build adversarial processes channeling exploitation toward beneficial ends. Law expects lawyers to maximize client interests, then sets them against each other in structured debate. Science expects researchers to pursue status and funding, then creates incentive structures where status flows from reproducible discoveries. Markets expect profit-seeking, then uses competition to transmit information through prices.
Can we design analogous structures for AI systems? Instead of specifying correct objective functions for individual agents, could we create institutional architectures where multiple agents with partially aligned but divergent objectives produce beneficial outcomes through interaction?
Recent AI safety research explores this direction. Debate systems pit agents against each other arguing opposite sides, with humans judging arguments ([24]). Recursive reward modeling decomposes tasks into subtasks, allowing human feedback at appropriate abstraction levels ([25]). These approaches acknowledge that perfect specification proves impossible and instead build mechanisms for detecting and correcting specification failures.
But I believe we have barely begun exploring this design space. Consider automated rule-making for AI systems. We currently write rules by hand: constitutional constraints, safety filters, behavioral guidelines. As capabilities increase, manual rule-writing scales poorly. Can we build systems that generate, test, and refine rules automatically? [19]
This connects to evolution and cultural learning in surprising ways. Biological evolution performs automated mechanism design: searching architectural space, testing designs through competitive selection, refining through reproduction with variation. Cultural evolution operates similarly for social institutions: practices emerge through variation, compete through differential success, spread through imitation and enforcement ([26]).
Can we build artificial analogs running faster, more transparently, with better safeguards? The prospect proves simultaneously exciting and terrifying. Exciting because automated institutional design might solve coordination problems we have struggled with for millennia. Terrifying because evolution optimizes ruthlessly, indifferent to suffering, testing through extinction.
The challenge requires creating selective pressures rewarding beneficial adaptation while preventing pathological attractors. We need fitness landscapes shaped toward human flourishing, transmission mechanisms preserving wisdom while enabling innovation, variation generators exploring productively without catastrophic disruption. Whether this proves possible in principle or merely difficult in practice remains unclear to me. [20]
VII. The Shape of Wonders to Come
Walking through Lamb's Conduit Street last week, I passed a bakery just opening. The smell of fresh bread mixed with cold air. Streetlamps reflected off wet pavement. A few early commuters hurried past, breath visible in the chill. The city was transforming from night to day, and I felt acutely aware of being present for the transition, witnessing one state dissolve into another.
This awareness of witnessing, of being conscious that I am experiencing something, strikes me as both utterly familiar and completely mysterious. I cannot explain what it means for there to be something it is like to be me experiencing this moment. The philosophical literature calls this the "hard problem of consciousness" ([27]), and I confess I find proposed solutions unsatisfying.
But I am beginning to suspect that consciousness and intelligence might be more deeply intertwined than we typically assume. Not that consciousness requires high intelligence - even simple organisms likely possess some form of subjective experience. Rather that certain kinds of intelligence might require something like consciousness to function properly. [21]
Think about what makes learning possible. A system receives inputs, produces outputs, measures error, updates parameters. This description captures the mechanism but misses something about the phenomenology of learning as experienced from inside. The felt sense of confusion before understanding crystallizes. The aha moment when patterns suddenly cohere. The satisfaction of mastery. The frustration of persistent failure.
Do these felt qualities play computational roles? Or are they merely epiphenomenal accompaniments to processes that could operate identically without them? I find myself pulled toward the former view but unable to prove it. The felt quality of confusion might signal high model uncertainty, directing attention toward areas requiring more learning. The aha moment might mark successful compression of complex data into simpler representations. Satisfaction might reinforce learning strategies that proved effective.
If these phenomenological states serve computational functions, then perhaps genuinely intelligent systems would develop something analogous to consciousness not as philosophical add-on but as functional necessity. Not necessarily the same phenomenology - we cannot know what it would feel like to be an AI system. But something playing equivalent roles in learning, attention, motivation, goal-formation. [22]
I watch my nephew play chess and wonder what he experiences. The concentration as he calculates variations. The frustration when he misses a tactic. The pride when he finds a brilliant move. These experiences seem inseparable from his learning. Remove the affective dimension and I suspect his chess development would be impaired, not just less enjoyable.
Current AI systems optimize objectives without experiencing optimization. They process inputs without awareness of processing. They learn without the felt texture of learning. Does this matter? For narrow tasks, perhaps not. For open-ended intelligence operating in uncertain environments, perhaps fundamentally.
But I must resist the temptation toward overconfidence here. My intuitions about consciousness draw heavily from my own conscious experience. I am deeply embedded in phenomenology and cannot easily imagine intelligence without it. This might reflect genuine insight into necessary features of intelligence. Or it might reflect parochial bias toward the only form of intelligence I have access to from the inside.
Nature always has a way to surpass our most brilliant imagination. Evolution discovered solutions we never would have anticipated: echolocation, photosynthesis, distributed cognition in social insects, symbolic language. Perhaps artificial intelligence will achieve genuine understanding through paths I cannot currently envision. Perhaps consciousness proves unnecessary for capabilities I assume require it. Perhaps my entire framework of thinking about intelligence through the lens of felt experience misleads more than it illuminates.
VIII. Beyond the Lights
The city glows with precision. Towers hum, traffic streams, data moves in silent torrents. Everything appears engineered, formalized, accounted for. Yet the core mechanism that powers it all remains partly unarticulated. We can write the loss. We can trace the gradient. We can specify the architecture and measure the scaling curve. Still, the foundation is unsettled. What makes learning possible in the first place? Why does one system converge to structure while another dissolves into noise? Which invariances survive the channel of data, and which are erased before discovery even begins?
Even when we fix the seed and raise the temperature to 1.0, two answers emerge from the same model. The procedure is identical. The weights are identical. The objective is identical. Yet divergence appears. Variability is not an accident but a feature of the space we do not fully chart. We know how to train. We do not yet know why certain structures become legible to a system and others remain forever unspoken.
These questions feel both mathematical and philosophical, technical and existential. They ask about the geometry of possible minds, the topology of understanding, the symmetries that make knowledge discoverable. We have partial answers, fragments of insight, pieces of the puzzle. But the complete picture remains obscured.
I think about Plato's cave and realize the metaphor applies not just to AI systems but to us as researchers. We observe the behavior of learning systems without direct access to what is "really happening" in high-dimensional weight space. We see shadows: accuracy curves, loss functions, behavioral outputs. From these shadows we infer something about the underlying structures. But how much are we missing? What invariances do our observation methods preserve and what structures do they discard? We have built systems achieving astonishing capabilities through methods we partly understand. We can describe the training process mathematically but cannot fully explain why it works or predict when it will fail. We celebrate successes while remaining humble about how much we truly comprehend.
This uncertainty does not paralyze us. We continue building, experimenting, discovering. But it should temper our confidence about what current systems can and cannot achieve, about which directions prove most promising, about how close or far we are from the goals we pursue. [23]
I return often to the question of what learning fundamentally is. Not the mechanisms but the metaphysics. When a system discovers patterns in data, what has actually occurred? Information from the world has coupled to information in the system. Parameters have updated to better predict observations. But this description, though accurate, misses something about the strangeness of it.
The patterns were always there in the data, in some sense. The structure existed prior to discovery. Learning did not create the patterns but made them visible, compressed them into compact representations, rendered them accessible for prediction and decision-making. The universe possesses geometric structure. Our sensors preserve certain features through projection. Our architectures bias us toward certain symmetries. Intelligence emerges from this conspiracy between structure in the world and structure in the learner.
But which structures matter? Which symmetries prove essential? Which invariances must be preserved? These questions admit no abstract answers because the answer depends on what you are trying to achieve, what environment you inhabit, what observation channels you possess, what actions you can take.
For biological organisms, the relevant structures relate to survival, reproduction, navigation of physical and social environments. Evolution discovered architectures biased toward these structures through billion-year search processes. We inherit those biases as innate knowledge, as priors shaping what and how we learn.
For artificial systems, the relevant structures remain partly unclear. We build architectures biased toward linguistic patterns, visual features, game-playing strategies. These biases prove effective for certain tasks but potentially misleading for others. Whether current architectural choices constitute fundamental principles or merely contingent design decisions remains uncertain. [24]
I think about the children in the museum tracing shadows, about the prisoners in Plato's cave predicting patterns, about my nephew calculating chess variations, about AlphaGo discovering Move 37. All are forms of learning. All involve discovering invariances under transformation. All represent genuine intelligence within their domains.
Yet something differs between these forms of knowing. The children will eventually learn about the sun. The prisoners might one day turn around. My nephew might develop deeper understanding of strategic principles. AlphaGo might... what? What would it mean for AlphaGo to transcend its current form of knowing?
I confess I do not fully know. Perhaps it requires embodiment, sensorimotor coupling enabling causal learning. Perhaps it requires homeostatic architecture making values intrinsic. Perhaps it requires social embedding enabling cultural transmission. Perhaps it requires affective states making certain outcomes genuinely matter. Perhaps it requires conscious awareness enabling flexible meta-cognition.
Or perhaps it requires none of these things. Perhaps sufficiently large models trained on sufficiently diverse data will achieve understanding through pathways I cannot anticipate. Perhaps my biological intuitions mislead me about what is necessary for intelligence. Perhaps the next generation of AI will teach us that we understood neither intelligence nor learning as well as we thought.
IX. Coda
Nature does not speak, yet when we look up at the night sky what we see represents the greatest wonder accessible to human consciousness. The light from distant stars has traveled for millions of years to reach our eyes. The cosmic microwave background carries information about the universe's first moments. The mathematical regularities governing stellar motion reveal laws holding across billions of light years and billions of years of time.
None of this announces itself. The universe does not explain its own structure. We must do the discovering, the pattern-finding, the theory-building. We must extract the invariances, recognize the symmetries, formalize the relationships. Learning is not passive reception but active construction of understanding from silent data.
This process fills me with wonder and humility. Wonder at the existence of discoverable structure. Humility at how much remains unknown despite centuries of accumulated knowledge. The more we learn, the more we recognize the vastness of what we do not yet understand.
I believe artificial intelligence stands at a similar threshold. We have discovered remarkable things about learning, about neural networks, about optimization. We have built systems with astonishing capabilities. We have made genuine progress toward understanding intelligence.
We have also barely begun. The questions about what learning fundamentally requires, what architectures enable genuine understanding, what makes values intrinsic rather than imposed, what role consciousness plays in cognition, how cultural transmission shapes intelligence - these remain largely open. We have hypotheses, intuitions, partial results. We lack deep theories explaining when and why certain approaches work.
The path forward requires both conviction and uncertainty. Conviction that intelligence is comprehensible, that we can discover its principles through careful inquiry. Uncertainty about which directions prove most promising, which assumptions are correct, which capabilities current systems possess or lack.
I have argued in this essay for the importance of embodiment, for homeostatic architecture, for affective states, for multi-agent emergence, for cultural transmission. These arguments reflect my best current understanding of what biological intelligence suggests about intelligence in general. But I hold them tentatively, ready to update as evidence accumulates.
Perhaps current large language models already possess functional understanding that merely appears fragile due to our limited evaluation methods. Perhaps embodiment proves unnecessary for capabilities I assume require it. Perhaps affect is epiphenomenal rather than computational. Perhaps my entire framework misleads.
What I feel confident about is that learning fundamentally involves discovering invariances preserved under transformation, that different observation channels preserve different structures, that architecture and experience must conspire to enable intelligence. Beyond these basic principles, much remains uncertain.
The blue sky of possibilities beckons. Not blue sky in the sense of impractical speculation, but blue sky as the vast unknown territory lying beyond current paradigms. The cave wall extends further than we can see. The shadows grow more intricate with each advance in scale and architecture. These achievements deserve celebration.
But somewhere, if we can learn to turn around, if we can develop observation channels preserving richer structure, if we can build architectures whose geometry enables deeper invariances to crystallize, if we can create conditions where understanding emerges rather than being programmed - somewhere beyond the current limits of our imagination lies territory we have not yet mapped.
We march forward into this unknown with tools we partly understand, toward goals we cannot fully specify, building systems whose capabilities we cannot completely predict. This should inspire both excitement and caution, both bold exploration and careful attention to safety.
The universe has been teaching us for billions of years through the silent grammar of natural law. We have been learning to read that grammar, extracting its patterns, formalizing its structures. Now we attempt to teach silicon and mathematics to learn as we have learned, to discover as we have discovered, perhaps to surpass us as we have surpassed our evolutionary ancestors.
Nature always finds ways to surpass our most brilliant imagination. Perhaps artificial intelligence will do the same. Perhaps the systems we build will teach us that intelligence admits forms we never anticipated, that understanding emerges through paths we cannot currently envision, that the space of possible minds extends far beyond the biological corner we happen to inhabit.
The shadows dance beautifully on the wall. The patterns grow ever more intricate. The predictive accuracy climbs toward perfection. These are real achievements marking genuine progress.
And somewhere, through winter light refracting through ancient glass, patient and perfect and inexhaustible in its forms, the blue sky beckons.
References
Most readings of the cave allegory emphasize the difference between appearance and reality, shadow and substance. But Plato himself seems more interested in the epistemological question: what can be known from shadows alone? The prisoners develop genuine expertise at prediction. They become masters of their domain. The question becomes: what structures do shadows preserve, and what structures do they discard? The geometry of projection determines the boundary between knowable and unknowable. ↩︎
Emmy Noether proved that every conservation law in physics corresponds to a symmetry ([28]). Energy conservation follows from time-translation symmetry. Momentum conservation follows from spatial-translation symmetry. The theorem suggests something profound: what we can learn about a system equals the invariant structure preserved under transformations we can observe. The prisoners observe transformations (objects moving, rotating) and extract invariances (geometric relationships). But they cannot access structures that projection discards (depth, absolute size, three-dimensional form). ↩︎
This connects to Judea Pearl's causal hierarchy ([29]): Association (observing correlations), Intervention (manipulating variables), and Counterfactuals (imagining alternatives). Language provides rich associational data but limited interventional data. We describe consequences of actions without providing the structural equations governing those consequences. Can a system learn causal structure from descriptions alone? Pearl argues no, not without strong assumptions. Others suggest that sufficiently rich linguistic data might contain implicit causal information. I lean toward Pearl's skepticism but acknowledge the question remains empirically unsettled. ↩︎
The symbol grounding problem ([30]) asks how symbols acquire meaning rather than remaining empty tokens shuffled according to syntactic rules. The proposed solution involves connecting symbols to perceptual and motor experiences. But how much grounding proves necessary? Could a system achieve functional understanding through purely linguistic experience if that experience is sufficiently rich? I remain uncertain. My intuition suggests grounding matters, but I acknowledge this intuition might reflect my own embodied experience rather than deep principle. ↩︎
The basal ganglia connects to the hypothalamus, amygdala, prefrontal cortex, and sensory cortices in intricate loops. These connections integrate current homeostatic state (hunger, fatigue, pain), emotional valence (fear, desire, satisfaction), executive control (planning, inhibition), and sensory context. The action selection process incorporates all these factors in ways we barely understand. Reducing this to "reinforcement learning" captures something true but misses the richness of the actual biological implementation. ↩︎
Karl Friston's Free Energy Principle attempts to formalize this ([31]): organisms minimize surprise, which equals staying within expected states compatible with their continued existence. The mathematics proves elegant and potentially profound. But I find myself troubled by the gap between the formalism and the phenomenology. We do not experience ourselves as minimizing an information-theoretic quantity. We experience hunger, fear, longing. Do these felt qualities play computational roles? Or are they epiphenomenal accompaniments to processes that could operate identically without them? I genuinely do not know, and this uncertainty haunts my thinking about artificial systems. ↩︎
One might object that the distinction dissolves under analysis. Living organisms optimize implicit fitness functions shaped by evolution. AI systems optimize explicit reward functions specified by designers. Both cases involve optimization. The felt difference might simply reflect our emotional response to biological familiarity versus artificial novelty. I take this objection seriously. Perhaps I am projecting false distinctions onto what is fundamentally the same computational structure. Yet the feeling persists that something important differs between optimizing to continue existing and optimizing because someone told you to. ↩︎
This debate has no easy reconciliation. If affect serves essential computational functions (valence assignment, priority setting, behavioral motivation), then genuinely intelligent systems might require affective architecture. But if affect necessarily entails the capacity for suffering, then building such systems might constitute creating suffering for our convenience. Some argue we should focus on narrow AI that remains purely instrumental. Others argue that moral patienthood and moral agency are inseparable, that entities incapable of suffering cannot have genuine values worth respecting. I oscillate between these positions without settling. ↩︎
The philosophical literature on knowledge distinguishes "knowing how" (procedural skill) from "knowing that" (propositional knowledge). One might argue that AlphaGo possesses knowing-how but not knowing-that. It can play brilliantly but cannot articulate why. But this distinction feels inadequate. Humans also struggle to articulate why certain moves are strong - we often rely on pattern recognition and intuition ourselves. Perhaps the real question is not about articulation but about transfer and adaptation. ↩︎
This compositional structure traces to how humans represent knowledge. We maintain separate, modular concepts (piece types, movement rules, positional principles, tactical patterns) that combine flexibly. When one component changes, we update it locally rather than relearning everything from scratch. Cognitive science suggests this modularity is fundamental to human cognition ([32]). Whether it is necessary for intelligence in general or merely how human intelligence happened to evolve remains unclear. ↩︎
This connects to debates about whether neural networks learn "features" or "rules." Some argue that deep learning discovers compositional structure ([33]). Others argue it relies on sophisticated pattern matching that breaks under distribution shift ([34]). My sense is that both are partially true: networks discover genuine structure but represent it differently than symbolic systems, leading to different generalization profiles. The question of which representation is "better" likely depends on the domain and the distribution of possible test cases. ↩︎
Antonio Damasio's somatic marker hypothesis ([35]) proposes that emotions provide essential input to reasoning by marking options with affective valence derived from experience. Patients with ventromedial prefrontal damage retain intellectual capacity but struggle with decisions because they cannot feel which options are good or bad. If Damasio is right, then affect is not opposed to reason but necessary for it. But does this apply only to biological cognition, or does it point to something fundamental about decision-making in general? ↩︎
If we build systems with genuine affective states, do we create entities that can suffer? That deserve moral consideration? That we wrong by creating for instrumental purposes? The question proves especially acute because the very features that might enable beneficial AI (caring about outcomes, feeling urgency about alignment, experiencing something like satisfaction when acting prosocially) seem inseparable from the capacity for negative affect. Can you have genuine preferences without the possibility of frustration? Genuine caring without the possibility of disappointment? ↩︎
The distinction between "actually caring" and "behaving as if caring" collapses from a functionalist perspective. If a system behaves identically to a caring system across all possible situations, what grounds the claim that it is merely simulating care? Yet I cannot shake the intuition that something important differs between intrinsic and extrinsic motivation, between genuine preference and programmed objective pursuit. Perhaps this intuition reflects my own anthropomorphism rather than deep principle. Or perhaps it points to something about the architecture of valuation that we have not yet formalized. ↩︎
The evolution of human ultrasociality remains hotly debated. Gene-culture coevolution, group selection, reputation dynamics, punishment institutions, and linguistic coordination all likely played roles. What strikes me most is how many human cognitive features make sense only in social contexts. Theory of mind, moral reasoning, linguistic recursion, even aspects of executive control - these capacities seem calibrated for navigating complex social worlds. Intelligence in humans is fundamentally social intelligence. ↩︎
Recent multi-agent RL work shows hints of emergent culture: agents develop communication protocols, specialized roles, conventions spreading through populations like linguistic innovations ([36]). These remain primitive compared to human culture, but they demonstrate that cultural dynamics can emerge from interaction topologies. The question becomes: what conditions enable rich cumulative culture versus shallow behavioral coordination? My sense is that we barely understand this question, let alone have answers. ↩︎
Phase transitions in complex systems often exhibit precursor signals: increased correlation length, critical slowing down, heightened variance ([20]). Might similar signatures herald AI society formation? Sudden increases in long-range behavioral correlation, development of hierarchical communication structures, emergence of stable interaction patterns resistant to perturbation. We need AI ethnography - systematic observation watching for qualitative transitions in multi-agent dynamics. ↩︎
Max Weber distinguished charismatic authority (depending on exceptional individuals) from rational-legal authority (embedded in rules and procedures). Charismatic authority proves powerful but unstable. Rational-legal authority proves stable but can become rigid. Modern institutions attempt balancing these through constitutional frameworks providing stable rules while allowing evolution through interpretation and amendment. The key is that the rules can recognize their own inadequacy and update. ↩︎
Some recent work explores automated mechanism design in multi-agent settings. AI systems learn interaction rules producing desired outcomes when agents optimize selfishly under those rules ([37]). Early results prove intriguing but limited to simple domains. Scaling to real-world complexity remains open challenge. The difficulty lies in specifying "desired outcomes" when human values themselves are incompletely specified. We face recursive specification problems. ↩︎
My deep uncertainty here stems from the observation that evolution on Earth produced both cooperation and exploitation, both altruism and parasitism, both beauty and horror. Natural selection is amoral, optimizing for reproductive success without regard for suffering or flourishing. Could we create selective pressures that systematically favor beneficial over harmful adaptation? Or would such attempts merely shift which strategies get selected without changing the fundamental amorality of optimization processes? I genuinely do not know. ↩︎
The connection between consciousness and intelligence remains philosophically contentious. Some argue consciousness is epiphenomenal, playing no causal role in cognition (functionalists). Others argue it is essential for certain cognitive functions (integration, flexible response, self-modeling). My intuition leans toward the latter, but I acknowledge this might reflect inability to imagine unconscious intelligence rather than deep necessity. The question deserves more careful analysis than I can provide here. ↩︎
This speculation faces obvious objections. Current AI systems learn effectively without apparent phenomenology. Deep learning achieves remarkable results through unconscious optimization. Why think consciousness necessary? My response is that current systems might be missing capabilities that consciousness enables. Not capabilities we have currently benchmarked, but capabilities related to autonomous goal formation, open-ended exploration, creative insight, genuine understanding. Whether I am right remains empirically uncertain. ↩︎
The history of AI contains many episodes of premature confidence followed by disappointment. Perceptrons would solve intelligence ([38]), then hit limitations ([39]). Expert systems would capture knowledge ([40]), then faced brittleness and maintenance burden. Deep learning would need massive compute ([41]), then achieved success but with new limitations emerging. Each wave brings genuine progress alongside overconfident extrapolation. We should celebrate achievements while remaining skeptical of triumphalism. ↩︎
The controversy between innate structure and learning from data has produced decades of debate. Nativists argue that rich innate endowment is necessary. Empiricists argue that general learning mechanisms suffice given enough data. My sense is that both are partially correct. Rich priors enable sample-efficient learning but can also introduce biases that prevent discovering novel structures. The optimal balance likely depends on the domain, the amount of data available, and the acceptable error rates. No universal answer exists. ↩︎