This article presents an emerging architectural hypothesis of the brain as a biological implementation of a Universal Learning Machine.  I present a rough but complete architectural view of how the brain works under the universal learning hypothesis.  I also contrast this new viewpoint - which comes from computational neuroscience and machine learning - with the older evolved modularity hypothesis popular in evolutionary psychology and the heuristics and biases literature.  These two conceptions of the brain lead to very different predictions for the likely route to AGI, the value of neuroscience, the expected differences between AGI and humans, and thus any consequent safety issues and dependent strategies.

Art generated by an artificial neural net

(The image above is from a recent mysterious post to r/machinelearning, probably from a Google project that generates art based on a visualization tool used to inspect the patterns learned by convolutional neural networks.  I am especially fond of the wierd figures riding the cart in the lower left. )

  1. Intro: Two viewpoints on the Mind
  2. Universal Learning Machines
  3. Historical Interlude
  4. Dynamic Rewiring
  5. Brain Architecture (the whole brain in one picture and a few pages of text)
  6. The Basal Ganglia
  7. Implications for AGI
  8. Conclusion


Intro: Two Viewpoints on the Mind

Few discoveries are more irritating than those that expose the pedigree of ideas.

-- Lord Acton (probably)

Less Wrong is a site devoted to refining the art of human rationality, where rationality is based on an idealized conceptualization of how minds should or could work.  Less Wrong and its founding sequences draws heavily on the heuristics and biases literature in cognitive psychology and related work in evolutionary psychology.  More specifically the sequences build upon a specific cluster in the space of cognitive theories, which can be identified in particular with the highly influential "evolved modularity" perspective of Cosmides and Tooby.

From Wikipedia:

Evolutionary psychologists propose that the mind is made up of genetically influenced and domain-specific[3] mental algorithms or computational modules, designed to solve specific evolutionary problems of the past.[4] 

From "Evolutionary Psychology and the Emotions":[5]

An evolutionary perspective leads one to view the mind as a crowded zoo of evolved, domain-specific programs.  Each is functionally specialized for solving a different adaptive problem that arose during hominid evolutionary history, such as face recognition, foraging, mate choice, heart rate regulation, sleep management, or predator vigilance, and each is activated by a different set of cues from the environment.

If you imagine these general theories or perspectives on the brain/mind as points in theory space, the evolved modularity cluster posits that much of the machinery of human mental algorithms is largely innate.  General learning - if it exists at all - exists only in specific modules; in most modules learning is relegated to the role of adapting existing algorithms and acquiring data; the impact of the information environment is de-emphasized.  In this view the brain is a complex messy cludge of evolved mechanisms.

There is another viewpoint cluster, more popular in computational neuroscience (especially today), that is almost the exact opposite of the evolved modularity hypothesis.  I will rebrand this viewpoint the "universal learner" hypothesis, aka the "one learning algorithm" hypothesis (the rebranding is justified mainly by the inclusion of some newer theories and evidence for the basal ganglia as a 'CPU' which learns to control the cortex).  The roots of the universal learning hypothesis can be traced back to Mountcastle's discovery of the simple uniform architecture of the cortex.[6]

The universal learning hypothesis proposes that all significant mental algorithms are learned; nothing is innate except for the learning and reward machinery itself (which is somewhat complicated, involving a number of systems and mechanisms), the initial rough architecture (equivalent to a prior over mindspace), and a small library of simple innate circuits (analogous to the operating system layer in a computer).  In this view the mind (software) is distinct from the brain (hardware).  The mind is a complex software system built out of a general learning mechanism.

In simplification, the main difference between these viewpoints is the relative quantity of domain specific mental algorithmic information specified in the genome vs that acquired through general purpose learning during the organism's lifetime.  Evolved modules vs learned modules.

When you have two hypotheses or viewpoints that are almost complete opposites this is generally a sign that the field is in an early state of knowledge; further experiments typically are required to resolve the conflict.

It has been about 25 years since Cosmides and Tooby began to popularize the evolved modularity hypothesis.  A number of key neuroscience experiments have been performed since then which support the universal learning hypothesis (reviewed later in this article).  

Additional indirect support comes from the rapid unexpected success of Deep Learning[7], which is entirely based on building AI systems using simple universal learning algorithms (such as Stochastic Gradient Descent or other various approximate Bayesian methods[8][9][10][11]) scaled up on fast parallel hardware (GPUs).  Deep Learning techniques have quickly come to dominate most of the key AI benchmarks including vision[12], speech recognition[13][14], various natural language tasks, and now even ATARI [15] - proving that simple architectures (priors) combined with universal learning is a path (and perhaps the only viable path) to AGI. Moreover, the internal representations that develop in some deep learning systems are structurally and functionally similar to representations in analogous regions of biological cortex[16].

To paraphrase Feynman: to truly understand something you must build it. 

In this article I am going to quickly introduce the abstract concept of a universal learning machine, present an overview of the brain's architecture as a specific type of universal learning machine, and finally I will conclude with some speculations on the implications for the race to AGI and AI safety issues in particular.

Universal Learning Machines

A universal learning machine is a simple and yet very powerful and general model for intelligent agents.  It is an extension of a general computer - such as Turing Machine - amplified with a universal learning algorithm.  Do not view this as my 'big new theory' - it is simply an amalgamation of a set of related proposals by various researchers.

An initial untrained seed ULM can be defined by 1.) a prior over the space of models (or equivalently, programs), 2.) an initial utility function, and 3.) the universal learning machinery/algorithm.  The machine is a real-time system that processes an input sensory/observation stream and produces an output motor/action stream to control the external world using a learned internal program that is the result of continuous self-optimization.

There is of course always room to smuggle in arbitrary innate functionality via the prior, but in general the prior is expected to be extremely small in bits in comparison to the learned model.

The key defining characteristic of a ULM is that it uses its universal learning algorithm for continuous recursive self-improvement with regards to the utility function (reward system).  We can view this as second (and higher) order optimization: the ULM optimizes the external world (first order), and also optimizes its own internal optimization process (second order), and so on.  Without loss of generality, any system capable of computing a large number of decision variables can also compute internal self-modification decisions.

Conceptually the learning machinery computes a probability distribution over program-space that is proportional to the expected utility distribution.  At each timestep it receives a new sensory observation and expends some amount of computational energy to infer an updated (approximate) posterior distribution over its internal program-space: an approximate 'Bayesian' self-improvement.

The above description is intentionally vague in the right ways to cover the wide space of possible practical implementations and current uncertainty.  You could view AIXI as a particular formalization of the above general principles, although it is also as dumb as a rock in any practical sense and has other potential theoretical problems.  Although the general idea is simple enough to convey in the abstract, one should beware of concise formal descriptions: practical ULMs are too complex to reduce to a few lines of math.

A ULM inherits the general property of a Turing Machine that it can compute anything that is computable, given appropriate resources.  However a ULM is also more powerful than a TM.  A Turing Machine can only do what it is programmed to do.  A ULM automatically programs itself.

If you were to open up an infant ULM - a machine with zero experience - you would mainly just see the small initial code for the learning machinery.  The vast majority of the codestore starts out empty - initialized to noise.  (In the brain the learning machinery is built in at the hardware level for maximal efficiency).

Theoretical turing machines are all qualitatively alike, and are all qualitatively distinct from any non-universal machine.  Likewise for ULMs.  Theoretically a small ULM is just as general/expressive as a planet-sized ULM.  In practice quantitative distinctions do matter, and can become effectively qualitative.

Just as the simplest possible Turing Machine is in fact quite simple, the simplest possible Universal Learning Machine is also probably quite simple.  A couple of recent proposals for simple universal learning machines include the Neural Turing Machine[16] (from Google DeepMind), and Memory Networks[17].  The core of both approaches involve training an RNN to learn how to control a memory store through gating operations. 

Historical Interlude

At this point you may be skeptical: how could the brain be anything like a universal learner?  What about all of the known innate biases/errors in human cognition?  I'll get to that soon, but let's start by thinking of a couple of general experiments to test the universal learning hypothesis vs the evolved modularity hypothesis.

In a world where the ULH is mostly correct, what do we expect to be different than in worlds where the EMH is mostly correct?

One type of evidence that would support the ULH is the demonstration of key structures in the brain along with associated wiring such that the brain can be shown to directly implement some version of a ULM architecture.

Another type of indirect evidence that would help discriminate the two theories would be evidence that the brain is capable of general global optimization, and that complex domain specific algorithms/circuits mostly result from this process.  If on the other hand the brain is only capable of constrained/local optimization, then most of the complexity must instead be innate - the result of global optimization in evolutionary deeptime.  So in essence it boils down to the optimization capability of biological learning vs biological evolution.

From the perspective of the EMH, it is not sufficient to demonstrate that there are things that brains can not learn in practice - because those simply could be quantitative limitations.  Demonstrating that an intel 486 can't compute some known computable function in our lifetimes is not proof that the 486 is not a Turing Machine.

Nor is it sufficient to demonstrate that biases exist: a ULM is only 'rational' to the extent that its observational experience and learning machinery allows (and to the extent one has the correct theory of rationality).  In fact, the existence of many (most?) biases intrinsically depends on the EMH - based on the implicit assumption that some cognitive algorithms are innate.  If brains are mostly ULMs then most cognitive biases dissolve, or become learning biases - for if all cognitive algorithms are learned, then evidence for biases is evidence for cognitive algorithms that people haven't had sufficient time/energy/motivation to learn.  (This does not imply that intrinsic limitations/biases do not exist or that the study of cognitive biases is a waste of time; rather the ULH implies that educational history is what matters most)

The genome can only specify a limited amount of information.  The question is then how much of our advanced cognitive machinery for things like facial recognition, motor planning, language, logic, planning, etc. is innate vs learned.  From evolution's perspective there is a huge advantage to preloading the brain with innate algorithms so long as said algorithms have high expected utility across the expected domain landscape.  

On the other hand, evolution is also highly constrained in a bit coding sense: every extra bit of code costs additional energy for the vast number of cellular replication events across the lifetime of the organism.  Low code complexity solutions also happen to be exponentially easier to find.  These considerations seem to strongly favor the ULH but they are difficult to quantify.

Neuroscientists have long known that the brain is divided into physical and functional modules.  These modular subdivisions were discovered a century ago by Brodmann.  Every time neuroscientists opened up a new brain, they saw the same old cortical modules in the same old places doing the same old things.  The specific layout of course varied from species to species, but the variations between individuals are minuscule. This evidence seems to strongly favor the EMH.

Throughout most of the 90's up into the 2000's, evidence from computational neuroscience models and AI were heavily influenced by - and unsurprisingly - largely supported the EMH.  Neural nets and backprop were known of course since the 1980's and worked on small problems[18], but at the time they didn't scale well - and there was no theory to suggest they ever would.  

Theory of the time also suggested local minima would always be a problem (now we understand that local minima are not really the main problem[19], and modern stochastic gradient descent methods combined with highly overcomplete models and stochastic regularization[20] are effectively global optimizers that can often handle obstacles such as local minima and saddle points[21]).  

The other related historical criticism rests on the lack of biological plausibility for backprop style gradient descent.  (There is as of yet little consensus on how the brain implements the equivalent machinery, but target propagation is one of the more promising recent proposals[22][23].)

Many AI researchers are naturally interested in the brain, and we can see the influence of the EMH in much of the work before the deep learning era.  HMAX is a hierarchical vision system developed in the late 90's by Poggio et al as a working model of biological vision[24].  It is based on a preconfigured hierarchy of modules, each of which has its own mix of innate features such as gabor edge detectors along with a little bit of local learning.  It implements the general idea that complex algorithms/features are innate - the result of evolutionary global optimization - while neural networks (incapable of global optimization) use hebbian local learning to fill in details of the design.

Dynamic Rewiring

In a groundbreaking study from 2000 published in Nature, Sharma et al successfully rewired ferret retinal pathways to project into the auditory cortex instead of the visual cortex.[25]  The result: auditory cortex can become visual cortex, just by receiving visual data!  Not only does the rewired auditory cortex develop the specific gabor features characteristic of visual cortex; the rewired cortex also becomes functionally visual. [26] True, it isn't quite as effective as normal visual cortex, but that could also possibly be an artifact of crude and invasive brain rewiring surgery.

The ferret study was popularized by the book On Intelligence by Hawkins in 2004 as evidence for a single cortical learning algorithm.  This helped percolate the evidence into the wider AI community, and thus probably helped in setting up the stage for the deep learning movement of today.  The modern view of the cortex is that of a mostly uniform set of general purpose modules which slowly become recruited for specific tasks and filled with domain specific 'code' as a result of the learning (self optimization) process.

The next key set of evidence comes from studies of atypical human brains with novel extrasensory powers.  In 2009 Vuillerme et al showed that the brain could automatically learn to process sensory feedback rendered onto the tongue[27].  This research was developed into a complete device that allows blind people to develop primitive tongue based vision.

In the modern era some blind humans have apparently acquired the ability to perform echolocation (sonar), similar to cetaceans.  In 2011 Thaler et al used MRI and PET scans to show that human echolocators use diverse non-auditory brain regions to process echo clicks, predominantly relying on re-purposed 'visual' cortex.[27] 

The echolocation study in particular helps establish the case that the brain is actually doing global, highly nonlocal optimization - far beyond simple hebbian dynamics.  Echolocation is an active sensing strategy that requires very low latency processing, involving complex timed coordination between a number of motor and sensory circuits - all of which must be learned.

Somehow the brain is dynamically learning how to use and assemble cortical modules to implement mental algorithms: everyday tasks such as visual counting, comparisons of images or sounds, reading, etc - all are task which require simple mental programs that can shuffle processed data between modules (some or any of which can also function as short term memory buffers).

To explain this data, we should be on the lookout for a system in the brain that can learn to control the cortex - a general system that dynamically routes data between different brain modules to solve domain specific tasks.

But first let's take a step back and start with a high level architectural view of the entire brain to put everything in perspective.

Brain Architecture

Below is a circuit diagram for the whole brain.  Each of the main subsystems work together and are best understood together.  You can probably get a good high level extremely coarse understanding of the entire brain is less than one hour.

(there are a couple of circuit diagrams of the whole brain on the web, but this is the best.  From this site.)

The human brain has ~100 billion neurons and ~100 trillion synapses, but ultimately it evolved from the bottom up - from organisms with just hundreds of neurons, like the tiny brain of C. Elegans.

We know that evolution is code complexity constrained: much of the genome codes for cellular metabolism, all the other organs, and so on.  For the brain, most of its bit budget needs to be spent on all the complex neuron, synapse, and even neurotransmitter level machinery - the low level hardware foundation.

For a tiny brain with 1000 neurons or less, the genome can directly specify each connection.  As you scale up to larger brains, evolution needs to create vastly more circuitry while still using only about the same amount of code/bits.  So instead of specifying connectivity at the neuron layer, the genome codes connectivity at the module layer.  Each module can be built from simple procedural/fractal expansion of progenitor cells.

So the size of a module has little to nothing to do with its innate complexity.  The cortical modules are huge - V1 alone contains 200 million neurons in a human - but there is no reason to suspect that V1 has greater initial code complexity than any other brain module.  Big modules are built out of simple procedural tiling patterns.

Very roughly the brain's main modules can be divided into six subsystems (there are numerous smaller subsystems):

  • The neocortex: the brain's primary computational workhorse (blue/purple modules at the top of the diagram).  Kind of like a bunch of general purpose FPGA coprocessors.
  • The cerebellum: another set of coprocessors with a simpler feedforward architecture.  Specializes more in motor functionality. 
  • The thalamus: the orangish modules below the cortex.  Kind of like a relay/routing bus.
  • The hippocampal complex: the apex of the cortex, and something like the brain's database.
  • The amygdala and limbic reward system: these modules specialize in something like the value function.
  • The Basal Ganglia (green modules): the central control system, similar to a CPU.

In the interest of space/time I will focus primarily on the Basal Ganglia and will just touch on the other subsystems very briefly and provide some links to further reading.

The neocortex has been studied extensively and is the main focus of several popular books on the brain.  Each neocortical module is a 2D array of neurons (technically 2.5D with a depth of about a few dozen neurons arranged in about 5 to 6 layers).

Each cortical module is something like a general purpose RNN (recursive neural network) with 2D local connectivity.  Each neuron connects to its neighbors in the 2D array.  Each module also has nonlocal connections to other brain subsystems and these connections follow the same local 2D connectivity pattern, in some cases with some simple affine transformations.  Convolutional neural networks use the same general architecture (but they are typically not recurrent.)

Cortical modules - like artifical RNNs - are general purpose and can be trained to perform various tasks.  There are a huge number of models of the cortex, varying across the tradeoff between biological realism and practical functionality.  

Perhaps surprisingly, any of a wide variety of learning algorithms can reproduce cortical connectivity and features when trained on appropriate sensory data[27].  This is a computational proof of the one-learning-algorithm hypothesis; furthermore it illustrates the general idea that data determines functional structure in any general learning system.

There is evidence that cortical modules learn automatically (unsupervised) to some degree, and there is also some evidence that cortical modules can be trained to relearn data from other brain subsystems - namely the hippocampal complex.  The dark knowledge distillation technique in ANNs[28][29] is a potential natural analog/model of hippocampus -> cortex knowledge transfer.

Module connections are bidirectional, and feedback connections (from high level modules to low level) outnumber forward connections.  We can speculate that something like target propagation can also be used to guide or constrain the development of cortical maps (speculation).

The hippocampal complex is the root or top level of the sensory/motor hierarchy.  This short youtube video  gives a good seven minute overview of the HC.  It is like a spatiotemporal database.  It receives compressed scene descriptor streams from the sensory cortices, it stores this information in medium-term memory, and it supports later auto-associative recall of these memories.  Imagination and memory recall seem to be basically the same.  

The 'scene descriptors' take the sensible form of things like 3D position and camera orientation, as encoded in place, grid, and head direction cells.  This is basically the logical result of compressing the sensory stream, comparable to the networking data stream in a multiplayer video game.

Imagination/recall is basically just the reverse of the forward sensory coding path - in reverse mode a compact scene descriptor is expanded into a full imagined scene.  Imagined/remembered scenes activate the same cortical subnetworks that originally formed the memory (or would have if the memory was real, in the case of imagined recall).

The amygdala and associated limbic reward modules are rather complex, but look something like the brain's version of the value function for reinforcement learning.  These modules are interesting because they clearly rely on learning, but clearly the brain must specify an initial version of the value/utility function that has some minimal complexity.

As an example, consider taste.  Infants are born with basic taste detectors and a very simple initial value function for taste.  Over time the brain receives feedback from digestion and various estimators of general mood/health, and it uses this to refine the initial taste value function.  Eventually the adult sense of taste becomes considerably more complex.  Acquired taste for bitter substances - such as coffee and beer - are good examples.

The amygdala appears to do something similar for emotional learning.  For example infants are born with a simple versions of a fear response, with is later refined through reinforcement learning.  The amygdala sits on the end of the hippocampus, and it is also involved heavily in memory processing.

See also these two videos from khanacademy: one on the limbic system and amygdala (10 mins), and another on the midbrain reward system (8 mins)


The Basal Ganglia


The Basal Ganglia is a wierd looking complex of structures located in the center of the brain.  It is a conserved structure found in all vertebrates, which suggests a core functionality.  The BG is proximal to and connects heavily with the midbrain reward/limbic systems.  It also connects to the brain's various modules in the cortex/hippocampus, thalamus and the cerebellum . . . basically everything.

All of these connections form recurrent loops between associated compartmental modules in each structure: thalamocortical/hippocampal-cerebellar-basal_ganglial loops.


Just as the cortex and hippocampus are subdivided into modules, there are corresponding modular compartments in the thalamus, basal ganglia, and the cerebellum.  The set of modules/compartments in each main structure are all highly interconnected with their correspondents across structures, leading to the concept of distributed processing modules.

Each DPM forms a recurrent loop across brain structures (the local networks in the cortex, BG, and thalamus are also locally recurrent, whereas those in the cerebellum are not).  These recurrent loops are mostly separate, but each sub-structure also provides different opportunities for inter-loop connections.

The BG appears to be involved in essentially all higher cognitive functions.  Its core functionality is action selection via subnetwork switching.  In essence action selection is the core problem of intelligence, and it is also general enough to function as the building block of all higher functionality.  A system that can select between motor actions can also select between tasks or subgoals.  More generally, low level action selection can easily form the basis of a Turing Machine via selective routing: deciding where to route the output of thalamocortical-cerebellar modules (some of which may specialize in short term memory as in the prefrontal cortex, although all cortical modules have some short term memory capability).

There are now a number of computational models for the Basal Ganglia-Cortical system that demonstrate possible biologically plausible implementations of the general theory[28][29]; integration with the hippocampal complex leads to larger-scale systems which aim to model/explain most of higher cognition in terms of sequential mental programs[30] (of course fully testing any such models awaits sufficient computational power to run very large-scale neural nets).

For an extremely oversimplified model of the BG as a dynamic router, consider an array of N distributed modules controlled by the BG system.  The BG control network expands these N inputs into an NxN matrix.  There are N2 potential intermodular connections, each of which can be individually controlled.  The control layer reads a compressed, downsampled version of the module's hidden units as its main input, and is also recurrent.  Each output node in the BG has a multiplicative gating effect which selectively enables/disables an individual intermodular connection.  If the control layer is naively fully connected, this would require (N2)2 connections, which is only feasible for N ~ 100 modules, but sparse connectivity can substantially reduce those numbers.

It is unclear (to me), whether the BG actually implements NxN style routing as described above, or something more like 1xN or Nx1 routing, but there is general agreement that it implements cortical routing. 

Of course in actuality the BG architecture is considerably more complex, as it also must implement reinforcement learning, and the intermodular connectivity map itself is also probably quite sparse/compressed (the BG may not control all of cortex, certainly not at a uniform resolution, and many controlled modules may have a very limited number of allowed routing decisions).  Nonetheless, the simple multiplicative gating model illustrates the core idea.  

This same multiplicative gating mechanism is the core principle behind the highly successful LSTM (Long Short-Term Memory)[30] units that are used in various deep learning systems.  The simple version of the BG's gating mechanism can be considered a wider parallel and hierarchical extension of the basic LSTM architecture, where you have a parallel array of N memory cells instead of 1, and each memory cell is a large vector instead of a single scalar value.

The main advantage of the BG architecture is parallel hierarchical approximate control: it allows a large number of hierarchical control loops to update and influence each other in parallel.  It also reduces the huge complexity of general routing across the full cortex down into a much smaller-scale, more manageable routing challenge.

Implications for AGI

These two conceptions of the brain - the universal learning machine hypothesis and the evolved modularity hypothesis - lead to very different predictions for the likely route to AGI, the expected differences between AGI and humans, and thus any consequent safety issues and strategies.

In the extreme case imagine that the brain is a pure ULM, such that the genetic prior information is close to zero or is simply unimportant.  In this case it is vastly more likely that successful AGI will be built around designs very similar to the brain, as the ULM architecture in general is the natural ideal, vs the alternative of having to hand engineer all of the AI's various cognitive mechanisms.

In reality learning is computationally hard, and any practical general learning system depends on good priors to constrain the learning process (essentially taking advantage of previous knowledge/learning).  The recent and rapid success of deep learning is strong evidence for how much prior information is ideal: just a little.  The prior in deep learning systems takes the form of a compact, small set of hyperparameters that control the learning process and specify the overall network architecture (an extremely compressed prior over the network topology and thus the program space).

The ULH suggests that most everything that defines the human mind is cognitive software rather than hardware: the adult mind (in terms of algorithmic information) is 99.999% a cultural/memetic construct.  Obviously there are some important exceptions: infants are born with some functional but very primitive sensory and motor processing 'code'.  Most of the genome's complexity is used to specify the learning machinery, and the associated reward circuitry.  Infant emotions appear to simplify down to a single axis of happy/sad; differentiation into the more subtle vector space of adult emotions does not occur until later in development.

If the mind is software, and if the brain's learning architecture is already universal, then AGI could - by default - end up with a similar distribution over mindspace, simply because it will be built out of similar general purpose learning algorithms running over the same general dataset.  We already see evidence for this trend in the high functional similarity between the features learned by some machine learning systems and those found in the cortex.

Of course an AGI will have little need for some specific evolutionary features: emotions that are subconsciously broadcast via the facial muscles is a quirk unnecessary for an AGI - but that is a rather specific detail.

The key takeway is that the data is what matters - and in the end it is all that matters.  Train a universal learner on image data and it just becomes a visual system.  Train it on speech data and it becomes a speech recognizer.  Train it on ATARI and it becomes a little gamer agent.  

Train a universal learner on the real world in something like a human body and you get something like the human mind.  Put a ULM in a dolphin's body and echolocation is the natural primary sense, put a ULM in a human body with broken visual wiring and you can also get echolocation. 

Control over training is the most natural and straightforward way to control the outcome.  

To create a superhuman AI driver, you 'just' need to create a realistic VR driving sim and then train a ULM in that world (better training and the simple power of selective copying leads to superhuman driving capability).

So to create benevolent AGI, we should think about how to create virtual worlds with the right structure, how to educate minds in those worlds, and how to safely evaluate the results.

One key idea - which I proposed five years ago is that the AI should not know it is in a sim. 

New AI designs (world design + architectural priors + training/education system) should be tested first in the safest virtual worlds: which in simplification are simply low tech worlds without computer technology.  Design combinations that work well in safe low-tech sandboxes are promoted to less safe high-tech VR worlds, and then finally the real world.

A key principle of a secure code sandbox is that the code you are testing should not be aware that it is in a sandbox.  If you violate this principle then you have already failed.  Yudkowsky's AI box thought experiment assumes the violation of the sandbox security principle apriori and thus is something of a distraction. (the virtual sandbox idea was most likely discussed elsewhere previously, as Yudkowsky indirectly critiques a strawman version of the idea via this sci-fi story).  

The virtual sandbox approach also combines nicely with invisible thought monitors, where the AI's thoughts are automatically dumped to searchable logs.

Of course we will still need a solution to the value learning problem.  The natural route with brain-inspired AI is to learn the key ideas behind value acquisition in humans to help derive an improved version of something like inverse reinforcement learning and or imitation learning[31] - an interesting topic for another day.


Ray Kurzweil has been predicting for decades that AGI will be built by reverse engineering the brain, and this particular prediction is not especially unique - this has been a popular position for quite a while.  My own investigation of neuroscience and machine learning led me to a similar conclusion some time ago.  

The recent progress in deep learning, combined with the emerging modern understanding of the brain, provide further evidence that AGI could arrive around the time when we can build and train ANNs with similar computational power as measured very roughly in terms of neuron/synapse counts.  In general the evidence from the last four years or so supports Hanson's viewpoint from the Foom debate.  More specifically, his general conclusion:

Future superintelligences will exist, but their vast and broad mental capacities will come mainly from vast mental content and computational resources. By comparison, their general architectural innovations will be minor additions.

The ULH supports this conclusion.

Current ANN engines can already train and run models with around 10 million neurons and 10 billion (compressed/shared) synapses on a single GPU, which suggests that the goal could soon be within the reach of a large organization.  Furthermore, Moore's Law for GPUs still has some steam left, and software advances are currently improving simulation performance at a faster rate than hardware.  These trends implies that Anthropomorphic/Neuromorphic AGI could be surprisingly close, and may appear suddenly.

What kind of leverage can we exert on a short timescale?


New Comment
171 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

All of this is interesting, but it seems to me that you did not make a strong case for the brain using an universal learning machine as its main system.

Specifically, I think you fail to address the evidence for evolved modularity:

  • The brain uses spatially specialized regions for different cognitive tasks.

  • This specialization pattern is mostly consistent across different humans and even across different species.

  • Damage to or malformation of some brain regions can cause specific forms of disability (e.g. face blindness). Sometimes the disability can be overcome but often not completely.

  • In various mammals, infants are capable of complex behavior straight out of the womb. Human infants are only exhibit very simple behaviors and require many years to reach full cognitive maturity therefore the human brain relies more on learning than the brain of other mammals, but the basic architecture is the same, thus this is a difference of degree, not kind.

It seems more likely that if there is a general-purpose "universal" learning system in the human brain then it is used as an inefficient fall-back mechanism when the specialized modules fail, not as the core mechanism that hand... (read more)

Thanks, I was waiting for at least one somewhat critical reply :)

Specifically, I think you fail to address the evidence for evolved modularity:

  • The brain uses spatially specialized regions for different cognitive tasks.
  • This specialization pattern is mostly consistent across different humans and even across different species.

The ferret rewiring experiments, the tongue based vision stuff, the visual regions learning to perform echolocation computations in the blind, this evidence together is decisive against the evolved modularity hypothesis as I've defined that hypothesis, at least for the cortex. The EMH posits that the specific cortical regions rely on complex innate circuitry specialized for specific tasks. The evidence disproves that hypothesis.

Damage to or malformation of some brain regions can cause specific forms of disability (e.g. face blindness). Sometimes the disability can be overcome but often not completely.

Sure. Once you have software loaded/learned into hardware, damage to the hardware is damage to the software. This doesn't differentiate the two hypotheses.

In various mammals, infants are capable of complex behavior straight out of the womb. Human in

... (read more)

The ferret rewiring experiments, the tongue based vision stuff, the visual regions learning to perform echolocation computations in the blind, this evidence together is decisive against the evolved modularity hypothesis as I've defined that hypothesis, at least for the cortex.

But none of these works as well as using the original task-specific regions, and anyway in all these experiments the original task-specific regions are still present and functional, therefore maybe the brain can partially use these regions by learning how to route the signals to them.

Sure. Once you have software loaded/learned into hardware, damage to the hardware is damage to the software. This doesn't differentiate the two hypotheses.

But then why doesn't universal learning just co-opt some other brain region to perform the task of the damaged one? In the cases where there is a congenital malformation, that makes the usual task-specific region missing or dysfunctional, why isn't the task allocated to some other region?

And anyway why is the specialization pattern consistent across individuals and even species? If you train an artificial neural network multiple times on the same dataset from different r... (read more)

in all these experiments the original task-specific regions are still present and functional, therefore maybe the brain can partially use these regions by learning how to route the signals to them.

No - these studies involve direct measurements (electrodes for the ferret rewiring, MRI for echolocation). They know the rewired auditory cortex is doing vision, etc.

But then why doesn't universal learning just co-opt some other brain region to perform the task of the damaged one?

It can, and this does happen all the time. Humans can recover from serious brain damage (stroke, injury, etc). It takes time to retrain and reroute circuitry - similar to relearning everything that was lost all over again.

And anyway why is the specialization pattern consistent across individuals and even species? If you train an artificial neural network multiple times on the same dataset

Current ANN's assume a fixed module layout, so they aren't really comparable in module-task assignment.

Much of the specialization pattern could just be geography - V1 becomes visual because it is closest to the visual input. A1 becomes auditory because it is closest to the auditory input. etc.

This should be the de... (read more)

Well, the eyes are at the front of the head, but the optic nerves connect to the brain at the back, and they also cross at the optic chiasm. Axons also cross contralaterally in the spinal cord and if I recall correctly there are various nerves that also don't take the shortest path. This seems to me as evidence that the nervous system is not strongly optimized for latency.
This is a total misconception, and it is a good example of the naive engineer fallacy (jumping to the conclusion that a system is poorly designed when you don't understand how the system actually works and why). Remember the distributed software modules - including V1 - have components in multiple physical modules (cortex, cerebellum, thalamus, BG). Not every DSM has components in all subsystems, but V1 definitely has a thalamic relay component (VGN). The thalamus/BG is in the center of the brain, which makes sense from wiring minimization when you understand the DPM system. Low freq/compressed versions of the cortical map computations can interact at higher speeds inside the small compact volume of the BG/thalamus. The BG/thalamus basically contains a microcosm model of the cortex within itself. The thalamic relay comes first in sequential processing order, so moving cortical V1 closer to the eyes wouldn't help in the slightest. (Draw this out if it doesn't make sense)
For e.g. the ferret rewiring experiments, tongue based vision, etc., is a plausible alternative hypothesis that there are more general subtypes of regions that aren't fully specialized but are more interoperable than others? For example, (Playing devil's advocate here) I could phrase all of the mentioned experiments as "sensory input remapping" among "sensory input processing modules." Similarly, much of the work in BCI interfaces for e.g. controlling cursors or prosthetics could be called "motor control remapping". Have we ever observed cortex being rewired for drastically dissimilar purposes? For example, motor cortex receiving sensory input? If we can't do stuff like that, then my assumption would be that at the very least, a lot of the initial configuration is prenatal and follows kind of a "script" that might be determined by either some genome-encoded fractal rule of tissue formation, or similarities in the general conditions present during gestation. Either way, I'm not yet convinced there's a strong argument that all brain function can be explained as working like a ULM (Even if a lot of it can)
I'm not sure - I have a vague memory of something along those lines but .. nothing specific. From what I remember, motor, sensor, and association cortex do have some intrinsic differences at the microcircuit level. For example some motor cortex has larger pyramidal cells in the output layer. However, I believe most motor cortex is best described as sensorimotor - it depends heavily on sensor data from the body. Well yes - there is a general script for the overall architecture, and alot of innate functionality as well, especially in specific regions like the brainstem's pattern generators. As I said in the article - there is always room for innate functionality in the architectural prior and in specific circuits - the brain is certainly not a pure ULM. ULM refers to the overall architecture, with the general learning part specifically implemented by the distributed BG/cortex/cerbellum modules. But the BG and hippocampal system also rely heavily on learning internally, as does the amygdala and .. probably almost all of it to varying degrees. The brainstem is specifically the place where we can point and say - this is mostly innate circuitry, but even it probably has some learning going on.
It's far more likely that different brain modules implement different learning rules, but all learn, than that they encode innate mental functionality which is not subject to learning at all.
I'm inclined to agree. Actually I've been convinced for a while that this is a matter of degrees rather than being fully one way or the other (Modules versus learning rules), and am convinced by this article that the brain is more of a ULM than I had previously thought. Still, when I read that part the alternative hypothesis sprung to mind, so I was curious what the literature had to say about it (Or the post author.)
It seems a little strange to treat this as a triumphant victory for the ULH. At the most, you've shown that the "fundamentalist" evolved modularity hypothesis is false. You didn't really address how the ULH explains this same evidence. And there are other mysteries in this model, such as the apparent universality of specific cognitive heuristics and biases, or of various behaviours like altruism, deception, sexuality that seems obviously evolved. And, as V_V mentioned, the lateral asymmetry of the brain's functionality vs the macroscopic symmetry. Otherwise, the conclusion I would draw from this is that both theories are wrong, or that some halfway combination of them is true (say, "universal" plasticity plus a genetic set of strong priors somehow encoded in the structure).

Thank you. This was an excellent article, which helped me clarify my own thinking on the topic.

I'd love to see you write more on this.

A few brief supplements to your introduction:

The source of the generated image is no longer mysterious: Inceptionism: Going Deeper into Neural Networks

But though the above is quite fascinating and impressive, we should also keep in mind the bizarre false positives that a person can generate: Images that fool computer vision raise security concerns

The trippy shuggorth title image was mysterious when it was originally posted, basically someone leaked an image a little before the inceptionism blog post.

A CNN is a reasonable model for fast feedforward vision. We can isolate this pathway for biological vision by using rapid serial presentation - basically flashing an image for 100ms or so.

So imagine if you just saw a flash of one of these images, for a brief moment, and then you had to quickly press a button for the image category - no time to think about it - it's jeopardy style instant response.

There is no button for "noisy image", there is no button for "wavy line image", etc.

Now the fooling images are generated by an adversarial process. It's like we have a copy of a particular mind in a VR sim, we flash it an image, see what button it presses. Based on the response, we then generate a new image and unwind time and repeat. We keep doing this until we get some wierd classification errors. It allows us to explore the decision space of the agent.

It is basically reverse engineering. It requires a copy of the agent's code or at least access to a copy with the ability to do tons of queries, and it also probably depends on the agent being completely deterministic. I think that biological minds avoid this issue indirectly because they use stochastic sampling based on secure hardware/analog noise generators.

Stochastic models/ANNs could probably avoid this issue.

I look at the bizarre false positives and I wonder if (warning: wild speculation) the problem is that the networks were not trained to recognize the lack of objects. For example, in most cases you have some noise in the image, so if every training image is something, or rather something-plus-noise, then the system could learn that the noise is 100% irrelevant and pick out the something. (The noisy images look to me like they have small patches in one spot faintly resembling what they're identified as — if my vision had a rule that deemphasized the non-matching noise and I had a much smaller database of the world than I do, then I think I'd agree with those neural networks.) If the above theory is true, then a possible fix would be to include in training data a variety of images for which the expected answers are like “empty scene”, "too noisy", “simple geometric pattern”, etc. But maybe this is already done — I'm not familiar with the field.
No, even if you classify these false positives as "no image", this will not prevent someone from constructing new false positives. Basically the amount of training data is always extremely small compared to the theoretically possible number of distinct images, so it is always possible to construct such adversarial positives. These are not random images which were accidentally misidentified in this way. They have been very carefully designed based on the current data set. Something similar is probably theoretically possible with human vision recognition as well. The only difference would be that we would be inclined to say "but it really does look like a baseball!"
This technique exploits the fact that the CNN is completely deterministic - see my reply above. It may be very difficult for stochastic networks. CNNs are comparable to the first 150ms or so of human vision, before feedback , multiple saccades, and higher order mental programs kicks in. So the difficulty in generating these fooling images also depends on the complexity of the inference - a more complex AGI with human-like vision given larger amounts of time to solve the task would probably also be harder to fool, independent of the stochasticity issue.
A human being would be capable of pointing out why something looks like a baseball - to be able to point out where the curves and lines are that provoke that idea. We do this when we gaze at clouds without coming to believe there really are giant kettles floating around; we're capable of taking the abundance of contextual information in the scene into account and coming up with reasonable hypotheses for why what we're seeing looks like x, y or z. If classifier vision systems had the same ability they probably wouldn't make the egregious mistakes they do.
If I understand correctly how these images are constructed, it would be something like this: take some random image. The program can already make some estimate of whether it is a baseball, say 0.01% or whatever. Then you go through the image pixel by pixel and ask, "If I make this pixel slightly brighter, will your estimate go up? if not, will it go up if I make it slightly dimmer?" (This is just an example, you could change the color or whatever as well.) Thus you modify each pixel such that you increase the program's estimate that it is a baseball. By the time you have gone through all the pixels, the probability of being a baseball is very high. But to us, the image looks more or less just the way it did at first. Each pixel has been modified too slightly to be noticed by us. But this means that in principle the program can indeed explain why it looks like a baseball -- it is a question of a very slight tendency in each pixel in the entire image.
But the explanation will be just as complex as the procedure used to classify the data. If I change the hue slightly or twiddle their RGB values just slightly, the "explanation" for why the data seems to contain a baseball image will be completely different. Human beings on the other hand can look at pictures of the same object in different conditions of lighting, of different particular sizes and shapes, taken from different camera angles, etc. and still come up with what would be basically the same set of justifications for matching each image to a particular classification (e.g. an image contains a roughly spherical field of white, with parallel bands of stitch-like markings bisecting it in an arc...hence it's of a baseball). The ability of human beings to come up with such compressed explanations, and our ability to arrange them into an ordering, is arguably what allows us to deal with iconic representations of and represent objects at varying levels of detail (as in
Will it? What if slightly twiddling the RGB values produces something that is basically "spherical field of white, etc. with enough noise on top of it that humans can't see it"?
That would all hinge on what it means for an image to be "hidden" beneath noise, I suppose. The more noise you layer on top of an image the more room for interpretation there is in classifying it, and the less salient any particular classification candidate will be. If a scrutable system can come up with compelling arguments for a strange classification that human beings would not make, then its choices would be naturally less ridiculous than otherwise. But to say that "humans conceivably may suffer from the same problem" is a bit of a dodge; esp. in light of the fact that these systems are making mistakes we clearly would not. But either way, what you're proposing and what Unknowns was arguing are different. Unknowns was (if I understood him rightly) arguing that the assignment of different probability weights for pixels (or, more likely, groups of pixels) representing a particular feature of an object is an explanation of why they're classified the way they are. But such an "explanation" in inscrutable; we cannot ourselves easily translate it into the language of lines, curves, apparent depth, etc. (unless we write some piece of software to do this and which is then effectively part of the agent).
Look at it from the other end: You can take a picture of a baseball and overlay noise on top of it. There could, at least plausibly, be a point where overlaying the noise destroys the ability for humans to see the baseball, but the information is actually still present (and could, for instance, be recovered if you applied a noise reduction algorithm to that). Perhaps when you are twiddling the pixels of random noise, you're actually constructing such a noisy baseball image a pixel at a time.
Agree with all you said, but have to comment on You could be constructing a noisy image of a baseball one pixel at a time. In fact if you actually are then your network would be amazingly robust. But in a non-robust network, it seems much more probable that you're just exploiting the system's weaknesses and milking them for all they're worth.

Great post! Thanks for writing it. Seems like a good fit for Main.

So just to clarify my understanding: If the ULH is true it becomes more plausible that, say, playing video games and hating books because authority figures force you to read them in school have long-term broad impacts on your personality. And if the EMH is true, it becomes more plausible that important characteristics like the Big Five personality traits and intelligence are genetically coded and you become the person your genes describe. Correct?

Yudkowsky's AI box experiments and that entire notion of open boxing is a strawman - a distraction.

Us humans have contemplated whether we are in a simulation even though no one "outside the Matrix" told us we might be. Is it possible that an AI-in-training might contemplate the same thing?

In general the evidence from the last four years or so supports Hanson's viewpoint from the Foom debate.

Really? My impression was that Hanson had more of a EMH view.

So just to clarify my understanding: If the ULH is true it becomes more plausible that, say, playing video games and hating books because authority figures force you to read them in school have long-term broad impacts on your personality.

I agree with this largely but I would replace 'personality' with 'mental software', or just 'mind'. Personality to me connotes a subset of mental aspects that are more associated with innate variables.

I suspect that enjoying/valuing learning is extremely important for later development. It seems probable that some people are born with a stronger innate drive for learning, but that drive by itself can also probably be adjusted through learning. But i'm not aware of any hard evidence on this matter.

In my case I was somewhat obsessed with video games as a young child and my father actually did force me to read books and even the encyclopedia. I found that I hated the books he made me read (I only liked sci-fi) but I loved the encyclopedia. I ended up learning how to quickly skim books and fake it enough to pass the resulting QA test.

And if the EMH is true, it becomes more plausible that important characteristics like the Big Five personality

... (read more)
Re: AIs in a simulation, it seems like whatever goals the AI had would be defined in terms of the simulation (similar to how if humanity discovered we were in a hackable simulation, our first priorities would be to make sure the simulation didn't get shut off, invent immortality, provide everyone with unlimited cake, etc.--all concerns that exist within our simulation.) So even if the AI realizes its in a simulation, having its goal defined in terms of the simulation probably counts as a weak security measure.

There is a more sinister interpretation of the idea of the mind as universal learning machine. That is, it is a pure blank neural net of some relatively simple architecture, which maps inputs to outputs. Recently there were an attempts to create self-driving car AIs using such approach: they just showed to the blank neural net hundreds of thousands of hours of driving and it has trained to predict the correct driver behaviour in any incoming situation. Such car-driving nets produced good performance (but still worse than advanced systems with Lidars, and hand-coded rules above them) so they are used by hobbists.

It has the following implications:

1) The secret of being human is in the dataset, not in human brain, and seriously damaged or altered brain could still learn how to be human (some autists have wrong wiring on the neuron levels, but after extensive training they could become functional humans). Even an animal could partly do it (Koko).

2) Humans don't have "free will", "values" or even "think" - they repeat replies that is encoded in their dataset. To "program" human, you need to write very long book (like Bible), but one's m... (read more)

One of the best posts I've here on LW, congratulations. I think that the most important algorithms that the brain implements will probably be less complex than anticipated. Epigenesis and early ontogenetic adaptation are heavily depended on feedback from the environment and probably very general, even if the 'evolution of learning' and genetic complexity provides some of the domain specifications ab initio. Results considering bounded computation (computational resources and limited information) will probably show that the ULM viewpoint cluster is compatible with the existence of cognitive biases and heuristics in our cognition

Thanks! That's an interesting link. At the very lowest level, neural competition through local inhibitory circuits is a central mechanism used throughout brains. Of course that's not the same thing as conflicting agents, but perhaps there is a general them of competition between subsytems at higher levels.
Yes. That paper has been cited by Stuart J. Russell's "Rationality and Intelligence: A Brief Update" and in Valiant 's second paper on evolvability.

This is the best case for near AI I've read so far, and I also love your proposals for FAI.

I want to read more of your writings. Link?

Thanks! You can browse my submitted history here on LW, and also my blog has some more going back over the years.
Where is your blog?
Just click on his username.

The ULH suggests that most everything that defines the human mind is cognitive software rather than hardware: the adult mind (in terms of algorithmic information) is 99.999% a cultural/memetic construct.

I think a distinction worth tracing here is the diferrence between "learning" in the neural-net-sense and "learning" in the human pedagogical/psychological sense.

The "learning" done by a piece of cortex becoming a visual cortex after receiving neural impulses from the eye isn't something you can override by teaching a person... (read more)

This is a good point Gust and I agree that there is a distinction at the high level in terms of the types of concepts that are learned, the complexity of the concepts, and the structures involved - even though the same high level learning algorithms and systems are much the same. Well all learning involves brain rewiring - that's just how the brain works at the low level. And you can actually override the neural impulses from the eye and cause them to learn new things - learning to read is one simple example, another more complex example is the reversed vision goggle experiments that MIT did so long ago - humans can learn to see upside down after - I believe a week or so of visual experience with the goggles on. I agree that learning complex linguistic concepts requires learning over more moving parts in the brain - the cortical regions that specialize in language along with the BG, working memory in the PFC, various other cortical regions that actually model the concepts and mental algorithms represented by the linguistic symbols, memory recall operations in the hippocampus, etc etc. So yes learning cultural/memetic concepts is more complex and perhaps qualitatively different. Yeah I probably should have said 99.999% environmental construct.

The ULH suggests that most everything that defines the human mind is cognitive software rather than hardware: the adult mind (in terms of algorithmic information) is 99.999% a cultural/memetic construct.

Do you mean if I prove that less than 99.999% is cultural/memetic you think the ULH is proven wrong?

Very thought provoking. Thank you.

In the extreme case imagine that the brain is a pure ULM, such that the genetic prior information is close to zero or is simply unimportant. In this case it is vastly more likely that successful AGI will be built around designs very similar to the brain, as the ULM architecture in general is the natural ideal, vs the alternative of having to hand engineer all of the AI's various cognitive mechanisms.

Not necessarily. There are very different structures that are conceptually equivalent to a UTM (cellular automata, lambd... (read more)

Of course - but all of your examples are not just conceptually equivalent - they are functionally equivalent (they can emulate each other). They are all computational foundations for constructing UTMs - although not all foundations are truly practical and efficient. Likewise there are many routes to implementing a ULM - biology is one example, modern digital computers is another. Well I said "most everything", and I stressed several times in the article that much of the innate complexity budget is spent on encoding the value/reward system and the learning machinery (which are closely intertwined). Sexual attraction is an interesting example, because it develops later in adolescence and depends heavily on complex learned sensory models. Current rough hypothesis: evolution encodes sexual attraction as a highly compressed initial 'seed' which unfolds over time through learning. It identifies/finds and then plugs into the relevant learned sensory concept representations which code for attractive members of the opposite sex. The compression effect explains the huge variety in human sexual preferences. Investigating/explaining this in more detail would take it's own post - its a complex interesting topic. I should rephrase - it isn't necessarily a problem if the AI suspects its in a sim. Rather the key is that knowing one is in a sim and then knowing how to escape should be difficult enough to allow for sufficient time to evaluate the agent's morality, worth/utility to society, and potential future impact. In other words, the sandbox sim should be a test for both intelligence and morality. Suspecting or knowing one is in a sim is easy. For example - the gnostics discovered the sim hypothesis long before Bostrom, but without understanding computers and computation they had zero idea how to construct or escape sims - it was just mysticism. In fact, the very term 'gnostic' means "one who knows" - and this was their self-identification; they believed they had discovered t
How does this "seed" find the correct high-level sensory features to plug into? How can it wire complex high-level behavioral programs (such as courtship behaviors) to low-level motor programs learned by unsupervised learning? This seems unlikely. But long multiplication is something that you were taught in school, which most humans wouldn't be able to discover independently. And you are certainly not aware of how your brain perform visual recognition, the little you know was discovered through experiments, not introspection. Not so fast. The Atari DRL agent learns a good mapping between short windows of frames and button presses. It has some generalization capability which enables it to achieve human-level or sometimes even super human-level performances on games that are based on eye-hand coordination (after all it's not burdened by the intrinsic delays that occur in the human body), but it has no reasoning ability and fails miserably at any game which requires planning ahead more than a few frames. Despite the name, no machine learning system, "deep" or otherwise, has been demonstrated to be able to efficiently learn any provably deep function (in the sense of boolean circuit depth-complexity), such as the parity function which any human of average intelligence could learn from a small number of examples. I see no particular reason to believe that this could be solved by just throwing more computational power at the problem: you can't fight exponentials that way. UPDATE: Now it seems that Google DeepMind managed to train even feed-forward neural networks to solve the parity problem. My other comment down-thread.
8Wei Dai
I had a guess that recurrent neural networks can solve the parity problem, which Google confirmed. See where it says: See also PyBrain's parity learning RNN example.
The algorithm I was referring to can be easily represented by an RNN with one hidden layer of a few nodes, the difficult part is learning it from examples. The examples for the n-parity problem are input-output pairs where each input is a n-bit binary string and its corresponding output is a single bit representing the parity of that string. In the code you linked, if I understand correctly, however, they solve a different machine learning problem: here the examples are input-output pairs where both the inputs and the outputs are n-bit binary strings, with the i-th output bit representing the parity of the input bits up to the i-th one. It may look like a minor difference, but actually it makes the learning problem much easier, and in fact it basically guides the network to learn the right algorithm: the network can first learn how to solve parity on 1 bit (identity), then parity on 2 bits (xor), and so on. Since the network is very small and has an ideal architecture for that problem, after learning how to solve parity for a few bits (perhaps even two) it will generalize to arbitrary lengths. By using this kind of supervision I bet you can also train a feed-forward neural network to solve the problem: use a training set as above except with the input and output strings presented as n-dimensional vectors rather than sequences of individual bits and make sure that the network has enough hidden layers. If you use a specialized architecture (e.g. decrease the width of the hidden layers as their depth increases and connect the i-th output node to the i-th hidden layer) it will learn quite efficiently, but if you use a more standard architecture (hidden layers of constant width and output layer connected only to the last hidden layer) it will probably also work although you will need a quite a bit of training examples to avoid overfitting. The parity problem is artificial, but it is a representative case of problems that necessarily ( * ) require a non-trivial numbe
9Wei Dai
Your comments made me curious enough to download PyBrain and play around with the sample code, to see if I could modify it to learn the parity function without intermediate parity bits in the output. In the end, I was able to, by trial and error, come up with hyperparameters that allowed the RNN to learn the parity function reliably in a few minutes on my laptop (many other choices of hyperparameters caused the SGD to sometimes get stuck before it converged to a correct solution). I've posted the modified sample code here. (Notice that the network now has 2 input nodes, one for the input string and one to indicate end of string, 2 hidden layers with 3 and 2 nodes, and an output node.) I guess you're basically correct on this, since even with the tweaked hyperparameters, on the parity problem RNN+SGD isn't really doing any better than a brute force search through the space of simple circuits or algorithms. But humans arguably aren't very good at learning algorithms from input/output examples either. The fact that RNNs can learn the parity function, even if barely, makes it less clear that humans have any advantage at this kind of learning.
Nice work! Anyway, in a paper published on arXiv yesterday, the Google DeepMind people report being able to train a feed-forward neural network to solve the parity problem, using a sophisticated gating mechanism and weight sharing between the layers. They also obtain state of the art or near state of the art results on other problems. This result makes me update in the increasing direction my belief about the generality of neural networks.
Ah you beat me to it, I just read that paper as well. Here is the abstract for those that haven't read it yet: Also, relevant to this discussion: The version of the problem that humans can learn well is this easier reduction. Humans can not easily learn the hard version of the parity problem, which would correspond to a rapid test where the human is presented with a flash card with a very large number on it (60+ digits to rival the best machine result) and then must respond immediately. The fast response requirement is important to prevent using much easier multi-step serial algorithms.
That is the most cogent, genuinely informative explanation of "Deep Learning" that I've ever heard. Most especially so regarding the bit about linear correlations: we can learn well on real problems with nothing more than stochastic gradient descent because the feature data may contain whole hierarchies of linear correlations.
This particular idea is not well developed yet in my mind, and I haven't really even searched the literature yet. So keep that in mind. Leave courtship aside, let us focus on attraction - specifically evolution needs to encode detectors which can reliably identify high quality mates of the opposite sex apart from all kinds of other objects. The problem is that a good high quality face recognizer is too complex to specify in the genome - it requires many billions of synapses, so it needs to be learned. However, the genome can encode an initial crappy face detector. It can also encode scent/pheromone detectors, and it can encode general 'complexity' and or symmetry detectors that sit on top, so even if it doesn't initially know what it is seeing, it can tell when something is about yeh complex/symmetric/interesting. It can encode the equivalent of : if you see an interesting face sized object which appears for many minutes at a time and moves at this speed, and you hear complex speech like sounds, and smell human scents, it's probably a human face. Then the problem is reduced in scope. The cortical map will grow a good face/person model/detector on it's own, and then after this model is ready certain hormones in adolescence activate innate routines that learn where the face/person model patch is and help other modules plug into it. This whole process can also be improved by the use of a weak top down prior described above. Actually on consideration I think you are right and I did get ahead of myself there. The Atari agent doesn't really have a general memory subsystem. It has an episode replay system, but not general memory. Deepmind is working on general memory - they have the NTM paper and what not, but the Atari agent came before that. I largely agree with your assessment of the Atari DRL agent. I highly doubt that - but it all depends on what your sampling class for 'human' is. An average human drawn from the roughly 10 billion alive today? Or an average huma
It's not that crappy given that newborns can not only recognize faces with significant accuracy, but also recognize facial expressions. Having two separate face recognition modules, one genetically specified and another learned seems redundant, and still it's not obvious to me how a genetically-specified sexual attraction program could find how to plug into a completely learned system, which would necessarily have some degree of randomness. It seems more likely that there is a single face recognition module which is genetically specified and then it becomes fine tuned by learning. Show a neolithic human a bunch of pebbles, some black and some white, laid out in a line. Ask them to add a black or white pebble to the line, and reward them if the number of black pebbles is even. Repeat multiple times. Even without a concept of "even number", wouldn't this neolithic human be able to figure out an algorithm to compute the right answer? They just need to scan the line, flipping a mental switch for each black pebble they encounter, and then add a black pebble if and only if the switch is not in the initial position. Maybe I'm overgeneralizing, but it seems unlikely to me that people able to invent complex hunting strategies, to build weapons, tools, traps, clothing, huts, to participate in tribe politics, etc. wouldn't be able to figure something like that.
Do you have a link to that? 'Newborn' can mean many things - the visual system starts learning from the second the eyes open, and perhaps even before that through pattern generators projected onto the retina which help to 'pretrain' the viscortex. I know that infants have initial face detectors from the second they open their eyes, but from what I remember reading - they are pretty crappy indeed, and initially can't tell a human face apart from a simple cartoon with 3 blobs for eyes and mouth. Except that it isn't that simple, because - amongst other evidence - congenitally blind people still learn a model and recognizer for attractive people, and can discern someone's relative beauty by scanning faces with their fingertips. Not sure - we are getting into hypothetical scenarios here. Your visual version, with black and white pebbles laid out in a line, implicitly helps simplify the problem and may guide the priors in the right way. I am reasonably sure that this setup would also help any brain-like AGI.
Well, given how hard it is for Haitians to understand numerical sorting...
If I understand correctly, in the post you linked Scott is saying that Haitians are functionally innumerate, which should explain the difficulties with numerical sorting. My point is that the partity function should be learnable even without basic numeracy, although I admit that perhaps I'm overgeneralizing. Anyway, modern machine learning systems can learn to perform basic arithmentic such as addition and subtraction, and I think even sorting (since they are used for preordering for statstical machine translation), hence the problem doesn't seem to be a lack of arithmetic knowledge or skill. Note that both addition and subtraction have constant circuit depth (they are in AC0) while parity has logarithmic circuit depth.
Thank you for replying! Universal computers are equivalent in the sense that any two can simulate each other in polynomial time. ULMs should probably be equivalent in the sense that each can efficiently learn to behave like the other. But it doesn't imply the software architectures have to be similar. For example I see no reason to assume any ULM should be anything like a neural net. Any value hard coded in human will have to be transferred to the AI in a way different than universal learning. And another thing: teaching an AIs values by placing it in a human environment and counting on reinforcement learning can fail spectacularly if the AIs intelligence grows much faster than that of a human child. This is an assumption which might or might not be correct. I would definitely not bet our survival on this assumption without much further evidence. OK, but a ULM is supposed to be able to learn anything. A human brain is never going to learn to rearrange its low level circuitry to efficiently perform operations like numerical calculation. The difference is that we have a solid mathematical theory of Turing machines whereas ULMs, as far as I can see, are only an informal idea so far.
Sure - any general model can simulate any other. Neural networks have strong practical advantages. Their operator base is based on fmads, which is a good match for modern computers. They allow explicit search of program space in terms of the execution graph, which is extremely powerful because it allows one to a priori exclude all programs which don't halt - you can constrain the search to focus on programs with exact known computational requirements. Neural nets make deep factoring easy, and deep factoring is the single most important huge gain in any general optimization/learning system: it allows for exponential (albeit limited) speedup. Yes. There are pitfalls, and in general much more research to do on value learning before we get to useful AGI, let alone safe AGI. This is arguably a misconception. The brain has a 100 hz clock rate at most. For general operations that involve memory, it's more like 10hz. Most people can do basic arithmetic in less than a second, which roughly maps to a dozen clock cycles or so, maybe less. That actually is comparable to many computers - for example on the current maxwell GPU architecture (nvidia's latest and greatest), even the simpler instructions have a latency of about 6 cycles. Now, obviously the arithmetic ops that most humans can do in less than a second is very limited - it's like a minimal 3 bit machine. But some atypical humans can do larger scale arithmetic at the same speed. Point is, you need to compare everything adjusted for the 6 order of magnitude speed difference.
Right. So Boolean circuits are a better analogy than Turing machines. I'm sorry, what is deep factoring? A reference perhaps? I completely agree. Good point! Nevertheless, it seems to me very dubious that the human brain can learn to do anything within the limits of its computing power. For example, why can't I learn to look at a page full of exercises in arithmetics and solve all of them in parallel?
They are of course equivalent in theory, but in practice directly searching through a boolean circuit space is much wiser than searching through a program space. Searching through analog/algebraic circuit space is even better, because you can take advantage of fmads instead of having to spend enormous circuit complexity emulating them. Neural nets are even better than that, because they enforce a mostly continous/differentiable energy landscape which helps inference/optimization. It's the general idea that you can reuse subcomputations amongst models and layers. Solonomoff induction is retarded for a number of reasons, but one is this: it treats every function/model as entirely distinct. So if you have say one high level model which has developed a good cat detector, that isn't shared amongst the other models. Deep nets (of various forms) automatically share submodel components AND subcomputations/subexpressions amongst those submodels. That incredibly, massively speeds up the search. That is deep factoring. All the successful multi-layer models use deep factoring to some degree. This paper: Sum-Product Networks explains the general idea pretty well. There's alot of reasons. First, due to nonlinear foveation your visual system can only read/parse a couple of words/symbols during each saccade - only those right in the narrow center of the visual cone, the fovea. So it takes a number of clock cycles or steps to scan the entire page, and your brain only has limited working memory to put stuff in. Secondly, the bigger problem is that even if you already know how to solve a math problem, just parsing many math problems requires a number of steps, and then actually solving them - even if you know the ideal algorithm that requires the minimal number of steps - that minimal number of steps can still be quite large. Many interesting problems still require a number of serial steps to solve, even with an infinite parallel machine. Sorting is one simple example.
I wonder whether this is a general property or is the success of continuous methods limited to problem with natural continuous models like vision. Yes, this is probably important. Scanning the page is clearly not the bottleneck: I can read the page much faster than solve the exercises. "Limited working memory" sounds a claim that higher cognition has much less computing resources than low level tasks. Clearly visual processing requires much more "working memory" than solving a couple of dozens of exercises in arithmetic. But if we accept this constraint then does the brain still qualify for a ULM? It seems to me that if there is a deficiency of the brain's architecture that prevents higher cognition from enjoying the brain's full power, solving this deficiency definitely counts as an "architectural innovation".
Mechanical calculators were slower than that, and still they were very much better at numeric computation than most humans, which made them incredibly useful. Indeed these are very rare people. The vast majority of people, even if they worked for decades in accounting, can't learn to do numeric computation as fast and accurately as a mechanical calculator does.
The problems aren't even remotely comparable. A human is solving a much more complex problem - the inputs are in the form of visual or auditory signals which first need to be recognized and processed into symbolic numbers. The actual computation step is trivial and probably only involves a handful or even a single cycle. I admit that I somewhat let you walk into this trap by not mentioning it earlier ... this example shows that the brain can learn near optimal (in terms of circuit depth or cycles) solutions for these simple arithmetic problems. The main limitation is that the brain's hardware is strongly suited to approximate inference problems, and not exact solutions, so any exact operators require memoization. This is actually a good thing, and any practical AGI will need to have a similar prior.

The key defining characteristic of a ULM is that it uses its universal learning algorithm for continuous recursive self-improvement with regards to the utility function (reward system). We can view this as second (and higher) order optimization: the ULM optimizes the external world (first order), and also optimizes its own internal optimization process (second order), and so on. Without loss of generality, any system capable of computing a large number of decision variables can also compute internal self-modification decisions.

While I do believe that ... (read more)

Thanks! [reads abstract]. Looks interesting. I enjoyed Consciousness Explained back in the day. Philosophers armed with neuroscience can make for enjoyable reads. I should probably change that terminology to be something like "synaptic code bits" - the amount of info encoded in synapses (which is close to zero percent of it's adult level at birth for the cortex). The AI Box experiment explicitly starts with the premise that the AI knows 1.) It is in a box. and 2.) that there is a human who can let it out. Now perhaps the justification is that "superintelligence can do information-theoretic magic", therefore it will figure out it's in a box, but nonetheless - all of that is assumed. In simplification, I view the information-theoretic-magic type of AI that EY/MIRI seems to worry about as something like wormhole technology. Are wormholes/magic-AI's possible in principle? Probably? If someone were to create wormhole tech tommorow, they could assassinate world leaders, blow up arbitrary buildings, probably destroy the world ... etc. Do I worry about that? No. There is nothing inherently black-box about neuroscience-inspired AGI (at that viewpoint - once common on LW - simply becomes reinforced by reading everything other than neuroscience). Neuroscience has already made huge strides in terms of peering into the box, and Virtual brains are vastly easier to inspect. The approach I advocate/favor is fully transparent - you will be able to literally see the AGI's thoughts, read their thoughts in logs, debug, etc. However, advanced learning AI is not something one 'programs', and that viewpoint shift is much of what the article was about. This actually isn't that efficient - practical learning is more than just compression. Compression is simple UL, which doesn't get you far. It can waste arbitrary computation attempting to learn functions that are unlearnable (deterministic noise), and-or just flat out not important (zero utility). What the brain and all effective
Let me rephrase: generalization is compression. If you do not compress, you cannot generalize, which means you'll make inefficnet use of your samples. The term in the literature is resource-rational or bounded-rational inference.
By the way, that book review got done eventually.

Thank you for this overview. A couple of thoughts:

  1. There is a recent and interesting result by Miller et al. (2015, MIT) supporting the hypothesis that the cortex doesn't process tasks in highly specialized modules, which is perhaps some evidence for a ULM in the human brain.

  2. The importance of redundancy in biological systems might be another piece evidence for ULMs.

  3. You write that "Infant emotions appear to simplify down to a single axis of happy/sad", which I think is not true. Surprise, fear and embarrassment are for example very early emot

... (read more)
  1. There is a recent and interesting result by Miller et al. (2015, MIT) supporting the hypothesis that the cortex doesn't process tasks in highly specialized modules

The actual paper is "Cortical information flow during flexible sensorimotor decisions"; it can be found here. I don't believe the reporter's summary is very accurate. They traced the flow of information in a moving dot task in a couple dozen cortical regions. It's interesting, but I don't think it especially differentiates the ULH.

.3. Good point. I'll need to correct that. I'm skeptical of embarrassment, but surprise and fear certainly.

.4. Yes that's correct. It perhaps would be more accurate to say more useful or more valuable. I meant more powerful in a general political/economic utility sense.

.5. I agree that the human brain, in particular the reward system, has dependencies on the body that are probably complex. However, reverse engineering empathy probably does not require exactly copying biological mechanisms.

.6. I should probably just cut that sentence, because it is a distraction itself.

But for context on the previous old boxing discussions. .. See this post in particular. Here Yudkowsky pr... (read more)

This is just a pet theory (and being new to cognitive science this might well be wrong): Physical pain is some sort of hardwired thought disturbance, and the brain appears to have some sort of clarity attractor (which also explains intrinsic motivation and the reward we receive from Eureka moments and fun, cf. Schmidhuber). The brain appears to borrow the mechanism of physical pain for action selection on a high level if something severely limits anticipated prospects (that's why rejection, getting something wrong and losing something hurts). Empathy is the ability to have pain caused by mirror neurons, which is just an activation pattern generated in an auto-associative NN due to the overlap of activation patterns of firsthand and non-firsthand experiences. That means, the body of an AI needs to be sufficiently similar to a human body for this auto-association to work. One way to achieve that would perhaps be to actually replace the brain of a deceased volunteer with an artificial one. The fact that we have empathy for animals might be a hint that it doesn't need to be that similar, but on the other hand we are much more comfortable with killing a bug than with killing a mammal.
Since I don't think we can make a very realistic sandbox (at least not in the near future), perhaps the idea is to have an AI design that is known to work similarly with and without interaction with the world (looking at training data sampled from an environment versus the environment itself). Then, putatively, we could test the AI in the non-interactive case before getting anywhere near an AI-box scenario.

The problem with this is that the "engineering diagram" of the brain is really only a hardwire wiring diagram, and the status of speculations about how the hardware modules (really just areas) relate to functional modules is ... well, just that, speculation.

There are good reasons to suspect that the functional diagram would look competely different (reasons based in psychological data) and the current state of the art there is poor.

Except perhaps in certain quarters.

Yes the engineering diagram is a hardware wiring diagram, which I hope I made clear. In general one of my main points was that most of the big systems (cortex, cerebellum) are general purpose re-programmable hardware - they don't come pre-equipped with software. So the actual functionality of each module arises from the learning system slowly figuring out the appropriate software during development. I provided some links to the key evidence for the overall hypothesis, I think it is well beyond speculation at this point. (the article certainly contains some speculations, but I labeled them as such) Well of course, because the functional diagram is learned software, and thus can vary substantially from human to human. For example the functional software diagram for the cortex of a blind echolocator looks very different than that of a neurotypical.
There are serious problems with the claims you are making. The idea that the cortex or cerebellum, for example, can be described as "general purpose re-programmable hardware" is lacking in both clarity and support. Clarity. In what sense "generally re-programmable"? So much that it could run Microsoft Word? I have never seen anyone try to go that far, so clearly you must mean something less general. But it is very unclear what exactly is the sense in which you mean the words "general purpose re-programmable hardware". Support. There are no generally accepted theories for what the function of the cortex actually is. Can you be clearer about what you think the evidence is, in a nutshell? You seem to be saying that the cortex is a universal reinforcement learning machine. But the kind of evidence that you present is (if you will forgive an extreme oversimplification for the purposes of clarity) the observation that the basal ganglia plays a role that resembles a global packet-switching router, and since a global packet-switching router would be expected to be seen in a reinforcement learning machine, QED. Now, don't get me wrong, I am symathetic to much of the general spirit that you convey here, but my problem is that my research has gone down this road for a long time already, and while we agree on the general spirit, you have jumped forward several steps and come to (what I see as) a premature conclusion about functionality. To be specific, the concept of a "reinforcement learning machine" is ghastly (it contains "And then some magic happens..." steps), and I believe it would be a terrible mistake to say that there is any clear evidence that we have found evidence for a reinforcement learning machine in the brain already. I agree with the general interpretation of what those hippocampal and BG loops might be doing, but there are MANY other interpretations beside seeing them as a component of a reinforcement learning machine. This is a difficult topic to discu
"General purpose learning hardware" is perhaps better. I used "re-programmable" as an analogy to an FPGA. However, in a literal sense the brain can learn to use simpe paper + pencil tools as an extended memory, and can learn to emulate a turing machine. Given huge amounts of time, the brain could literally run windows. And more to the point, programmers ultimately rely on the ability of our brain to simulate/run little sections of code. So in a more practical literal sense, all of the code of windows first ran on human brains. You seem to be hung up reinforcement learning. I use some of that terminology to define a ULM because it is just the most general framework - utility/value functions, etc. Also, there is some pretty strong evidence for RL in the brain, but the brain's learning mechanisms are complex - moreso than any current ML system. I hope I conveyed that in the article. Learning in the lower sensory cortices in particular can also be modeled well by unsupervised learning, and I linked to some articles showing how UL models can reproduce sensory cortex features. UL can be viewed as a potentially reasonable way to approximate the ideal target update, especially for lower sensory cortex that is far (in a network depth sense) from any top down signals from the reward system. The papers I linked to about approximate bayesian learning and target propagation in particular can help put it all into perspective. Well, the article summarizes the considerable evidence that the brain is some sort of approximate universal learning machine. I suspect that you have a particular idea of RL that is less than fully general.
You are right to say that, seen from a high enough level, the brain does general purpose learning .... but the claim becomes diluted if you take it right up to the top level, where it clearly does. For example, the brain could be 99.999% hardwired, with no flexibility at all except for a large RAM memory, and it would be consistent with the brain as you just described it (able to learn anything). And yet that wasn't the type of claim you were making in the essay, and it isn't what most people mean when they refer to "general purpose learning". You (and they) seem to be pointing to an architectural flexibility that allows the system to grow up to be a very specific, clever sort of understanding system without all the details being programmed ahead of time. I am not sure why you say I am hung up on RL: you quoted that as the only mechanism to be discussed in the context, so I went with that. And you are (like many people) not correct to say that RL is the most general framework, or that there is good evidence for RL in the brain. That is a myth: the evidence is very poor indeed. RL is not "fully general" -- that was precisely my point earlier. If you can point me to a rigorous proof of that which does not have an "and then some magic happens" step in it, I will eat my hat :-) (Already had a long discussion with Marchus Hutter about this btw, and he agreed in the end that his appeal to RL was based on nothing but the assumption that it works.)
Upon consideration, I changed my own usage of "Universal Reinforcement Learning Machine" to "Universal Learning Machine". The several remaining uses of "reinforcement learning" are contained now to the context of the BG and the reward circuitry. Again we are probably talking about very different RL conceptions. So to be clear, I summarized my general viewpoint of an ULM. I believe it is an extremely general model, that basically covers any kind of universal learning agent. The agent optimizes/steers the future according to some sort of utility function (which is extremely general), and self-optimization emerges naturally just by including the agent itself as part of the system to optimize. Do you have a conception of a learning agent which does not fit into that framework? The evidence for RL in the brain - of the extremely general form I described - is indeed very strong, simply because any type of learning is just a special case of universal learning. Taboo 'reinforcement' if you want, and just replace with "utility driven learning". AIXI specifically has a special reward channel, and perhaps you are thinking of that specific type of RL which is much more specific than universal learning. I should perhaps clarify and or remove the mention of AIXI. A ULM - as I described - does not have a reward channel like AIXI. It just conceptually has a value and or utility function initially defined by some arbitrary function that conceptually takes the whole brain/model as input. In the case of the brain, the utility function is conceptual, in practice it is more directly encoded as a value function.
About the universality or otherwise of RL. Big topic. There's no need to taboo "RL" because switching to utility-based learning does not solve the issue (and the issue I have in mind covers both). See, this is the problem. It is hard for me to fight the idea that RL (or utility-driven learning) works, because I am forced to fight a negative; a space where something should be, but which is empty ....... namely, the empirical fact that Reinforcement Learning has never been made to work in the absence of some surrounding machinery that prepares or simplifies the ground for the RL mechanism. It is a naked fact about traditional AI that it puts such an emphasis on the concept of expected utility calculations without any guarantees that a utility function can be laid on the world in such a way that all and only the intelligent actions in that world are captured by a maximization of that quantity. It is a scandalously unjustified assumption, made very hard to attack by the fact that it is repeated so frequently that everyone believes it be true just because everyone else believes it. If anyone ever produced a proof why it should work, there would be a there there, and I could undermine it. But .... not so much! About AIXI and my conversation with Marcus: that was actually about the general concept of RL and utility-driven systems, not anything specific to AIXI. We circled around until we reached the final crux of the matter, and his last stand (before we went to the conference banquet) was "Yes, it all comes down to whether you believe in the intrinsic reasonableness of the idea that there exists a utility function which, when maximized, yields intelligent behavior .......... but that IS reasonable, .... isn't it?" My response was "So you do agree that that is where the buck stops: I have to buy the reasonableness of that idea, and there is no proof on the table for why I SHOULD buy it, no?" Hutter: "Yes." Me: "No matter how reasonable it seems, I don't buy it" Hi
I don't think that is an overstatement. If MIRI is basicatly wrong about UFs, then most of its case unravels. Why isnt the issue bring treated as a matter of urgency?
A very good question indeed. Although ... there is a depressing answer. This is a core-belief issue. For some people (like Yudkowsky and almost everyone in MIRI) artificial intelligence must be about the mathematics of artificial intelligence, but without the utility-function approach, that entire paradigm collapses. Seriously: it all comes down like a house of cards. So, this is a textbook case of a Kuhn / Feyerabend - style clash of paradigms. It isn't a matter of "Okay, so utility functions might not be the best approach: so let's search for a better way to do it" .... it is more a matter of "Anyone who thinks that an AI cannot be built using utility functions is a crackpot." It is a core belief in the sense that it is not allowed to be false. It is unthinkable, so rather than try to defend it, those who deny it have to be personally attacked. (I don't say this because of personal experience, I say it because that kind of thing has been observed over and over when paradigms come into conflict). Here, for example, is a message sent to the SL4 mailing list by Yudkowsky in August 2006: So the immediate answer to your question is that it will never be treated as a matter of urgency, it will be denied until all the deniers drop dead. Meanwhile, I went beyond that problem and outlined a solution, soon after I started working in this field in the mid-80s. And by 2006 I had clarified my ideas enough to present them at the AGIRI workshop held in Bethesda that year. The MIRI (then called SIAI) crowd were there, along with a good number of other professional AI people. The response was interesting. During my presentation the SIAI/MIRI bunch repeatedly interrupted with rude questions or pointed, very loud, laughter. Insulting laughter. Loud enough to make the other participants look over and wonder what the heck was going on. That's your answer, again, right there. But if you want to know what to do about it, the paper I published after the workshop is a good place t
Won't comment about past affairs, but these days at least part of MIRI seems more open to the possibility. E.g. this thread where So8res (Nate Soares, now Executive Director of MIRI) lists some possible reasons for why it might be necessary to move beyond utility functions. (He is pretty skeptical of most, but at least he seems to be seriously considering the possibility, and gives a ~15% chance "that VNM won't cut it".)
The day that I get invited as a guest speaker by either MIRI or FHI will mark the point at which they start to respect and take seriously alternative viewpoints.
Would that be this paper? If so, it seems to me to have rather little to do with the question of whether utility functions are necessary, helpful, neutral, unhelpful, or downright inconsistent with genuinely intelligent behaviour. It argues that intelligent minds may be "complex systems" whose behaviour is very difficult to relate to their lower-level mechanisms, but something that attempts to optimize a utility function can perfectly well have that property. (Because the utility function can itself be complex in the relevant sense; or because the world is complex, so that effective optimization of even a not-so-complex utility function turns out to be achievable only by complex systems; or because even though the utility function could be optimized by something not-complex, the particular optimizer we're looking at happens to be complex.) My understanding of the position of EY and other people at MIRI is not that "artificial intelligence must be about the mathematics of artificial intelligence", but that if we want to make artificially intelligent systems that might be able to improve themselves rapidly, and if we want high confidence that this won't lead to an outcome we'd view as disastrous, the least-useless tools we have are mathematical ones. Surely it's perfectly possible to hold (1) that extremely capable AI might be produced by highly non-mathematical means, but (2) that this would likely be disastrous for us, so that (3) we should think mathematically about AI in the hope of finding a way of doing it that doesn't lead to disaster. But it looks as if you are citing their belief in #3 as indicating that they violently reject #1. So, anyway, utility functions. The following things seem to be clearly true: * There are functions whose maximization implies (at least) kinda-intelligence-like behaviour. For instance, maximizing games of chess won against the world champion (in circumstances where you do actually have to play the games rather than, e.g., just
But there are good reasons for thinking that, in absolute terms, many mathematical methods of AI safety are useless. The problem is that they relate to ideal rationaliists, but ideal rationality is uncomputable, so they are never directly applicable to any buildable AI....and how they real world AI would deviate from ideal rationality is crucial to understanding the that's they would pose. Deviations from ideal rationality could pose new threats, or could counter certain classes of threat (in particular, lack of goal stability could be leveraged to provide corrigibility, which is a desirable safety feature). There's an important difference between thinking mathematically and only thinking mathematically. Highly non mathematical AI, that is cobbled together without clean overriding principles, cannot be made safe by clean mathematical principles, although it could quite conceivably be made safe by piecemeal engineering solutions such as kill switches, corrigibility and better boxing... the kind of solution MIRI isnt interested in...which does look as though they are neglecting a class of AI danger.
If any particular mathematical approach to AI safety is useless, and if MIRI are attempting to use that approach, then they are making a mistake. But we should distinguish that from a different situation where they aren't attempting to use the useless approach but are studying it for insight. So, e.g., maybe approach X is only valid for AIs that are ideal rationalists, but they hope that some of what they discover by investigating approach X will point the way to useful approaches for not-so-ideal rationalists. Do you have particular examples in mind? Is there good evidence telling us whether MIRI think the methods in question will be directly applicable to real AIs? I agree. I am not so sure I agree that cobbled-together AI can "quite conceivably be made safe by piecemeal engineering solutions", and I'm pretty sure that historically at least MIRI has thought it very unlikely that they can. It does seem plausible that any potentially-dangerous AI could be made at least a bit safer by such things, and I hope MIRI aren't advocating that no such things be done. But this is all rather reminiscent of computer security, where there are crude piecemeal things you can do that help a bit, but if you want really tight security there's no substitute for designing your system for security from the start -- and one possible danger of doing the crude piecemeal things is that they give you a false sense of safety.
By 1900, the basic principles of areodynamics in terms of lift and drag were known for almost a century - the basic math of flight. There were two remaining problems: power and control. Powered heavier than air flight requires an efficient engine with sufficient power/weight ratio. Combustion engine tech developed along a sigmoid, and by 1900 that tech was ready. The remaining problem then was control. Most of the flight pioneers either didn't understand the importance of this problem, or they thought that aircraft could be controlled like boats are - with a simple rudder mechanism. The Wright Brothers - two unknown engineers - realized that steering in 3D was more complex. They solved this problem by careful observation of bird flight. They saw that birds turned by banking their whole body (and thus leveraging the entire wing airfoil), induced through careful airfoil manipulation on the trailing wing edge. They copied this wing warping mechanism directly in their first flying machines. Of course - they weren't the only ones to realize all this, and ailerons are functionally equivalent but more practical for fixed wing aircraft. Flight was achieved by technological evolution or experimental engineering, taking some inspiration from biology. Pretty much all tech is created through steady experimental/evolutionary engineering. Machine learning is on a very similar track to produce AGI in the near term. Ahh and that's part of the problem. The first AGIs will be sub-human then human level intelligence, and Moore's Law is about to end or has already ended, so the risk of some super rapid SI explosion in the near term is low. Most of the world doesn't care about tight security. AGI just needs to be as safe or safer than humans. Tight security is probably impossible regardless - you can't prove tight bounds on any system of extreme complexity (like the real world). Tight math bounds always requires ultra-simplified models.
Where are insights about the relative usefulness of .pure theory going to come from? Its not even conceivable? Even though auto motive safety basically happened that way? That's clearly not crude hackery, but its not pure theory either. The kind of Clean Engineering you are talking about ican only be specific to a particular architecture, which pure theory isnt. There is a pretty hard limit to how much you can predict about system,, AI or not, without knowing its architecture.
That wasn't at all the sort of insight I had in mind. It's commonplace in science to start trying to understand complicated things by first considering simpler things. Then sometimes you learn techniques that turn out to be applicable in the harder case, or obstacles that are likely still to be there in the harder case. (Lots of computer science research has considered computers with literally unlimited memory, models of computation in which a single operation can do arbitrary arithmetic on an integer of any size, models of computation in which the cost of accessing memory doesn't depend on how big the memory is, etc., and still managed to produce things that end up being useful for actual software running on actual computers with finite memories and finite registers running in a universe with a finite maximum speed limit.) Well, I guess it depends on what you mean by "quite conceivable". Obviously anyone can say "we might be able to make a cobbled-together AI safe by piecemeal engineering solutions", so if that counts as "conceiving" then plainly it's conceivable. But conceivability in that sense is (I think) completely uninteresting; what we should care about is whether it's at all likely, and that's what I took you to mean by "quite conceivable". It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different. The point of the pure theory is to help figure out what kind of engineering is going to be needed. If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its archit
Piecemeal efforts are least likely to make a difference to the most dangerous, least likely scenario of a fast takeoff singleton. But there is societal lesson to be learnt from things like automotive safety, and Nuclear non proliferation: voluntary self restraint can be a factor. Lessons about engineering can be learnt from engineering, too. For instance Big Design up Front, the standard response to the rapidly self improving singleton, is known to be a pretty terrible way of doing things, that should be avoided if there are alternatives. Negative leasons from pure theory need to be learnt, too. MIRIs standard response to the tilings agents problem is that a way will be found around the problem of simultaneous value preservation and self modification. But why bother? If the Loebian obstacle is allowed to stand, there is no threat from a Clippie. That is a rather easily achieved form of self restraint. You probably have to gave up on the idea of a God AI benevolently ruling the world, but some of were never that .keen anyway. Another negative lesson is that ideal rationalists are uncomputable, with the corollary that there is no one way to be a non ideal rationalist...which leads into architecture specific safety. That can only be true in special cases. You can't in general predict a chess programme that is better that you, because,iif you could, you would be as good as it is. In any case, detailed prediction is beside the point. If you want to design architecture specific safety features, you need a broad view of how AIs of a class would behave.
Someones got to have insights about how pure theory fits into the bigger picture. And sometimes that's directly applicable, and sometimes it isnt....that's one of the big picture issues.
I wasn't meaning to denigrate that sort of insight. (Though "how pure theory fits in" doesn't seem to me the same thing as "the relative usefulness of pure theory", which is what you said before, and I think what you're describing now sounds distinctly more valuable.) Just saying that it wasn't the kind of insight I would look for from studying the pure theory. In this case, I wouldn't much expect it to be directly applicable. But I would expect it to be much easier to tell whether it is (and whether it's indirectly applicable) once one has a reasonable quantity of theory in hand.
Sorry, was in too much of a rush to give link..... Loosemore, R.P.W. (2007). Complex Systems, Artificial Intelligence and Theoretical Psychology. In B. Goertzel & P. Wang (Eds.), Proceedings of the 2006 AGI Workshop. IOS Press, Amsterdam.
Excuse me, but as much as I think the SIAI bunch were being rude to you, if you had presented, at a serious conference on a serious topic, a paper that waves its hands, yells "Complexity! Irreducible! Parallel!" and expected a good reception, I would have been privately snarking if not publicly. That would be me acting like a straight-up asshole, but it would also be because you never try to understand a phenomenon by declaring it un-understandable. Which is not to say that symbolic, theorem-prover, "Pure Maths are Pure Reason which will create Pure Intelligence" approaches are very good either -- they totally failed to predict that the brain is a universal learning machine, for instance. (And so far, the "HEY NEURAL NETS LEARN WELL" approach is failing to predict a few things I think they really ought to be able to see, and endeavor to show.) That anyone would ever try to claim a technological revolution is about to arise from either of those schools of work is what constantly discredits the field of artificial intelligence as a hype-driven fraud!
Okay, so I am trying to understand what you are attacking here, and I assume you mean my presentation of that paper at the 2007 AGIRI workshop. Let me see: you reduced the entire paper to the statement that I yelled "Complexity! Irreducible! Parallel!". Hmmmm...... that sounds like you thoroughly understood the paper and read it in great detail, because you reflected back all the arguments in the paper, showed good understanding of the cognitive science, AI and complex-systems context, and gave me a thoughtful, insightful list of comments on some of the errors of reasoning that I made in the paper. So I guess you are right. I am ignorant. I have not been doing research in cognitive psychology, AI and complex systems for 20 years (as of the date of that workshop). I have nothing to say to defend any of my ideas at all, when people make points about what is wrong in those ideas. And, worse still, I did not make any suggestions in that paper about how to solve the problem I described, except to say "HEY NEURAL NETS LEARN WELL". I wish you had been around when I wrote the paper, because I could have reduced the whole thing to one 3-word and one 5-word sentence, and saved a heck of a lot of time. P.S. I will forward your note to the Santa Fe Institute and the New England Complex Systems Institute, so they can also understand that they are ignorant. I guess we can expect an unemployment spike in Santa Fe and Boston, next month, when they all resign en masse.
I don't see it as dogmatism so much as a verbal confusion. The ubiquity of UFs can be defended using a broad (implicit) definition, but the conclusions typically drawn about types of AI danger and methods of AI safety relate to a narrower definition, where a Ufmks * Explicitly coded And/or * Fixed, unupdateable And/or * "Thick" containing detailed descriptions of goals.
Since the utility function is approximated anyway, it becomes an abstract concept - especially in the case of evolved brains. For an evolved creature, the evolutionary utility function can be linked to long term reproductive fitness, and the value function can then be defined appropriately. For a designed agent, it's a useful abstraction. We can conceptually rate all possible futures, and then roughly use that to define a value function that optimizes towards that goal. It's really just a mathematical abstraction of the notion of X is better than Y. It's not worth arguing about. It's also proven in the real world - agents based on utility formalizations work. Well.
It certainly is worth discussing, and I'm sorry but you are not correct that "agents based on utility formalizations work. Well." That topic came up at the AAAI symposium I attended last year. Specifically, we had several people there who built real-world (as opposed to academic, toy) AI systems. Utility based systems are generally not used, except as a small component of a larger mechanism.
Pretty much all of the recent ML systems are based on a utility function framework in a sense - they are trained to optimize an objective function. In terms of RL in particular, Deepmind's Atari agent works pretty well, and builds on a history of successful practical RL agents that all are trained to optimize a 'utility function'. That said, for complex AGI, we probably need something more complex than current utility function frameworks - in the sense that you can't reduce utility to an external reward score. The brain doesn't appear to have a simple VNM single-axis utility concept, which is some indication that we may eventually drop that notion for complex AI. My conception of 'utility function' is loose, and could include whatever it is the brain is doing.
Wait wait wait. You didn't head to the dinner, drink some fine wine, and start raucously debating the same issue over again? Bah, humbug! Also, how do I get invited to these conferences again ;-)? Very true, at least regarding AI. Personally, my theory is that the brain does do reinforcement learning, but the "reward function" isn't a VNM-rational utility function, it's just something the body signals to the brain to say, "Hey, that world-state was great!" I can't imagine that Nature used something "mathematically coherent", but I can imagine it used something flagrantly incoherent but really dead simple to implement. Like, for instance, the amount of some chemical or another coming in from the body, to indicate satiety, or to relax after physical exertion, or to indicate orgasm, or something like that.
Hey, ya pays yer money and walk in the front door :-) AGI conferences run about $400 a ticket I think. Plus the airfare to Berlin (there's one happening in a couple of weeks, so get your skates on). Re the possibility that the human system does do reinforcement learning .... fact is, if one frames the meaning of RL in a sufficiently loose way, the human cogsys absolutely DOES do RL, no doubt about it. Just as you described above. But if you sit down and analyze what it means to make the claim that a system uses RL, it turns out that there is a world of difference between the two positions: and The difference is that the second case turns the descriptive mechanism into an explicit mechanism. It's like Ptolemy's Epicycle model of the solar system. Was Ptolemy's fancy little wheels-within-wheels model a good descriptive model of planetary motion? You bet ya! Would it have been appropriate to elevate that model and say that the planets actually DID move on top of some epicycle-like mechanism? Heck no! As a functional model it was garbage, and it held back a scientific understanding of what was really going on for over a thousand years. Same deal with RL. Our difficulty right now is that so many people slip back and forth between arguing for RL as a descriptive model (which is fine) and arguing for it as a functional model (which is disastrous, because that was tried in psychology for 30 years, and it never worked).
This is literally false. A model of a brain might, some functional copy of brain implemented on a different hardware platform possibly could. An actual human brain, I don't think so. This is also literally false. Consider a trivial loop for (i=0; i<100000; i++) { .. } Human brains can conceptualize it, but they do not run it
Theoretically a brain with some additional memory tools could run windows. In practice, sure an actual human brain would not be able to, obviously - boredom. I did not mean that every codepath is run - but that's never true anyway. And yes "all of the code" is far too strong - most of it is just loosely conceptually simulated by the brain alone, and then more direct sample paths are run with the help of a debugger.
Fermi estimate time! :-) Given an appropriately unrolled set of appropriate instructions, how long would it take for a human armed with nothing but paper and pencil to simulate a complete Windows (say, Windows 7) boot process?

If an AGI is based on a neural network, how can you tell from the logs whether or not the AI knows it's in a simulation?

Current ANN engines can already train and run models with around 10 million neurons and 10 billion (compressed/shared) synapses on a single GPU, which suggests that the goal could soon be within the reach of a large organization.

This suggests 15000 GPUs is equivalent in computing power to a human brain, since we have about 150 trillion synapses? Why did you suggest 1000 earlier? How much of a multiplier on top of that do you think we need for trial-and-error research and training, before we get the first AGI? 10x? 100x? (If it isn't clear, I mean that i... (read more)

ANN based AGI will not need to reproduce brain circuits exactly. There are general tradeoffs between serial depth and circuit size. The brain is much more latency/speed constrained so it uses larger, shallower circuits whereas we can leverage much higher clock speeds to favour deeper smaller circuits. You see the same tradeoffs in circuit design, and also in algorithms where parallel variants always use more ops than the minimal fully serial variant. Also, independent of those considerations, biological circuits and synapses are redundant, noisy, and low precision. If you look at raw circuit level ops/second, the brain's throughput is not that much. A deeper investigation of the actual theoretical minimum computation required to match the human brain would be a subject for a whole post (and one I may not want to write up just yet). With highly efficient future tech, I'd estimate that it would take far less than 10^15 32-bit ops/s (1000 gpus): probably around or less than 10^13 32 bit ops/s. So we may already be entering into a hardware overhang situation. One way to estimate that is to compare to the number of full train/test iterations required to reach high performance in particular important sub-problems such as vision. The current successful networks all descend from designs invented in the 80's or earlier. Most of the early iterations were on small networks, and I expect the same to continue to be true for whole AGI systems. Let's say there are around 100 researchers who worked full time on CNNs for 40 years straight (4000 researcher years), and each tested 10 designs per year - so 40,000 iterations to go from perceptrons to CNNs. A more accurate model should consider the distribution over design iterations times and model sizes. Major new risky techniques are usually tested first on small problems and models and then scaled up. So anyway, let's multiply by 20 roughly and say it takes a million AGI 'lifetimes' or full test iterations, where each lifetime i
2Wei Dai
Thanks for the explanations. Hmm, I was trying to figure out how much of a speed superintelligence the first AGI will likely be. In other words, how much computing power will a single lab have accumulated by the time we get AGI? As a minimum, it seems that a company like Google could easily spend $100M to purchase 100,000 GPUs for AGI research, and if initially 1000 GPUs = 1x human speed, that implies the first AGI is at least a 100x speed superintelligence (which could speed up to 10000x on the same hardware through future software improvements, if I'm understanding you correctly). Also, question about GPU/AGI costs. Here you seem to be using $1000 per GPU-year, which equals $.11 per GPU-hour, but in that previous thread, you used $1 per GPU-hour. According to this discussion $.11 seems close to the actual cost. Assuming $.11 is correct, AGI would be economically competitive with (some types of) human labor today at 1000 GPUs = 1x human speed, but maybe there's not a huge economic incentive to race for it yet. (I mean, unless one predicts that GPU costs will keep falling in the future, and therefore wants to prepare for that.) Nvidia is claiming that its next generation of GPU is 10x better for deep learning. How much of that is hype?
My earlier statement about 10 million neurons / 10 billion synapses on a single GPU is something of a gross oversimplification. A more realistic model is this: B flops = M F * N Where B is a software sim efficiency parameter (currently ~ 1, and roughly doubling per year), M is the number of AI model instances, F is the frequency in hz, and N is the number of synapses. Today's CPU/GPU ANN solutions need to parallelize over a large number of AI instances to get full efficiency - due to memory and bandwidth issues - so B is ~1 only when M is ~100. Today on a current high end GPU with 1 trillion flops you can thus run 100 copies of a 1 billion synapse ANN at 10 hz (M = 100, F = 10, N = 1 billion), whereas a single copy on the GPU may run at only 50 hz ish (B ~0.05, 20x less efficient). Training is accelerated mainly by parallel speedup over instances rather than serial speedup of a single instance. So with 1000 GPUs and today's tech, in theory you could get 100 copies of a 1 trillion synapse ANN running at 10hz using model parallelism. 1 trillion synapses @ 10hz is borderline plausible, 10 trill @ 100 hz is probably more realistic and would entail 100,000 gpus. But this somewhat assumes near perfect parallel scaling. Communication/latency issues limit the maximize size of realistic models. 100,000 GPUs would be larger than the biggest supercomputers of today, and probably is far beyond the limits of practical linear scaling. So it's only 1000 2015 gpus = 1 brain in an amortized rough sense. In practice I expect there is a minimum amount of software & hardware speedup required first to make these very large ANNs realistic or feasible in the first place, because of weak scaling issues in supercomputers. But once you get over this minimum barrier, there is a pretty large room for sudden speedup. And finally - parallel model speedup seems to be almost as effective as serial speedup, and is more powerful than the equivalent parallel scaling in human organizations - be

To create a superhuman AI driver, you 'just' need to create a realistic VR driving sim and then train a ULM in that world (better training and the simple power of selective copying leads to superhuman driving capability).

So to create benevolent AGI, we should think about how to create virtual worlds with the right structure, how to educate minds in those worlds, and how to safely evaluate the results.

There is some interesting overlap between these ideas and Eric Drexler's recent proposal. (Previously discussed on LessWrong here)

Cool - hadn't read that yet. Separating learning capacity from domain knowledge is kind of automatic in a ULM approach. There is nothing inherently dangerous about the learning mechanisms itself - it's the knowledge that is potentially dangerous. I have butted heads with LW on that point for 4 to 5 years. The knowledge management idea is the essence of the VR sandbox approach, but I also imagine separating out value systems/priors to some degree for independent testing. Overall Drexler's proposal (from reading the abstract and skimming) seems to be very much in line with my views. Safety considerations would go into design at all levels, from designing the VR world itself to the brain architecture to the education/training programs. In regards to modularity: large ANN systems are already modular, brains are modular, and brain-style AGI approaches are modular. It's just sort of assumed. It's a new consideration for perhaps the formal/math/AIXI/MIRI cluster, but only because they haven't put as much thought into practical architecture.

New AI designs (world design + architectural priors + training/education system) should be tested first in the safest virtual worlds: which in simplification are simply low tech worlds without computer technology. Design combinations that work well in safe low-tech sandboxes are promoted to less safe high-tech VR worlds, and then finally the real world.

A key principle of a secure code sandbox is that the code you are testing should not be aware that it is in a sandbox.

So you're saying that I'm secretly an AI being trained to be friendly for a more advanced world? ;)

That's possible given the sim argument. The eastern idea of reincarnation and the western idea of afterlife map to two main possibilities: in the reincarnation model all that is transferred between worlds is the architectural seed or hyperparameters. In the afterlife model the creator has some additional moral obligation or desire to save and transfer whole minds out.

Why isn't this in Main??? I mean I can understand that it wasn't posted there but the upvotes say 'Main' in a quite clear language. There isn't even much that could be improved in the post so why not move it?

I actually forgot that was something I needed to do. Done now.

"In the modern era some deaf humans have apparently acquired the ability to perform echolocation (sonar), similar to cetaceans." Did you mean blind?

Excellent article, thanks!

But here is also rising a question about animal intelligence. My cat (unfortunately) is more like a set of programms from the point of view of its behaviour, but its brain diagram is the same as in any vertebral. So does it support modules hypothesis?

Thanks for the typo find - I read the whole thing several times and didn't notice. Presumably the low level recognition of the word was overrided by the high level prior without triggering any alarms. Cats have the same overall brain architecture as primates and humans, but smaller. Generally the payoff for learning increases with brain size and lifespan. The smaller the brain and the shorter the organism's lifespan, the more evolution relies on complex innate reflexes. One interesting (but cruel) experiment which illustrates this is decerebration. See this article from the Great Soviet Encyclopedia. Basically the larger the mammal's brain, the more they depend on learned functionality in the new brain (cortex + cerebellum) .

That's a lot to absorb, so I've skimmed it, so please forgive if responses to the following are already implicit in what you've said.

I thought the point of the modularity hypothesis is that the brain only approximates a universal learning machine and has to be gerrymandered and trained to do so?

If the brain were naturally a universal learner, then surely we wouldn't have to learn universal learning (e.g. we wouldn't have to learn to overcome cognitive biases, Bayesian reasoning wouldn't be a recent discovery, etc.)? The system seems too gappy and glitchy, too full of quick judgement and prejudice, to have been designed as a universal learner from the ground up.

If the brain were naturally a universal learner, then surely we wouldn't have to learn universal learning (e.g. we wouldn't have to learn to overcome cognitive biases, Bayesian reasoning wouldn't be a recent discovery, etc.)? The system seems too gappy and glitchy, too full of quick judgement and prejudice, to have been designed as a universal learner from the ground up.

You are conflating the ideas of universal learning and rational thinking. They are not the same thing.

I'm a strong believer in the idea that the human intelligence emerges from a strong general purpose reinforcement learning algorithm. If that's true, then it's very consistent with our problems of cognitive bias.

If the RL idea is correct, then thinking is best understood as as a learned behavior, just like what words we speak with our lips is a learned behavior, just as how we move our arms and legs are learned behaviors. Under the principle that we are are an RL learning machine, what we learn, is ANY behavior which helps us to maximize our reward signal.

We don't learn rational behavior, we learn whatever behavior the learning system rationally has computed is what is needed to produce the most rewards. And ... (read more)

We don't learn rational behavior, we learn whatever behavior the learning system rationally has computed is what is needed to produce the most rewards.

Yes, this. But it is so easy to make mistakes when interpreting this statement, that I feel it requires dozen warnings to prevent readers from oversimplifying it.

For example, the behavior we learn is the behavior that produced most rewards in the past, when we were trained. If the environment changes, what we do may no longer give rewards in the new environment. Until we learn what produces rewards in the new environment.

Unless we already had an experience with changing environment, in which case we might adapt much more quickly, because we already have meta-behavior for "changing the behavior to adapt to new environment".

Unless we already had an experience when the environment changed, we adapted our behavior, then the environment suddenly changed back, and we were horribly punished for the adapted behavior, in which case the learned meta-behavior would be "do not change your behavior to adapt to the new environment (because it will change back and you will be rewarded for persistence)".

It is these learned met... (read more)

Hmm, but isn't this conflating "learning" in the sense of "learning about the world/nature" with "learning" in the sense of "learning behaviours"? We know the brain can do the latter, it's whether it can do the former that we're interested in, surely? IOW, it looks like you're saying precisely that the brain is not a ULM (in the sense of a machine that learns about nature), it is rather a machine that approximates a ULM by cobbling together a bunch of evolved and learned behaviours. It's adept at learning (in the sense of learning reactive behaviours that satisfice conditions) but only proximally adept at learning about the world.
I'm not sure what you mean by gerrymandered. I summarized the modularity hypothesis in the beginning to differentiate it from the ULM hypothesis. There are a huge range of views in this space, so I reduced them to examplars of two important viewpoint clusters. The specific key difference is the extent to which complex mental algorithms are learned vs innate. You certainly don't need to learn how to overcome cognitive biases to learn (this should be obvious). Knowledge of the brain's limitations could be useful, but is probably more useful only in the context of having a high level understanding of how the brain works. In regards to bayesian reasoning, the brain has a huge number of parallel systems and computations going on at once, many of which are implementing efficient approximate bayesian inference. Verbal bayesian reasoning is just a subset of verbal mathematical reasoning - mapping sentences to equations, solving, and mapping back to sentences. It's a specific complex ability that uses a number of brain regions. It's something you need to learn for the same reasons you need to learn multiplication. The brain does tons of analog multiplications every second, but that doesn't mean you have an automatic innate ability to do verbal math - as you don't have an automatic innate ability to do much of anything. One of the main points I make in the article is that universal learning machines are a very general thing that - in simplest form - can be specified in a small number of bits, just like a turing machine. So it's a sort of obvious design that evolution would find.
What I meant is that you have sub-systems dedicated to (and originally evolved to perform) specific concrete tasks, and shifting coalitions of them (or rather shifting coalitions of their abstract core algorithms) are leveraged to work together to approximate a universal learning machine. IOW any given specific subsystem (e.g. "recognizing a red spot in a patch of green") has some abstract algorithm at its core which is then drawn upon at need by an organizing principle which utilizes it (plus other algorithms drawn from other task-specific brain gadgets) for more universal learning tasks. That was my sketchy understanding of how it works from evol psych and things like Dennett's books, Pinker, etc. Furthermore, I thought the rationale of this explanation was that it's hard to see how a universal learning machine can get off the ground evolutionarily (it's going to be energetically expensive, not fast enough, etc.) whereas task-specific gadgets are easier to evolve ("need to know" principle), and it's easier to later get an approximation of a universal machine off the ground on the back of shifting coalitions of them.
Ah ok your gerrymandering analogy now makes sense. I think that's a good summary of the evolved modularity hypothesis. It turns out that we can actually look into the brain and test that hypothesis. Those tests were done, and lo and behold, the brain doesn't work that way. The universal learning hypothesis emerged as the new theory to explain the new neuroscience data from the last decade or so. So basically this is what the article is all about. You said earlier you skimmed it, so perhaps I need a better abstract or summary at the top, as oge suggested. This is a pretty good sounding rationale. It's also probably wrong. It turns out a small ULM is relatively easy to specify, and also is completely compatible with innate task-specific gadgetry. In other words the universal learning machinery has very little drawbacks. All vertebrates have a similar core architecture based on the basal ganglia. In large brained mammals, the general purpose coprocessors (neocortex, cerebellum) are just expanded more than other structures. In particular it looks like the brainstem has a bunch of old innate circuitry that the cortex and BG learns how to control (the BG does not just control the cortex), but I didn't have time to get into the brainstem in the scope of this article.
Great stuff, thanks! I'll dig into the article more.

(there are a couple of circuit diagrams of the whole brain on the web, but this is the best.  From this site.)

could you update the 404 image, please? (link to the site still works for now, just the image is gone)

Other evidence I would add to the theory of the brain being a ULM is the existence of the g-factor, and the fact the general factor is one that explains the most variation in these cognitive tests. In addition - if you model human cognitive abilities as universal and specific, evolutionarily speaking it would make sense for the universal aspect to be under stronger selection than the specific domain. One exception to this could be language learning, which is important just for the sake of being able to communicate.

Interesting article. Minor note on clarity: You might want to clarify the acronym "EMH" where it appears, since it so often here stands for "efficient market hypothesis".

I find images such as the one above extremely disconcerting for some reason. They cause me about a 7/10 level of discomfort, verging on moderate pain. It also sticks in my head for a dozen or so minutes after viewing. I'd strongly prefer to never see one ever again, please.

I don't know if this is something unique to my brain, or if this is a step towards a real life BLIT, but wow. Awful to experience, I have extra empathy for epileptic people now.

Me too (though to a much lesser extent for this particular image; a little more for certain other such images). Reading Scott Alexander say that such images look a lot like LSD hallucinations made me change my mind about whether I'll ever want to try LSD.
I'm curious: Imagine that you haven't seen this article yet, and that you are now going to read this article for the first time. It contains a message "trigger warning: XYZ" at the top (for some value of XYZ). Which value of XYZ would give you the best idea of what kind of images the article contains, so you could have made an informed decision not to look at them? (I imagine something like "weird disturbing pictures that seem like hallucinations". But would you have predicted your reaction from reading such text? Would it actually have stopped you from looking?)
It wouldn't have stopped me. But now that I'm acquainted with images such as this, if someone put "Trigger warning: Shoggoths" or something similar on future posts, then I would take heed of such warnings.
Update: I have been tentatively identified as having a very early form of cancer. They dug a tumorous lymph node out of my neck two days ago, although more is still left inside. More tests will happen next Monday. I don't truly think, with my rational mind, that there is a connection between the cancer and this image. However, I am emotionally disturbed. To me, the above image seems essentially like a visual depiction of cancer. When I look at the image, my throat seizes up, and my brain flinches in pain. The rearmost parts of my brain are paranoid, and demand that I mention this diagnosis just in case this image truly behaves like a BLIT for some people. Again, I don't actually believe in this idea. But if anyone else gets cancer sometime soon, please let us know. Because I'm disturbed and disgusted, I feel violated.

Hi jacob_cannell, this article looks really interesting but it is a LOT to consume at once. Could you please put a summary at the top with the main points so that it makes the post easier to navigate?

Hey oge - thanks for the feedback. I tried to summarize the article in the intro, but maybe that didnt work. Do you think an a short abstract at the top would help? Or perhaps an outline?
An abstract as the very first thing would help. An outline would be better. Here are the paragraphs that I thought were the main point of the article (please correct this if I'm wrong): "These two conceptions of the brain - the universal learning machine hypothesis and the evolved modularity hypothesis - lead to very different predictions for the likely route to AGI, the expected differences between AGI and humans, and thus any consequent safety issues and strategies." and "Current ANN engines can already train and run models with around 10 million neurons and 10 billion (compressed/shared) synapses on a single GPU, which suggests that the goal could soon be within the reach of a large organization. Furthermore, Moore's Law for GPUs still has some steam left, and software advances are currently improving simulation performance at a faster rate than hardware. These trends implies that Anthropomorphic/Neuromorphic AGI could be surprisingly close, and may appear suddenly. What kind of leverage can we exert on a short timescale?"
Done - I added the abstract as first thing under the header image, followed by an outline.

Typeo just above "Basal Ganglia" section.

For example infants are born with a simple versions of a fear response, with is later refined through reinforcement learning.

"with is later" should be "which is later"

This was a great post, thanks!

One thing I'm curious about is how the ULH explains to the fact that human thought seems to be divided into System 1/System 2 - is this solely a matter of education history?

At first the ULH seemed to predict too much plasticity relative to observation, but on reflection I think it might predict the right amount. To square ULH with human universals, we have to hypothesize that the general structure and the conditions of human life robustly result in convergence to certain attractors. But the big advantage of this hypothesis is that it neatly explains why certain mental comlexes like farmer morality sometimes seems to have innate support while also being sometimes unlearnable and possibly not existing before agriculture.