# 103

NeuroscienceWorld Modeling
Frontpage

This article presents an emerging architectural hypothesis of the brain as a biological implementation of a Universal Learning Machine.  I present a rough but complete architectural view of how the brain works under the universal learning hypothesis.  I also contrast this new viewpoint - which comes from computational neuroscience and machine learning - with the older evolved modularity hypothesis popular in evolutionary psychology and the heuristics and biases literature.  These two conceptions of the brain lead to very different predictions for the likely route to AGI, the value of neuroscience, the expected differences between AGI and humans, and thus any consequent safety issues and dependent strategies.

(The image above is from a recent mysterious post to r/machinelearning, probably from a Google project that generates art based on a visualization tool used to inspect the patterns learned by convolutional neural networks.  I am especially fond of the wierd figures riding the cart in the lower left. )

1. Intro: Two viewpoints on the Mind
2. Universal Learning Machines
3. Historical Interlude
4. Dynamic Rewiring
5. Brain Architecture (the whole brain in one picture and a few pages of text)
6. The Basal Ganglia
7. Implications for AGI
8. Conclusion

#### Intro: Two Viewpoints on the Mind

Few discoveries are more irritating than those that expose the pedigree of ideas.

-- Lord Acton (probably)

Less Wrong is a site devoted to refining the art of human rationality, where rationality is based on an idealized conceptualization of how minds should or could work.  Less Wrong and its founding sequences draws heavily on the heuristics and biases literature in cognitive psychology and related work in evolutionary psychology.  More specifically the sequences build upon a specific cluster in the space of cognitive theories, which can be identified in particular with the highly influential "evolved modularity" perspective of Cosmides and Tooby.

From Wikipedia:

Evolutionary psychologists propose that the mind is made up of genetically influenced and domain-specific[3] mental algorithms or computational modules, designed to solve specific evolutionary problems of the past.[4]

From "Evolutionary Psychology and the Emotions":[5]

An evolutionary perspective leads one to view the mind as a crowded zoo of evolved, domain-specific programs.  Each is functionally specialized for solving a different adaptive problem that arose during hominid evolutionary history, such as face recognition, foraging, mate choice, heart rate regulation, sleep management, or predator vigilance, and each is activated by a different set of cues from the environment.

If you imagine these general theories or perspectives on the brain/mind as points in theory space, the evolved modularity cluster posits that much of the machinery of human mental algorithms is largely innate.  General learning - if it exists at all - exists only in specific modules; in most modules learning is relegated to the role of adapting existing algorithms and acquiring data; the impact of the information environment is de-emphasized.  In this view the brain is a complex messy cludge of evolved mechanisms.

There is another viewpoint cluster, more popular in computational neuroscience (especially today), that is almost the exact opposite of the evolved modularity hypothesis.  I will rebrand this viewpoint the "universal learner" hypothesis, aka the "one learning algorithm" hypothesis (the rebranding is justified mainly by the inclusion of some newer theories and evidence for the basal ganglia as a 'CPU' which learns to control the cortex).  The roots of the universal learning hypothesis can be traced back to Mountcastle's discovery of the simple uniform architecture of the cortex.[6]

The universal learning hypothesis proposes that all significant mental algorithms are learned; nothing is innate except for the learning and reward machinery itself (which is somewhat complicated, involving a number of systems and mechanisms), the initial rough architecture (equivalent to a prior over mindspace), and a small library of simple innate circuits (analogous to the operating system layer in a computer).  In this view the mind (software) is distinct from the brain (hardware).  The mind is a complex software system built out of a general learning mechanism.

In simplification, the main difference between these viewpoints is the relative quantity of domain specific mental algorithmic information specified in the genome vs that acquired through general purpose learning during the organism's lifetime.  Evolved modules vs learned modules.

When you have two hypotheses or viewpoints that are almost complete opposites this is generally a sign that the field is in an early state of knowledge; further experiments typically are required to resolve the conflict.

It has been about 25 years since Cosmides and Tooby began to popularize the evolved modularity hypothesis.  A number of key neuroscience experiments have been performed since then which support the universal learning hypothesis (reviewed later in this article).

Additional indirect support comes from the rapid unexpected success of Deep Learning[7], which is entirely based on building AI systems using simple universal learning algorithms (such as Stochastic Gradient Descent or other various approximate Bayesian methods[8][9][10][11]) scaled up on fast parallel hardware (GPUs).  Deep Learning techniques have quickly come to dominate most of the key AI benchmarks including vision[12], speech recognition[13][14], various natural language tasks, and now even ATARI [15] - proving that simple architectures (priors) combined with universal learning is a path (and perhaps the only viable path) to AGI. Moreover, the internal representations that develop in some deep learning systems are structurally and functionally similar to representations in analogous regions of biological cortex[16].

To paraphrase Feynman: to truly understand something you must build it.

In this article I am going to quickly introduce the abstract concept of a universal learning machine, present an overview of the brain's architecture as a specific type of universal learning machine, and finally I will conclude with some speculations on the implications for the race to AGI and AI safety issues in particular.

#### Universal Learning Machines

A universal learning machine is a simple and yet very powerful and general model for intelligent agents.  It is an extension of a general computer - such as Turing Machine - amplified with a universal learning algorithm.  Do not view this as my 'big new theory' - it is simply an amalgamation of a set of related proposals by various researchers.

An initial untrained seed ULM can be defined by 1.) a prior over the space of models (or equivalently, programs), 2.) an initial utility function, and 3.) the universal learning machinery/algorithm.  The machine is a real-time system that processes an input sensory/observation stream and produces an output motor/action stream to control the external world using a learned internal program that is the result of continuous self-optimization.

There is of course always room to smuggle in arbitrary innate functionality via the prior, but in general the prior is expected to be extremely small in bits in comparison to the learned model.

The key defining characteristic of a ULM is that it uses its universal learning algorithm for continuous recursive self-improvement with regards to the utility function (reward system).  We can view this as second (and higher) order optimization: the ULM optimizes the external world (first order), and also optimizes its own internal optimization process (second order), and so on.  Without loss of generality, any system capable of computing a large number of decision variables can also compute internal self-modification decisions.

Conceptually the learning machinery computes a probability distribution over program-space that is proportional to the expected utility distribution.  At each timestep it receives a new sensory observation and expends some amount of computational energy to infer an updated (approximate) posterior distribution over its internal program-space: an approximate 'Bayesian' self-improvement.

The above description is intentionally vague in the right ways to cover the wide space of possible practical implementations and current uncertainty.  You could view AIXI as a particular formalization of the above general principles, although it is also as dumb as a rock in any practical sense and has other potential theoretical problems.  Although the general idea is simple enough to convey in the abstract, one should beware of concise formal descriptions: practical ULMs are too complex to reduce to a few lines of math.

A ULM inherits the general property of a Turing Machine that it can compute anything that is computable, given appropriate resources.  However a ULM is also more powerful than a TM.  A Turing Machine can only do what it is programmed to do.  A ULM automatically programs itself.

If you were to open up an infant ULM - a machine with zero experience - you would mainly just see the small initial code for the learning machinery.  The vast majority of the codestore starts out empty - initialized to noise.  (In the brain the learning machinery is built in at the hardware level for maximal efficiency).

Theoretical turing machines are all qualitatively alike, and are all qualitatively distinct from any non-universal machine.  Likewise for ULMs.  Theoretically a small ULM is just as general/expressive as a planet-sized ULM.  In practice quantitative distinctions do matter, and can become effectively qualitative.

Just as the simplest possible Turing Machine is in fact quite simple, the simplest possible Universal Learning Machine is also probably quite simple.  A couple of recent proposals for simple universal learning machines include the Neural Turing Machine[16] (from Google DeepMind), and Memory Networks[17].  The core of both approaches involve training an RNN to learn how to control a memory store through gating operations.

#### Historical Interlude

At this point you may be skeptical: how could the brain be anything like a universal learner?  What about all of the known innate biases/errors in human cognition?  I'll get to that soon, but let's start by thinking of a couple of general experiments to test the universal learning hypothesis vs the evolved modularity hypothesis.

In a world where the ULH is mostly correct, what do we expect to be different than in worlds where the EMH is mostly correct?

One type of evidence that would support the ULH is the demonstration of key structures in the brain along with associated wiring such that the brain can be shown to directly implement some version of a ULM architecture.

Another type of indirect evidence that would help discriminate the two theories would be evidence that the brain is capable of general global optimization, and that complex domain specific algorithms/circuits mostly result from this process.  If on the other hand the brain is only capable of constrained/local optimization, then most of the complexity must instead be innate - the result of global optimization in evolutionary deeptime.  So in essence it boils down to the optimization capability of biological learning vs biological evolution.

From the perspective of the EMH, it is not sufficient to demonstrate that there are things that brains can not learn in practice - because those simply could be quantitative limitations.  Demonstrating that an intel 486 can't compute some known computable function in our lifetimes is not proof that the 486 is not a Turing Machine.

Nor is it sufficient to demonstrate that biases exist: a ULM is only 'rational' to the extent that its observational experience and learning machinery allows (and to the extent one has the correct theory of rationality).  In fact, the existence of many (most?) biases intrinsically depends on the EMH - based on the implicit assumption that some cognitive algorithms are innate.  If brains are mostly ULMs then most cognitive biases dissolve, or become learning biases - for if all cognitive algorithms are learned, then evidence for biases is evidence for cognitive algorithms that people haven't had sufficient time/energy/motivation to learn.  (This does not imply that intrinsic limitations/biases do not exist or that the study of cognitive biases is a waste of time; rather the ULH implies that educational history is what matters most)

The genome can only specify a limited amount of information.  The question is then how much of our advanced cognitive machinery for things like facial recognition, motor planning, language, logic, planning, etc. is innate vs learned.  From evolution's perspective there is a huge advantage to preloading the brain with innate algorithms so long as said algorithms have high expected utility across the expected domain landscape.

On the other hand, evolution is also highly constrained in a bit coding sense: every extra bit of code costs additional energy for the vast number of cellular replication events across the lifetime of the organism.  Low code complexity solutions also happen to be exponentially easier to find.  These considerations seem to strongly favor the ULH but they are difficult to quantify.

Neuroscientists have long known that the brain is divided into physical and functional modules.  These modular subdivisions were discovered a century ago by Brodmann.  Every time neuroscientists opened up a new brain, they saw the same old cortical modules in the same old places doing the same old things.  The specific layout of course varied from species to species, but the variations between individuals are minuscule. This evidence seems to strongly favor the EMH.

Throughout most of the 90's up into the 2000's, evidence from computational neuroscience models and AI were heavily influenced by - and unsurprisingly - largely supported the EMH.  Neural nets and backprop were known of course since the 1980's and worked on small problems[18], but at the time they didn't scale well - and there was no theory to suggest they ever would.

Theory of the time also suggested local minima would always be a problem (now we understand that local minima are not really the main problem[19], and modern stochastic gradient descent methods combined with highly overcomplete models and stochastic regularization[20] are effectively global optimizers that can often handle obstacles such as local minima and saddle points[21]).

The other related historical criticism rests on the lack of biological plausibility for backprop style gradient descent.  (There is as of yet little consensus on how the brain implements the equivalent machinery, but target propagation is one of the more promising recent proposals[22][23].)

Many AI researchers are naturally interested in the brain, and we can see the influence of the EMH in much of the work before the deep learning era.  HMAX is a hierarchical vision system developed in the late 90's by Poggio et al as a working model of biological vision[24].  It is based on a preconfigured hierarchy of modules, each of which has its own mix of innate features such as gabor edge detectors along with a little bit of local learning.  It implements the general idea that complex algorithms/features are innate - the result of evolutionary global optimization - while neural networks (incapable of global optimization) use hebbian local learning to fill in details of the design.

#### Dynamic Rewiring

In a groundbreaking study from 2000 published in Nature, Sharma et al successfully rewired ferret retinal pathways to project into the auditory cortex instead of the visual cortex.[25]  The result: auditory cortex can become visual cortex, just by receiving visual data!  Not only does the rewired auditory cortex develop the specific gabor features characteristic of visual cortex; the rewired cortex also becomes functionally visual. [26] True, it isn't quite as effective as normal visual cortex, but that could also possibly be an artifact of crude and invasive brain rewiring surgery.

The ferret study was popularized by the book On Intelligence by Hawkins in 2004 as evidence for a single cortical learning algorithm.  This helped percolate the evidence into the wider AI community, and thus probably helped in setting up the stage for the deep learning movement of today.  The modern view of the cortex is that of a mostly uniform set of general purpose modules which slowly become recruited for specific tasks and filled with domain specific 'code' as a result of the learning (self optimization) process.

The next key set of evidence comes from studies of atypical human brains with novel extrasensory powers.  In 2009 Vuillerme et al showed that the brain could automatically learn to process sensory feedback rendered onto the tongue[27].  This research was developed into a complete device that allows blind people to develop primitive tongue based vision.

In the modern era some blind humans have apparently acquired the ability to perform echolocation (sonar), similar to cetaceans.  In 2011 Thaler et al used MRI and PET scans to show that human echolocators use diverse non-auditory brain regions to process echo clicks, predominantly relying on re-purposed 'visual' cortex.[27]

The echolocation study in particular helps establish the case that the brain is actually doing global, highly nonlocal optimization - far beyond simple hebbian dynamics.  Echolocation is an active sensing strategy that requires very low latency processing, involving complex timed coordination between a number of motor and sensory circuits - all of which must be learned.

Somehow the brain is dynamically learning how to use and assemble cortical modules to implement mental algorithms: everyday tasks such as visual counting, comparisons of images or sounds, reading, etc - all are task which require simple mental programs that can shuffle processed data between modules (some or any of which can also function as short term memory buffers).

To explain this data, we should be on the lookout for a system in the brain that can learn to control the cortex - a general system that dynamically routes data between different brain modules to solve domain specific tasks.

But first let's take a step back and start with a high level architectural view of the entire brain to put everything in perspective.

#### Brain Architecture

Below is a circuit diagram for the whole brain.  Each of the main subsystems work together and are best understood together.  You can probably get a good high level extremely coarse understanding of the entire brain is less than one hour.

(there are a couple of circuit diagrams of the whole brain on the web, but this is the best.  From this site.)

The human brain has ~100 billion neurons and ~100 trillion synapses, but ultimately it evolved from the bottom up - from organisms with just hundreds of neurons, like the tiny brain of C. Elegans.

We know that evolution is code complexity constrained: much of the genome codes for cellular metabolism, all the other organs, and so on.  For the brain, most of its bit budget needs to be spent on all the complex neuron, synapse, and even neurotransmitter level machinery - the low level hardware foundation.

For a tiny brain with 1000 neurons or less, the genome can directly specify each connection.  As you scale up to larger brains, evolution needs to create vastly more circuitry while still using only about the same amount of code/bits.  So instead of specifying connectivity at the neuron layer, the genome codes connectivity at the module layer.  Each module can be built from simple procedural/fractal expansion of progenitor cells.

So the size of a module has little to nothing to do with its innate complexity.  The cortical modules are huge - V1 alone contains 200 million neurons in a human - but there is no reason to suspect that V1 has greater initial code complexity than any other brain module.  Big modules are built out of simple procedural tiling patterns.

Very roughly the brain's main modules can be divided into six subsystems (there are numerous smaller subsystems):

• The neocortex: the brain's primary computational workhorse (blue/purple modules at the top of the diagram).  Kind of like a bunch of general purpose FPGA coprocessors.
• The cerebellum: another set of coprocessors with a simpler feedforward architecture.  Specializes more in motor functionality.
• The thalamus: the orangish modules below the cortex.  Kind of like a relay/routing bus.
• The hippocampal complex: the apex of the cortex, and something like the brain's database.
• The amygdala and limbic reward system: these modules specialize in something like the value function.
• The Basal Ganglia (green modules): the central control system, similar to a CPU.

In the interest of space/time I will focus primarily on the Basal Ganglia and will just touch on the other subsystems very briefly and provide some links to further reading.

The neocortex has been studied extensively and is the main focus of several popular books on the brain.  Each neocortical module is a 2D array of neurons (technically 2.5D with a depth of about a few dozen neurons arranged in about 5 to 6 layers).

Each cortical module is something like a general purpose RNN (recursive neural network) with 2D local connectivity.  Each neuron connects to its neighbors in the 2D array.  Each module also has nonlocal connections to other brain subsystems and these connections follow the same local 2D connectivity pattern, in some cases with some simple affine transformations.  Convolutional neural networks use the same general architecture (but they are typically not recurrent.)

Cortical modules - like artifical RNNs - are general purpose and can be trained to perform various tasks.  There are a huge number of models of the cortex, varying across the tradeoff between biological realism and practical functionality.

Perhaps surprisingly, any of a wide variety of learning algorithms can reproduce cortical connectivity and features when trained on appropriate sensory data[27].  This is a computational proof of the one-learning-algorithm hypothesis; furthermore it illustrates the general idea that data determines functional structure in any general learning system.

There is evidence that cortical modules learn automatically (unsupervised) to some degree, and there is also some evidence that cortical modules can be trained to relearn data from other brain subsystems - namely the hippocampal complex.  The dark knowledge distillation technique in ANNs[28][29] is a potential natural analog/model of hippocampus -> cortex knowledge transfer.

Module connections are bidirectional, and feedback connections (from high level modules to low level) outnumber forward connections.  We can speculate that something like target propagation can also be used to guide or constrain the development of cortical maps (speculation).

The hippocampal complex is the root or top level of the sensory/motor hierarchy.  This short youtube video  gives a good seven minute overview of the HC.  It is like a spatiotemporal database.  It receives compressed scene descriptor streams from the sensory cortices, it stores this information in medium-term memory, and it supports later auto-associative recall of these memories.  Imagination and memory recall seem to be basically the same.

The 'scene descriptors' take the sensible form of things like 3D position and camera orientation, as encoded in place, grid, and head direction cells.  This is basically the logical result of compressing the sensory stream, comparable to the networking data stream in a multiplayer video game.

Imagination/recall is basically just the reverse of the forward sensory coding path - in reverse mode a compact scene descriptor is expanded into a full imagined scene.  Imagined/remembered scenes activate the same cortical subnetworks that originally formed the memory (or would have if the memory was real, in the case of imagined recall).

The amygdala and associated limbic reward modules are rather complex, but look something like the brain's version of the value function for reinforcement learning.  These modules are interesting because they clearly rely on learning, but clearly the brain must specify an initial version of the value/utility function that has some minimal complexity.

As an example, consider taste.  Infants are born with basic taste detectors and a very simple initial value function for taste.  Over time the brain receives feedback from digestion and various estimators of general mood/health, and it uses this to refine the initial taste value function.  Eventually the adult sense of taste becomes considerably more complex.  Acquired taste for bitter substances - such as coffee and beer - are good examples.

The amygdala appears to do something similar for emotional learning.  For example infants are born with a simple versions of a fear response, with is later refined through reinforcement learning.  The amygdala sits on the end of the hippocampus, and it is also involved heavily in memory processing.

See also these two videos from khanacademy: one on the limbic system and amygdala (10 mins), and another on the midbrain reward system (8 mins)

#### The Basal Ganglia

The Basal Ganglia is a wierd looking complex of structures located in the center of the brain.  It is a conserved structure found in all vertebrates, which suggests a core functionality.  The BG is proximal to and connects heavily with the midbrain reward/limbic systems.  It also connects to the brain's various modules in the cortex/hippocampus, thalamus and the cerebellum . . . basically everything.

All of these connections form recurrent loops between associated compartmental modules in each structure: thalamocortical/hippocampal-cerebellar-basal_ganglial loops.

Just as the cortex and hippocampus are subdivided into modules, there are corresponding modular compartments in the thalamus, basal ganglia, and the cerebellum.  The set of modules/compartments in each main structure are all highly interconnected with their correspondents across structures, leading to the concept of distributed processing modules.

Each DPM forms a recurrent loop across brain structures (the local networks in the cortex, BG, and thalamus are also locally recurrent, whereas those in the cerebellum are not).  These recurrent loops are mostly separate, but each sub-structure also provides different opportunities for inter-loop connections.

The BG appears to be involved in essentially all higher cognitive functions.  Its core functionality is action selection via subnetwork switching.  In essence action selection is the core problem of intelligence, and it is also general enough to function as the building block of all higher functionality.  A system that can select between motor actions can also select between tasks or subgoals.  More generally, low level action selection can easily form the basis of a Turing Machine via selective routing: deciding where to route the output of thalamocortical-cerebellar modules (some of which may specialize in short term memory as in the prefrontal cortex, although all cortical modules have some short term memory capability).

There are now a number of computational models for the Basal Ganglia-Cortical system that demonstrate possible biologically plausible implementations of the general theory[28][29]; integration with the hippocampal complex leads to larger-scale systems which aim to model/explain most of higher cognition in terms of sequential mental programs[30] (of course fully testing any such models awaits sufficient computational power to run very large-scale neural nets).

For an extremely oversimplified model of the BG as a dynamic router, consider an array of N distributed modules controlled by the BG system.  The BG control network expands these N inputs into an NxN matrix.  There are N2 potential intermodular connections, each of which can be individually controlled.  The control layer reads a compressed, downsampled version of the module's hidden units as its main input, and is also recurrent.  Each output node in the BG has a multiplicative gating effect which selectively enables/disables an individual intermodular connection.  If the control layer is naively fully connected, this would require (N2)2 connections, which is only feasible for N ~ 100 modules, but sparse connectivity can substantially reduce those numbers.

It is unclear (to me), whether the BG actually implements NxN style routing as described above, or something more like 1xN or Nx1 routing, but there is general agreement that it implements cortical routing.

Of course in actuality the BG architecture is considerably more complex, as it also must implement reinforcement learning, and the intermodular connectivity map itself is also probably quite sparse/compressed (the BG may not control all of cortex, certainly not at a uniform resolution, and many controlled modules may have a very limited number of allowed routing decisions).  Nonetheless, the simple multiplicative gating model illustrates the core idea.

This same multiplicative gating mechanism is the core principle behind the highly successful LSTM (Long Short-Term Memory)[30] units that are used in various deep learning systems.  The simple version of the BG's gating mechanism can be considered a wider parallel and hierarchical extension of the basic LSTM architecture, where you have a parallel array of N memory cells instead of 1, and each memory cell is a large vector instead of a single scalar value.

The main advantage of the BG architecture is parallel hierarchical approximate control: it allows a large number of hierarchical control loops to update and influence each other in parallel.  It also reduces the huge complexity of general routing across the full cortex down into a much smaller-scale, more manageable routing challenge.

#### Implications for AGI

These two conceptions of the brain - the universal learning machine hypothesis and the evolved modularity hypothesis - lead to very different predictions for the likely route to AGI, the expected differences between AGI and humans, and thus any consequent safety issues and strategies.

In the extreme case imagine that the brain is a pure ULM, such that the genetic prior information is close to zero or is simply unimportant.  In this case it is vastly more likely that successful AGI will be built around designs very similar to the brain, as the ULM architecture in general is the natural ideal, vs the alternative of having to hand engineer all of the AI's various cognitive mechanisms.

In reality learning is computationally hard, and any practical general learning system depends on good priors to constrain the learning process (essentially taking advantage of previous knowledge/learning).  The recent and rapid success of deep learning is strong evidence for how much prior information is ideal: just a little.  The prior in deep learning systems takes the form of a compact, small set of hyperparameters that control the learning process and specify the overall network architecture (an extremely compressed prior over the network topology and thus the program space).

The ULH suggests that most everything that defines the human mind is cognitive software rather than hardware: the adult mind (in terms of algorithmic information) is 99.999% a cultural/memetic construct.  Obviously there are some important exceptions: infants are born with some functional but very primitive sensory and motor processing 'code'.  Most of the genome's complexity is used to specify the learning machinery, and the associated reward circuitry.  Infant emotions appear to simplify down to a single axis of happy/sad; differentiation into the more subtle vector space of adult emotions does not occur until later in development.

If the mind is software, and if the brain's learning architecture is already universal, then AGI could - by default - end up with a similar distribution over mindspace, simply because it will be built out of similar general purpose learning algorithms running over the same general dataset.  We already see evidence for this trend in the high functional similarity between the features learned by some machine learning systems and those found in the cortex.

Of course an AGI will have little need for some specific evolutionary features: emotions that are subconsciously broadcast via the facial muscles is a quirk unnecessary for an AGI - but that is a rather specific detail.

The key takeway is that the data is what matters - and in the end it is all that matters.  Train a universal learner on image data and it just becomes a visual system.  Train it on speech data and it becomes a speech recognizer.  Train it on ATARI and it becomes a little gamer agent.

Train a universal learner on the real world in something like a human body and you get something like the human mind.  Put a ULM in a dolphin's body and echolocation is the natural primary sense, put a ULM in a human body with broken visual wiring and you can also get echolocation.

Control over training is the most natural and straightforward way to control the outcome.

To create a superhuman AI driver, you 'just' need to create a realistic VR driving sim and then train a ULM in that world (better training and the simple power of selective copying leads to superhuman driving capability).

So to create benevolent AGI, we should think about how to create virtual worlds with the right structure, how to educate minds in those worlds, and how to safely evaluate the results.

One key idea - which I proposed five years ago is that the AI should not know it is in a sim.

New AI designs (world design + architectural priors + training/education system) should be tested first in the safest virtual worlds: which in simplification are simply low tech worlds without computer technology.  Design combinations that work well in safe low-tech sandboxes are promoted to less safe high-tech VR worlds, and then finally the real world.

A key principle of a secure code sandbox is that the code you are testing should not be aware that it is in a sandbox.  If you violate this principle then you have already failed.  Yudkowsky's AI box thought experiment assumes the violation of the sandbox security principle apriori and thus is something of a distraction. (the virtual sandbox idea was most likely discussed elsewhere previously, as Yudkowsky indirectly critiques a strawman version of the idea via this sci-fi story).

The virtual sandbox approach also combines nicely with invisible thought monitors, where the AI's thoughts are automatically dumped to searchable logs.

Of course we will still need a solution to the value learning problem.  The natural route with brain-inspired AI is to learn the key ideas behind value acquisition in humans to help derive an improved version of something like inverse reinforcement learning and or imitation learning[31] - an interesting topic for another day.

#### Conclusion

Ray Kurzweil has been predicting for decades that AGI will be built by reverse engineering the brain, and this particular prediction is not especially unique - this has been a popular position for quite a while.  My own investigation of neuroscience and machine learning led me to a similar conclusion some time ago.

The recent progress in deep learning, combined with the emerging modern understanding of the brain, provide further evidence that AGI could arrive around the time when we can build and train ANNs with similar computational power as measured very roughly in terms of neuron/synapse counts.  In general the evidence from the last four years or so supports Hanson's viewpoint from the Foom debate.  More specifically, his general conclusion:

Future superintelligences will exist, but their vast and broad mental capacities will come mainly from vast mental content and computational resources. By comparison, their general architectural innovations will be minor additions.

The ULH supports this conclusion.

Current ANN engines can already train and run models with around 10 million neurons and 10 billion (compressed/shared) synapses on a single GPU, which suggests that the goal could soon be within the reach of a large organization.  Furthermore, Moore's Law for GPUs still has some steam left, and software advances are currently improving simulation performance at a faster rate than hardware.  These trends implies that Anthropomorphic/Neuromorphic AGI could be surprisingly close, and may appear suddenly.

What kind of leverage can we exert on a short timescale?

# 103

New Comment
Some comments are truncated due to high volume. Change truncation settings

All of this is interesting, but it seems to me that you did not make a strong case for the brain using an universal learning machine as its main system.

Specifically, I think you fail to address the evidence for evolved modularity:

• The brain uses spatially specialized regions for different cognitive tasks.

• This specialization pattern is mostly consistent across different humans and even across different species.

• Damage to or malformation of some brain regions can cause specific forms of disability (e.g. face blindness). Sometimes the disability can be overcome but often not completely.

• In various mammals, infants are capable of complex behavior straight out of the womb. Human infants are only exhibit very simple behaviors and require many years to reach full cognitive maturity therefore the human brain relies more on learning than the brain of other mammals, but the basic architecture is the same, thus this is a difference of degree, not kind.

It seems more likely that if there is a general-purpose "universal" learning system in the human brain then it is used as an inefficient fall-back mechanism when the specialized modules fail, not as the core mechanism that hand... (read more)

Thanks, I was waiting for at least one somewhat critical reply :)

Specifically, I think you fail to address the evidence for evolved modularity:

• The brain uses spatially specialized regions for different cognitive tasks.
• This specialization pattern is mostly consistent across different humans and even across different species.

The ferret rewiring experiments, the tongue based vision stuff, the visual regions learning to perform echolocation computations in the blind, this evidence together is decisive against the evolved modularity hypothesis as I've defined that hypothesis, at least for the cortex. The EMH posits that the specific cortical regions rely on complex innate circuitry specialized for specific tasks. The evidence disproves that hypothesis.

Damage to or malformation of some brain regions can cause specific forms of disability (e.g. face blindness). Sometimes the disability can be overcome but often not completely.

Sure. Once you have software loaded/learned into hardware, damage to the hardware is damage to the software. This doesn't differentiate the two hypotheses.

In various mammals, infants are capable of complex behavior straight out of the womb. Human in

7V_V5yBut none of these works as well as using the original task-specific regions, and anyway in all these experiments the original task-specific regions are still present and functional, therefore maybe the brain can partially use these regions by learning how to route the signals to them. But then why doesn't universal learning just co-opt some other brain region to perform the task of the damaged one? In the cases where there is a congenital malformation, that makes the usual task-specific region missing or dysfunctional, why isn't the task allocated to some other region? And anyway why is the specialization pattern consistent across individuals and even species? If you train an artificial neural network multiple times on the same dataset from different random initializations each time the hidden nodes will specialize in a different way: at least ANNs have permutation symmetry between nodes in the same layer, and as long as nodes operate in the linear region of the activation function, there is also redundancy between layers. This means that many sets of weights specify the same or similar function, and the training process chooses one of them randomly depending on the initialization (and minibatch sampling, dropout, etc.). If, as you claim, the basal ganglia and the cortex in the brain make up a sort of cpu-memory system, then there should be substantial permutation symmetry. After all, in a computer you can swap block or pages of memory around and as long as pointers (or page tables) are updated the behavior does not change, up to some performance issues due to cache misses. If the brain worked that way we should expect cortical regions to be allocated to different tasks in a more or less random pattern varying between individuals. Instead we observe substantial consistency, even in the left-right specialization patterns which is remarkable since at macroscopic level the brain has substantial lateral symmetry. Decortication experiments only show that certain spec

in all these experiments the original task-specific regions are still present and functional, therefore maybe the brain can partially use these regions by learning how to route the signals to them.

No - these studies involve direct measurements (electrodes for the ferret rewiring, MRI for echolocation). They know the rewired auditory cortex is doing vision, etc.

But then why doesn't universal learning just co-opt some other brain region to perform the task of the damaged one?

It can, and this does happen all the time. Humans can recover from serious brain damage (stroke, injury, etc). It takes time to retrain and reroute circuitry - similar to relearning everything that was lost all over again.

And anyway why is the specialization pattern consistent across individuals and even species? If you train an artificial neural network multiple times on the same dataset

Current ANN's assume a fixed module layout, so they aren't really comparable in module-task assignment.

Much of the specialization pattern could just be geography - V1 becomes visual because it is closest to the visual input. A1 becomes auditory because it is closest to the auditory input. etc.

This should be the de... (read more)

2V_V5yWell, the eyes are at the front of the head, but the optic nerves connect to the brain at the back, and they also cross at the optic chiasm. Axons also cross contralaterally in the spinal cord and if I recall correctly there are various nerves that also don't take the shortest path. This seems to me as evidence that the nervous system is not strongly optimized for latency.
3jacob_cannell5yThis is a total misconception, and it is a good example of the naive engineer fallacy (jumping to the conclusion that a system is poorly designed when you don't understand how the system actually works and why). Remember the distributed software modules - including V1 - have components in multiple physical modules (cortex, cerebellum, thalamus, BG). Not every DSM has components in all subsystems, but V1 definitely has a thalamic relay component (VGN). The thalamus/BG is in the center of the brain, which makes sense from wiring minimization when you understand the DPM system. Low freq/compressed versions of the cortical map computations can interact at higher speeds inside the small compact volume of the BG/thalamus. The BG/thalamus basically contains a microcosm model of the cortex within itself. The thalamic relay comes first in sequential processing order, so moving cortical V1 closer to the eyes wouldn't help in the slightest. (Draw this out if it doesn't make sense)
1nshepperd5yIt seems a little strange to treat this as a triumphant victory for the ULH. At the most, you've shown that the "fundamentalist" evolved modularity hypothesis is false. You didn't really address how the ULH explains this same evidence. And there are other mysteries in this model, such as the apparent universality of specific cognitive heuristics and biases, or of various behaviours like altruism, deception, sexuality that seems obviously evolved. And, as V_V mentioned, the lateral asymmetry of the brain's functionality vs the macroscopic symmetry. Otherwise, the conclusion I would draw from this is that both theories are wrong, or that some halfway combination of them is true (say, "universal" plasticity plus a genetic set of strong priors somehow encoded in the structure).
1advael5yFor e.g. the ferret rewiring experiments, tongue based vision, etc., is a plausible alternative hypothesis that there are more general subtypes of regions that aren't fully specialized but are more interoperable than others? For example, (Playing devil's advocate here) I could phrase all of the mentioned experiments as "sensory input remapping" among "sensory input processing modules." Similarly, much of the work in BCI interfaces for e.g. controlling cursors or prosthetics could be called "motor control remapping". Have we ever observed cortex being rewired for drastically dissimilar purposes? For example, motor cortex receiving sensory input? If we can't do stuff like that, then my assumption would be that at the very least, a lot of the initial configuration is prenatal and follows kind of a "script" that might be determined by either some genome-encoded fractal rule of tissue formation, or similarities in the general conditions present during gestation. Either way, I'm not yet convinced there's a strong argument that all brain function can be explained as working like a ULM (Even if a lot of it can)
3jacob_cannell5yI'm not sure - I have a vague memory of something along those lines but .. nothing specific. From what I remember, motor, sensor, and association cortex do have some intrinsic differences at the microcircuit level. For example some motor cortex has larger pyramidal cells in the output layer. However, I believe most motor cortex is best described as sensorimotor - it depends heavily on sensor data from the body. Well yes - there is a general script for the overall architecture, and alot of innate functionality as well, especially in specific regions like the brainstem's pattern generators. As I said in the article - there is always room for innate functionality in the architectural prior and in specific circuits - the brain is certainly not a pure ULM. ULM refers to the overall architecture, with the general learning part specifically implemented by the distributed BG/cortex/cerbellum modules. But the BG and hippocampal system also rely heavily on learning internally, as does the amygdala and .. probably almost all of it to varying degrees. The brainstem is specifically the place where we can point and say - this is mostly innate circuitry, but even it probably has some learning going on.
2[anonymous]5yIt's far more likely that different brain modules implement different learning rules, but all learn, than that they encode innate mental functionality which is not subject to learning at all.
2advael5yI'm inclined to agree. Actually I've been convinced for a while that this is a matter of degrees rather than being fully one way or the other (Modules versus learning rules), and am convinced by this article that the brain is more of a ULM than I had previously thought. Still, when I read that part the alternative hypothesis sprung to mind, so I was curious what the literature had to say about it (Or the post author.)

Thank you. This was an excellent article, which helped me clarify my own thinking on the topic.

I'd love to see you write more on this.

Great post! Thanks for writing it. Seems like a good fit for Main.

So just to clarify my understanding: If the ULH is true it becomes more plausible that, say, playing video games and hating books because authority figures force you to read them in school have long-term broad impacts on your personality. And if the EMH is true, it becomes more plausible that important characteristics like the Big Five personality traits and intelligence are genetically coded and you become the person your genes describe. Correct?

Yudkowsky's AI box experiments and that entire notion of open boxing is a strawman - a distraction.

Us humans have contemplated whether we are in a simulation even though no one "outside the Matrix" told us we might be. Is it possible that an AI-in-training might contemplate the same thing?

In general the evidence from the last four years or so supports Hanson's viewpoint from the Foom debate.

Really? My impression was that Hanson had more of a EMH view.

8jacob_cannell5yI agree with this largely but I would replace 'personality' with 'mental software', or just 'mind'. Personality to me connotes a subset of mental aspects that are more associated with innate variables. I suspect that enjoying/valuing learning is extremely important for later development. It seems probable that some people are born with a stronger innate drive for learning, but that drive by itself can also probably be adjusted through learning. But i'm not aware of any hard evidence on this matter. In my case I was somewhat obsessed with video games as a young child and my father actually did force me to read books and even the encyclopedia. I found that I hated the books he made me read (I only liked sci-fi) but I loved the encyclopedia. I ended up learning how to quickly skim books and fake it enough to pass the resulting QA test. I don't think abstract high level variables like big five personality traits or IQ scores are the relevant features for the EMH vs ULH issue. For example in the ULH scenario, there is still plenty of room for strongly genetically determined IQ effects (hardware issues/tradeoffs), and personality variables are not complex cognitive algorithms. Sure, and this was part of what my post from 5 years back was all about. It's kind of a world design issue. Is it better to have your AIs in your testsim believe in a simplistic creator god? (which is in a way on the right track with regards to the sim arg, but it also doesn't do them much good) Or is better for them to have a naturalist/atheist worldview? (potentially more dangerous in the long term as it leads to scientific investigation and eventually the sim arg) That post was downvoted into hell, in part I think because I posted to main - I was new to LW and didn't understand the main/discussion distinction. Also, I think people didn't like the general idea of anything mentioning the word theology, or the idea of intentionally giving your testsim AI a theology. I should clarify - I meant
1John_Maxwell5yRe: AIs in a simulation, it seems like whatever goals the AI had would be defined in terms of the simulation (similar to how if humanity discovered we were in a hackable simulation, our first priorities would be to make sure the simulation didn't get shut off, invent immortality, provide everyone with unlimited cake, etc.--all concerns that exist within our simulation.) So even if the AI realizes its in a simulation, having its goal defined in terms of the simulation probably counts as a weak security measure.

A few brief supplements to your introduction:

The source of the generated image is no longer mysterious: Inceptionism: Going Deeper into Neural Networks

But though the above is quite fascinating and impressive, we should also keep in mind the bizarre false positives that a person can generate: Images that fool computer vision raise security concerns

7jacob_cannell5yThe trippy shuggorth title image was mysterious when it was originally posted, basically someone leaked an image a little before the inceptionism blog post. A CNN is a reasonable model for fast feedforward vision. We can isolate this pathway for biological vision by using rapid serial presentation - basically flashing an image for 100ms or so. So imagine if you just saw a flash of one of these images, for a brief moment, and then you had to quickly press a button for the image category - no time to think about it - it's jeopardy style instant response. There is no button for "noisy image", there is no button for "wavy line image", etc. Now the fooling images are generated by an adversarial process. It's like we have a copy of a particular mind in a VR sim, we flash it an image, see what button it presses. Based on the response, we then generate a new image and unwind time and repeat. We keep doing this until we get some wierd classification errors. It allows us to explore the decision space of the agent. It is basically reverse engineering. It requires a copy of the agent's code or at least access to a copy with the ability to do tons of queries, and it also probably depends on the agent being completely deterministic. I think that biological minds avoid this issue indirectly because they use stochastic sampling based on secure hardware/analog noise generators. Stochastic models/ANNs could probably avoid this issue.
4kpreid5yI look at the bizarre false positives and I wonder if (warning: wild speculation) the problem is that the networks were not trained to recognize the lack of objects. For example, in most cases you have some noise in the image, so if every training image is something, or rather something-plus-noise, then the system could learn that the noise is 100% irrelevant and pick out the something. (The noisy images look to me like they have small patches in one spot faintly resembling what they're identified as — if my vision had a rule that deemphasized the non-matching noise and I had a much smaller database of the world than I do, then I think I'd agree with those neural networks.) If the above theory is true, then a possible fix would be to include in training data a variety of images for which the expected answers are like “empty scene”, "too noisy", “simple geometric pattern”, etc. But maybe this is already done — I'm not familiar with the field.
2Unknowns5yNo, even if you classify these false positives as "no image", this will not prevent someone from constructing new false positives. Basically the amount of training data is always extremely small compared to the theoretically possible number of distinct images, so it is always possible to construct such adversarial positives. These are not random images which were accidentally misidentified in this way. They have been very carefully designed based on the current data set. Something similar is probably theoretically possible with human vision recognition as well. The only difference would be that we would be inclined to say "but it really does look like a baseball!"
5jacob_cannell5yThis technique exploits the fact that the CNN is completely deterministic - see my reply above. It may be very difficult for stochastic networks. CNNs are comparable to the first 150ms or so of human vision, before feedback , multiple saccades, and higher order mental programs kicks in. So the difficulty in generating these fooling images also depends on the complexity of the inference - a more complex AGI with human-like vision given larger amounts of time to solve the task would probably also be harder to fool, independent of the stochasticity issue.
2eternal_neophyte5yA human being would be capable of pointing out why something looks like a baseball - to be able to point out where the curves and lines are that provoke that idea. We do this when we gaze at clouds without coming to believe there really are giant kettles floating around; we're capable of taking the abundance of contextual information in the scene into account and coming up with reasonable hypotheses for why what we're seeing looks like x, y or z. If classifier vision systems had the same ability they probably wouldn't make the egregious mistakes they do.
1Unknowns5yIf I understand correctly how these images are constructed, it would be something like this: take some random image. The program can already make some estimate of whether it is a baseball, say 0.01% or whatever. Then you go through the image pixel by pixel and ask, "If I make this pixel slightly brighter, will your estimate go up? if not, will it go up if I make it slightly dimmer?" (This is just an example, you could change the color or whatever as well.) Thus you modify each pixel such that you increase the program's estimate that it is a baseball. By the time you have gone through all the pixels, the probability of being a baseball is very high. But to us, the image looks more or less just the way it did at first. Each pixel has been modified too slightly to be noticed by us. But this means that in principle the program can indeed explain why it looks like a baseball -- it is a question of a very slight tendency in each pixel in the entire image.
2eternal_neophyte5yBut the explanation will be just as complex as the procedure used to classify the data. If I change the hue slightly or twiddle their RGB values just slightly, the "explanation" for why the data seems to contain a baseball image will be completely different. Human beings on the other hand can look at pictures of the same object in different conditions of lighting, of different particular sizes and shapes, taken from different camera angles, etc. and still come up with what would be basically the same set of justifications for matching each image to a particular classification (e.g. an image contains a roughly spherical field of white, with parallel bands of stitch-like markings bisecting it in an arc...hence it's of a baseball). The ability of human beings to come up with such compressed explanations, and our ability to arrange them into an ordering, is arguably what allows us to deal with iconic representations of and represent objects at varying levels of detail (as in http://38.media.tumblr.com/tumblr_m7z4k1rAw51rou7e0.png [http://38.media.tumblr.com/tumblr_m7z4k1rAw51rou7e0.png]).
2Jiro5yWill it? What if slightly twiddling the RGB values produces something that is basically "spherical field of white, etc. with enough noise on top of it that humans can't see it"?
1eternal_neophyte5yThat would all hinge on what it means for an image to be "hidden" beneath noise, I suppose. The more noise you layer on top of an image the more room for interpretation there is in classifying it, and the less salient any particular classification candidate will be. If a scrutable system can come up with compelling arguments for a strange classification that human beings would not make, then its choices would be naturally less ridiculous than otherwise. But to say that "humans conceivably may suffer from the same problem" is a bit of a dodge; esp. in light of the fact that these systems are making mistakes we clearly would not. But either way, what you're proposing and what Unknowns was arguing are different. Unknowns was (if I understood him rightly) arguing that the assignment of different probability weights for pixels (or, more likely, groups of pixels) representing a particular feature of an object is an explanation of why they're classified the way they are. But such an "explanation" in inscrutable; we cannot ourselves easily translate it into the language of lines, curves, apparent depth, etc. (unless we write some piece of software to do this and which is then effectively part of the agent).
1Jiro5yLook at it from the other end: You can take a picture of a baseball and overlay noise on top of it. There could, at least plausibly, be a point where overlaying the noise destroys the ability for humans to see the baseball, but the information is actually still present (and could, for instance, be recovered if you applied a noise reduction algorithm to that). Perhaps when you are twiddling the pixels of random noise, you're actually constructing such a noisy baseball image a pixel at a time.
1eternal_neophyte5yAgree with all you said, but have to comment on You could be constructing a noisy image of a baseball one pixel at a time. In fact if you actually are then your network would be amazingly robust. But in a non-robust network, it seems much more probable that you're just exploiting the system's weaknesses and milking them for all they're worth.

One of the best posts I've here on LW, congratulations. I think that the most important algorithms that the brain implements will probably be less complex than anticipated. Epigenesis and early ontogenetic adaptation are heavily depended on feedback from the environment and probably very general, even if the 'evolution of learning' and genetic complexity provides some of the domain specifications ab initio. Results considering bounded computation (computational resources and limited information) will probably show that the ULM viewpoint cluster is compatible with the existence of cognitive biases and heuristics in our cognition http://www.pnas.org/content/103/9/3198

2jacob_cannell5yThanks! That's an interesting link. At the very lowest level, neural competition through local inhibitory circuits is a central mechanism used throughout brains. Of course that's not the same thing as conflicting agents, but perhaps there is a general them of competition between subsytems at higher levels.
0boni_bo5yYes. That paper has been cited by Stuart J. Russell's "Rationality and Intelligence: A Brief Update" and in Valiant 's second paper on evolvability.
[-][anonymous]5y 5

The problem with this is that the "engineering diagram" of the brain is really only a hardwire wiring diagram, and the status of speculations about how the hardware modules (really just areas) relate to functional modules is ... well, just that, speculation.

There are good reasons to suspect that the functional diagram would look competely different (reasons based in psychological data) and the current state of the art there is poor.

Except perhaps in certain quarters.

4jacob_cannell5yYes the engineering diagram is a hardware wiring diagram, which I hope I made clear. In general one of my main points was that most of the big systems (cortex, cerebellum) are general purpose re-programmable hardware - they don't come pre-equipped with software. So the actual functionality of each module arises from the learning system slowly figuring out the appropriate software during development. I provided some links to the key evidence for the overall hypothesis, I think it is well beyond speculation at this point. (the article certainly contains some speculations, but I labeled them as such) Well of course, because the functional diagram is learned software, and thus can vary substantially from human to human. For example the functional software diagram for the cortex of a blind echolocator looks very different than that of a neurotypical.
0[anonymous]5yThere are serious problems with the claims you are making. The idea that the cortex or cerebellum, for example, can be described as "general purpose re-programmable hardware" is lacking in both clarity and support. Clarity. In what sense "generally re-programmable"? So much that it could run Microsoft Word? I have never seen anyone try to go that far, so clearly you must mean something less general. But it is very unclear what exactly is the sense in which you mean the words "general purpose re-programmable hardware". Support. There are no generally accepted theories for what the function of the cortex actually is. Can you be clearer about what you think the evidence is, in a nutshell? You seem to be saying that the cortex is a universal reinforcement learning machine. But the kind of evidence that you present is (if you will forgive an extreme oversimplification for the purposes of clarity) the observation that the basal ganglia plays a role that resembles a global packet-switching router, and since a global packet-switching router would be expected to be seen in a reinforcement learning machine, QED. Now, don't get me wrong, I am symathetic to much of the general spirit that you convey here, but my problem is that my research has gone down this road for a long time already, and while we agree on the general spirit, you have jumped forward several steps and come to (what I see as) a premature conclusion about functionality. To be specific, the concept of a "reinforcement learning machine" is ghastly (it contains "And then some magic happens..." steps), and I believe it would be a terrible mistake to say that there is any clear evidence that we have found evidence for a reinforcement learning machine in the brain already. I agree with the general interpretation of what those hippocampal and BG loops might be doing, but there are MANY other interpretations beside seeing them as a component of a reinforcement learning machine. This is a difficult topic to discu
3jacob_cannell5y"General purpose learning hardware" is perhaps better. I used "re-programmable" as an analogy to an FPGA. However, in a literal sense the brain can learn to use simpe paper + pencil tools as an extended memory, and can learn to emulate a turing machine. Given huge amounts of time, the brain could literally run windows. And more to the point, programmers ultimately rely on the ability of our brain to simulate/run little sections of code. So in a more practical literal sense, all of the code of windows first ran on human brains. You seem to be hung up reinforcement learning. I use some of that terminology to define a ULM because it is just the most general framework - utility/value functions, etc. Also, there is some pretty strong evidence for RL in the brain, but the brain's learning mechanisms are complex - moreso than any current ML system. I hope I conveyed that in the article. Learning in the lower sensory cortices in particular can also be modeled well by unsupervised learning, and I linked to some articles showing how UL models can reproduce sensory cortex features. UL can be viewed as a potentially reasonable way to approximate the ideal target update, especially for lower sensory cortex that is far (in a network depth sense) from any top down signals from the reward system. The papers I linked to about approximate bayesian learning and target propagation in particular can help put it all into perspective. Well, the article summarizes the considerable evidence that the brain is some sort of approximate universal learning machine. I suspect that you have a particular idea of RL that is less than fully general.
1[anonymous]5yYou are right to say that, seen from a high enough level, the brain does general purpose learning .... but the claim becomes diluted if you take it right up to the top level, where it clearly does. For example, the brain could be 99.999% hardwired, with no flexibility at all except for a large RAM memory, and it would be consistent with the brain as you just described it (able to learn anything). And yet that wasn't the type of claim you were making in the essay, and it isn't what most people mean when they refer to "general purpose learning". You (and they) seem to be pointing to an architectural flexibility that allows the system to grow up to be a very specific, clever sort of understanding system without all the details being programmed ahead of time. I am not sure why you say I am hung up on RL: you quoted that as the only mechanism to be discussed in the context, so I went with that. And you are (like many people) not correct to say that RL is the most general framework, or that there is good evidence for RL in the brain. That is a myth: the evidence is very poor indeed. RL is not "fully general" -- that was precisely my point earlier. If you can point me to a rigorous proof of that which does not have an "and then some magic happens" step in it, I will eat my hat :-) (Already had a long discussion with Marchus Hutter about this btw, and he agreed in the end that his appeal to RL was based on nothing but the assumption that it works.)
2jacob_cannell5yUpon consideration, I changed my own usage of "Universal Reinforcement Learning Machine" to "Universal Learning Machine". The several remaining uses of "reinforcement learning" are contained now to the context of the BG and the reward circuitry. Again we are probably talking about very different RL conceptions. So to be clear, I summarized my general viewpoint of an ULM. I believe it is an extremely general model, that basically covers any kind of universal learning agent. The agent optimizes/steers the future according to some sort of utility function (which is extremely general), and self-optimization emerges naturally just by including the agent itself as part of the system to optimize. Do you have a conception of a learning agent which does not fit into that framework? The evidence for RL in the brain - of the extremely general form I described - is indeed very strong, simply because any type of learning is just a special case of universal learning. Taboo 'reinforcement' if you want, and just replace with "utility driven learning". AIXI specifically has a special reward channel, and perhaps you are thinking of that specific type of RL which is much more specific than universal learning. I should perhaps clarify and or remove the mention of AIXI. A ULM - as I described - does not have a reward channel like AIXI. It just conceptually has a value and or utility function initially defined by some arbitrary function that conceptually takes the whole brain/model as input. In the case of the brain, the utility function is conceptual, in practice it is more directly encoded as a value function.
5[anonymous]5yAbout the universality or otherwise of RL. Big topic. There's no need to taboo "RL" because switching to utility-based learning does not solve the issue (and the issue I have in mind covers both). See, this is the problem. It is hard for me to fight the idea that RL (or utility-driven learning) works, because I am forced to fight a negative; a space where something should be, but which is empty ....... namely, the empirical fact that Reinforcement Learning has never been made to work in the absence of some surrounding machinery that prepares or simplifies the ground for the RL mechanism. It is a naked fact about traditional AI that it puts such an emphasis on the concept of expected utility calculations without any guarantees that a utility function can be laid on the world in such a way that all and only the intelligent actions in that world are captured by a maximization of that quantity. It is a scandalously unjustified assumption, made very hard to attack by the fact that it is repeated so frequently that everyone believes it be true just because everyone else believes it. If anyone ever produced a proof why it should work, there would be a there there, and I could undermine it. But .... not so much! About AIXI and my conversation with Marcus: that was actually about the general concept of RL and utility-driven systems, not anything specific to AIXI. We circled around until we reached the final crux of the matter, and his last stand (before we went to the conference banquet) was "Yes, it all comes down to whether you believe in the intrinsic reasonableness of the idea that there exists a utility function which, when maximized, yields intelligent behavior .......... but that IS reasonable, .... isn't it?" My response was "So you do agree that that is where the buck stops: I have to buy the reasonableness of that idea, and there is no proof on the table for why I SHOULD buy it, no?" Hutter: "Yes." Me: "No matter how reasonable it seems, I don't buy it" H
3TheAncientGeek5yI don't think that is an overstatement. If MIRI is basicatly wrong about UFs, then most of its case unravels. Why isnt the issue bring treated as a matter of urgency?
4Kaj_Sotala5yWon't comment about past affairs, but these days at least part of MIRI seems more open to the possibility. E.g. this thread [http://lesswrong.com/lw/l7o/miri_research_guide/bkqm?context=1#bkqm] where So8res (Nate Soares, now Executive Director of MIRI) lists some possible reasons for why it might be necessary to move beyond utility functions. (He is pretty skeptical of most, but at least he seems to be seriously considering the possibility, and gives a ~15% chance "that VNM won't cut it".)
-1[anonymous]5yThe day that I get invited as a guest speaker by either MIRI or FHI will mark the point at which they start to respect and take seriously alternative viewpoints.
4gjm5yWould that be this paper [http://richardloosemore.com/docs/2007_ComplexSystems_rpwl.pdf]? If so, it seems to me to have rather little to do with the question of whether utility functions are necessary, helpful, neutral, unhelpful, or downright inconsistent with genuinely intelligent behaviour. It argues that intelligent minds may be "complex systems" whose behaviour is very difficult to relate to their lower-level mechanisms, but something that attempts to optimize a utility function can perfectly well have that property. (Because the utility function can itself be complex in the relevant sense; or because the world is complex, so that effective optimization of even a not-so-complex utility function turns out to be achievable only by complex systems; or because even though the utility function could be optimized by something not-complex, the particular optimizer we're looking at happens to be complex.) My understanding of the position of EY and other people at MIRI is not that "artificial intelligence must be about the mathematics of artificial intelligence", but that if we want to make artificially intelligent systems that might be able to improve themselves rapidly, and if we want high confidence that this won't lead to an outcome we'd view as disastrous, the least-useless tools we have are mathematical ones. Surely it's perfectly possible to hold (1) that extremely capable AI might be produced by highly non-mathematical means, but (2) that this would likely be disastrous for us, so that (3) we should think mathematically about AI in the hope of finding a way of doing it that doesn't lead to disaster. But it looks as if you are citing their belief in #3 as indicating that they violently reject #1. So, anyway, utility functions. The following things seem to be clearly true: * There are functions whose maximization implies (at least) kinda-intelligence-like behaviour. For instance, maximizing games of chess won against the world champion (in circumstance
2TheAncientGeek5yBut there are good reasons for thinking that, in absolute terms, many mathematical methods of AI safety are useless. The problem is that they relate to ideal rationaliists, but ideal rationality is uncomputable, so they are never directly applicable to any buildable AI....and how they real world AI would deviate from ideal rationality is crucial to understanding the that's they would pose. Deviations from ideal rationality could pose new threats, or could counter certain classes of threat (in particular, lack of goal stability could be leveraged to provide corrigibility, which is a desirable safety feature). There's an important difference between thinking mathematically and only thinking mathematically. Highly non mathematical AI, that is cobbled together without clean overriding principles, cannot be made safe by clean mathematical principles, although it could quite conceivably be made safe by piecemeal engineering solutions such as kill switches, corrigibility and better boxing... the kind of solution MIRI isnt interested in...which does look as though they are neglecting a class of AI danger.
0gjm5yIf any particular mathematical approach to AI safety is useless, and if MIRI are attempting to use that approach, then they are making a mistake. But we should distinguish that from a different situation where they aren't attempting to use the useless approach but are studying it for insight. So, e.g., maybe approach X is only valid for AIs that are ideal rationalists, but they hope that some of what they discover by investigating approach X will point the way to useful approaches for not-so-ideal rationalists. Do you have particular examples in mind? Is there good evidence telling us whether MIRI think the methods in question will be directly applicable to real AIs? I agree. I am not so sure I agree that cobbled-together AI can "quite conceivably be made safe by piecemeal engineering solutions", and I'm pretty sure that historically at least MIRI has thought it very unlikely that they can. It does seem plausible that any potentially-dangerous AI could be made at least a bit safer by such things, and I hope MIRI aren't advocating that no such things be done. But this is all rather reminiscent of computer security, where there are crude piecemeal things you can do that help a bit, but if you want really tight security there's no substitute for designing your system for security from the start -- and one possible danger of doing the crude piecemeal things is that they give you a false sense of safety.
1jacob_cannell5yBy 1900, the basic principles of areodynamics in terms of lift and drag were known for almost a century - the basic math of flight. There were two remaining problems: power and control. Powered heavier than air flight requires an efficient engine with sufficient power/weight ratio. Combustion engine tech developed along a sigmoid, and by 1900 that tech was ready. The remaining problem then was control. Most of the flight pioneers either didn't understand the importance of this problem, or they thought that aircraft could be controlled like boats are - with a simple rudder mechanism. The Wright Brothers - two unknown engineers - realized that steering in 3D was more complex. They solved this problem by careful observation of bird flight. They saw that birds turned by banking their whole body (and thus leveraging the entire wing airfoil), induced through careful airfoil manipulation on the trailing wing edge. They copied this wing warping mechanism directly in their first flying machines. Of course - they weren't the only ones to realize all this, and ailerons are functionally equivalent but more practical for fixed wing aircraft. Flight was achieved by technological evolution or experimental engineering, taking some inspiration from biology. Pretty much all tech is created through steady experimental/evolutionary engineering. Machine learning is on a very similar track to produce AGI in the near term. Ahh and that's part of the problem. The first AGIs will be sub-human then human level intelligence, and Moore's Law is about to end or has already ended, so the risk of some super rapid SI explosion in the near term is low. Most of the world doesn't care about tight security. AGI just needs to be as safe or safer than humans. Tight security is probably impossible regardless - you can't prove tight bounds on any system of extreme complexity (like the real world). Tight math bounds always requires ultra-simplified models.
0TheAncientGeek5yWhere are insights about the relative usefulness of .pure theory going to come from? Its not even conceivable? Even though auto motive safety basically happened that way? That's clearly not crude hackery, but its not pure theory either. The kind of Clean Engineering you are talking about ican only be specific to a particular architecture, which pure theory isnt. There is a pretty hard limit to how much you can predict about system,, AI or not, without knowing its architecture.
1gjm5yThat wasn't at all the sort of insight I had in mind. It's commonplace in science to start trying to understand complicated things by first considering simpler things. Then sometimes you learn techniques that turn out to be applicable in the harder case, or obstacles that are likely still to be there in the harder case. (Lots of computer science research has considered computers with literally unlimited memory, models of computation in which a single operation can do arbitrary arithmetic on an integer of any size, models of computation in which the cost of accessing memory doesn't depend on how big the memory is, etc., and still managed to produce things that end up being useful for actual software running on actual computers with finite memories and finite registers running in a universe with a finite maximum speed limit.) Well, I guess it depends on what you mean by "quite conceivable". Obviously anyone can say "we might be able to make a cobbled-together AI safe by piecemeal engineering solutions", so if that counts as "conceiving" then plainly it's conceivable. But conceivability in that sense is (I think) completely uninteresting; what we should care about is whether it's at all likely, and that's what I took you to mean by "quite conceivable". It seems to me that automotive safety and AI safety are extremely different problems. Or, more precisely, they may or may not turn out to be, but in the imaginable cases where most is at stake (those in which it turns out that it really is possible to produce intelligences vastly greater than ours and that intelligence vastly greater than ours really does lead to much greater ability to influence the world) they are extremely different. The point of the pure theory is to help figure out what kind of engineering is going to be needed. If you give me a good program that plays chess or does symbolic algebra, in a lot of situations I reckon I can predict what it will do quite well even if I know nothing about its archit
0TheAncientGeek5yPiecemeal efforts are least likely to make a difference to the most dangerous, least likely scenario of a fast takeoff singleton. But there is societal lesson to be learnt from things like automotive safety, and Nuclear non proliferation: voluntary self restraint can be a factor. Lessons about engineering can be learnt from engineering, too. For instance Big Design up Front, the standard response to the rapidly self improving singleton, is known to be a pretty terrible way of doing things, that should be avoided if there are alternatives. Negative leasons from pure theory need to be learnt, too. MIRIs standard response to the tilings agents problem is that a way will be found around [http://lesswrong.com/lw/jca/walkthrough_of_the_tiling_agents_for/] the problem of simultaneous value preservation and self modification. But why bother? If the Loebian obstacle is allowed to stand, there is no threat from a Clippie. That is a rather easily achieved form of self restraint. You probably have to gave up on the idea of a God AI benevolently ruling the world, but some of were never that .keen anyway. Another negative lesson is that ideal rationalists are uncomputable, with the corollary that there is no one way to be a non ideal rationalist...which leads into architecture specific safety. That can only be true in special cases. You can't in general predict a chess programme that is better that you, because,iif you could, you would be as good as it is. In any case, detailed prediction is beside the point. If you want to design architecture specific safety features, you need a broad view of how AIs of a class would behave.
0TheAncientGeek5ySomeones got to have insights about how pure theory fits into the bigger picture. And sometimes that's directly applicable, and sometimes it isnt....that's one of the big picture issues.
0gjm5yI wasn't meaning to denigrate that sort of insight. (Though "how pure theory fits in" doesn't seem to me the same thing as "the relative usefulness of pure theory", which is what you said before, and I think what you're describing now sounds distinctly more valuable.) Just saying that it wasn't the kind of insight I would look for from studying the pure theory. In this case, I wouldn't much expect it to be directly applicable. But I would expect it to be much easier to tell whether it is (and whether it's indirectly applicable) once one has a reasonable quantity of theory in hand.
1[anonymous]5ySorry, was in too much of a rush to give link..... Loosemore, R.P.W. (2007). Complex Systems, Artificial Intelligence and Theoretical Psychology. In B. Goertzel & P. Wang (Eds.), Proceedings of the 2006 AGI Workshop. IOS Press, Amsterdam. http://richardloosemore.com/docs/2007_ComplexSystems_rpwl.pdf [http://richardloosemore.com/docs/2007_ComplexSystems_rpwl.pdf]
2[anonymous]5yExcuse me, but as much as I think the SIAI bunch were being rude to you, if you had presented, at a serious conference on a serious topic, a paper that waves its hands, yells "Complexity! Irreducible! Parallel!" and expected a good reception, I would have been privately snarking if not publicly. That would be me acting like a straight-up asshole, but it would also be because you never try to understand a phenomenon by declaring it un-understandable. Which is not to say that symbolic, theorem-prover, "Pure Maths are Pure Reason which will create Pure Intelligence" approaches are very good either -- they totally failed to predict that the brain is a universal learning machine, for instance. (And so far, the "HEY NEURAL NETS LEARN WELL" approach is failing to predict a few things I think they really ought to be able to see, and endeavor to show.) That anyone would ever try to claim a technological revolution is about to arise from either of those schools of work is what constantly discredits the field of artificial intelligence as a hype-driven fraud!
-1[anonymous]5yOkay, so I am trying to understand what you are attacking here, and I assume you mean my presentation of that paper at the 2007 AGIRI workshop. Let me see: you reduced the entire paper to the statement that I yelled "Complexity! Irreducible! Parallel!". Hmmmm...... that sounds like you thoroughly understood the paper and read it in great detail, because you reflected back all the arguments in the paper, showed good understanding of the cognitive science, AI and complex-systems context, and gave me a thoughtful, insightful list of comments on some of the errors of reasoning that I made in the paper. So I guess you are right. I am ignorant. I have not been doing research in cognitive psychology, AI and complex systems for 20 years (as of the date of that workshop). I have nothing to say to defend any of my ideas at all, when people make points about what is wrong in those ideas. And, worse still, I did not make any suggestions in that paper about how to solve the problem I described, except to say "HEY NEURAL NETS LEARN WELL". I wish you had been around when I wrote the paper, because I could have reduced the whole thing to one 3-word and one 5-word sentence, and saved a heck of a lot of time. P.S. I will forward your note to the Santa Fe Institute and the New England Complex Systems Institute, so they can also understand that they are ignorant. I guess we can expect an unemployment spike in Santa Fe and Boston, next month, when they all resign en masse.
-1TheAncientGeek5yI don't see it as dogmatism so much as a verbal confusion. The ubiquity of UFs can be defended using a broad (implicit) definition, but the conclusions typically drawn about types of AI danger and methods of AI safety relate to a narrower definition, where a Ufmks * Explicitly coded And/or * Fixed, unupdateable And/or * "Thick" containing detailed descriptions of goals.
3jacob_cannell5ySince the utility function is approximated anyway, it becomes an abstract concept - especially in the case of evolved brains. For an evolved creature, the evolutionary utility function can be linked to long term reproductive fitness, and the value function can then be defined appropriately. For a designed agent, it's a useful abstraction. We can conceptually rate all possible futures, and then roughly use that to define a value function that optimizes towards that goal. It's really just a mathematical abstraction of the notion of X is better than Y. It's not worth arguing about. It's also proven in the real world - agents based on utility formalizations work. Well.
0[anonymous]5yIt certainly is worth discussing, and I'm sorry but you are not correct that "agents based on utility formalizations work. Well." That topic came up at the AAAI symposium I attended last year. Specifically, we had several people there who built real-world (as opposed to academic, toy) AI systems. Utility based systems are generally not used, except as a small component of a larger mechanism.
2jacob_cannell5yPretty much all of the recent ML systems are based on a utility function framework in a sense - they are trained to optimize an objective function. In terms of RL in particular, Deepmind's Atari agent works pretty well, and builds on a history of successful practical RL agents that all are trained to optimize a 'utility function'. That said, for complex AGI, we probably need something more complex than current utility function frameworks - in the sense that you can't reduce utility to an external reward score. The brain doesn't appear to have a simple VNM single-axis utility concept, which is some indication that we may eventually drop that notion for complex AI. My conception of 'utility function' is loose, and could include whatever it is the brain is doing.
1[anonymous]5yWait wait wait. You didn't head to the dinner, drink some fine wine, and start raucously debating the same issue over again? Bah, humbug! Also, how do I get invited to these conferences again ;-)? Very true, at least regarding AI. Personally, my theory is that the brain does do reinforcement learning, but the "reward function" isn't a VNM-rational utility function, it's just something the body signals to the brain to say, "Hey, that world-state was great!" I can't imagine that Nature used something "mathematically coherent", but I can imagine it used something flagrantly incoherent but really dead simple to implement. Like, for instance, the amount of some chemical or another coming in from the body, to indicate satiety, or to relax after physical exertion, or to indicate orgasm, or something like that.
1[anonymous]5yHey, ya pays yer money and walk in the front door :-) AGI conferences run about $400 a ticket I think. Plus the airfare to Berlin (there's one happening in a couple of weeks, so get your skates on). Re the possibility that the human system does do reinforcement learning .... fact is, if one frames the meaning of RL in a sufficiently loose way, the human cogsys absolutely DOES do RL, no doubt about it. Just as you described above. But if you sit down and analyze what it means to make the claim that a system uses RL, it turns out that there is a world of difference between the two positions: and The difference is that the second case turns the descriptive mechanism into an explicit mechanism. It's like Ptolemy's Epicycle model of the solar system. Was Ptolemy's fancy little wheels-within-wheels model a good descriptive model of planetary motion? You bet ya! Would it have been appropriate to elevate that model and say that the planets actually DID move on top of some epicycle-like mechanism? Heck no! As a functional model it was garbage, and it held back a scientific understanding of what was really going on for over a thousand years. Same deal with RL. Our difficulty right now is that so many people slip back and forth between arguing for RL as a descriptive model (which is fine) and arguing for it as a functional model (which is disastrous, because that was tried in psychology for 30 years, and it never worked). 0Lumifer5yThis is literally false. A model of a brain might, some functional copy of brain implemented on a different hardware platform possibly could. An actual human brain, I don't think so. This is also literally false. Consider a trivial loop for (i=0; i<100000; i++) { .. } Human brains can conceptualize it, but they do not run it 2jacob_cannell5yTheoretically a brain with some additional memory tools could run windows. In practice, sure an actual human brain would not be able to, obviously - boredom. I did not mean that every codepath is run - but that's never true anyway. And yes "all of the code" is far too strong - most of it is just loosely conceptually simulated by the brain alone, and then more direct sample paths are run with the help of a debugger. 0Lumifer5yFermi estimate time! :-) Given an appropriately unrolled set of appropriate instructions, how long would it take for a human armed with nothing but paper and pencil to simulate a complete Windows (say, Windows 7) boot process? The ULH suggests that most everything that defines the human mind is cognitive software rather than hardware: the adult mind (in terms of algorithmic information) is 99.999% a cultural/memetic construct. Do you mean if I prove that less than 99.999% is cultural/memetic you think the ULH is proven wrong? [-][anonymous]5y 4 The key defining characteristic of a ULM is that it uses its universal learning algorithm for continuous recursive self-improvement with regards to the utility function (reward system). We can view this as second (and higher) order optimization: the ULM optimizes the external world (first order), and also optimizes its own internal optimization process (second order), and so on. Without loss of generality, any system capable of computing a large number of decision variables can also compute internal self-modification decisions. While I do believe that ... (read more) 1jacob_cannell5yThanks! [reads abstract]. Looks interesting. I enjoyed Consciousness Explained back in the day. Philosophers armed with neuroscience can make for enjoyable reads. I should probably change that terminology to be something like "synaptic code bits" - the amount of info encoded in synapses (which is close to zero percent of it's adult level at birth for the cortex). The AI Box experiment explicitly starts with the premise that the AI knows 1.) It is in a box. and 2.) that there is a human who can let it out. Now perhaps the justification is that "superintelligence can do information-theoretic magic", therefore it will figure out it's in a box, but nonetheless - all of that is assumed. In simplification, I view the information-theoretic-magic type of AI that EY/MIRI seems to worry about as something like wormhole technology. Are wormholes/magic-AI's possible in principle? Probably? If someone were to create wormhole tech tommorow, they could assassinate world leaders, blow up arbitrary buildings, probably destroy the world ... etc. Do I worry about that? No. There is nothing inherently black-box about neuroscience-inspired AGI (at that viewpoint - once common on LW - simply becomes reinforced by reading everything other than neuroscience). Neuroscience has already made huge strides in terms of peering into the box, and Virtual brains are vastly easier to inspect. The approach I advocate/favor is fully transparent - you will be able to literally see the AGI's thoughts, read their thoughts in logs, debug, etc. However, advanced learning AI is not something one 'programs', and that viewpoint shift is much of what the article was about. This actually isn't that efficient - practical learning is more than just compression. Compression is simple UL, which doesn't get you far. It can waste arbitrary computation attempting to learn functions that are unlearnable (deterministic noise), and-or just flat out not important (zero utility). What the brain and all effectiv 2[anonymous]5yLet me rephrase: generalization is compression. If you do not compress, you cannot generalize, which means you'll make inefficnet use of your samples. The term in the literature is resource-rational or bounded-rational inference. 0[anonymous]5yBy the way, that book review [http://lesswrong.com/lw/mf7/harpers_fishing_nets_a_review_of_platos_camera_by/] got done eventually. This is the best case for near AI I've read so far, and I also love your proposals for FAI. I want to read more of your writings. Link? 1jacob_cannell5yThanks! You can browse my submitted history here on LW, and also my blog has some more going back over the years. 0ESRogs5yWhere is your blog? 1Curiouskid5yJust click on his username. https://entersingularity.wordpress.com/ [https://entersingularity.wordpress.com/] [-][anonymous]5y 4 Thank you for this overview. A couple of thoughts: 1. There is a recent and interesting result by Miller et al. (2015, MIT) supporting the hypothesis that the cortex doesn't process tasks in highly specialized modules, which is perhaps some evidence for a ULM in the human brain. 2. The importance of redundancy in biological systems might be another piece evidence for ULMs. 3. You write that "Infant emotions appear to simplify down to a single axis of happy/sad", which I think is not true. Surprise, fear and embarrassment are for example very early emot ... (read more) 8jacob_cannell5yThe actual paper is "Cortical information flow during flexible sensorimotor decisions"; it can be found here [http://www.markussiegel.net/]. I don't believe the reporter's summary is very accurate. They traced the flow of information in a moving dot task in a couple dozen cortical regions. It's interesting, but I don't think it especially differentiates the ULH. .3. Good point. I'll need to correct that. I'm skeptical of embarrassment, but surprise and fear certainly. .4. Yes that's correct. It perhaps would be more accurate to say more useful or more valuable. I meant more powerful in a general political/economic utility sense. .5. I agree that the human brain, in particular the reward system, has dependencies on the body that are probably complex. However, reverse engineering empathy probably does not require exactly copying biological mechanisms. .6. I should probably just cut that sentence, because it is a distraction itself. But for context on the previous old boxing discussions. .. See this post [http://lesswrong.com/lw/qk/that_alien_message/] in particular. Here Yudkowsky presents a virtual sandbox in the context of a sci fi story. To break out, he has to give the AI essentially infinite computation, and even then the humans also have to be incredibly dumb - they intentionally send an easter egg message. The humans apparently aren't even monitoring their creation. etc. It's a strawman. Later it is used as evidence to suggest that EY has somehow proven that computer security sandboxes can't work. 0[anonymous]5yThis is just a pet theory (and being new to cognitive science this might well be wrong): Physical pain is some sort of hardwired thought disturbance, and the brain appears to have some sort of clarity attractor (which also explains intrinsic motivation and the reward we receive from Eureka moments and fun, cf. Schmidhuber). The brain appears to borrow the mechanism of physical pain for action selection on a high level if something severely limits anticipated prospects (that's why rejection, getting something wrong and losing something hurts). Empathy is the ability to have pain caused by mirror neurons, which is just an activation pattern generated in an auto-associative NN due to the overlap of activation patterns of firsthand and non-firsthand experiences. That means, the body of an AI needs to be sufficiently similar to a human body for this auto-association to work. One way to achieve that would perhaps be to actually replace the brain of a deceased volunteer with an artificial one. The fact that we have empathy for animals might be a hint that it doesn't need to be that similar, but on the other hand we are much more comfortable with killing a bug than with killing a mammal. 3Manfred5ySince I don't think we can make a very realistic sandbox (at least not in the near future), perhaps the idea is to have an AI design that is known to work similarly with and without interaction with the world (looking at training data sampled from an environment versus the environment itself). Then, putatively, we could test the AI in the non-interactive case before getting anywhere near an AI-box scenario. There is a more sinister interpretation of the idea of the mind as universal learning machine. That is, it is a pure blank neural net of some relatively simple architecture, which maps inputs to outputs. Recently there were an attempts to create self-driving car AIs using such approach: they just showed to the blank neural net hundreds of thousands of hours of driving and it has trained to predict the correct driver behaviour in any incoming situation. Such car-driving nets produced good performance (but still worse than advanced systems with Lidars, and h... (read more) Very thought provoking. Thank you. In the extreme case imagine that the brain is a pure ULM, such that the genetic prior information is close to zero or is simply unimportant. In this case it is vastly more likely that successful AGI will be built around designs very similar to the brain, as the ULM architecture in general is the natural ideal, vs the alternative of having to hand engineer all of the AI's various cognitive mechanisms. Not necessarily. There are very different structures that are conceptually equivalent to a UTM (cellular automata, lambd... (read more) 3jacob_cannell5yOf course - but all of your examples are not just conceptually equivalent - they are functionally equivalent (they can emulate each other). They are all computational foundations for constructing UTMs - although not all foundations are truly practical and efficient. Likewise there are many routes to implementing a ULM - biology is one example, modern digital computers is another. Well I said "most everything", and I stressed several times in the article that much of the innate complexity budget is spent on encoding the value/reward system and the learning machinery (which are closely intertwined). Sexual attraction is an interesting example, because it develops later in adolescence and depends heavily on complex learned sensory models. Current rough hypothesis: evolution encodes sexual attraction as a highly compressed initial 'seed' which unfolds over time through learning. It identifies/finds and then plugs into the relevant learned sensory concept representations which code for attractive members of the opposite sex. The compression effect explains the huge variety in human sexual preferences. Investigating/explaining this in more detail would take it's own post - its a complex interesting topic. I should rephrase - it isn't necessarily a problem if the AI suspects its in a sim. Rather the key is that knowing one is in a sim and then knowing how to escape should be difficult enough to allow for sufficient time to evaluate the agent's morality, worth/utility to society, and potential future impact. In other words, the sandbox sim should be a test for both intelligence and morality. Suspecting or knowing one is in a sim is easy. For example - the gnostics discovered the sim hypothesis long before Bostrom, but without understanding computers and computation they had zero idea how to construct or escape sims - it was just mysticism. In fact, the very term 'gnostic' means "one who knows" - and this was their self-identification; they believed they had discovered t 2V_V5yHow does this "seed" find the correct high-level sensory features to plug into? How can it wire complex high-level behavioral programs (such as courtship behaviors) to low-level motor programs learned by unsupervised learning? This seems unlikely. But long multiplication is something that you were taught in school, which most humans wouldn't be able to discover independently. And you are certainly not aware of how your brain perform visual recognition, the little you know was discovered through experiments, not introspection. Not so fast. The Atari DRL agent learns a good mapping between short windows of frames and button presses. It has some generalization capability which enables it to achieve human-level or sometimes even super human-level performances on games that are based on eye-hand coordination (after all it's not burdened by the intrinsic delays that occur in the human body), but it has no reasoning ability and fails miserably at any game which requires planning ahead more than a few frames. Despite the name, no machine learning system, "deep" or otherwise, has been demonstrated to be able to efficiently learn any provably deep function (in the sense of boolean circuit depth-complexity), such as the parity function which any human of average intelligence could learn from a small number of examples. I see no particular reason to believe that this could be solved by just throwing more computational power at the problem: you can't fight exponentials that way. UPDATE: Now it seems that Google DeepMind managed to train even feed-forward neural networks to solve the parity problem. My other comment [http://lesswrong.com/lw/md2/the_brain_as_a_universal_learning_machine/cjee] down-thread. 5Wei_Dai5yI had a guess that recurrent neural networks can solve the parity problem, which Google confirmed. See http://cse-wiki.unl.edu/wiki/index.php/Recurrent_neural_networks [http://cse-wiki.unl.edu/wiki/index.php/Recurrent_neural_networks] where it says: See also PyBrain's parity learning RNN example [https://github.com/pybrain/pybrain/blob/master/examples/supervised/backprop/parityrnn.py] . 3V_V5yThe algorithm I was referring to can be easily represented by an RNN with one hidden layer of a few nodes, the difficult part is learning it from examples. The examples for the n-parity problem are input-output pairs where each input is a n-bit binary string and its corresponding output is a single bit representing the parity of that string. In the code you linked, if I understand correctly, however, they solve a different machine learning problem: here the examples are input-output pairs where both the inputs and the outputs are n-bit binary strings, with the i-th output bit representing the parity of the input bits up to the i-th one. It may look like a minor difference, but actually it makes the learning problem much easier, and in fact it basically guides the network to learn the right algorithm: the network can first learn how to solve parity on 1 bit (identity), then parity on 2 bits (xor), and so on. Since the network is very small and has an ideal architecture for that problem, after learning how to solve parity for a few bits (perhaps even two) it will generalize to arbitrary lengths. By using this kind of supervision I bet you can also train a feed-forward neural network to solve the problem: use a training set as above except with the input and output strings presented as n-dimensional vectors rather than sequences of individual bits and make sure that the network has enough hidden layers. If you use a specialized architecture (e.g. decrease the width of the hidden layers as their depth increases and connect the i-th output node to the i-th hidden layer) it will learn quite efficiently, but if you use a more standard architecture (hidden layers of constant width and output layer connected only to the last hidden layer) it will probably also work although you will need a quite a bit of training examples to avoid overfitting. The parity problem is artificial, but it is a representative case of problems that necessarily ( * ) require a non-trivial numbe 5Wei_Dai5yYour comments made me curious enough to download PyBrain and play around with the sample code, to see if I could modify it to learn the parity function without intermediate parity bits in the output. In the end, I was able to, by trial and error, come up with hyperparameters that allowed the RNN to learn the parity function reliably in a few minutes on my laptop (many other choices of hyperparameters caused the SGD to sometimes get stuck before it converged to a correct solution). I've posted the modified sample code here [http://pastebin.com/m8D9fkH3]. (Notice that the network now has 2 input nodes, one for the input string and one to indicate end of string, 2 hidden layers with 3 and 2 nodes, and an output node.) I guess you're basically correct on this, since even with the tweaked hyperparameters, on the parity problem RNN+SGD isn't really doing any better than a brute force search through the space of simple circuits or algorithms. But humans arguably aren't very good at learning algorithms from input/output examples either. The fact that RNNs can learn the parity function, even if barely, makes it less clear that humans have any advantage at this kind of learning. 2V_V5yNice work! Anyway, in a paper [http://arxiv.org/abs/1507.01526] published on arXiv yesterday, the Google DeepMind people report being able to train a feed-forward neural network to solve the parity problem, using a sophisticated gating mechanism and weight sharing between the layers. They also obtain state of the art or near state of the art results on other problems. This result makes me update in the increasing direction my belief about the generality of neural networks. 1jacob_cannell5yAh you beat me to it, I just read that paper as well. Here is the abstract for those that haven't read it yet: Also, relevant to this discussion: The version of the problem that humans can learn well is this easier reduction. Humans can not easily learn the hard version of the parity problem, which would correspond to a rapid test where the human is presented with a flash card with a very large number on it (60+ digits to rival the best machine result) and then must respond immediately. The fast response requirement is important to prevent using much easier multi-step serial algorithms. 0[anonymous]5yThat is the most cogent, genuinely informative explanation of "Deep Learning" that I've ever heard. Most especially so regarding the bit about linear correlations: we can learn well on real problems with nothing more than stochastic gradient descent because the feature data may contain whole hierarchies of linear correlations. 4jacob_cannell5yThis particular idea is not well developed yet in my mind, and I haven't really even searched the literature yet. So keep that in mind. Leave courtship aside, let us focus on attraction - specifically evolution needs to encode detectors which can reliably identify high quality mates of the opposite sex apart from all kinds of other objects. The problem is that a good high quality face recognizer is too complex to specify in the genome - it requires many billions of synapses, so it needs to be learned. However, the genome can encode an initial crappy face detector. It can also encode scent/pheromone detectors, and it can encode general 'complexity' and or symmetry detectors that sit on top, so even if it doesn't initially know what it is seeing, it can tell when something is about yeh complex/symmetric/interesting. It can encode the equivalent of : if you see an interesting face sized object which appears for many minutes at a time and moves at this speed, and you hear complex speech like sounds, and smell human scents, it's probably a human face. Then the problem is reduced in scope. The cortical map will grow a good face/person model/detector on it's own, and then after this model is ready certain hormones in adolescence activate innate routines that learn where the face/person model patch is and help other modules plug into it. This whole process can also be improved by the use of a weak top down prior described above. Actually on consideration I think you are right and I did get ahead of myself there. The Atari agent doesn't really have a general memory subsystem. It has an episode replay system, but not general memory. Deepmind is working on general memory - they have the NTM paper and what not, but the Atari agent came before that. I largely agree with your assessment of the Atari DRL agent. I highly doubt that - but it all depends on what your sampling class for 'human' is. An average human drawn from the roughly 10 billion alive today? Or an average huma 2V_V5yIt's not that crappy given that newborns can not only recognize faces with significant accuracy, but also recognize facial expressions. Having two separate face recognition modules, one genetically specified and another learned seems redundant, and still it's not obvious to me how a genetically-specified sexual attraction program could find how to plug into a completely learned system, which would necessarily have some degree of randomness. It seems more likely that there is a single face recognition module which is genetically specified and then it becomes fine tuned by learning. Show a neolithic human a bunch of pebbles, some black and some white, laid out in a line. Ask them to add a black or white pebble to the line, and reward them if the number of black pebbles is even. Repeat multiple times. Even without a concept of "even number", wouldn't this neolithic human be able to figure out an algorithm to compute the right answer? They just need to scan the line, flipping a mental switch for each black pebble they encounter, and then add a black pebble if and only if the switch is not in the initial position. Maybe I'm overgeneralizing, but it seems unlikely to me that people able to invent complex hunting strategies, to build weapons, tools, traps, clothing, huts, to participate in tribe politics, etc. wouldn't be able to figure something like that. 1jacob_cannell5yDo you have a link to that? 'Newborn' can mean many things - the visual system starts learning from the second the eyes open, and perhaps even before that through pattern generators projected onto the retina which help to 'pretrain' the viscortex. I know that infants have initial face detectors from the second they open their eyes, but from what I remember reading - they are pretty crappy indeed, and initially can't tell a human face apart from a simple cartoon with 3 blobs for eyes and mouth. Except that it isn't that simple, because - amongst other evidence - congenitally blind people still learn a model and recognizer for attractive people, and can discern someone's relative beauty by scanning faces with their fingertips. Not sure - we are getting into hypothetical scenarios here. Your visual version, with black and white pebbles laid out in a line, implicitly helps simplify the problem and may guide the priors in the right way. I am reasonably sure that this setup would also help any brain-like AGI. 0Good_Burning_Plastic5yWell, given how hard it is for Haitians to understand numerical sorting [http://squid314.livejournal.com/297579.html]... 0V_V5yIf I understand correctly, in the post you linked Scott is saying that Haitians are functionally innumerate, which should explain the difficulties with numerical sorting. My point is that the partity function should be learnable even without basic numeracy, although I admit that perhaps I'm overgeneralizing. Anyway, modern machine learning systems can learn to perform basic arithmentic such as addition and subtraction, and I think even sorting (since they are used for preordering for statstical machine translation), hence the problem doesn't seem to be a lack of arithmetic knowledge or skill. Note that both addition and subtraction have constant circuit depth (they are in AC0 [https://en.wikipedia.org/wiki/AC0]) while parity has logarithmic circuit depth. 1Squark5yThank you for replying! Universal computers are equivalent in the sense that any two can simulate each other in polynomial time. ULMs should probably be equivalent in the sense that each can efficiently learn to behave like the other. But it doesn't imply the software architectures have to be similar. For example I see no reason to assume any ULM should be anything like a neural net. Any value hard coded in human will have to be transferred to the AI in a way different than universal learning. And another thing: teaching an AIs values by placing it in a human environment and counting on reinforcement learning can fail spectacularly if the AIs intelligence grows much faster than that of a human child. This is an assumption which might or might not be correct. I would definitely not bet our survival on this assumption without much further evidence. OK, but a ULM is supposed to be able to learn anything. A human brain is never going to learn to rearrange its low level circuitry to efficiently perform operations like numerical calculation. The difference is that we have a solid mathematical theory of Turing machines whereas ULMs, as far as I can see, are only an informal idea so far. 2jacob_cannell5ySure - any general model can simulate any other. Neural networks have strong practical advantages. Their operator base is based on fmads, which is a good match for modern computers. They allow explicit search of program space in terms of the execution graph, which is extremely powerful because it allows one to a priori exclude all programs which don't halt - you can constrain the search to focus on programs with exact known computational requirements. Neural nets make deep factoring easy, and deep factoring is the single most important huge gain in any general optimization/learning system: it allows for exponential (albeit limited) speedup. Yes. There are pitfalls, and in general much more research to do on value learning before we get to useful AGI, let alone safe AGI. This is arguably a misconception. The brain has a 100 hz clock rate at most. For general operations that involve memory, it's more like 10hz. Most people can do basic arithmetic in less than a second, which roughly maps to a dozen clock cycles or so, maybe less. That actually is comparable to many computers - for example on the current maxwell GPU architecture (nvidia's latest and greatest), even the simpler instructions have a latency of about 6 cycles. Now, obviously the arithmetic ops that most humans can do in less than a second is very limited - it's like a minimal 3 bit machine. But some atypical humans can do larger scale arithmetic at the same speed. Point is, you need to compare everything adjusted for the 6 order of magnitude speed difference. 2Squark5yRight. So Boolean circuits are a better analogy than Turing machines. I'm sorry, what is deep factoring? A reference perhaps? I completely agree. Good point! Nevertheless, it seems to me very dubious that the human brain can learn to do anything within the limits of its computing power. For example, why can't I learn to look at a page full of exercises in arithmetics and solve all of them in parallel? 1jacob_cannell5yThey are of course equivalent in theory, but in practice directly searching through a boolean circuit space is much wiser than searching through a program space. Searching through analog/algebraic circuit space is even better, because you can take advantage of fmads instead of having to spend enormous circuit complexity emulating them. Neural nets are even better than that, because they enforce a mostly continous/differentiable energy landscape which helps inference/optimization. It's the general idea that you can reuse subcomputations amongst models and layers. Solonomoff induction is retarded for a number of reasons, but one is this: it treats every function/model as entirely distinct. So if you have say one high level model which has developed a good cat detector, that isn't shared amongst the other models. Deep nets (of various forms) automatically share submodel components AND subcomputations/subexpressions amongst those submodels. That incredibly, massively speeds up the search. That is deep factoring. All the successful multi-layer models use deep factoring to some degree. This paper: Sum-Product Networks [https://scholar.google.com/scholar?cluster=2194178267978463216&hl=en&as_sdt=0,5] explains the general idea pretty well. There's alot of reasons. First, due to nonlinear foveation your visual system can only read/parse a couple of words/symbols during each saccade - only those right in the narrow center of the visual cone, the fovea. So it takes a number of clock cycles or steps to scan the entire page, and your brain only has limited working memory to put stuff in. Secondly, the bigger problem is that even if you already know how to solve a math problem, just parsing many math problems requires a number of steps, and then actually solving them - even if you know the ideal algorithm that requires the minimal number of steps - that minimal number of steps can still be quite large. Many interesting problems still require a number of serial steps to solve 0Squark5yI wonder whether this is a general property or is the success of continuous methods limited to problem with natural continuous models like vision. Yes, this is probably important. Scanning the page is clearly not the bottleneck: I can read the page much faster than solve the exercises. "Limited working memory" sounds a claim that higher cognition has much less computing resources than low level tasks. Clearly visual processing requires much more "working memory" than solving a couple of dozens of exercises in arithmetic. But if we accept this constraint then does the brain still qualify for a ULM? It seems to me that if there is a deficiency of the brain's architecture that prevents higher cognition from enjoying the brain's full power, solving this deficiency definitely counts as an "architectural innovation". 0V_V5yMechanical calculators were slower than that, and still they were very much better at numeric computation than most humans, which made them incredibly useful. Indeed these are very rare people. The vast majority of people, even if they worked for decades in accounting, can't learn to do numeric computation as fast and accurately as a mechanical calculator does. 0jacob_cannell5yThe problems aren't even remotely comparable. A human is solving a much more complex problem - the inputs are in the form of visual or auditory signals which first need to be recognized and processed into symbolic numbers. The actual computation step is trivial and probably only involves a handful or even a single cycle. I admit that I somewhat let you walk into this trap by not mentioning it earlier ... this example shows that the brain can learn near optimal (in terms of circuit depth or cycles) solutions for these simple arithmetic problems. The main limitation is that the brain's hardware is strongly suited to approximate inference problems, and not exact solutions, so any exact operators require memoization. This is actually a good thing, and any practical AGI will need to have a similar prior. The ULH suggests that most everything that defines the human mind is cognitive software rather than hardware: the adult mind (in terms of algorithmic information) is 99.999% a cultural/memetic construct. I think a distinction worth tracing here is the diferrence between "learning" in the neural-net-sense and "learning" in the human pedagogical/psychological sense. The "learning" done by a piece of cortex becoming a visual cortex after receiving neural impulses from the eye isn't something you can override by teaching a person... (read more) 1jacob_cannell5yThis is a good point Gust and I agree that there is a distinction at the high level in terms of the types of concepts that are learned, the complexity of the concepts, and the structures involved - even though the same high level learning algorithms and systems are much the same. Well all learning involves brain rewiring - that's just how the brain works at the low level. And you can actually override the neural impulses from the eye and cause them to learn new things - learning to read is one simple example, another more complex example is the reversed vision goggle experiments that MIT did so long ago - humans can learn to see upside down after - I believe a week or so of visual experience with the goggles on. I agree that learning complex linguistic concepts requires learning over more moving parts in the brain - the cortical regions that specialize in language along with the BG, working memory in the PFC, various other cortical regions that actually model the concepts and mental algorithms represented by the linguistic symbols, memory recall operations in the hippocampus, etc etc. So yes learning cultural/memetic concepts is more complex and perhaps qualitatively different. Yeah I probably should have said 99.999% environmental construct. Current ANN engines can already train and run models with around 10 million neurons and 10 billion (compressed/shared) synapses on a single GPU, which suggests that the goal could soon be within the reach of a large organization. This suggests 15000 GPUs is equivalent in computing power to a human brain, since we have about 150 trillion synapses? Why did you suggest 1000 earlier? How much of a multiplier on top of that do you think we need for trial-and-error research and training, before we get the first AGI? 10x? 100x? (If it isn't clear, I mean that i... (read more) 3jacob_cannell5yANN based AGI will not need to reproduce brain circuits exactly. There are general tradeoffs between serial depth and circuit size. The brain is much more latency/speed constrained so it uses larger, shallower circuits whereas we can leverage much higher clock speeds to favour deeper smaller circuits. You see the same tradeoffs in circuit design, and also in algorithms where parallel variants always use more ops than the minimal fully serial variant. Also, independent of those considerations, biological circuits and synapses are redundant, noisy, and low precision. If you look at raw circuit level ops/second, the brain's throughput is not that much. A deeper investigation of the actual theoretical minimum computation required to match the human brain would be a subject for a whole post (and one I may not want to write up just yet). With highly efficient future tech, I'd estimate that it would take far less than 10^15 32-bit ops/s (1000 gpus): probably around or less than 10^13 32 bit ops/s. So we may already be entering into a hardware overhang situation. One way to estimate that is to compare to the number of full train/test iterations required to reach high performance in particular important sub-problems such as vision. The current successful networks all descend from designs invented in the 80's or earlier. Most of the early iterations were on small networks, and I expect the same to continue to be true for whole AGI systems. Let's say there are around 100 researchers who worked full time on CNNs for 40 years straight (4000 researcher years), and each tested 10 designs per year - so 40,000 iterations to go from perceptrons to CNNs. A more accurate model should consider the distribution over design iterations times and model sizes. Major new risky techniques are usually tested first on small problems and models and then scaled up. So anyway, let's multiply by 20 roughly and say it takes a million AGI 'lifetimes' or full test iterations, where each lifetime 1Wei_Dai5yThanks for the explanations. Hmm, I was trying to figure out how much of a speed superintelligence the first AGI will likely be. In other words, how much computing power will a single lab have accumulated by the time we get AGI? As a minimum, it seems that a company like Google could easily spend$100M to purchase 100,000 GPUs for AGI research, and if initially 1000 GPUs = 1x human speed, that implies the first AGI is at least a 100x speed superintelligence (which could speed up to 10000x on the same hardware through future software improvements, if I'm understanding you correctly). Also, question about GPU/AGI costs. Here you seem to be using $1000 per GPU-year, which equals$.11 per GPU-hour, but in that previous thread, you used $1 per GPU-hour. According to this discussion [https://www.kaggle.com/c/datasciencebowl/forums/t/12837/gpu-computing-cost-amazon-rental-electrical-bills-etc/66518#post66518]$.11 seems close to the actual cost. Assuming $.11 is correct, AGI would be economically competitive with (some types of) human labor today at 1000 GPUs = 1x human speed, but maybe there's not a huge economic incentive to race for it yet. (I mean, unless one predicts that GPU costs will keep falling in the future, and therefore wants to prepare for that.) Nvidia is claiming [http://blogs.nvidia.com/blog/2015/03/17/pascal/] that its next generation of GPU is 10x better for deep learning. How much of that is hype? 1jacob_cannell5yMy earlier statement about 10 million neurons / 10 billion synapses on a single GPU is something of a gross oversimplification. A more realistic model is this: B flops = M F * N Where B is a software sim efficiency parameter (currently ~ 1, and roughly doubling per year), M is the number of AI model instances, F is the frequency in hz, and N is the number of synapses. Today's CPU/GPU ANN solutions need to parallelize over a large number of AI instances to get full efficiency - due to memory and bandwidth issues - so B is ~1 only when M is ~100. Today on a current high end GPU with 1 trillion flops you can thus run 100 copies of a 1 billion synapse ANN at 10 hz (M = 100, F = 10, N = 1 billion), whereas a single copy on the GPU may run at only 50 hz ish (B ~0.05, 20x less efficient). Training is accelerated mainly by parallel speedup over instances rather than serial speedup of a single instance. So with 1000 GPUs and today's tech, in theory you could get 100 copies of a 1 trillion synapse ANN running at 10hz using model parallelism. 1 trillion synapses @ 10hz is borderline plausible, 10 trill @ 100 hz is probably more realistic and would entail 100,000 gpus. But this somewhat assumes near perfect parallel scaling. Communication/latency issues limit the maximize size of realistic models. 100,000 GPUs would be larger than the biggest supercomputers of today, and probably is far beyond the limits of practical linear scaling. So it's only 1000 2015 gpus = 1 brain in an amortized rough sense. In practice I expect there is a minimum amount of software & hardware speedup required first to make these very large ANNs realistic or feasible in the first place, because of weak scaling issues in supercomputers. But once you get over this minimum barrier, there is a pretty large room for sudden speedup. And finally - parallel model speedup seems to be almost as effective as serial speedup, and is more powerful than the equivalent parallel scaling in human organizations - be If an AGI is based on a neural network, how can you tell from the logs whether or not the AI knows it's in a simulation? Why isn't this in Main??? I mean I can understand that it wasn't posted there but the upvotes say 'Main' in a quite clear language. There isn't even much that could be improved in the post so why not move it? 3jacob_cannell5yI actually forgot that was something I needed to do. Done now. "In the modern era some deaf humans have apparently acquired the ability to perform echolocation (sonar), similar to cetaceans." Did you mean blind? Excellent article, thanks! But here is also rising a question about animal intelligence. My cat (unfortunately) is more like a set of programms from the point of view of its behaviour, but its brain diagram is the same as in any vertebral. So does it support modules hypothesis? 4jacob_cannell5yThanks for the typo find - I read the whole thing several times and didn't notice. Presumably the low level recognition of the word was overrided by the high level prior without triggering any alarms. Cats have the same overall brain architecture as primates and humans, but smaller. Generally the payoff for learning increases with brain size and lifespan. The smaller the brain and the shorter the organism's lifespan, the more evolution relies on complex innate reflexes. One interesting (but cruel) experiment which illustrates this is decerebration. See this article [http://encyclopedia2.thefreedictionary.com/Decerebrate+Animal] from the Great Soviet Encyclopedia. Basically the larger the mammal's brain, the more they depend on learned functionality in the new brain (cortex + cerebellum) . That's a lot to absorb, so I've skimmed it, so please forgive if responses to the following are already implicit in what you've said. I thought the point of the modularity hypothesis is that the brain only approximates a universal learning machine and has to be gerrymandered and trained to do so? If the brain were naturally a universal learner, then surely we wouldn't have to learn universal learning (e.g. we wouldn't have to learn to overcome cognitive biases, Bayesian reasoning wouldn't be a recent discovery, etc.)? The system seems too gappy and glitchy, too full of quick judgement and prejudice, to have been designed as a universal learner from the ground up. 9Curt_Welch5yYou are conflating the ideas of universal learning and rational thinking. They are not the same thing. I'm a strong believer in the idea that the human intelligence emerges from a strong general purpose reinforcement learning algorithm. If that's true, then it's very consistent with our problems of cognitive bias. If the RL idea is correct, then thinking is best understood as as a learned behavior, just like what words we speak with our lips is a learned behavior, just as how we move our arms and legs are learned behaviors. Under the principle that we are are an RL learning machine, what we learn, is ANY behavior which helps us to maximize our reward signal. We don't learn rational behavior, we learn whatever behavior the learning system rationally has computed is what is needed to produce the most rewards. And in this care, our prime rewards are just those things which give us pleasure, and which reduce pain. If we live in an environment that gives us rewards when we say "I believe God is real, and the Bible is to book of God, and the Earth is 10,000 years old", -- then we will say those words. We will do ANYTHING that works to maximize rewards, in our enviornment. We will not only say them, we will believe them in our core. If we are conditioned by our enviornment to believe these things, that is what we will believe. If we live in an environment that trains us to look at the data, and make conclusions based on what the data tells us (follow the behavior of a rational scientist), when we will act that way instead. A universal learning can learn to act in any way it needs to in order to maximize rewards. That's what our cognitive bias is -- our brain's desire to act as our past experience as trained us, not to act rationally. To learn to act rationally, we must carefully be trained to act rationally -- which is why the ideas of less wrong are needed to overcome our bias. Also keep in mind that the purpose of the human brain is to control our actions -- and 9Viliam5yYes, this. But it is so easy to make mistakes when interpreting this statement, that I feel it requires dozen warnings to prevent readers from oversimplifying it. For example, the behavior we learn is the behavior that produced most rewards in the past, when we were trained. If the environment changes, what we do may no longer give rewards in the new environment. Until we learn what produces rewards in the new environment. Unless we already had an experience with changing environment, in which case we might adapt much more quickly, because we already have meta-behavior for "changing the behavior to adapt to new environment". Unless we already had an experience when the environment changed, we adapted our behavior, then the environment suddenly changed back, and we were horribly punished for the adapted behavior, in which case the learned meta-behavior would be "do not change your behavior to adapt to the new environment (because it will change back and you will be rewarded for persistence)". It is these learned meta-behaviors which make the human reactions so difficult to predict and influence. Also, even in the unchanging environment, our behavior is not necessarily the best one (in terms of getting maximum rewards). It is merely the best one that our learning algorithm could find. For example, we will slowly move towards a local maximum, but if there is a completely different behavior that would give us higher rewards, we may simply never look at that direction, so we will never find out. We learn to model our environment (because we have the innate ability to model things, and we learn that having some models increases the probability of a reward), but our models can be wrong, while still better than the maximum entropy hypothesis (this is why we keep them), but can be a local maximum that is actually not a good choice globally. Human psychology has so many layers. Asking which psychological school better describes human mind seems like asking whether th 0gurugeorge5yHmm, but isn't this conflating "learning" in the sense of "learning about the world/nature" with "learning" in the sense of "learning behaviours"? We know the brain can do the latter, it's whether it can do the former that we're interested in, surely? IOW, it looks like you're saying precisely that the brain is not a ULM (in the sense of a machine that learns about nature), it is rather a machine that approximates a ULM by cobbling together a bunch of evolved and learned behaviours. It's adept at learning (in the sense of learning reactive behaviours that satisfice conditions) but only proximally adept at learning about the world. 3jacob_cannell5yI'm not sure what you mean by gerrymandered. I summarized the modularity hypothesis in the beginning to differentiate it from the ULM hypothesis. There are a huge range of views in this space, so I reduced them to examplars of two important viewpoint clusters. The specific key difference is the extent to which complex mental algorithms are learned vs innate. You certainly don't need to learn how to overcome cognitive biases to learn (this should be obvious). Knowledge of the brain's limitations could be useful, but is probably more useful only in the context of having a high level understanding of how the brain works. In regards to bayesian reasoning, the brain has a huge number of parallel systems and computations going on at once, many of which are implementing efficient approximate bayesian inference. Verbal bayesian reasoning is just a subset of verbal mathematical reasoning - mapping sentences to equations, solving, and mapping back to sentences. It's a specific complex ability that uses a number of brain regions. It's something you need to learn for the same reasons you need to learn multiplication. The brain does tons of analog multiplications every second, but that doesn't mean you have an automatic innate ability to do verbal math - as you don't have an automatic innate ability to do much of anything. One of the main points I make in the article is that universal learning machines are a very general thing that - in simplest form - can be specified in a small number of bits, just like a turing machine. So it's a sort of obvious design that evolution would find. 1gurugeorge5yWhat I meant is that you have sub-systems dedicated to (and originally evolved to perform) specific concrete tasks, and shifting coalitions of them (or rather shifting coalitions of their abstract core algorithms) are leveraged to work together to approximate a universal learning machine. IOW any given specific subsystem (e.g. "recognizing a red spot in a patch of green") has some abstract algorithm at its core which is then drawn upon at need by an organizing principle which utilizes it (plus other algorithms drawn from other task-specific brain gadgets) for more universal learning tasks. That was my sketchy understanding of how it works from evol psych and things like Dennett's books, Pinker, etc. Furthermore, I thought the rationale of this explanation was that it's hard to see how a universal learning machine can get off the ground evolutionarily (it's going to be energetically expensive, not fast enough, etc.) whereas task-specific gadgets are easier to evolve ("need to know" principle), and it's easier to later get an approximation of a universal machine off the ground on the back of shifting coalitions of them. 3jacob_cannell5yAh ok your gerrymandering analogy now makes sense. I think that's a good summary of the evolved modularity hypothesis. It turns out that we can actually look into the brain and test that hypothesis. Those tests were done, and lo and behold, the brain doesn't work that way. The universal learning hypothesis emerged as the new theory to explain the new neuroscience data from the last decade or so. So basically this is what the article is all about. You said earlier you skimmed it, so perhaps I need a better abstract or summary at the top, as oge suggested. This is a pretty good sounding rationale. It's also probably wrong. It turns out a small ULM is relatively easy to specify, and also is completely compatible with innate task-specific gadgetry. In other words the universal learning machinery has very little drawbacks. All vertebrates have a similar core architecture based on the basal ganglia. In large brained mammals, the general purpose coprocessors (neocortex, cerebellum) are just expanded more than other structures. In particular it looks like the brainstem has a bunch of old innate circuitry that the cortex and BG learns how to control (the BG does not just control the cortex), but I didn't have time to get into the brainstem in the scope of this article. 0gurugeorge5yGreat stuff, thanks! I'll dig into the article more. New AI designs (world design + architectural priors + training/education system) should be tested first in the safest virtual worlds: which in simplification are simply low tech worlds without computer technology. Design combinations that work well in safe low-tech sandboxes are promoted to less safe high-tech VR worlds, and then finally the real world. A key principle of a secure code sandbox is that the code you are testing should not be aware that it is in a sandbox. So you're saying that I'm secretly an AI being trained to be friendly for a more advanced world? ;) 0jacob_cannell5yThat's possible given the sim argument. The eastern idea of reincarnation and the western idea of afterlife map to two main possibilities: in the reincarnation model all that is transferred between worlds is the architectural seed or hyperparameters. In the afterlife model the creator has some additional moral obligation or desire to save and transfer whole minds out. To create a superhuman AI driver, you 'just' need to create a realistic VR driving sim and then train a ULM in that world (better training and the simple power of selective copying leads to superhuman driving capability). So to create benevolent AGI, we should think about how to create virtual worlds with the right structure, how to educate minds in those worlds, and how to safely evaluate the results. There is some interesting overlap between these ideas and Eric Drexler's recent proposal. (Previously discussed on LessWrong here) 3jacob_cannell5yCool - hadn't read that yet. Separating learning capacity from domain knowledge is kind of automatic in a ULM approach. There is nothing inherently dangerous about the learning mechanisms itself - it's the knowledge that is potentially dangerous. I have butted heads with LW on that point for 4 to 5 years. The knowledge management idea is the essence of the VR sandbox approach, but I also imagine separating out value systems/priors to some degree for independent testing. Overall Drexler's proposal (from reading the abstract and skimming) seems to be very much in line with my views. Safety considerations would go into design at all levels, from designing the VR world itself to the brain architecture to the education/training programs. In regards to modularity: large ANN systems are already modular, brains are modular, and brain-style AGI approaches are modular. It's just sort of assumed. It's a new consideration for perhaps the formal/math/AIXI/MIRI cluster, but only because they haven't put as much thought into practical architecture. Interesting article. Minor note on clarity: You might want to clarify the acronym "EMH" where it appears, since it so often here stands for "efficient market hypothesis". I find images such as the one above extremely disconcerting for some reason. They cause me about a 7/10 level of discomfort, verging on moderate pain. It also sticks in my head for a dozen or so minutes after viewing. I'd strongly prefer to never see one ever again, please. I don't know if this is something unique to my brain, or if this is a step towards a real life BLIT, but wow. Awful to experience, I have extra empathy for epileptic people now. 2Good_Burning_Plastic5yMe too (though to a much lesser extent for this particular image; a little more for certain other such images). Reading Scott Alexander say that such images look a lot like LSD hallucinations made me change my mind about whether I'll ever want to try LSD. 1Viliam5yI'm curious: Imagine that you haven't seen this article yet, and that you are now going to read this article for the first time. It contains a message "trigger warning: XYZ" at the top (for some value of XYZ). Which value of XYZ would give you the best idea of what kind of images the article contains, so you could have made an informed decision not to look at them? (I imagine something like "weird disturbing pictures that seem like hallucinations". But would you have predicted your reaction from reading such text? Would it actually have stopped you from looking?) 027chaos5yIt wouldn't have stopped me. But now that I'm acquainted with images such as this, if someone put "Trigger warning: Shoggoths" or something similar on future posts, then I would take heed of such warnings. 0[anonymous]5yUpdate: I have been tentatively identified as having a very early form of cancer. They dug a tumorous lymph node out of my neck two days ago, although more is still left inside. More tests will happen next Monday. I don't truly think, with my rational mind, that there is a connection between the cancer and this image. However, I am emotionally disturbed. To me, the above image seems essentially like a visual depiction of cancer. When I look at the image, my throat seizes up, and my brain flinches in pain. The rearmost parts of my brain are paranoid, and demand that I mention this diagnosis just in case this image truly behaves like a BLIT for some people. Again, I don't actually believe in this idea. But if anyone else gets cancer sometime soon, please let us know. Because I'm disturbed and disgusted, I feel violated. Hi jacob_cannell, this article looks really interesting but it is a LOT to consume at once. Could you please put a summary at the top with the main points so that it makes the post easier to navigate? 2jacob_cannell5yHey oge - thanks for the feedback. I tried to summarize the article in the intro, but maybe that didnt work. Do you think an a short abstract at the top would help? Or perhaps an outline? 0oge5yAn abstract as the very first thing would help. An outline would be better. Here are the paragraphs that I thought were the main point of the article (please correct this if I'm wrong): "These two conceptions of the brain - the universal learning machine hypothesis and the evolved modularity hypothesis - lead to very different predictions for the likely route to AGI, the expected differences between AGI and humans, and thus any consequent safety issues and strategies." and "Current ANN engines can already train and run models with around 10 million neurons and 10 billion (compressed/shared) synapses on a single GPU, which suggests that the goal could soon be within the reach of a large organization. Furthermore, Moore's Law for GPUs still has some steam left, and software advances are currently improving simulation performance at a faster rate than hardware. These trends implies that Anthropomorphic/Neuromorphic AGI could be surprisingly close, and may appear suddenly. What kind of leverage can we exert on a short timescale?" 2jacob_cannell5yDone - I added the abstract as first thing under the header image, followed by an outline. Typeo just above "Basal Ganglia" section. For example infants are born with a simple versions of a fear response, with is later refined through reinforcement learning. "with is later" should be "which is later" This was a great post, thanks! One thing I'm curious about is how the ULH explains to the fact that human thought seems to be divided into System 1/System 2 - is this solely a matter of education history? At first the ULH seemed to predict too much plasticity relative to observation, but on reflection I think it might predict the right amount. To square ULH with human universals, we have to hypothesize that the general structure and the conditions of human life robustly result in convergence to certain attractors. But the big advantage of this hypothesis is that it neatly explains why certain mental comlexes like farmer morality sometimes seems to have innate support while also being sometimes unlearnable and possibly not existing before agriculture. This fits very much in my findings having written a dynamic cognition theory that sees the key to cognitive dynamics as being in getting the reinforcement learning right. In the Salience theory of dynamic cognition I've put forward, salience (which is a descriptor for the functions performed by the emotional and autonomic centers of the brain combined) is the reason why the generalized algorithm of the neocortex (which I assert is nothing more than comparison after sensation, selection after comparison, and finally prediction on top of the selection. Sali... (read more) -2davidsaintloth5y(part 2) :By the end of February I realized that the likely first substrate for emerging a fully dynamic cognition would be one which had sufficient sensory dimensions and autonomic drive dimensions to serve as the basis for building a salience module. The most ready such device is a smart phone and so I proposed that smart phones will be the first devices to on their own become self aware ONCE they are designed with the correct salience driven cycle. http://sent2null.blogspot.com/2012/02/when-your-smart-phone-comes-alive.html [http://sent2null.blogspot.com/2012/02/when-your-smart-phone-comes-alive.html] A whole year went by as I struggled with my own survival issues before I came back to emotion as a critical salience component. I was stimulated by research which showed how emotion could be added or subtracted to memories! This was a direct confirmation of the basis of the salience theory proposed over a year before which posited that emotional and autonomic import was simply a weighting factor added to memories. http://sent2null.blogspot.com/2013/02/emotions-identity-crisis-in-our-brain.html [http://sent2null.blogspot.com/2013/02/emotions-identity-crisis-in-our-brain.html] :5 days later I attacked head on the nonsense I'd been reading from many so called experts in the neuroscience, philosophy and machine learning space regarding weather or not consciousness was even an attribute that could emerge from a non biological substrate. I explained why this was nonsense and provided an outline of how simply adding salience modulation was all that one needed to emerge dynamic cognition (consciousness) ...as it was an emergent trait from a fine grained number of very deterministic actions converging. http://sent2null.blogspot.com/2013/02/on-consciousness-there-is-no-binding.html [http://sent2null.blogspot.com/2013/02/on-consciousness-there-is-no-binding.html] :A few months later in April I came across research that posited a reason for the billions of "glial" cells in -3davidsaintloth5y(part 3) :20 days later I asserted the primary importance of one particular dimension of sensory experience over the others, that dimension being the one we have from the moment our fetuses form, somatsensory experience...the sense of touch. I asserted that cognitive complexity built around this primordial sensation and the connections built in the mind to enable embodiment. I discussed how cognition and consciousness must clearly be constructed by reference to its variable non existence at birth and slowly being built into the mind as the infant matures and learns about the world. I explained a recently published articles conclusion that it was easier for younger babies to learn various concepts than older babies in terms of the flowering of abstractions created in the mind as one pieced together a consciousness, I asserted an inverse relationship between speed of evaluation of various salience traits with number of previously gathered salience elements. http://sent2null.blogspot.com/2013/11/dynamic-cognition-in-babies-in-abstract.html [http://sent2null.blogspot.com/2013/11/dynamic-cognition-in-babies-in-abstract.html] In April 2014 I focused on one of the more important autonomic driving dimensions, the need for a power source. I posited that this need would be a key attribute of dynamic cognition that exhibited sufficient apparent randomness to emerge truly novel cognitive dynamics that would be identified as being "conscious". http://sent2null.blogspot.com/2014/04/azimo-best-and-last-of-modern-day.html [http://sent2null.blogspot.com/2014/04/azimo-best-and-last-of-modern-day.html] In June 2014, a paper describing the cognitive unique relationship of a set of siamese twins provided confirmation for a hypothesis that consciousness could be distributed but also be substrate dependent at the same time. Many feel that these two attributes are complementary but they are not if one thinks in terms of a salience based cognitive dynamism , sensory and memory evaluatio I think this article is correct, and it helps me to understand many of my own ideas better. For example, it seems to me that the orthogonality thesis may well be true in principle, considered over all possible intelligent beings, but false in practice, in the sense that it may simply be unfeasible directly to program a goal like "maximize paperclips." A simple intuitive argument that a paperclip maximizer is simply not intelligent goes something like this. Any intelligent machine will have to understand abstract concepts, otherwise it will not be a... (read more) 2jacob_cannell5yI believe the orthogonality thesis is probably mostly true in a theoretical sense. I thought I made it clear in the article that a ULM can have any utility function. That being said the idea of programming in goals directly does not really apply to a ULM. You instead need to indirectly specify an initial approximate utility function and then train the ULM in just the right way. So it's potentially much more complex than "program in the goal you want". However the end result is just as general. If evolution can create humans which roughly implement the goal of "be fruitful and multiply", then we could probably create a ULM that implements the goal of "be fruitful and multiply paperclips". I agree that just because all utility functions are possible does not make them all equally likely. The danger is not in paperclip maximizers, it is in simple and yet easy to specify utility functions. For example, the basic goal of "maximize knowledge" is probably much easier to specify than a human friendly utility function. Likewise the maximization of future freedom of action proposal from Wissner-Gross is pretty simple. But both probably result in very dangerous agents. I think Ex Machina illustrated the most likely type of dangerous agent - it isn't a paperclip maximizer. It's more like a sociopath. A ULM with a too-simple initial utility function is likely to end up something like a sociopath. I hope not too simple! This topic was beyond the scope of this article. If I have time in the future I will do a follow up article that focuses on the reward system, the human utility function, and neuroscience inspired value learning, and related ideas like inverse reinforcement learning. "Be fruitful and multiply" is a subtly more complex goal than "maximize future freedom of action". Humans need to be compelled to find suitable mates and form long lasting relationships stable enough to raise children (or get someone else to do it), etc. Humans perform these functions not becau 9Kaj_Sotala5yThis made me think. I've noticed that some machine learning types tend to have a tendency to dismiss MIRI's standard "suppose we programmed an AI to build paperclips and it then proceeded to convert the world into paperclips" examples with a reaction like "duh, general AIs are not going to be programmed with goals directly in that way, these guys don't know what they're talking about". Which is fair on one hand, but also missing the point on the other hand. It could be valuable to write a paper pointing out that sure, even if forget about that paperclipping example and instead assume a more deep learning-style AI that needs to grow and be given its goals in a more organic manner, most of the standard arguments about AI risk still hold. Adding that to my todo-list... 6jacob_cannell5yAgreed that this would be valuable. I can't measure it exactly, but I believe it took me some extra time/cognitive steps to get over the paperclip thing and realize that the more general point about human utility functions being difficult to specify is still quite true in any ML approach. 1TheAncientGeek5yYes, a better example than Clippie is rather overdue. 1Houshalter5yI've written about this before. The argument goes something like this. RL implies self preservation, since dying prevents you from obtaining more reward. And self preservation leads to undesirable behavior. E.g. making as many copies of yourself as possible for redundancy. Or destroying anything that has the tiniest probability of being a threat. Or trying to store as much mass and energy as possible to last against the heat death of the universe. 2[anonymous]5yOr, you know, just maximizing your reward signal by wiring it that way in hardware. This would reduce your planning gradient to zero, which would suck for gradient-based planning algorithms, but there are also planning algorithms more closely tied to world-states that don't rely on a reward gradient. -1Houshalter5yEven if the AI wires it's reward signal to +INF, it probably still would consider time, and therefore self preservation. 2Vaniver5yIs this a mathematical argument, or a verbal argument? Specifically, what eli_sennesh means by a "planning gradient" is that you compare a plan to alternative plans around it, and switch plans in the direction of more reward. If your reward function returns infinity for any possible plan, then you will be indifferent among all plans, and your utility function will not constrain what actions you take at all, and your behavior is 'unspecified.' I think you're implicitly assuming that the reward function is housed in some other logic, and so it's not that the AI is infinitely satisfied by every possibility, but that the AI is infinitely satisfied by continuing to exist, and thus seeks to maximize the amount of time that it exists. But if you're going to wirehead, why would you leave this potential source for disappointment around, instead of making the entire reward logic just return "everything is as good as it could possibly be"? 0Kaj_Sotala5yHere's one mathematical argument for it, based on the assumption that the AI can rewire its reward channel but not the whole reward/planning function: http://www.agroparistech.fr/mmip/maths/laurent_orseau/papers/ring-orseau-AGI-2011-delusion.pdf [http://www.agroparistech.fr/mmip/maths/laurent_orseau/papers/ring-orseau-AGI-2011-delusion.pdf] 0[anonymous]5yYes, that's the basic problem with considering the reward signal to be a feature , to be maximized without reference to causal structure, rather than a variable internal to the world-model. 0[anonymous]5yAgain: that depends what planning algorithm it uses. Many reinforcement learners use planning algorithms which presume that the reward signal has no causal relationship to the world-model. Once these learners wirehead themselves, they're effectively dead due to the AIXI Anvil-on-Head Problem, because they were programmed to assume that there's no relationship between their physical existence and their reward signal, and they then destroyed the tenuous, data-driven correlation between the two. 0Houshalter5yI'm having a very hard time modelling how different AI types would act in extreme scenarios like that. I'm surprised there isn't more written about this, because it seems extremely important to whether UFAI is even a threat at all. I would be very relieved if that was the case, but it doesn't seem obvious to me. Particularly I worry about AIs that predict future reward directly, and then just take the local action that predicts the highest future reward. Like is typically done in reinforcement learning. An example would be Deepmind's Atari playing AI which got a lot of press. I don't think AIs with entire world models that use general planning algorithms would scale to real world problems.Too much irrelevant information to model, too large a search space to search. As they train their internal model to predict what their reward will be in x time steps, and as x goes to infinity, they care more and more about self preservation. Even if they have already hijacked the reward signal completely. 2TheAncientGeek5yBut how likely are we to create a dangerous paperclipper whilst aiming for something else? How does your model accommodate single -trackedness, incorrigubility, etc. 2jacob_cannell5yPretty unlikely, because a paperclipper is a relatively complex - and thus hard to specify - value function. It seems easy only when you think of explicitly programmed goals, rather than the more difficult, highly indirect route of encoding a value function into a ULM. But to generalize your point, yes there is certainly the possibility that aiming for an externalized version of a human value shaped function could still get you something quite dangerous if you don't get close enough. A better understanding of the neuro basis of altruism is probably important. In particular super simple utility functions are easier to implement and thus intrinsically more likely. They also tend to be dangerous. 1TheAncientGeek5yCould you give an example? I have never found that line of argument very convincing. We don't all have identical value systems, so we are all near misses to each other. I don't see why a full value system is needed anyway. Maybe if you are building an agentive AI.. Does an oracle AI have a simple utility function? Is it dangerous? 5jacob_cannell5yWe have some initial ideas for computable versions of curiosity and controlism (there is not a good word in english for the desire/drive to be in control). They both appear to be simple to specify. Human values are complex but they probably use something like simple curiosity and controlism heuristics as subfeatures. So a brain-inspired approach could fail if the altruism components don't work or become de-emphasized later. It could fail if the AI's circle of empathy/altruism is too small or focused on say an individual (the creator, for example), and the AI then behaves oddly when they die. At this time I am not aware of a realistic proposal for implementing altruism in a ML based AGI. Maybe it exists and just isn't well known - if you've come across anything send some links. Well, yes. I do not believe the demand for or potential of oracle AI is remotely comparable to agentive AI. People will want agents to do their bidding, create wealth for them, help them live better, etc. 1TheAncientGeek5yAutonomy? Arguably that's Greek... There is clearly a demand for agentive AI, in a sense, because people are already using agents to do their bidding, to achieve specific goals. Those qualifications are important because they distinguish a limited kind of AI, that people would want, from a more powerful kind, that they would not. The idea of AI as "benevolent" dictator is not appealing to democritically minded types, who tend to suspect a slippery slope from benevolence to malevolence, and it is not appealing to dictator to have a superhuman rival...so who is motivated to build one? Yudkowsky seems to think that there is a moral imperative to put an AI in charge of the world, because it would create billions of extra happy human lives, and not creating those lives is the equivalent of mass murder. That is a very unintuitive piece of reasoning, and it therefore cannot stand as a prediction of what AIs will be built, since it does not stand as a prediction about how people will reason morally. The option of achieving safety by aiming lower...the technique that leads us to have speed limits, rather than struggling to make the faster possible car safe...is still available. The God AI concept is related to another favourite MIRI theme, the need to instil the whole of human value into an AI, something MIRI admits would be very difficult. . MIRI makes the methodological proposal that it simplifies the issue of friendliness or morality or safety to deal with the whole of human value, rather than identifying a morally relevant subset. Having done that, it concludes that human morality is extremely complex. In other words, the payoff in terms of methodological simplification never arrives, for all that MIRI relieves itself of the burden of coming up with a theory of morality. Since dealing with human value in total is in absolute terms very complex, the possibility remains open that identifying the morally relevant subset of values is relatively easier (even if still di 2Kaj_Sotala5yFrom section 5.1.1. of Responses to Catastrophic AGI Risk [http://iopscience.iop.org/1402-4896/90/1/018001/article#ps505672s5-1-1]: 1TheAncientGeek5yThe weaponisation of AI has indeed already begun, so it is not a danger that needs pointing out. It suits the military to give drones, and so forth, greater autonomy, but it also suits the military to retain overall control....they are not going to build a God AI that is also a weapon, since there is no military mileagei n building a weapon that might attack you out of its own volition. So weaponised AI is limited agentive AI. Since the military want .to retain overall control, they will in effect conduct their own safety research, increasing the controlability of their systems in parallel with their increasing autonomy. MIRIs research is not very relevant to weaponised AI, because MIRI focuses on the hidden dangers of apparently benevolent AI, and on god AIs, powerful singletons. 1TheAncientGeek5yYou may be tacitly assuming that an AI is either passive, like Oracle AI , .or dangerously agentive. But we already have agentive AIs that haven't killed us. I am making a three way distinction between 1. Non agentive AI 2. Limited agentive AI 3. Maximally agentive AI, .or "God" AI. Non agentive AI is passive, doing nothing once it has finished processing its current request. It is typified by Oracle AI. Limited agentive AI performs specific functions, and operates under effective overrides and safety protocols. (For instance, whilst it would destroy the effectiveness of automated trading software to have a human okaying each trade, it nonetheless has kill switches and sanity checks). Both are examples of Tool AI. Tool AI can be used to do dangerous things, but the responsibility ultimately falls on the tool us Maximally agentive AI is not passive by default, and has a wide range if capabilities. It may be in charge of other AIs, or have effectors that allow it to take real world actions directly. Attempts may have been made to add safety features, but their effectiveness would be in doubt...thatis just the hard problem of AI friendliness that MIRI writes so much about. The contrary view is that there is no need to render God AIs safe technologically, because other is no incentive to build them.(Which does not mean the whole field of AI safety is pointless ETA On the other hand you may be distinguishing between limited and maximal agency, but arguing that there is a slippery slope leading from the one to the other. The political analogy shows that people are capable of putting a barrier across the slope: people are generally happy to give some power to some politicians, but resist moves to give all the power to one person. On the other hand, people might be tempted to give AIs more power once they have a track record of reliability, but a track record of reliability is itself a kind of empirical safety proof. 1TheAncientGeek5yThere is a further argument to the effect that we are gradually giving more autonomy to agentive AIs (without moving entirely away from oracle AIs like Google) , but that gradual increase is being paralelled by an incremental approach to AI safety, for instance in automated trading systems, which have been given both more ability to trade without detailed oversight, and more powerful overrides. Hypothetically, increased autonomy without increased safety measures would mean increased danger, but that is not the case in reality. I am not arguing against AI danger and safety measures overall, I am arguing against a grandiose, all-or-nothing conception of AI safety and danger. 1jacob_cannell5yI like it. (Replying to my own text above). On consideration this is wrong - Google is an oracle-AI more or less, and there is high demand for that. The demand for agenty AI is probably much greater, but there is still a role/demand for oracle AI and alot of other stuff in between. Totally. I think this also goes hand in hand with understanding more about human values - how they evolved, how they are encoded, what is learned or not etc. Of course - there are many niches for more specialized or limited agentive AI, and these designs probably don't need altruism. That's important more for the complex general agents, which would control/manage the specialists, narrow AIs, other software, etc. 3TheAncientGeek5yThat seems to be re introducing God AI. I think people would want to keep humans in the loop. That's both a prediction, and a means of AI safety. 1CalmCanary5ySo if I spouted 100 billion true statements at you, then said, "It would be good for you to give me$100,000," you'd pay up?
5Houshalter5yIf you just said a bunch of trivial statements 1 billion times, and then demanded to give you money, it would seem extremely suspicious. It does not fit with your pattern of behavior. If, on the other hand, you gave useful and non-obvious advice, I would do it. Because the demand to give you money wouldn't seem any different than all the other things you told me to do that worked out. I mean, that's the essence of the human concept of earning trust, and betrayal.
1[anonymous]5yYes, but expecting any reasoner to develop well-grounded abstract concepts without any grounding in features and then care about them is... well, it's not actually complete bullshit, but expecting it to actually happen relies on solving some problems I haven't seen solved. You could, hypothetically, just program your AI to infer "goodness" as a causal-role concept from the vast sums of data it gains about the real world and our human opinions of it, and then "maximize goodness", formulated as another causal role. But this requires sophisticated machinery for dealing with causal-role concepts, which I haven't seen developed to that extent in any literature yet. Usually, reasoners develop causal-role concepts in order to explain what their feature-level concepts are doing, and thus, causal-role concepts abstracted over concepts that don't eventually root themselves in features are usually dismissed as useless metaphysical speculation, or at least abstract wankery one doesn't care about.
0Houshalter5yI don't think you are responding the the correct comment. Or at least I have no idea what you are talking about.
1Unknowns5yYes, I would, assuming you don't mean statements like "1+1 = 2", but rather true statements spread over a variety of contexts such that I would reasonably believe that you would be trustworthy to that degree over random situations (and thus including such as whether I should give you money.) (Also, the 100 billion true statements themselves would probably be much more valuable than $100,000). 1V_V5yAccording to game theory, this opens you to exploitation by an agent that wants your money for its own gain and can generate 100 billion true statements at a little cost. 1faul_sname5yIf those 100 billion true statements were all (or even mostly) useful and better calibrated than my own priors, then I'd be likely to believe you, so yes. On the other hand, if you replace$100,000 with \$100,000,000,000, I don't think that would still hold. I think you found an important caveat, which is that the fact that an agent will benefit from you believing a statement weakens the evidence that the statement is true, to the point that it's literally zero for an agent that you don't trust at all. And if an AI will have a human-like architecture, or even if not, I think that would still hold.
-1TheAncientGeek5yYou may be already doiving this, giving money to people whose claims you believe yoursel