Produced as part of the SERI ML Alignment Theory Scholars Program with support from Cavendish Labs, Many thanks go to the following for reading and commenting: Dennis Akar, Walter Laurito, Soroush Pour, Kaarel HänniGoals of post:
Aim to answer these questions:
(Future post: 2. Potential ways to create further convergence, experimental ideas)
In machine learning, when referring to the inductive bias of a particular architecture and training process, we are pointing to the distribution of models it produces. For example, if a CNN is trained to classify a training set of red apples vs green pears, is it more likely to become a red-vs-green-object-classifier, or a round-vs-pointy-object-classifier (or something entirely different). The term ‘bias’ in this case refers to any basis for choosing one generalization over another one, differing from strictly following the observed training instances.
Inductive bias can be thought of as the set of rules governing what kind of algorithm is likely to end up being implemented in the model architecture, including patterns in how it will generalize to test data and out-of-distribution. This allows the model to generalize. Occam's razor is a classic example of inductive bias, which says the simplest consistent explanation is generally the most likely to be the correct one.
There are some known factors that contribute to the creation of a model’s inductive bias, such as: choice in model architecture, training data, feature selection/preprocessing, and optimization algorithms such as Stochastic Gradient Descent (SGD) can all introduce different biases to different degrees.
Humans are great at adapting quickly to a variety of situations, even ones they haven’t seen before. This could be the result of robust inductive biases in abstract structured knowledge.The use of inductive bias in human decision making is crucial to allowing us to draw conclusions based on available information, make efficient decisions in uncertain situations, and generalize. Inductive biases can manifest in multiple ways, including limitations on memory and learning, general principles favoring simplicity, limitations on people's hypotheses shaped by cultural transmission, and patterns in their environments.
How can we computationally measure psychological occurrences in the brain in order to offer some translation for models? What can we learn from the way humans develop learning preferences in childhood? Once a bias is learned, does this have an effect on our ability to learn other things?
In order to answer these questions, we will take a look at computational neuroscience research, the biases that result from early-childhood learning, and how the brain develops.
Some studies suggest some human behaviors are not learned from the ground up and could perhaps emerge as a result of evolution-afforded innate skills that affect how we learn. Human inductive bias development could be influenced in part by these innate capacities.
Studies have shown that the act of behavioral imitation is very important when learning a new skill (Aitken, 2018). During infancy humans imitate facial movements and gestures (Andrew N Meltzoff et al., 1977). Observations from imitation studies have pointed to an innate cortical structure in the brain that further backs learning through imitation, such as mirror neurons and a mirror system (Rizzolatti et al., 2001). This activation pattern is the same whether the person is performing the action or they are just observing the action being completed. Imitation is imperative to infant learning, however humans utilize this way of learning throughout their adult lives (Möttönen et al., 2005). Human inductive bias development could be influenced in part by the innate capacity for imitation and learning through observation, as evidenced by the presence of mirror neurons and the mirror system. This bias towards imitation shapes how humans generalize from observed actions and incorporate them into their behavior.
Humans are continually reevaluating and modifying their belief systems when new information and evidence is presented in the sensorimotor stream, this is known as metacognition. Understanding the role of imitation and mirror systems in human learning can inform the design of AI systems that update and improve over time, similar to humans' ability to modify their beliefs based on new information. In a study looking at how humans adapt to their surroundings and learn to predict future events based on their past experiences, findings revealed that people's internal models were influenced by two factors. First, there was the "ideal observer" model, which represents a perfect understanding of the task. Second, there was a simpler "Markov model" that only considered the most recent experience.Initially people relied more on the simpler Markov model but with greater task familiarity, aspects of the ideal observer model became more prevalent. This suggests that our mental models of the world are not fixed but instead adapt and become more sophisticated over time. However, the rate and extent of this sophistication vary among individuals. So, while people generally have a learning bias that follows a certain pattern, the strength and persistence of this pattern can differ from person to person and we tend to form approximate models that evolve over time.
Humans will also display a shape bias, beginning in childhood, when they will lean towards categorizing objects by shape instead of by other categories such as texture or color. There are a few debates surrounding the origin and reasoning for this phenomenon, such as whether this bias arises from learned associations between objects and words or if this is a general view that shape is a consistent cue of that object’s category. While daily exposure to shapes plays a role in shaping our perceptual biases, these studies suggest that the shape bias goes beyond mere exposure and that this bias is primarily the result of children’s beliefs about object categories and not just associative learning.
A probabilistic model outlines how we deal with problems where we need to make educated guesses based on data. It includes formally defining the problem through hypotheses, how these guesses relate to the data we observe, and the initial likelihood of each guess, or the prior probability. Through this process we can clearly uncover any assumptions and explore how these might have an effect. These hypotheses can include a variety of different elements, such as numbers in a neural network or hierarchical symbols, which should represent the likelihood of specific outcomes.
Probabilistic models of cognition provide solutions to inductive problems in cognitive science. These problems involve making uncertain guesses based on incomplete or noisy information, and probabilistic frameworks help us understand how people tackle them. In contrast, connectionism, another approach in cognitive science, relies on ‘graded, continuous vector spaces without explicit structure and are primarily shaped by experience through gradual error-driven learning algorithms’ or otherwise described as continuous representations shaped by learning from experience. This differs from traditional research that primarily utilizes structured concepts like rules and logic to explain higher-level cognitive functions.While connectionist models and bayesian models are distinct approaches, it’s possible they could be connected as a means to get closer to brain-like models. Many connectionist algorithms used for learning and inference can be interpreted from a Bayesian perspective, suggesting that certain types of probabilistic reasoning might be implemented in the brain. Some researchers focus on explicitly incorporating probabilistic ideas into connectionist models while preserving their unique hierarchical structure, while others aim to implement core Bayesian computations and models using biologically plausible mechanisms. However, there is still much to learn about how the brain represents and implements structured knowledge for probabilistic inference, which presents a significant challenge for theoretical neuroscience.
However, it’s possible to add probabilistic aspects to connectionist models through Bayesian neural networks. In work by Griffiths et. al, probabilistic priors are introduced to the model's weights and Bayesian inference is employed to calculate the posterior distribution of the weights based on the available data. They also argue that ‘probabilistic inference over structured representations is crucial for explaining the use and origins of human concepts, language, or intuitive theories’. However, there is limited understanding surrounding the implementation of these structured representations in neural systems. The claim is also made that the primary challenge in theoretical neuroscience is not understanding how the brain carries out probabilistic inference, but rather how it represents the structured knowledge on which such inference is based.
When studying human cognition and neural pathways, there are three levels of analysis to consider. The first level is the "computational" level, which focuses on understanding the problem faced by the mind and how it can be solved functionally. The second level is the "algorithmic" level, which describes the specific processes executed by the mind to achieve this solution. The third level is the "hardware" level, which specifies how these processes are implemented in the brain.
When it comes to modeling inductive bias, there are two main approaches: a bottom-up, mechanism-first model, or a top-down, function-first strategy. The bottom-up approach begins by identifying the neural or psychological mechanisms believed to be responsible for cognition and then seeks to explain behavior in terms of those mechanisms. On the other hand, probabilistic models of cognition follow a top-down approach, starting by considering the function that a particular cognitive aspect serves and explaining behavior based on performing that function. While there are still debates as to which approach is the correct one, each presents valid arguments and further study is needed.
A potential way to reveal a human's inductive bias is through iterated learning from tasks based on Bayesian Inference. This technique is based on methods found in mechanism design in theoretical economics. In this framework, participants are viewed as rational and as always trying to maximize their own benefits or ‘utility’.
The goal here is to reveal and analyze the inductive biases of human learners during category learning. In order to measure this, participant responses in one trial were used to create the next set of stimuli for either that same participant or someone else. By analyzing this iterated learning process and assuming that learners think like Bayesian agents, we can predict and reveal their inductive biases in the form of a probability distribution over different explanations.
When we study how learners pass information to each other through iterated learning, we notice that their beliefs tend to settle on their initial assumptions, or prior probabilities. As more rounds of learning are completed, the likelihood of people choosing a specific idea should become very close to their original belief in that idea. This way, we can figure out which hypotheses people tend to favor.
The following section will focus on studies that aim to understand the ways we can measure and accurately identify inductive bias in a model and possible ways this initial machine bias can be shifted to match that of a human.
It's often hard to pinpoint exactly what these inductive biases are and how to adjust them during the system's design process. In addition to classification task results, the following are a few potential frameworks that can be used to measure and identify this depending on the context.
One such system proposed by Li et al., a meta-learning Gaussian process hyperparameters as a means of quantifying inductive biases in a model is used. Specifically, adjusting the hyperparameters of the Gaussian process based on model predictions. This framework was shown to accurately capture inductive biases in neural network models via GP kernel hyperparameters.
In an alternative framework proposed by Kharitonov & Chaabouni, inductive bias of standard seq2seq models (Transformer, LSTM, and CNN) was studied in order to understand how these models develop inductive biases through arithmetic, hierarchical, and compositional “reasoning”. To understand how a model (M) develops inductive biases, a training dataset containing input/output pairs and a separate set of inputs was used. In this paradigm, there are two possible rules (C1 and C2) that could explain the training data. After training the model on the data, its bias was determined towards either C1 or C2 by comparing its outputs on the separate set of inputs with the outputs of the respective rules.
Dataset description length was used as a sensitive measure of inductive bias, in light of a connection to Solomonoff’s theory of Induction and Minimal Description Length. By identifying regularities in the data and using them to compress the data efficiently, a deeper understanding of the underlying structure and patterns of the data is gained. This method was used as a way to measure how much the model relies on certain assumptions when making those guesses. It was found that the inductive biases of different seq2seq models differ from each other in distinctive ways. It was also demonstrated that some seq2seq learners show strong human-like biases and effectively apply these biases to learn language-related behaviors with great accuracy. LSTM-s2s exhibited a preference for the hierarchical bias, which has been suggested to play a significant role in how children acquire syntax.
ImageNet trained CNNs on the other hand, are strongly biased towards recognizing textures rather than shapes, which is different from human behavioral data. However, in this study it was shown that the same standard architecture (ResNet-50) that learns a texture-based representation on ImageNet is able to learn a shape-based representation instead when trained on ‘Stylized-ImageNet’, a stylized version of ImageNet that uses style transfer to remove texture cues, forcing the models to pay more attention to shapes. This suggests that the texture bias isn't inherent to CNNs, but rather a result of the data they're usually trained on.
It was also found that the shape-based strategy was often more robust to various image distortions compared to the texture-based approach. This is also important for neuroscience communities, as CNNs are generally used as computational human vision models of object recognition and shape perception.
Co-training is a semi-supervised machine learning technique where two models are trained simultaneously on different views of the same data. Each model initially learns from a small set of labeled data and then makes predictions on unlabeled data, with only confident predictions are shared between the models and added to their training sets. This process repeats with each model learning from the confident predictions of the other, allowing them to teach each other.
In a study conducted by Kumar et. al. the inductive bias of the model was altered by co-training the meta-learning agents on two types of auxiliary tasks: predicting representations from natural language task descriptions and predicting representations from programs induced to generate such tasks. The idea behind this approach is that human-generated language descriptions and program induction models with learned primitives contain abstract concepts that can compress description length, leading to more human-like inductive bias as measured through the agent's ability to show an improvement in performance on a human-generated game board and a worsened performance on the machine generated board. The co-training process leverages the abstraction supported by these representations to guide the agents toward better alignment with human strategies.
By co-training on these representations, they found that their approach resulted in more human-like behavior in downstream meta-reinforcement learning agents compared to less abstract controls, such as synthetic language descriptions or program induction without learned primitives. This suggests that the abstraction supported by the chosen representations plays a crucial role in shaping the inductive biases of the model. The authors argue that language descriptions and program abstractions can act as repositories for human inductive biases.
Directly transferring these insights into models might not be straightforward. Humans and AI have different constraints and capabilities, and what works well for human cognition might not necessarily be optimal or even feasible for a model. The challenge lies in interpreting and applying these findings in a way that admits limitations while critically examining the root of why we would transfer a specific human-like bias to a model.
The following studies look at comparisons of inductive bias between models and humans, specifically in a shape/color/texture identification task. This paradigm has been commonly used due to our understanding of how humans establish a shape bias early in development. When training a model, different methods employed in training such as co-training and using alternative training data can potentially alter the model’s inductive bias.
In one study by Ritter et. al, (2017), Deep Neural Networks (DNNs) were used to study the categorization biases in these models, drawing inspiration from developmental psychology.
Performance-optimized DNNs trained on the ImageNet object recognition dataset showed a shape bias similar to humans. However, the strength of this shape bias varies significantly among different models that are structurally identical but initialized with a different seed, and even changes over time during the training process, despite nearly equivalent performance in classification tasks. The biases arose from the model's architecture and the dataset, which interact through the optimization process.
These findings revealed that one-shot learning models exhibited a shape bias similar to that observed in humans, preferring to categorize objects based on shape rather than color.
In a study by Colunga and Sims (2005) showed the ways in which a simple recurrent neural network, trained with Hebbian learning, acquired a shape bias for solid objects and a material bias for non-solid objects.
Feinman & Lake (2018) measured the development and influence of the shape bias in convolutional neural networks (CNNs). The study demonstrated that the model exhibited a preference for identifying objects by shape over color or texture, even when presented with less data than expected. Using simple and synthetic images, they studied how the model learned to recognize shapes similarly to humans. Understanding shape in these complex images requires the model to make more complex generalizations and abstractions, resembling human categorization.
Additionally, the development of the shape bias is a benchmark sign used to predict the beginning of vocabulary acceleration in children. If the model’s shape bias is any indication of this, would this same phenomena be seen in CNNs?In this same study, the relationship between shape-based choices and noun recognition was investigated, and found to have a strong positive correlation. In this paradigm, early stages of model "word learning" were investigated, defined as the time when the model knows less than two-thirds of the total number of nouns it is trained to recognize. In this task it was shown that the more the model leaned towards recognizing shapes, the more nouns it could correctly identify.
There are a few strategies that have been suggested to instill a human-inspired inductive bias in a model where this had originally diverged, such as through co-training and meta-learning, altering datasets for training, using human-generated language descriptions, and testing one-shot learning.
The aim of this work lies in its usefulness as a piece of the larger puzzle or as a stepping stone for future related research. In future posts I plan to outline a stronger case relating to the usefulness in alignment research, if results from my own studies point in that direction. Ultimately I’m unsure of how exactly this would aid in solving an alignment issue directly at this time. However, I’m wondering if by aligning a model’s inductive bias with that of a human’s, we may get closer to the development of a system that more closely addresses human focused needs. My intuition here is that while there may be multiple different options in which to steer a model’s inductive bias, the machine-like ones may generalize in machine-like ways in other tasks, while the human-like ones may generalize in human-like ways.
I could also see a possibility of greater goal generalization but am unsure of what types of human-like biases this would require. Failure modes spring up when considering how to instill bias without sacrificing model performance and in deciding which human bias we want to attempt to instill. It’s important to avoid instilling every bias, as some of them would be harmful, prejudiced, or discriminatory.
Human cognition is a multi-faceted process that is difficult to replicate fully. Even so, the human inductive biases that we are able to measure and attempt to recreate in a model makes this an interesting and potentially fruitful research avenue.