Neural nets as a model for how humans make and understand visual art

by Owain_Evans1 min read9th Nov 20197 comments


Machine LearningArt

This is a new paper relating experimental results in deep learning to human psychology and cognitive science. I'm excited to get feedback and comments. I've included some excerpts below.


This paper is about the cognitive science of visual art. Artists create physical artifacts (such as sculptures or paintings) which depict people, objects, and events. These depictions are usually stylized rather than photo-realistic. How is it that humans are able to understand and create stylized representations? Does this ability depend on general cognitive capacities or an evolutionary adaptation for art? What role is played by learning and culture?

Machine Learning can shed light on these questions. It’s possible to train convolutional neural networks (CNNs) to recognize objects without training them on any visual art. If such CNNs can generalize to visual art (by creating and understanding stylized representations), then CNNs provide a model for how humans could understand art without innate adaptations or cultural learning. I argue that Deep Dream and Style Transfer show that CNNs can create a basic form of visual art, and that humans could create art by similar processes. This suggests that artists make art by optimizing for effects on the human object-recognition system. Physical artifacts are optimized to evoke real-world objects for this system (e.g. to evoke people or landscapes) and to serve as superstimuli for this system.

From "Introduction"

In a psychology study in the 1960s, two professors kept their son from seeing any pictures or photos until the age of 19 months. On viewing line-drawings for the first time, the child immediately recognized what was depicted. Yet aside from this study, we have limited data on humans with zero exposure to visual representations.


For the first time in history, there are algorithms [convolutional neural nets] for object recognition that approach human performance across a wide range of datasets. This enables novel computational experiments akin to depriving a child of visual art. It’s possible to train a network to recognize objects (e.g. people, horses, chairs) without giving it any exposure to visual art and then test whether it can understand and create artistic representations.

From "Part 1: Creating art with networks for object recognition"

Figure 2. Outputs from testing whether a conv-net model can generalize to paintings. Results are fairly impressive overall. However, in Picasso painting on the right, the people are classified as "stop sign, frisbee".

Figure 12. Diagram showing how Deep Dream and Style Transfer could be combined. This generates an image that is a superstimulus for the conv net (due to the Deep Dream loss) and has the style (i.e. low-level textures) of the style image. Black arrows show the forward pass of the conv net. Red arrows show the backward pass, which is used to optimize the image in the center by gradient descent.

Figure 13. Diagram showing how the process in Figure 12 can be extended to humans. This is the "Sensory Optimization" model for creating visual art. For humans, the input is a binocular stream of visual perception (represented here as "content video" frames). The goal is to capture the content of this input in a different physical medium (woodcut print) and with a different style. The optimization is not by gradient descent but by a much slower process of blackbox search that draws on human general intelligence.

Figure 14. Semi-abstract images that are classified as “toilet”, “house tick”, and “pornographic” (“NSFW”) by recognition nets. From Tom White’s “Perception Engines” and “Synthetic Abstractions” (with permission from the artist).