Nevertheless, I shall take advantage of your kindness in assuming we agree that a science cannot be conditioned upon empiricism.
— Jacques Lacan, “The Subversion of the Subject and the Dialectic of Desire in the Freudian Unconscious”[1]
Freud developed the first modern theory of the unconscious. His writings on drives, dreams and the id were instrumental in developing modern practices of psychology and neuroscience. Modern researchers are unlikely to leverage concepts like the superego or the oedipal complex, because we have been able to further our understanding of the mind through empirical work, which does not support many of Freud’s specific claims. Freud pushed us in the right direction, but he lacked an empirical foundation to make precise claims.
Freudian psychoanalysis mapped well to our narrative claims about the mind. Particularly coming out of the prudishness of late 19th century Europe, it was intuitive to learn we all had unconscious drives that did not track with societal norms. Stefan Zweig, writing about the austere Austro-Hungarian norms of his youth, wrote:
The more a woman was supposed to appear as a “lady,” the less her natural forms were permitted to be discernible; fundamentally, fashion, with this deliberate guiding principle of hers, was merely serving obediently the general moral tendency of the time, whose chief concern was covering and concealing.[2]
— Stefan Zweig, Die Welt von Gestern (1942)
Leaving this period of social conservatism for the competing liberalisation and violence of the first half of the 20th century, it should not be surprising if the public was eager to learn about our suppressed tendencies, nor if they continued to explore this narrative framework far past its utility.
In 1952, Jacques Lacan began a series of seminars on his idiosyncratic psychoanalytic theory, which would later be published as “Écrits” in 1966. Building on the work of Freud and connecting it with the structuralist linguistics developed by Ferdinand de Saussure, Lacan’s work is famously dense. He builds a new vocabulary to situate the self beside language and sexual desire/anxiety, introducing terms like big Other and little other with distinct symbols (A, a) and “algebraic” relationships. This leads to graphs like
Lacan, Écrits (Seuil, 1966), 817
and equations like
Lacan, Écrits (Seuil, 1966), 819
Both examples come from “The Subversion of the Subject and the Dialectic of Desire,” which is generally understood to be one of the best introductory texts on Lacan.
Recall that, by the time of the publication of “Écrits,” the “cognitive revolution” had been underway for nearly a decade. This would eventually mature into cognitive science, and our modern understanding of psychology, neuroscience, linguistics and artificial intelligence.
Understanding Lacan’s work presents a unique intellectual challenge for his readers. Working through his formalizations can be fun, in the same way it is exciting to figure out a complicated puzzle. The use of symbols and algebraic notation provides the allure of sophistication that prose alone cannot. Making statements like “castration represents the disavowal of the Real by the Imaginary” is a helpful signal of one’s own intellect.
Theory does not require strictly empirical work, but building theories on top of theories creates castles in the sky, with little predictive power for reality.
In college, I spent a a lot of time studying the French intellectual tradition. Not everyone in the tradition is so separated from reality. Writers like Albert Camus articulated a meaningful engagement with life in a post-theistic society; intellectuals like Foucault turned to history to reveal our latent power structures. But obscurantist authors like Lacan and Derrida have spawned numerous imitators. As a former Arrogant Student, I can attest that writing in reference to this dense terminology can make one feel satisfyingly smart. But I think the primary motivation behind authors in this tradition has been to drape their writing in a pretense of rigor, stealing the formalisms of science to write without its methodologies.
Theory, to remain rigorous, needs to conceptualize and assess its own ties to reality. If you remain too bound to present understanding, you will never advance knowledge. But if you build upon layers of abstraction, without careful justification, you will deviate from an accurate model of ground truth.
As I have begun to research superintelligence alignment in the past year, particularly agent foundations, I have sometimes had the uncanny feeling that I was back in college studying continental philosophy. Reading about functional decision theory and deep deceptiveness certainly makes me feel like I am learning about the challenges of alignment. But much of the material in the field has been written in such a way that it is difficult to unpack the chain of assumptions leading to a single argument. Rather than dealing with crisp theoretical formulations which make explicit claims, one argues through a series of metaphors, with an ill-defined link back to reality.
I believe this is partly due to the specific writing style of Eliezer Yudkowsky, which tends to be verbose and prone to metaphor and other narrative approaches of argumentation. As a foundational figure in alignment research, it makes sense that he would continue to hold sway over present styles of authorship. But if Yudkowsky’s style is well-suited for introducing an online, technical audience to alignment problems, it is poorly suited for arguing those claims robustly against specific, material considerations.
A lot of commentary on AI risk and alignment theory takes place on LessWrong. As a community blog intended to provoke new directions of thought, LessWrong is fantastic! An important first step in developing new theories is to work on concepts, while formalization and evidence come later. But a lot of actual alignment work also occurs directly on LessWrong, or the associated AI Alignment Forum. Because of this dual use, I feel that there has been a mode collapse, where it is difficult to distinguish between metaphorical/conceptual work and rigorous research.
Consider this example from Andrew Critch’s boundaries sequence (which I liked a lot but was fresh on my mind). Critch introduces a cross-disciplinary concept of “boundaries,” which he suggests are an important precept in modeling agent behavior and preferences. Early in the sequence, he uses the example of an employee who can/cannot maintain a good boundary between their personal life and work.
This is a nice conceptual framing! It does seem that “boundaries” are a useful emergent property of a number of different theories, which merits further thinking. I don’t have a good formal conception of what boundaries might look like, but I can intuit how they will behave in practice.
Later in the sequence, Critch attempts to formalize boundaries using an “approximate directed (dynamic) Markov blanket.”
Critch’s formalization does not come out of thin air. Active inference, for example, also uses Markov blankets to formalize the boundary of a single agent. But I worry that we have now overformalized a conceptual theory, or attempted to turn metaphor into math! While formalization does allow for greater specificity and may feel intuitive to individuals coming from a background heavy in math, I worry that this brings an unjustified level of precision to a solution that is inherently approximate.
Does this bring us closer to a transferable understanding of boundaries? Or have we allowed ourselves to get lost in unnecessary details?
Far more problematic to me are arguments like the sharp left turn, which become load-bearing in some misalignment scenarios, but are poorly specified in the work of Nate Soares and others. As Paul Christiano astutely comments:
More concretely, you talk about novel mechanisms by which AI systems gain capabilities, but I think you haven’t said much concrete about why existing alignment work couldn’t address these mechanisms. This looks to me like a pretty unproductive stance; I suspect you are wrong about the shape of the problem, but if you are right then I think your main realistic path to impact involves saying something more concrete about why you think this.
I worry that some alignment researchers have become averse to empiricism and rigorous justification of claims. LessWrong provides a low-stakes environment to share sketches of ideas that will be formalized elsewhere. But some never seem to escape the stylistic pull of the internet blog format. Mixing informal and formal techniques creates mimetic risk for the alignment researcher.
Writers are usually upfront when their thoughts are cursory or speculative. The “epistemic status” label is meant to be an honest acknowledgement that different posts come with different strengths of claims. But in practice, the label functions as a permission slip to say whatever one wants. As long as you’re honest that your claims are groundless, then there’s nothing to stop you from making your claims! As Slavoj Zizek put it in a defense of psychoanalysis:
Isn’t my virtual persona in a way ‘more real than reality’? Isn’t it precisely because I am aware that this is ‘just a game’ that in it I can do what I would never be able to in the real world?
Can’t writing on LessWrong also feel like a game, where the objective is to cleverly restate your idea with mathematics or tie it back to a niche concept from The Sequences?
There is a lot of good discourse in the community. I do not mean to devalue how important some of the problems raised by agent foundations can be. But we should be careful not to reside too long in the “virtual” world of speculative discourse, which leads to play and clever anecdotes, rather than scientific advancement and technological change.
Freud developed the first modern theory of the unconscious. His writings on drives, dreams and the id were instrumental in developing modern practices of psychology and neuroscience. Modern researchers are unlikely to leverage concepts like the superego or the oedipal complex, because we have been able to further our understanding of the mind through empirical work, which does not support many of Freud’s specific claims. Freud pushed us in the right direction, but he lacked an empirical foundation to make precise claims.
Freudian psychoanalysis mapped well to our narrative claims about the mind. Particularly coming out of the prudishness of late 19th century Europe, it was intuitive to learn we all had unconscious drives that did not track with societal norms. Stefan Zweig, writing about the austere Austro-Hungarian norms of his youth, wrote:
Leaving this period of social conservatism for the competing liberalisation and violence of the first half of the 20th century, it should not be surprising if the public was eager to learn about our suppressed tendencies, nor if they continued to explore this narrative framework far past its utility.
In 1952, Jacques Lacan began a series of seminars on his idiosyncratic psychoanalytic theory, which would later be published as “Écrits” in 1966. Building on the work of Freud and connecting it with the structuralist linguistics developed by Ferdinand de Saussure, Lacan’s work is famously dense. He builds a new vocabulary to situate the self beside language and sexual desire/anxiety, introducing terms like big Other and little other with distinct symbols (A, a) and “algebraic” relationships. This leads to graphs like
Lacan, Écrits (Seuil, 1966), 817
and equations like
Lacan, Écrits (Seuil, 1966), 819
Both examples come from “The Subversion of the Subject and the Dialectic of Desire,” which is generally understood to be one of the best introductory texts on Lacan.
Recall that, by the time of the publication of “Écrits,” the “cognitive revolution” had been underway for nearly a decade. This would eventually mature into cognitive science, and our modern understanding of psychology, neuroscience, linguistics and artificial intelligence.
Understanding Lacan’s work presents a unique intellectual challenge for his readers. Working through his formalizations can be fun, in the same way it is exciting to figure out a complicated puzzle. The use of symbols and algebraic notation provides the allure of sophistication that prose alone cannot. Making statements like “castration represents the disavowal of the Real by the Imaginary” is a helpful signal of one’s own intellect.
Theory does not require strictly empirical work, but building theories on top of theories creates castles in the sky, with little predictive power for reality.
In college, I spent a a lot of time studying the French intellectual tradition. Not everyone in the tradition is so separated from reality. Writers like Albert Camus articulated a meaningful engagement with life in a post-theistic society; intellectuals like Foucault turned to history to reveal our latent power structures. But obscurantist authors like Lacan and Derrida have spawned numerous imitators. As a former Arrogant Student, I can attest that writing in reference to this dense terminology can make one feel satisfyingly smart. But I think the primary motivation behind authors in this tradition has been to drape their writing in a pretense of rigor, stealing the formalisms of science to write without its methodologies.
Theory, to remain rigorous, needs to conceptualize and assess its own ties to reality. If you remain too bound to present understanding, you will never advance knowledge. But if you build upon layers of abstraction, without careful justification, you will deviate from an accurate model of ground truth.
As I have begun to research superintelligence alignment in the past year, particularly agent foundations, I have sometimes had the uncanny feeling that I was back in college studying continental philosophy. Reading about functional decision theory and deep deceptiveness certainly makes me feel like I am learning about the challenges of alignment. But much of the material in the field has been written in such a way that it is difficult to unpack the chain of assumptions leading to a single argument. Rather than dealing with crisp theoretical formulations which make explicit claims, one argues through a series of metaphors, with an ill-defined link back to reality.
I believe this is partly due to the specific writing style of Eliezer Yudkowsky, which tends to be verbose and prone to metaphor and other narrative approaches of argumentation. As a foundational figure in alignment research, it makes sense that he would continue to hold sway over present styles of authorship. But if Yudkowsky’s style is well-suited for introducing an online, technical audience to alignment problems, it is poorly suited for arguing those claims robustly against specific, material considerations.
A lot of commentary on AI risk and alignment theory takes place on LessWrong. As a community blog intended to provoke new directions of thought, LessWrong is fantastic! An important first step in developing new theories is to work on concepts, while formalization and evidence come later. But a lot of actual alignment work also occurs directly on LessWrong, or the associated AI Alignment Forum. Because of this dual use, I feel that there has been a mode collapse, where it is difficult to distinguish between metaphorical/conceptual work and rigorous research.
Consider this example from Andrew Critch’s boundaries sequence (which I liked a lot but was fresh on my mind). Critch introduces a cross-disciplinary concept of “boundaries,” which he suggests are an important precept in modeling agent behavior and preferences. Early in the sequence, he uses the example of an employee who can/cannot maintain a good boundary between their personal life and work.
Critch, «Boundaries» Part 2 (2022)
This is a nice conceptual framing! It does seem that “boundaries” are a useful emergent property of a number of different theories, which merits further thinking. I don’t have a good formal conception of what boundaries might look like, but I can intuit how they will behave in practice.
Later in the sequence, Critch attempts to formalize boundaries using an “approximate directed (dynamic) Markov blanket.”
Critch, «Boundaries» Part 3a (2022)
Critch’s formalization does not come out of thin air. Active inference, for example, also uses Markov blankets to formalize the boundary of a single agent. But I worry that we have now overformalized a conceptual theory, or attempted to turn metaphor into math! While formalization does allow for greater specificity and may feel intuitive to individuals coming from a background heavy in math, I worry that this brings an unjustified level of precision to a solution that is inherently approximate.
Does this bring us closer to a transferable understanding of boundaries? Or have we allowed ourselves to get lost in unnecessary details?
Far more problematic to me are arguments like the sharp left turn, which become load-bearing in some misalignment scenarios, but are poorly specified in the work of Nate Soares and others. As Paul Christiano astutely comments:
I worry that some alignment researchers have become averse to empiricism and rigorous justification of claims. LessWrong provides a low-stakes environment to share sketches of ideas that will be formalized elsewhere. But some never seem to escape the stylistic pull of the internet blog format. Mixing informal and formal techniques creates mimetic risk for the alignment researcher.
Writers are usually upfront when their thoughts are cursory or speculative. The “epistemic status” label is meant to be an honest acknowledgement that different posts come with different strengths of claims. But in practice, the label functions as a permission slip to say whatever one wants. As long as you’re honest that your claims are groundless, then there’s nothing to stop you from making your claims! As Slavoj Zizek put it in a defense of psychoanalysis:
Can’t writing on LessWrong also feel like a game, where the objective is to cleverly restate your idea with mathematics or tie it back to a niche concept from The Sequences?
There is a lot of good discourse in the community. I do not mean to devalue how important some of the problems raised by agent foundations can be. But we should be careful not to reside too long in the “virtual” world of speculative discourse, which leads to play and clever anecdotes, rather than scientific advancement and technological change.
Écrits, trans. Bruce Fink (Norton, 2006), 672
Translation by Claude