John Nay


A.I. researcher. Conducted research funded by the U.S. National Science Foundation and the U.S. Office of Naval Research. Created first A.I. course at the NYU School of Law. Published research on machine learning, finance, law, policy, economics, and climate change. Publications at, and Twitter at

Wiki Contributions


Thanks so much for sharing that paper. I will give that a read.

I just posted another LW post that is related to this here:


There seems to be pretty wide disagreement about how intent-aligned AGI could lead to a good outcome. 

For example, even in the first couple comments to this post: 

  1. The comment above ( suggests "wide open decentralized distribution of AI" as the solution to making intent-aligned AGI deployment go well. 
  2. And this comment I am replying to here says, "I could see the concerns in this post being especially important if things work out such that a full solution to intent-alignment becomes widely available."

My guess, and a motivation for writing this post, is that we see something in between (a.) wide and open distribution of intent-aligned AGI (that somehow leads to well-balanced highly multi-polar scenarios), and (b.) completely central ownership (by a beneficial group of very conscientious philosopher-AI-researchers) of intent-aligned AGI. 

Thanks for those links and this reply.


for a sufficiently powerful AI trained in the current paradigm, there is no goal that it could faithfully pursue without collapsing into power seeking, reward hacking, and other instrumental goals leading to x-risk

I don't see how this is a counterargument to this post's main claim:

P(misalignment x-risk | intent-aligned AGI) >> P(misalignment x-risk | societally-aligned AGI). 

That problem of the collapse of a human provided goal into AGI power-seeking seems to apply just as much to the problem of intent alignment as it does to societal alignment; it could apply even more because the goals provided would be (a) far less comprehensive, and (b) much less carefully crafted.



Personally I think there's plenty of x-risk from intent aligned systems and people should think about what we do once we have intent alignment.

I agree with this. My point is not that we should not think about the risks of intent alignment, but rather that (if the arguments in this post are valid): AGI-capabilities-advancing-technical-research that actively pushes us closer to developing intent-aligned AGI is a net negative because it could cause us to develop intent-aligned AGIs that would cause an increase in x-risk because AGIs aligned to multiple humans that have conflicting intentions can lead to out-of-control conflicts; and if we first solve intent alignment before solving societal alignment, humans with intent-aligned AGIs are likely to be incentivized to inhibit the development and roll-out of societal AGI-alignment techniques because they would be giving up significant power. Furthermore, humans with intent-aligned AIs would suddenly have significantly more power, and their advantages over others would likely compound, worsening the above issues.

Most current technical AI alignment research is AGI-capabilities-advancing-research that actively pushes us closer to developing intent-aligned AGI, with the (usually implicit, sometimes explicit) assumption is that solving intent alignment will help subsequently solve societal-AGI alignment. But this would only be the case if all the humans that had access to intent-aligned AGI had the same intentions (and did not have any major conflicts between them); and that is highly unlikely. 

It's definitely not the case that:

all of our intents have some implied "...and do so without disrupting social order."

There are many human intents that want to disrupt social order, and more generally cause things that are negative for other humans. 

And that is one of the key issues with intent alignment.

Relatedly, Cullen O'Keefe has a very useful discussion of distinctions between intent alignment and law-following AI here:

We can see that, on its face, intent alignment does not entail law-following. A key crux of this sequence, to be defended in subsequent posts, is that this gap between intent alignment and law-following is:

  1. Bad in expectation for the long-term future.
  2. Easier to bridge than the gap between intent alignment and deeper alignment with moral truth.
  3. Therefore worth addressing.

As a follow-up here, to expand on this a little more:

If we do not yet have sufficient AI safety solutions, advancing general AI capabilities may not be desirable because it leads to further deployment of AI and to bringing AI closer to transformative levels. If new model architectures or training techniques were not going to be developed by other research groups within a similar timeframe, then that increases AI capabilities. The specific capabilities developed for Law-Informed AGI purposes may be orthogonal to developments that contribute toward general AGI work. Technical developments achieved for the purposes of AI understanding law better that were not going to be developed by other research groups within a similar timeframe anyway are likely not material contributors to accelerating timelines for the global development of transformative AI. 

However, this is an important consideration for any technical AI research – it's hard to rule out AI research contributing in at least some small way to advancing capabilities – so it is more a matter of degree and the tradeoffs of the positive safety benefits of the research with the negative of the timeline acceleration.

Teaching AI to better understand the preferences of an individual human (or small group of humans), e.g. RLHF, likely leads to additional capabilities advancements faster and to the type of capabilities that are associated with power-seeking of one entity (human, group of humans, or AI), relative to teaching AI to better understand public law and societal values as expressed through legal data. Much of the work on making AI understand law is data engineering work, e.g., generating labeled court opinion data that can be employed in evaluating the consistency of agent behavior with particular legal standards. This type of work does not cause AGI timeline acceleration as much as work on model architectures or compute scaling.

Is there no room for ethics outside of the law? It is not illegal to tell a lie or make a child cry, but AI should understand that those actions conflict with human preferences. Work on imbuing ethical understanding in AI systems therefore seems valuable. 


There is definitely room for ethics outside of the law. When increasingly autonomous systems are navigating the world, it is important for AI to attempt to understand (or at least try to predict) moral judgements of humans encountered. 

However, imbuing an understanding of an ethical framework for an AI to implement is more of a human-AI alignment solution, rather than a society-AI alignment solution

The alignment problem is most often described (usually implicitly) with respect to the alignment of one AI system with one human, or a small subset of humans. It is more challenging to expand the scope of the AI’s analysis beyond a small set of humans and ascribe societal value to action-state pairs. Society-AI alignment requires us to move beyond "private contracts" between a human and her AI system and into the realm of public law to explicitly address inter-agent conflicts and policies designed to ameliorate externalities and solve massively multi-agent coordination and cooperation dilemmas. 

We can use ethics to better align AI with its human principal by imbuing the ethical framework that the human principal chooses into the AI. But choosing one out of the infinite possible ethical theories (or an ensemble of theories) and "uploading" that into an AI does not work for a society-AI alignment solution because we have no means of deciding -- across all the humans that will be affected by the resolution of the inter-agent conflicts and the externality reduction actions taken -- which ethical framework to imbue in the AI. When attempting to align multiple humans with one or more AI system, we would need the equivalent of an elected "council on AI ethics" where every affected human is bought in and will respect the outcome. 

In sum, imbuing an understanding of an ethical framework for an AI should definitely be pursued as part of human-AI alignment, but it is not an even remotely practical possibility for society-AI alignment.

law provides a relatively nuanced picture of the values we should give to AI. A simpler answer to the question of "what should the AI's values be?" would be "aligned with the person who's using it", known as intent alignment. Intent alignment is an important problem on its own, but does not entirely solve the problem. Law is particularly better than ideas like Coherent Extrapolated Volition, which attempt to reinvent morality in order to define the goals of an AI. 


The law-informed AI framework sees intent alignment as (1.) something that private law methods can help with, and (2.) something that does not solve, and in some ways probably exacerbates (if we do not also tackle externalities concurrently), societal-AI alignment.

  1. One way of describing the deployment of an AI system is that some human principal, P, employs an AI to accomplish a goal, G, specified by P. If we view G as a “contract,” methods for creating and implementing legal contracts – which govern billions of relationships every day – can inform how we align AI with P.  Contracts memorialize a shared understanding between parties regarding value-action-state tuples. It is not possible to create a complete contingent contract between AI and P because AI’s training process is never comprehensive of every action-state pair (that P may have a value judgment on) that AI will see in the wild once deployed.  Although it is also practically impossible to create complete contracts between humans, contracts still serve as incredibly useful customizable commitment devices to clarify and advance shared goals. (Dylan Hadfield-Menell & Gillian Hadfield, Incomplete Contracting and AI Alignment).
    1. We believe this works mainly because the law has developed mechanisms to facilitate commitment and sustained alignment amongst ambiguity. Gaps within contracts – action-state pairs without a value – are often filled by the invocation of frequently employed standards (e.g., “material” and “reasonable”). These standards could be used as modular (pre-trained model) building blocks across AI systems. Rather than viewing contracts from the perspective of a traditional participant, e.g., a counterparty or judge, AI could view contracts (and their creation, implementation, evolution, and enforcement) as (model inductive biases and data) guides to navigating webs of inter-agent  obligations. 
  2. If (1.) works to increase the intent alignment of one AI system to one human (or a small group of humans), we will have a more useful and locally reliable system. But this likely decreases the expected global reliability and safety of the system as it interacts with the broader world, e.g., by increasing the risk of the system maximizing the welfare of a small group of powerful people. There are many more objectives (outside of individual or group goals) and many more humans that should be considered. As AI advances, we need to simultaneously address the human/intent-alignment and society AI alignment problems. Some humans would “contract” with an AI (e.g., by providing instructions to the AI or from the AI learning the humans’ preferences/intents) to harm others.  Further, humans have (often, inconsistent and time-varying) preferences about the behavior of other humans (especially behaviors with negative externalities) and states of the world more broadly.  Moving beyond the problem of intent alignment with a single human, aligning AI with society is considerably more difficult,  but it is necessary as AI deployment has broad effects. Much of the technical AI alignment research is still focused on the solipsistic “single-single” problem of single human and a single AI.  The pluralistic dilemmas stemming from “single-multi” (a single human and multiple AIs) and especially “multi-single” (multiple humans and a single AI ) and “multi-multi” situations are critical (Andrew Critch & David Krueger, AI Research Considerations for Human Existential Safety). When attempting to align multiple humans with one or more AI system, we need overlapping and sustained endorsements of AI behaviors,  but there is no consensus social choice mechanism to aggregate preferences and values across humans or time.  Eliciting and synthesizing human values systematically is an unsolved problem that philosophers and economists have labored on for millennia. Hence, the need for public law here.

Thank you for this detailed feedback. I'll go through the rest of your comments/questions in additional comment replies. To start:

What kinds of work do you want to see? Common legal tasks include contract review, legal judgment prediction, and passing questions on the bar exam, but those aren't necessarily the most important tasks. Could you propose a benchmark for the field of Legal AI that would help align AGI?

Given that progress in AI capabilities research is driven, in large part, by shared benchmarks that thousands of researchers globally use to guide their experiments, understand as a community whether certain model and data advancements are improving AI capabilities, and compare results across research groups, we should aim for the same phenomena in Legal AI understanding.  Optimizing benchmarks are one of the primary “objective functions” of the overall global AI capabilities research apparatus.  

But, as quantitative lodestars, benchmarks also create perverse incentives to build AI systems that optimize for benchmark performance at the expense of true generalization and intelligence (Goodhart’s Law). Many AI benchmark datasets have a significant number of errors, which suggests that, in some cases, machine learning models have, more than widely recognized, failed to actually learn generalizable skills and abstract concepts. There are spurious cues within benchmark data structures that, once removed, significantly drop model performance, demonstrating that models are often learning patterns that do not generalize outside of the closed world of the benchmark data.  Many benchmarks, especially in natural language processing, have become saturated not because the models are super-human but because the benchmarks are not truly assessing their skills to operate in real-world scenarios.  This is not to say that AI capabilities have made incredible advancements over the past 10 years (and especially since 2017). The point is just that benchmarking AI capabilities is difficult.  

Benchmarking AI alignment likely has the same issues, but compounded by significantly vaguer problem definitions. There is also far less research on AI alignment benchmarks. Performing well on societal alignment is more difficult than performing well on task capabilities.  Because alignment is so fundamentally hard, the sky should be the limit on the difficulty of alignment benchmarks.  Legal-informatics-based benchmarks could serve as AI alignment benchmarks for the research community. Current machine learning models perform poorly on legal understanding tasks such as statutory reasoning (Nils Holzenberger, Andrew Blair-Stanek & Benjamin Van Durme, A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering (2020); Nils Holzenberger & Benjamin Van Durme, Factoring Statutory Reasoning as Language Understanding Challenges (2021)), professional law (Dan Hendrycks et al., Measuring Massive Multitask Language Understanding, arXiv:2009.03300 (2020)), and legal discovery (Eugene Yang et al., Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review, in Advances in Information Retrieval: 44th European Conference on IR Research, 502–517 (2022)). There is significant room for improvement on legal language processing tasks (Ilias Chalkidis et al., LexGLUE: A Benchmark Dataset for Legal Language Understanding in English, in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (2022); D. Jain, M.D. Borah & A. Biswas, Summarization of legal documents: where are we now and the way forward, Comput. Sci. Rev. 40, 100388 (2021)).  An example benchmark that could be used as part of the alignment benchmarks is Law Search (Faraz Dadgostari et al., Modeling Law Search as Prediction, A.I. & L. 29.1, 3-34 (2021) at 3 (“In any given matter, before legal reasoning can take place, the reasoning agent must first engage in a task of “law search” to identify the legal knowledge—cases, statutes, or regulations—that bear on the questions being addressed.”); Michael A. Livermore & Daniel N. Rockmore, The Law Search Turing Test, in Law as Data: Computation, Text, and the Future of Legal Analysis (2019) at 443-452; Michael A. Livermore et al., Law Search in the Age of the Algorithm, Mich. St. L. Rev. 1183 (2020)). 

We have just received a couple small grants specifically to begin to build additional legal understanding benchmarks for LLMs, starting with legal standards. I will share more on this shortly and would invite anyone interested in partnering on this to reach out!

Load More