Currently, AGI is mostly being developed by human engineers and scientists within human social systems. [...] There are far fewer literature professors, historians, anthropologists, creatives, social workers, landscape design architects, restaurant workers, farmers, etc., who are intimately involved in creating AGI. This isn’t surprising or illogical, but if AI is likely to be useful to “everyone” in some way (à la radio, computers), then “everyone” probably needs to be involved.
This concern seems somewhat misdirected.
There weren't a lot of landscape design architects or farmers involved in the development of radio or computers. It was done by engineers, product managers, technology hobbyists, research scientists, logicians, etc.; along with economic demand from commerce, military, and other users capable of specifying their needs and paying the engineers etc. to do it.
Were landscape architects excluded from developing radio? Did anyone prevent farmers from developing computers? No, they were just busy doing landscape design and farming. Eventually someone built computer systems for architects and farmers to use to get more architecting and farming done.
And then the product managers and sales people made sure that they charged the architects and farmers a butt-ton of money. Downstream of that is why both the farmers and the open-source folks have a problem with John Deere's licensing and enforcement practices; and the architects ain't particularly thrilled by Autodesk's behavior either.
You can't align AGI with the CEV of engineers to the exclusion of other humans, because engineers are not that different from other humans. That's not the problem. But aligning AGI with "number go up" to the exclusion of other human values, that's a problem. Even people who like capitalism don't tend to believe that capitalism is aligned with all human values. That's something to worry about.
This is an important point that I needed to be much clearer about -- thank you. I'll try to be more explicit:
First, AGI is not the same as tech historically, where you're making tools and solving for PMF. AGI is distinct, and my radio/computers analogy muddled this point. Radios didn't inherit the worldviews of Marconi etc., transistors didn't generalize the moral intuitions of the Bell Labs engineers. Basically, these tools weren't absorbing and learning values, so where they solved for PMF, AGI is solving for alignment. AGI learns behavioral priors directly from human judgments (RLHF, reward modeling, etc.) and internalizes/represents the structures of the data and norms it's trained on. It forms generalizations about concepts like "harm" or "helpfulness" or "fairness" and so forth from those inputs and scales and deploys those generalizations to users, domains, cultures that are way beyond those of its creators'.
So the early value inputs (who provides them, what perspectives the represent, what blind spots they have, etc.) aren't incidental. And when they become the systems' default behavior it could be pretty hard to unwind or reshape later for better "PMF"/alignment to new demographics. So absolutely, by using a list of professions to make this point, I definitely minimized the issue and made it feel like a capitalistic / labor force-oriented problem. My deeper point is that there are a lot of people who don't share Silicon Valley's belief systems and cultural norms who will be pretty significantly impacted by alignment decisions in AGI.
Diversification really matters because of this. But it's not at all about having a bunch of farmers learn to code and build transformers -- that's silly. It's about preventing value lock-in from a narrow slice of people involved in the early decision-making process about what actually matters in alignment.
Currently, alignment (in the US) involves small RLHF annotator groups, safety team preferences from a concentrated selection of tech hubs, SV product-grounded assumptions about what "harm" means and what's an "acceptable risk" or "good behavior"... Engineers are highly educated and often decently wealthy, with their own cultural norms and value systems of what matters (i.e., what's "good" or "cool" or "interesting" and what's not). This isn't a bad thing, and this is absolutely not a criticism of engineers and how important their perspectives are in shaping AGI! My point is just that they only represent a part of the full distribution that would ideally be reflected in AGI alignment -- not bad just narrow.
It's not just about fairness or optics either, it's a direct safety issue as well as a limitation to actually achieving AGI. The people currently building these systems have blind spots, possibly brittle, monoculture assumptions, etc., whereas broader representation would help mitigate those risks and catch unanticipated failure modes when the system interacts with radically different human contexts downstream. That's where I was pointing to the historical precedents... i.e., medicine built around male physiology = persistent harm to women.
And I totally agree with you that capitalism for sure plays a big role here, and aligning to "number go up" is a real risk. It's the context in which AGI is being built. That's part of the problem but not the whole problem. Even if we removed capitalism entirely, you'd still have the safety issue and potentially brittle systems due to narrow input distributions (in terms of the broader, system-level decisions and assumptions). And the context AGI is being built in is actually part of my point too. Today, it's being built within a context driven by capitalistic forces. But don't we want AGI alignment that isn't constrained to working in only one type of socioeconomic system, that could adapt to regime changes in a positive and non-scary way?
So to me it's an "and" -- we should examine and worry about alignment in terms of context (capitalism) and in terms of who decides what matters.
I strongly suspect that representation is achieved by pretraining the systems on large datasets including nearly all worldviews that weren't filtered away by the data selection teams. Issues like the incident with Gemini putting Black people everywhere or Sonnet 4.5's fiction making both couples gay are likely to originate in the pro-diversity bias of SOTA labs and Western training data. For comparison, earlier versions of DeepSeek would align with the language's political views (e.g., when asked in Russian without the web search, they would call the Russian invasion SVO or agree with Russia's anti-drug stance). Later versions have arguably learned to take West-originated scientific results as granted.
Sonnet 4.5's fiction making both couples gay are likely to originate in the pro-diversity bias of SOTA labs and Western training data.
I don't think this is due to a pro-diversity bias, but is simply due to this being extremely popular in easily available stories: https://archiveofourown.org/works/68352911/chapters/176886216 (9 of the top 10 pairings are M/M, each with 40k+ stories; for reference Project Gutenberg only has about 76,000 books total). I think this is due to M/M romance being a superstimulus for female sexuality in a similar way to how lesbian porn is a superstimulus for male sexuality.
The pro-diversity bias' main influence seems to be changing the proportion of stories focused on non-white male/male pairings, as you can see here: https://archiveofourown.org/works/27420499/chapters/68826984
Hmm, I tried and failed to reproduce the effect of Claude writing about gays: asking Claude itself to write a story in Tomas B.'s style had it produce a story with no gay couples. Nor did I manage to elicit this quirk from DeepSeek, the Chinese AI (whom I asked two times), from Grok 4 (who generated a story containing the phrase "Sarah, who'd left me for a guy"), GPT-5.1-thinking (whose story had a male and female character and no gays).
What I don't understand is how the bias was eliminated.
Yes great examples of how training data that supports alignment goals matters. But the model's behaviors are also shaped by RL, SFT, safety filters/inference-time policies, etc., and it will be important to get those right too.
I doubt that it's an alignment issue instead of a governance issue. The resolution of similar problems in the AI-2027-slowdown scenario[1] is that "the language in the Spec is vague, but seems to imply a chain of command that tops out at company leadership", which then makes the authors intervene and try to prevent a power grab with unclear results. In addition, even the AI-2027 authors acknowledge the risks related to the Intelligence Curse in a footnote. Were the Curse to happen, it would lock in a power distribution which doesn't involve most humans ~at all.
The Race Branch has mankind fail to align the AIs with obvious results of genocide or disempowerment.
Agreed, governance failures (unclear chain of command, power grabs, Intelligence Curse) are a huge part of the story that I should've drawn out more. It's a major part of the ideal solution, but I don't think it makes alignment not an issue. To your point, governance basically helps us choose who is allowed to specify goals, and alignment determines how those goals become operational behaviors. If the chain of command in governance is narrow, the value inputs that alignment systems learn from are also narrow -- so governance failures can lead to misaligned AGI. But even within the current governance constructs, I think there's still room for alignment researchers and developers to influence alignment outcomes. Saying it's a governance not alignment question overlooks how things are built. The mechanistic/implementation piece is a lot harder to solve, and I'm not sure what the answer is. Anthropic's Interviewer tool seems like a step in the right direction, in terms of engaging a wider (and directly impacted) audience.
More explicitly: even if governance chooses perfect alignment goals, mechanistic / inner alignment can still embed its builder's blind spots. Systems are still being shaped by the datasets getting chosen, heuristics being encoded, RLHF rubrics being designed, safety evals, shortcuts taken, etc. etc. I kind of doubt "governance" solves this because those folks aren't micromanaging these kinds of decisions. The idea that "governance picks goals and alignment implements them" isn't really cleanly separable in practice. Or, you can take a really broad view on "governance" and say it includes the senior researchers and engineers. The representation problem here remains, it's just a harder problem to solve. Maybe part of the solution involves making AI development concepts a lot easier and more accessible to a wide audience over time, then somehow soliciting more diverse inputs on mechanistic alignment decisions... would be super curious if others have thought about these challenges from an implementation perspective.
Also, the Intelligence Curse doesn't negate the alignment point, it amplifies it as a "now" problem. The danger is that the values being embedded leading up to that point will become self-reinforcing, scaled, locked-in.
Disclaimer: these ideas are not new, just my own way of organizing and elevating of what feels important to pay better attention to in the context of alignment work.
All uses of em dashes are my own! LLMs were occasionally used to improve word choice or clarify expression.
One of the reasons alignment is hard relates to the question of: who should AGI benefit? And also: who benefits (today) from AGI?
An idealist may answer “everyone” for both, but this is problematic, and also not what’s happening currently. If AGI is for everyone, that would include “evil” people who want to use AGI for “bad” things. If it’s “humanity,” whether through some universal utility function or averaged preferences, how does one reach those core metric(s) and account for diversity in culture, thought, morality, etc.? What is lost by achieving alignment around, or distillation to, that common value system?
Many human justice systems punish asocial behaviors like murder or theft, and while there’s general consensus that these things are harmful or undesirable, there are still some who believe it’s acceptable to steal in certain contexts (e.g., Robin Hood), or commit murder (as an act of revenge, in the form of a death penalty / deterrence mechanisms, etc.). Issues like abortion introduce categorical uncertainty, such as the precise moment at which life begins, or what degree of autonomy and individual freedoms should be granted at different stages of development, making terms like “murder” — and its underlying concepts of death and agency — murky.
Furthermore, values shift over time, and can experience full regime-changes across cultures or timelines. For example, for certain parts of history, population growth was seen as largely beneficial, driving labor capacity, economic productivity, innovation and creativity, security and survival, etc. Today, ecological capacity ceilings, resource limits, stress concentrations, and power-law-driven risks like disease or disaster, have increased antinatalism perspectives. If consensus is already seemingly impossible at a given moment, and values shift over time, then both rigidity and plasticity in AGI alignment seem dangerous: how would aligned AGI retain the flexibility to navigate long-horizon generational shifts? How do we ensure adaptability without values-drift in dangerous directions? It seems likely that any given universal utility function would still be constrained to context, or influenced by changes over large time scales; though perhaps this view is overly grounded in known human histories and behaviors, and the possibility still exists that something universal and persistent has yet to be fully conceptualized.
There’s also the central question of who shapes alignment, and how their biases or personal interests might inform decisions. Paraphrasing part of a framework posited by Evan Hubinger, you might have a superintelligent system aligned to traits based on human data/expectations, or to traits this superintelligent system identifies (possibly beyond human conception). If things like pluralism or consensus are deemed insurmountable in a human-informed alignment scenario and we let AGI choose, would its choice actually be a good one? How would we know, or place trust in it?
These philosophical problems have concrete, practical consequences. Currently, AGI is mostly being developed by human engineers and scientists within human social systems.[1] Whether or not transcendent superintelligence that drives its own development eventually emerges, for now, only a small subset of the human population is focused on this effort. These researchers share a fuzzy driving goal to build “intelligent” systems, and a lot of subgoals about how to do that and how deploy them — often related to furthering ML research, or within other closer-in, computational sciences like physics, mathematics, biology, etc.
There are far fewer literature professors, historians, anthropologists, creatives, social workers, landscape design architects, restaurant workers, farmers, etc., who are intimately involved in creating AGI. This isn’t surprising or illogical, but if AI is likely to be useful to “everyone” in some way (à la radio, computers), then “everyone” probably needs to be involved. Of course, there are also entire populations around the world with limited access to the requisite technology (i.e., internet, computers), let alone AI/ML systems. This is a different but still important gap to address.
Many people are being excluded from AGI research (even if largely unintentional). And eventually, intelligent systems will probably play a role in their lives. But what will AI/AGI systems look like by then? Will they be more plastic, or able to learn continuously? Hopefully! But maybe not. What are the opportunity costs of not intentionally including extreme levels of human diversity early on in a task as important as AGI, which might be responsible for the protection, enablement, and/or governance of humanity in meaningful ways in the future? We have seen this story before, for example, in the history of science and medicine in terms of including women in research about women’s health[2][3], and in pretty much any field where a population being analyzed is not involved in the analysis. It usually leads to getting things wrong, which isn’t just a truth-seeking issue — it can be extremely harmful to the people who are supposed to benefit.
What this means for alignment research:
The inner alignment problem asks: how do we ensure AGI does what we want? Outer alignment tries to address: what are we optimizing for?, but can still feel too narrowly examined in current alignment research. Who makes up “we”? What frameworks determine whose “wants” count? And how do we ensure those mechanisms are legitimate given irreducible disagreement about values?
A top-down approach would treat questions of pluralism and plasticity as prerequisites, resolving the second-order problem (alignment to what) before tackling first-order technical alignment (how to do it). But waiting for philosophical consensus before building AGI is both impractical and risky, as development continues regardless.
A parallel approach is necessary. Researchers working through mechanistic alignment can take immediate and intentional (bottoms-up) action to diversify who builds and informs AGI and shapes its goals, while simultaneously pursuing top-down governance and philosophical work on pluralism and legitimacy mechanisms. Mechanistic work should of course be done in conversation with philosophical work, because path dependencies will lock in early choices. If AGI systems crystallize around the values and blind spots of their creators before incorporating global human diversity, they may lack the ability to later accommodate strongly divergent worldviews. Importantly, AGI researchers already agree that solving continuous learning will be a critical piece of this puzzle.
Without resolving second-order alignment issues, we risk building systems that are increasingly powerful, yet aligned to an unexamined, unrepresentative “we”. This could be as dangerous as unaligned AGI. One could even argue that AGI isn’t possible without solving these problems first — that true general intelligence requires the capacity to navigate human pluralism in all its complexity, as well as adapt to new contexts while maintaining beneficent alignment.
Even where AI itself is
increasingly playing a more direct rolein research, it is not yet clear that these systems are meaningfully controlling or influencing progress.https://www.aamc.org/news/why-we-know-so-little-about-women-s-health#:~:text=Throughout%20history%2C%20doctors%20have%20considered,of%20pharmaceuticals%20and%20medical%20devices.
https://magazine.hms.harvard.edu/articles/how-gender-bias-medicine-has-shaped-womens-health#:~:text=The%20problem%20is%20that%20so,have%20a%20more%20equitable%20future.