If your plan for making AI go well requires figuring out once-and-for-all the correct ontology and meta-ethics in the next year, that plan is approximately hopeless, and you should switch to lobbying, which is merely very unlikely to work, rather than predictably guaranteed not to.
To be fair, for other alignment plans that don't include the "solve ontology and ethics" step, it's not because they've figured out how to avoid needing to solve those problems. It's just because they're pretending those problems don't exist. Which is even more hopeless.
I don’t think that’s true at all. The point is not to solve ontology and ethics, but to design an AGI that is safe even though your understanding of ontology and (your own) ethics is and likely always will be wrong / shifting. Many agendas attempt to do this.
Could you point me at an agenda you think is a good example? i.e. it lays out an agenda that, if solved, would make AI safe even with minimal understanding of ontology and ethics. Specifically an agenda that isn't super heavy on corrigibility because I see how corrigibility can avoid having to solve ethics, but my impression is that most agendas aren't aiming at strong corrigibility.
If you think that corrigibility is an answer to your question, why rule it out? My favorite approaches have that flavor, such as the work of Michael Cohen on “unambitious” versions of AIXI.
The natural abstractions agenda is another example, which directly grapples with ontological crisis.
Disagree, I think "figuring out once-and-for-all the correct ontology and meta-ethics in the next year" is a much better plan than lobbying, because after a year you'll probably have made interesting conceptual progress, which you can then build on later.
I agree it's a terrible plan conditional on nothing you doing after the next year mattering, but it seems to me like Mitchell is making two mistakes that kinda cancel out, and you're telling him to get rid of one mistake in a way which makes the overall plan worse.
I see you assuming that doing lobbying for a year gives you nothing that you can build on. I don't agree. If you do lobbying for a year you will at least get better at lobbying.
Let me belatedly express thanks for the support.
About this:
I agree it's a terrible plan conditional on nothing you doing after the next year mattering
I take this to mean that, if frontier AI (e.g. perhaps scaffolded Mythos) really was going to take over the world later this year, it would be better to work on a global pause on frontier AI, than to try to solve CEV-level alignment.
This is like choosing which miracle to aim at - the political miracle of halting the worldwide capabilities race in AI, or the scientific miracle of resolving all the safety issues in time? I don't know which is objectively more likely to be achieved within eight months starting from today. But the second feels more thinkable to me, partly because I'm simply more familiar with the technical world and its intellectual crown jewels, the big sweeping theories and frameworks which are our best evidence that the human mind can achieve breakthrough solutions to fundamental problems. Such breakthroughs typically have a big distinct idea at their heart, which then has massive ramifications. To have our scientific miracle, above all, we need the right ideas.
As for the political miracle... the volatility of the political and geopolitical world is such that I do not absolutely rule out a surprise last-minute coalition that stops AI progress. But it would take a remarkably potent political idea to overcome the wealth-seeking and power-seeking forces that are driving the race.
Like I said, I think you're making two mistakes that cancel out, so I don't want to try to argue you out of the second mistake. I think the things you're focusing on are important questions, which I'm also working on myself. I will have some posts coming out explaining my perspective on them soon; in the meantime, the best summary I have is this post.
The main thing I want to point at is that "suppose this is the final year before humanity loses control to AI. What should I do, where should I focus?" is just a bizarre starting point. I expect that if you carefully scrutinize the reasons why you are making your research plan contingent on that supposition, you will find that they are significantly confused.
For example, some people (especially EAs) implicitly reason "there's a 10% chance of AGI takeover by year X. But a 10% x-risk is really bad! Therefore I should focus my efforts on preventing AGI takeover by year X." This logic clearly doesn't stand up even on its own terms. I don't think you're making quite that mistake but probably something in the same broad family.
(Probably won't reply further, since I'm working on some posts that analyze these kinds of mistakes more generally, which seems more productive.)
Having the right simple idea (Darwin, Turing) can be the kernel of everything. The correct ontology could be something like, the correct understanding of entanglement in quantum gravity, plus a precisely stated panprotopsychism that implies consciousness at the human level. The correct meta-ethics may be one of the known proposals (example by a CEV theorist), grounded in the correct ontology's account of intentionality.
I mention these concrete proposals, not out of commitment to their correctness, but as examples of what the kernel of an answer could look like.
I think that if you're trying to give your AI ethics at the outset or do something like writing down a CEV utility function, something's already gone deeply wrong[1], in the same way that capabilities-wise you shouldn't be hardcoding quantum mechanics into the AI. It's supposed to be superintelligent - it should learn what you/humans care about. The thing you need to figure out is how to get the AI to care about the thing that humans care about, which it then tries to learn and then optimize for.
Under this model, most of the difficulty would also be present in trying to get an AI that cares about the thing that some agent cares about, e.g. some aliens, or possibly chimpanzees and other animals, to the extent they have goals.
(in fact I think that somewhere from half to most of the problem is getting an AI that cares about any not super easy goal about the real world at all, like the classic "maximize the number of diamond atoms". If you knew how to actually build a literal paperclip maximizer, I would expect that you've figured out much of alignment)
This includes it being the last resort alternative to other actors doing even dumber stuff. I count corrigibility as an example here - hopefully it's easier, albeit worse.
I've been meaning to ask - in what sense are some states of entangled electrons more objectively different from other states of entangled electrons, than some microstates are objectively different from other microstates when it comes to their function (in the sense of functionalism)?
For the reader who is unfamiliar: This refers to a position I have previously taken with respect to ontology of mind. I will put a version of it here in quotes, for future reference. I apologize in advance for the length and complexity of my discussion below.
I have said: Qualia exist objectively, but functional states are inherently vague when it comes to microphysical details. There are edge cases, there is a sorites problem, which prevent the definition of a functional state from being made microphysically exact for all physical states, in a non-arbitrary way. But it needs to be exact, if it is to be part of a psychophysical bridge law that specifies for each possible physical world, what qualia that world contains.
In a dilemma between computation-based and substrate-based theories of consciousness, I have therefore preferred the latter, in the sense that I prefer a theory of qualia which is based on states and properties whose existence is just as objective.
At the same time, I have also said: Conscious states are complex unities in which numerous qualia are united in some way; perhaps they correspond to entangled states, these being complex in various ways, while also not being tensor products (tensor products being the standard way to construct a mereological sum in quantum mechanics).
But @green_leaf is asking, how are entangled states any more objective than functional states?
This is a valid question because, if you look at things from the perspective of a wavefunction of the universe, everything is entangled with everything else! There is a risk that constructing exact states for subsystems of the universe will once again require arbitrary choices.
But first a few words on distinctness of states and exactness of properties in quantum theory.
First of all, let me emphasize that according to the Copenhagen interpretation of quantum theory, wavefunctions are not "elements of physical reality", they simply codify the knowledge of an observer. The elements of physical reality are the "observables". Schrodinger and Einstein criticized this framework as necessarily an incomplete description of reality.
The most elegant heir to the Copenhagen interpretation is what I'll call the Hartle multiverse model (though Gell-Mann and Omnes worked on it too). This has a wavefunction of the universe, then a set of observables (e.g. field values and/or field momenta at particular space-time locations) whose possible values define a set of possible histories. If these histories all satisfy the technical property of being mutually decoherent, then each history inherits an apriori probability from the universal wavefunction, and you can derive ordinary quantum mechanics from conditional probabilities within this ensemble of possible histories.
This formalism in itself is not yet a full-fledged ontological interpretation of quantum theory. For that, I add the further postulate that these decoherent histories are maximally fine-grained - you can't add any more observables while retaining the decoherence condition. This does not yet single out a unique ensemble - Dowker and Kent pointed out that there's a vast number of choices for the maximally fine-grained observables.
But a few extra postulates might suffice to single out a unique ensemble. Maybe a rule, similar to a cellular automaton rule, that determines the observables. Maybe a principle that the apriori probabilities must all be equal. At this point you'd have a multiverse theory with no ambiguity about what is posited to exist, and no problem of some worlds having a larger probability measure than others.
That's all a digression but I'll return to it later.
Strictly speaking, according to the Copenhagen interpretation, wavefunctions are not fundamental physical entities, they are just epistemic states. However, most quantum physicists talk like defacto wavefunction realists, and any choice of definite values for the observables can be encoded in a wavefunction, the corresponding eigenstate. So I'll talk as a wavefunction realist from now on.
Returning finally to distinctness and exactness... An eigenfunction of an observable is definitely exact. If the observable also has a discrete spectrum of possible values, such as the energies of an electron bound to an atomic nucleus, the eigenfunctions will also be inarguably distinct: the different orbitals in an atom are separated from each other by a quantum jump in the energy.
However, it's the exactness of the state that I was after. I have no problem with a continuum of quantum states being mapped onto a continuum of qualic states. I have a problem with psychophysical mappings which get microphysically vague on the physical side, because if we ask about an edge case, what qualia are present, there's no definite answer. At worst, you could even end up with no definite answer about whether or not a given possible physical world contains a person, a conscious being.
Now let us consider entangled wavefunctions. They give us a whole new set of properties which, in principle, could be part of a psychophysical correspondence between quantum and quale. There are not only the various measures of entanglement, which quantify how much entanglement is present; there are the different forms of multipartite entanglement (e.g. Borromean states, a form of tripartite entanglement analogous to the Borromean rings, no two of which are linked, but which as a trio cannot be separated). I'm not really sure how rich these possibilities are, but they are a novel kind of physical property on which conscious states might supervene.
However, I already mentioned the issue that validates @green_leaf's question: if the universal wavefunction is the ultimate objective description of the physical world, then everything is entangled with everything else. For example, all occurrences of any given species of fermion, such as all electrons, are antisymmetrically entangled with each other. This is implied by the spin-statistics theorem, and this is what implements the Pauli exclusion principle, that keeps the electrons (in atoms and molecules) in their separate orbitals. Wavefunctions describing just a few entangled entities, such as show up in quantum chemistry and quantum computing, are truncations of this universal entanglement, and have no particular claim to objective significance. There is a psychophysical sorites problem, not just for functionalism, but for "wavefunctionalism".
It is possible that dynamics within the universal wavefunction does produce localized temporary examples of complete disentanglement. Maybe a natural mereology could be built on this. But otherwise, my only counter-proposal would be a version of the maximally fine-grained Hartle multiverse which, to my knowledge, has never been investigated: one in which the observables, the elements of physical reality, are "multipartite" in some way. Since in fundamental physics we deal with quantum fields, I think the logical candidates are observables associated with extended objects, like "Wilson loops" and "surface operators". Interestingly, Lee Smolin worked both on a version of loop quantum gravity in which the physical states are eigenfunctions of gravitational WIlson loops, and on a version of "quantum causal histories" which might be sufficiently general to allow for a Hartle multiverse with multipartite observables. It would be interesting to implement something like these in a well-explored modern framework like AdS/CFT.
If something like this turns out to be viable, not just as physics but as psychophysics, then functionalism's emphasis on causality and representation will still be relevant! It's just that to produce specific conscious states, casual structure alone would not be enough, the substrate would need to be these fundamental extended observables, and not virtual state machines running at a more coarse-grained level of description.
Okay, so I want to make a prediction. My prediction is based on this. Humans are just things made of stuff and that stuff is governed by laws of nature. Where does ethics come from? From biological limitations and interactions with environment. What we call ethics as a philosophy is mix of good writing and looking for patterns in a complex thing and building on them. The figuring out ethics thing is kind of like figuring out tarot. What is the decision process? Not explainable in simple words. There is just no low level explanation for it. Brain is super complex.
Are you assuming there exists some kind of true and on ethics or is it all subjective? Or is it one of the things you want to research?
I think ethics is an emergent rational order imposed upon impulses that arise from expectations of pain and pleasure, emotions like empathy and moral disgust. and perhaps other sources. It will be hard to know whether there is one true ethics as the natural convergent ideal, without first understanding morality as a phenomenon of consciousness, including how it relates to non-moral factors in human decision making.
I think Schopenhauer or Nietzsche at one point lists saint, artist, and scholar as three distinct human ideals. Note that only the saint is governed by morality; the artist is ruled by aesthetics, and the scholar, we could say, by epistemology. I do suspect that moral, aesthetic, and epistemic normativity are all distinct factors in human psychology (and this is not meant to be an exhaustive list), and that understanding the "human decision procedure" requires identifying their psychological and ontological roots (e.g. how they emerge in an ontology like @algekalipso's "process-topological monads"), and that this kind of understanding is necessary to obtain the correct metaethics, and to carry out a process like Coherent Extrapolated Volition.
Also, there is probably a different decision-theoretic ontology for entities like non-conscious AIs, in which the basic cause and effect of "decision making" doesn't even involve qualia, emotion, or conscious thought.
It will be hard to know whether there is one true ethics as the natural convergent ideal, without first understanding morality as a phenomenon of consciousness, including how it relates to non-moral factors in human decision making.
I don't understand this point. Why do you need to analyze consciousness to understand ethics? I think I'm missing some crucial information here.
Moral feelings are states of consciousness, and moral judgments are acts of consciousness, and these distinctive mental phenomena are the entire basis of morality and ethics. Without them, no one would believe there are such things as moral right and wrong. Examining their nature, how fallible they are, and what they imply about the ontology of the moral, seems to me an essential ingredient in understanding what morality is, and how it relates to the rest of reality. Moral faculties, insofar as they exist, are a specific capability of the conscious mind.
You could try to be a pure materialist and "heterophenomenologist" about this; or you could be a pure moral philosopher who takes the basic structures of subjectivity for granted. But an investigator strictly following the first approach is artificially denying themselves the use of their own first-person knowledge and experience, and everything that can provide; and we are ultimately aiming to make AIs into ideal moral agents, so the second approach also needs something more. I conclude that we need to solve the problems of consciousness in order to really know what we're doing.
Since the start of 2026, I've been thinking, suppose this is the final year before humanity loses control to AI. What should I do, where should I focus? I now have an answer. The plan is to tackle three questions:
What is the correct ontology?
What is the correct ethics?
What are ontology and ethics in an AI?
A few comments about my perspective on these questions...
What is the correct ontology?
The standard scientific answer would be to say that the world consists of fundamental physics and everything made from that. That answer defines a possible research program.
However, we also know that we don't know, how to understand anything to do with consciousness in terms of that framework. This is a huge gap since the entirety of our experience occurs within consciousness.
This suggests that in addition to (1) the purely physics-based research program, we also need (2) a program to understand the entirety of experience as conscious experience, and (3) research programs that take the fuzzy existing ideas about how consciousness and the physical world are related, and develop them rigorously and in a way that incorporates the whole of (1) and (2).
In addition to these, I see a need for a fourth research program which I'll just call (4) philosophical metaphysics. Metaphysics in philosophy covers topics like, what is existence, what is causality, what are properties, what are numbers - and so on. Some of these questions also arise within the first three ontological research programs, but it's not yet clear how it will all fit together, so metaphysics gets its own stream for now.
What is the correct ethics?
In terms of AI, this is meant to bear upon the part of alignment where we ask questions like, what should the AI be aligned with, what should its values be?
But I'm not even sure that "ethics" is exactly the right framework. I could say that ethics is about decision-making that involves "the Good", but is that the only dimension of decision-making that we need to care about? Classically in philosophy, in addition to the Good, people might also talk about the True and even the Beautiful. Could it be that a correct theory of human decision-making would say that there are multiple kinds of norms behind our decisions, and it's a mistake to reduce it all to ethics?
This is a bit like saying that we need to know the right metaethics as well as the right ethics. Perhaps we could boil it down to these two questions, which define two ethical research programs:
(1) What is the correct ontology of human decision-making?
(2) Based on (1), what is the ideal to which AI should be aligned?
What are ontology and ethics in an AI?
My assumption is that humanity will lose control of the world to some superintelligent decision-making system - it might be an AI, it might be an infrastructure of AIs. The purpose of this 2026 research agenda, is to increase the chances that this superintelligence will be human-friendly, or that it will be governed by the values that it should be governed by.
Public progress in the research programs above, has a chance of reaching the architects of superintelligence (i.e. everyone working on frontier AI) and informing their thinking and their design choices. However, it's no good if we manage to identify the right ontology and the right ethics, but don't know how to impart them to an AI. Knowing how to do so is the purpose of this third and final leg of the 2026 research agenda.
We could say that there are three AI research programs here:
(1) Understand the current and forthcoming frontier AI architectures (both single agent and multi-agent)
(2) Understand in terms of their architecture, what the ontology of such an AI would be
(3) Understand in terms of their architecture, what the ethics or decision process of such an AI would be
Final comments
Of course, this research plan is provisional. For example, if epistemology proved to require top-level attention, a fourth leg might have to be added to the agenda, built around the question "What is the correct epistemology?"
It is also potentially vast. Fortunately, in places it significantly overlaps with major recognized branches of human knowledge. One hopes that specific important new questions will emerge as the plan is refined.
A rather specific, but very timely question, is how a human-AI hivemind like Moltbook could contribute to a broad fundamental research program like this. I expect that the next few months will provide some answers to that question.