I recently read "Agent foundations: not really math, not really science" by Alex_Altair. I am continually building my awareness of who is out there doing what, so I had not previously heard of Alex or Dovetail research. From their webpage:
Dovetail is a research group whose mission is to help humanity safely navigate the creation of powerful AI systems through foundational mathematics research that gives us an understanding of the nature of AI agents.
I really like this mission statement. I share the sentiment that AI technology is important enough that we should be trying to understand it in really deep and mathematical ways, not merely following empirical gradients towards capable systems like so much of AI ML research seems to be doing.
So, in their post, Alex seems to be prompting for exploration of what Agent Foundations (AF) is, especially in the context of math, science, and possibly philosophy. From the discussion in the comments it seems this isn't well settled, so although I do not consider myself an expert in math, science, philosophy, or communication, I am trying to become one, and I have something of an answer that can at least help frame discussion if nothing else.
I'll first describe my view of philosophy, math, and science and then work towards explaining where AF fits in and finally offer a potential solution to the problem that AF poses.
Imo, the focus of philosophy is to build reason and knowledge from the ground up, so within philosophy we have critical thinking, logic, phenomenology, epistemology, and other exotic topics I do not feel qualified to discern from nonsense. Two particularly relevant and important philosophical topics are inductive and deductive reasoning.
Induction is basically noticing trends and supposing that those trends might continue. It is built into human reasoning at a very low level, but there is great nuance to be explored and made explicit. As an example of induction, it is inductive reasoning to see many white swans and conclude that swans are usually, or even always, white.
I view statistics and probability as being the spiritual successor to induction. Induction done well.
As an aside, this is a great time to tell you my favourite epistemology joke: There is a cult of people who believe that something happening in the past makes it less likely to happen in the future. When asked why they have this ridiculous belief in spite of overwhelming evidence to the contrary, the response is "Well it's never worked before!"
(I apologize, but I'll provide an explanation for those of you who didn't laugh... I dislike the ambiguity of jokes. There's an idiom "it's always worked before", often used with relation to Black Swans. The cultist, reasoning from within the worldview where things happening makes them less likely to continue concludes that the trend of trends continuing is therefore unlikely to continue and uses a reversed version of the idiom. The thing that is funny to me is both the switching of the idiom and that the cultists view is internally consistent while simultaneously being obviously ridiculous.)
Sorry, I'll get back to the main thread now.
Deduction is different from induction. Deduction does not look out at the external world to determine something, but instead looks at "models", particularly logical models with rules for valid deduction. For example, suppose I have a 20' rope and a 30' rope and I want to know if they will span a 40' distance once I tie them together. I assume everyone reading will almost instantly notice that 20+30=50>40, so the ropes will span the distance. The reasoning that has been trained into you to the point of instinct is deductive reasoning, using rules like addition, equivalence, and comparison.
Other examples of deduction include things like "Dude is mortal. Mortals die if they consume poison. Dude consumed poison. Therefore dude will die." This style of propositional logic is closer to how mathematical truths are proven, but is further from everyday deductive reasoning.
I think mathematics is the spiritual successor to deduction. Deduction done well.
Ok, so I've talked about philosophy as a foundation, and mathematics as a truly impressive extension to deductive reasoning. Where does science fit? Science, as the process of forming and testing hypotheses, is a fancy form of induction that also makes use of deduction. The practitioner of the scientific method observes the object of their curiosity and forms hypotheses, usually these are mathematical models that use the deductive rules of math to map some observations onto some predictions. The models can then be falsified by using them to generate predictions and seeing if those predictions are accurate or not.
There is obviously a lot more detail to science, but this serves as a grounding context.
The math that is used to build models for making predictions is drawn from the large body of mathematic theory allowing scientific models to make much more impressive deductive predictions than would otherwise be possible.
Mathematical theory is itself built up using logical, deductive reasoning. Theorems are proven by agreed upon rules of logic which are themselves subject to logical scrutiny.
This involves a great deal of work, and so when building theory mathematicians want to use as few foundational definitions and assumptions as possible as a core and build everything else from that core using valid logic. They also don't want the theory to be useful only to one domain, but to all domains where it can be applied. This means all math is abstract. We do not speak of counting coins or apples or cars, but of counting numbers which may represent any object with the quantitative properties of the set of numbers we are using.
In the terminology of math, these foundational definitions are called "axioms". Definitions are then created to more easily refer to structures built out of the axioms. One valuable goal mathematicians work towards is proving the equivalency of different definitions. I think much of this work is like building interfaces that allow people to make use of the mathematical structures. In this way, many different ways of thinking about a mathematical structure can be connected allowing it's application in diverse contexts.
So how do people go about finding the correct axioms and definitions? The ones that will be useful models for thinking about the kinds of things they are interested in thinking about? In my own understanding, the process is science, or at least something similar.
For the usual conception of science, it is not something that can be carried out in ones imagination, but rather, the real world must be examined. This focus on concrete details gathers solid data and produces solid results, but it is slow and expensive to run experiments in the real world. Additionally, many mathematical inquiries relate to logical structures, which do not exist in the real world, but only exist within minds.
So when doing science to find the axioms and definitions, researchers conduct thought experiments. These are fast and cheap enough to be able to explore the very large and complicated space of possibilities, and experiments can be performed on abstract deductive structures, which is impossible outside of a mind.
Incidentally, this is the same as how I suspect most scientists approach hypothesis formation. Only those with legendary intuition may have workable hypotheses appear fully formed in their minds. For the rest of us, we must consider and refine possibilities before arriving at something reasonable enough to be worth performing experiments on in the real world.
I think this is the process that AF is engaged in. Trying to figure out models and definitions to use for thinking about the problems we are trying to think about.
I think sometimes people doing thought experiment work encounter people who are doing real experiment work, and the people doing real experiments will sometimes scoff at the work being done in the mind. "Imagine how much easier my job would be if I could just think about what the answer ought to be rather than checking if it's true or not" they say. Is focusing on thought experiments really justified?
I think it is, both because it is the only way to perform experiments quickly and abstractly enough to develop definitions and logical models in very large possibility spaces, and also because human minds are trained by experience and schooling to approximate the real world.
This does mean, of course, that one person performing thought experiments is not the same as another person doing so. It is good to have different people with different perspectives, but it is also possible for some people to be better at performing thought experiments than others. What skills are valuable for performing thought experiments well?
I suggest: Intuition, experience, deduction, examples, details, and targeting.
In my view, intuition is like what a feed forward neural net is doing. It is the ability to train your mind to jump directly from some antecedent to it's consequences. Intuition is trained by experience, and so experience is very valuable, but there also seems to be some mysterious luck, some people can more frequently jump to conclusions which are more likely to be correct. ( How lucky it must be to have a good model architecture and well tuned hyperparameters. )
In addition to intuition from experience, it is good to have solid understanding of deduction with logical rules for reasoning about whatever type of thing you are trying to reason about in whatever thought experiments you are performing. This can be good for verifying intuition, and for making deductive leaps which would not possible by intuition alone.
Another good way to verify intuitive and deductive thoughts is to have collected examples and details pertaining to your focus. This is related to experience, memory, and organizational record keeping skills. The models you are trying to create need to fit the details of relevant examples.
Finally, when performing thought experiments, it is valuable to have a good sense of why you are performing them. Sometimes aimless daydreaming can be fruitful, and I certainly find it enjoyable, but, for the purpose of finding valuable results for a specific topic, one must have a good sense of why they are experimenting in their mind, what they are looking for, and they must be able to keep their mind exploring in the direction of their target.
So what of the people exploring AF? My sense is that they are generally experienced and skilled in the ways described. I have the impression that a lack of real world ML experience is sometimes suggested as an issue. I think this can be justified, especially if they have no alternative experience, however, it is my view that ML is not the object of interest of AF. Surely it is relevant, and if nobody working on AF had ML experience, I would see that as a significant problem, but as it stands I feel there are more issues with deduction and targeting than there are with experience.
The target of AF exploration should be driven by questions like:
You have probably heard that there is an argument that Artificial Super-Intelligence (ASI) could bring about human extinction, and that human extinction is very bad, and therefore we should avoid ASI. There's a lot of hidden detail in the phrase "could bring about". Let's explore that.
I feel that one massively neglected detail in most peoples understanding of the argument is that it must rely on deductive rather than inductive reasoning. Once a system becomes an ASI and is out of our control, we cannot learn from what we did and try again. We know this from deductive reasoning about what is meant by ASI. This means we cannot perform real world experiments on ASI. This is a very bad situation because real world experiments are the gold standard for true understanding. Humanity has always relied on trial and error for making progress. To get a complicated new technology correct on the first attempt is an almost unprecedented hurdle.
The lack of access to real world experiments means we must rely only on deductive reasoning extrapolating from models built only with induction from phenomena other than ASI. And so it cannot be by empirical testing that we will show that ASI is possible, or how close we are to it, or that it is safe or unsafe. The discussion of ASI is explicitly the domain of deductive reasoning.
People may notice that those predicting trouble with ASI are relying on deductive reasoning and that deductive reasoning isn't nearly as reliable as empiricism, and that may be true, but all discussion of ASI is based on deductive reasoning, and deductive reasoning is the only reasoning safe to use to explore and develop ASI. If we can show that some aspect of ASI must behave like another phenomena that we can study empirically, then that allows us valuable use of empiricism, but showing such an equivalency must be done with logical deduction.
This is why it is so important to develop solid deductive models around AF, because for the first time ever, we must get our models correct using only deduction.
So with all that in mind, AF is Paradigmization. It is the process of using thought experiments to try to get us closer to a model that will allow us to reason about ASI. It is the process of creating a paradigm.
We do not currently have any paradigm that allows us reason about ASI. "Intelligence" is not something we have a deductive model of. Certainly not any model with the mathematical rigour required to correctly reason about AI, especially as it becomes "Super" AI, something that has never existed and we only suspect could exist through deductive speculation.
So, since ASI isn't a finished model of what we're trying to focus on, "intelligence" or "agents" may not be the correct object to focus on, and indeed I think they are not. There are many relevant fields of study from which to draw inspiration, many of which I believe will be described by the more general model we are seeking, but I don't think any of them are already that model.
In particular, there are several models of agents I'd like to point out because I think they are not the correct paradigm:
Rather, I have a candidate to suggest which might be the model AF is seeking. The paradigm I currently favour is the model of "Outcome Influencing Systems (OISs)" which I have been developing. That is to say, systems that influence what outcome comes to pass.
For a more in depth introduction, look for my upcoming explainer post, but for now I will briefly say that the concern with agents not really that they will be too smart or too capable of maximizing utility, but that they will influence our world to outcomes that are contrary to our own preferences. That is the core issue and so it makes sense to me that it be central in defining our model.