Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran's group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.
E-mail: {first name}@alter.org.il
Here's a feature proposal.
The problem: At present, when a post has 0 reviews, there is an incentive against writing critical reviews. Writing such a review enables the post to enter the voting phase, which you don't especially want to happen if you think the post is undeserving. This seems perverse: critical reviews are valuable, especially so if someone would write a positive review later, enabling the post to enter voting anyway. (In principle, you can "lie in ambush" until someone writes a positive review and only then write your negative review, but that requires annoying logistics.)
My suggestion: Allow flagging reviews as "critical" in the UI. (One option is to consider a review "critical" whenever your own vote for the post is negative, another is to have a separate checkbox.) Such reviews would not count for enabling the post to enter voting.
This work[1] was the first[2] foray into proving non-trivial regret bounds in the robust (infra-Bayesian) setting. The specific bound I got was later slightly improved in Diffractor's and my later paper. This work studied a variant of linear bandits, due the usual reasons linear models are often studied in learning theory: it is a conveniently simple setting where we actually know how to prove things, even with computationally efficient algorithms. (Although we still don't have a computationally efficient algorithm for the robust version: not because it's very difficult, but (probably) just because nobody got around to solving it.) As such, this work was useful as a toy-model test that infra-Bayesianism doesn't run into statistical intractability issues. As to whether linear-model algorithms or their direct descendants will actually play a role in the ultimate theory of learning, that is still an open question.
An abridged version was also published as a paper in JMLR.
Other than Tian et al, which technically is a robust regret bound, but was not framed by the authors as such (instead, their motivation was studying zero-sum games).
TLDR: This post introduces a novel and interesting game-theoretic solution concept and provides informal arguments for why robust (infra-Bayesian) reinforcement learning algorithms might be expected to produce this solution in the multi-agent setting. As such, it is potentially an important step towards understanding multi-agency.
Disclosure: This review is hardly impartial, since the post was written with my guidance and based on my own work.
Understanding multi-agency is IMO, one of the most confusing and difficult challenges in the construction of a general theory of intelligent agents. I have a lot of uncertainty about what shape the solution should take even in the broadest brushstrokes, as I outlined in my recent five worlds taxonomy[1]. This is in contrast to uni-agency, where Formal Computational Realism (FCR) is, IMO, pretty close to at least nailing down the correct type signature and qualitative nature of the desiderata.
At the same time, understanding multi-agency seems quite important in the context of AI alignment. There are many sorts of multi-agent interactions that are potentially relevant:
This post tells a particular story of how multi-agent theory might look like. In this story, agents converge to a new type of solution concept described in the "stable cycles for multiplayer games" section. (I call this solution "haggling equilibrium"). As opposed to Nash equilibria, the "typical" (but not any) haggling equilibrium in a two-player game is Pareto-efficient. This stands in contrast even to Nash equilibria in repeated games, where Pareto-efficiency is possibly but, due to the folk theorem, very underdetermined.
Moreover, there is an argument that a particular type of robust RL algorithm (robust UCB) would converge to such equilibria under some assumptions. However, the argument is pretty informal and there is not even a rigorous conjecture at present. There are, broadly speaking, two possibilities how the story might be completed:
With either possibility, the hope is that combining such a result with FCR would promote it to applying in more "exotic" contexts as well, such as one-shot games with transparent source code (along the lines of Demski's "logical time").
It is also interesting to study the notion of haggling equilibrium in itself, for example: is there always a Pareto-efficient haggling equilibrium? (True for two players, but I don't know the answer in general.)
To summarize, the ideas in this post are, AFAIK, novel (although somewhat similar ideas appeared in the literature in the guise of "aspiration-based" algorithms in multi-agent RL, see e.g. Crandall and Goodrich 2013) and might be key to understanding multi-agency. However, the jury is still very much out.
In the terminology of those five worlds, I consider Nihiland and Discord to be quite unlikely, but Linguistica, Economica and Harmonia all seem plausible.
I propose a taxonomy of 5 possible worlds for multi-agent theory, inspired by Imagliazzo's 5 possible worlds of complexity theory (and also the Aaronson-Barak 5 worlds of AI):
For simplicity, I'm ignoring what is arguably an "orthogonal" axis: to which extent the "correct" multi-agent theory implies acausal cooperation even under favorable conditions. I believe that, outside of Nihiland and Discordia, it probably does, but the alternative hypothesis is also tenable.
On the border between Linguistica and Economica, there are worlds with strong guarantees for agents of the same type and medium-strength guarantees for agents of different type (where "medium-strength" is still stronger than "achieve maximin payoff": the latter is already guaranteed in infra-Bayesianism). This blurs the boundary, but I would consider this to be Linguistica if even slightly different types have much weaker guarantees (or if there is no useful notion of "slightly different types") and Economica if there is continuous graceful degradation like in Yudkowsky's subjective fairness proposal.
This post discusses an important point: it is impossible to be simultaneously perfectly priorist ("updateless") and learn. Learning requires eventually "passing to" something like a posterior, which is inconsistent with forever maintaining "entanglement" with a counterfactual world. This is somewhat similar to the problem of traps (irreversible transitions): being prudent about risking traps requires relying on your prior, which prevents you from learning every conceivable opportunity.
My own position on this cluster of questions is that you should be priorist/(infra-)Bayesian about physics but postist/learner/frequentist about logic. This idea is formally embodied in the no-regret criterion for Formal Computational Realism. I believe that this no-regret condition implies something like the OP's "Eventual Learning", but formally demonstrating it is future work.
Strictly speaking, there's no result saying you can't represent quantum phenomena by stochastic dynamics (a.k.a. hidden variables). Indeed, e.g. the de Broglie-Bohm interpretation does exactly that. What does exist is Bell's inequality, which implies that it's impossible to represent quantum phenomena by local hidden variables (local = the distribution is the limit of causal graphs in which variables are localized in spacetime and causal connections only run along future-directed timelike (not superluminal) separations). Now, our framework doesn't even fall in the domain of Bell's inequality, since (i) we have supracontributions (in this post called "ultracontributions") instead of ordinary probability distributions (ii) we have multiple co-existing "worlds". AFAIK, Bell-inequality-based arguments against local hidden variables support neither i nor ii. As such, it is conceivable that our interpretation is in some sense "local". On the other hand, I don't know that it's local and have no strong reason to believe it.
The interpretation of quantum mechanics is a philosophical puzzle that was baffling physicists and philosophers for about a century. In my view, this confusion is a symptom of us lacking a rigorous theory of epistemology and metaphysics. At the same time, creating such a theory seems to me like a necessary prerequisite for solving the technical AI alignment problem. Therefore, once we created a candidate theory of metaphysics (Formal Computation Realism (FCR), formerly known as infra-Bayesian Physicalism), the interpretation of quantum mechanics stood out as a powerful test case. In the work presented in this post, we demonstrated that FCR indeed passes this test (at least to a first approximation).
What is so confusing about quantum mechanics? To understand this, let's take a look at a few of the most popular pre-existing interpretations.
The Copenhagen Interpretation (CI) proposes a mathematical rule for computing the probabilities of observation sequences, via postulating the collapse of the wavefunction. For every observation, you can apply the Born Rule to compute the probabilities of different results, and once a result is selected, the wavefunction is "collapsed" by projecting it to the corresponding eigenspace.
CI seems satisfactory to a logical positivist: if all we need from a physical theory is computing the probabilities of observations, we have it. However, this is unsatisfactory for a decision-making agent if the agent's utility function depends on something other than its direct observations. For such an agent, CI offers no well-defined way to compute expected utility. Moreover, while normally decoherence ensures that the observations of all agents are in some sense "consistent", in principle it is theoretically possible to create a situation in which decoherence fails and CI will prescribe contradictory beliefs to different agents (as in the Wigner's friend thought experiment).
In CI, the wavefunction is merely a book-keeping device with no deep meaning of its own. In contrast, the Many Worlds Interpretation (MWI) takes a realist metaphysical stance, postulating that the wavefunction describes the objective physical state of the universe. This, in principle, admits meaningful unobservable quantities on which the values of agents can depend. However, the MWI has no mathematical rule for computing probabilities of observation sequences. If all "worlds" exist at the same time, there's no obvious reason to expect to see one of them rather than another. MWI proponents address this by handwaving into existence some "degree of reality" that some worlds posses more than others. However, the fundamental fact remains that there is no well-defined prescription for probabilities of observation sequences, unless we copy the prescription of CI: however the latter is inconsistent with the intent of MWI in cases when decoherence fails, such as Wigner's friend.
In principle, we can defend an MWI-based decision-theory in which the utility function is a self-adjoint operator on the Hilbert space and we are maximizing its expectation in the usual quantum-mechanical sense. Such a decision-theory can avoid the need for a well-defined probability distribution over observation sequences. However, it would leave us with an "ontological crisis": if our agent did not start out knowing quantum mechanics, how would it translate its values into this quantum mechanical form?[1]
The De Broglie-Bohm Interpretation (DBBI) proposes that in addition to the wavefunction, we should also postulate a classical trajectory following a time-evolution law that depends on the wavefunction. This results in a realist theory with a well-defined distribution over observation sequences. However, it comes with two major issues:
In my view, the real source of all the confusion is the lack of rigorous metaphysics: we didn't know (prior to this line of research), in full generality, what the type signature of a physical theory should be and how should we evaluate such a theory.
Enter Formal Computational Realism (FCR). According to FCR, the fundamental ontology in which all beliefs and values about the world should be expressed is computable logical facts plus the computational information content of the universe. The universe can contain information about computations (e.g., if someone calculated the 700th digit of pi, then the universe contains this information), and fundamentally this information is all there is[2]. Moreover, given an algorithmic description of a physical theory plus the epistemic state of an agent in relation to computable logical facts, it is possible to formally specify the computational information content that this physical theory implies, from the perspective of the agent. The latter operation is called the "bridge transform".
To apply this to quantum mechanics, we need to choose a particular algorithmic description. The choice we settled on is fairly natural: We imagine all possible quantum observables as having marginal distributions that obey the Born rule, with the joint distribution being otherwise completely ambiguous (in the sense that imprecise probability allows distributions to be ambiguous, i.e. we have "Knightian uncertainty" about it). The latter is a natural choice, because quantum mechanics has no prescription for the joint distribution of noncommuting observables. The combined values of all observables is the "state" that the physical theory computes, and the agent's policy is treated as an unknown logical fact on which the computation depends.
Applying the bridge transform to the above operationalization of quantum mechanics, we infer the computational information content of the universe according to quantum mechanics, and then use the latter to extract the probabilities of various agent experiences. What we discover is as follows:
As opposed to most pre-existing interpretations, the resulting formalism has precisely defined decision-theoretic prescriptions for an agent in any "weird" (i.e. not-decohering) situation like e.g. Wigner's friend. This only requires the agent's values to be specified in the FCR ontology (and in particular allows the agent to assign value to their own experiences, in some arbitrary history-dependent way, and/or the experiences of particular other agents).
In conclusion, FCR passed a non-trivial test here. It was not obvious to me that it would: before Gergely figured out the details, I wasn't sure that it's going to work at all. As such, I believe this to be a milestone result. (With some caveats: e.g. it needs to be rechecked for the non-monotonic version of the framework.)
Note that de Blanc's proposal is inapplicable here, since the quantum ontology is not a Markov decision process.
To be clear, this is just a vague informal description, FCR is an actual rigorous mathematical framework.
I'm renaming Infra-Bayesian Physicalism to Formal Computational Realism (FCR), since the latter name is much more in line with the nomenclature in academic philosophy.
AFAICT, the closest pre-existing philosophical views are Ontic Structural Realism (see 1 2) and Floridi's Information Realism. In fact, FCR can be viewed as a rejection of physicalism, since it posits that a physical theory is meaningless unless it's conjoined with beliefs about computable mathematics.
The adjective "formal" is meant to indicate that it's a formal mathematical framework, not just a philosophical position. The previously used adjective "infra-Bayesian" now seems to me potentially confusing: On the one hand, it's true that the framework requires imprecise probability (hence "infra"), on the other hand it's a hybrid of frequentist and Bayesian.
To keep terminology consistent, Physicalist Superimitation should now be called Computational Superimitation (COSI).
I think that the problem is in the way you define the prior. Here is an alternative proposal:
Given a lambda-term , we can interpret it as defining a partial function . This function works by applying to the (appropriately encoded) inputs, beta-reducing, and then interpreting the result as an element of using some reasonable encoding. It's a partial function because the reduction can fail to terminate or the output can violate the expected format.
Given , we define the "corrected" function as follows. (The goal here is to make it monotonic in the last argument, and also ensure that probabilities sum to .) First, we write whenever (i) for all , and (ii) . If there is no such (i.e. when condition i fails) then is undefined. Now, we have two cases:
We can now define the semimeasure by
For , this semimeasure is lower-semicomputable. Conversely, any lower-semicomputable semimeasure is of this form. Mixing these semimeasures according to our prior over lambda terms gives the desired Solomonoff-like prior.
In this post, Abram Demski argues that existing AI systems are already "AGI". They are clearly general in a way previous generations of AI were not, and claiming that they are still not AGI smells of moving the goalposts.
Abram also helpfully edited the post to summarize and address some of the discussion in the comments. The commenters argued, and Abram largely agreed, that there are still important abilities that modern AI lacks. However, there is still the question of whether that should disqualify it from the moniker "AGI", or maybe we need new terminology.
I tend to agree with Abram that there's a sense in which modern AI is already "AGI", and also agree with the commenters that there might be something important missing. To put the latter in my own words: I think that there is some natural property in computational-system-space s.t.
To handwave in the direction of that property, I would say "the ability to effectively and continuously acquire deep knowledge and exploit this knowledge to construct and execute goal-directed plans over long lifetimes and consequence horizons".
It is IMO unclear whether modern AI are better thought of as having a positive but subhuman amount of this property, or as lacking it entirely (i.e. lacking some algorithmic component necessary for it). This question is hard to answer from our understanding of the algorithms, because foundation models "steal" some human cognitive algorithms in opaque ways, and we don't even understand deep learning itself. Clearly, a civilization comprised of modern AI and no humans would not survive (not to mention progress), even if equipped with excellent robotic bodies. But, the latter might be just a "coincidental" fact about how harsh our specific universe is.
Be the case as it may, I think that the argument for more fine-grained terminology is strong. We can concede that modern AI is AGI, and have a new term for the thing modern AI might-not-yet-be. Maybe AGA: "Aritificial General Agent"?