Dr. in math. AI Alignment and Safety researcher. Bayesian.
Science YouTuber, podcaster, writer.
Author of the books "The Equation of Knowledge", "Le Fabuleux Chantier" and "Turing à la plage".
Hi Apodosis, I have done my PhD in Bayesian game theory, so this is a topic close to my heart ˆˆ
There are plenty of fascinating things to explore in the study of interactions between Bayesians. One important finding of my PhD was that, essentially, Bayesians end up playing (stable) Bayes-Nash equilibria in repeated games, even if the only feedback they receive is their utility (and in particular even if the private information of other players remain private). I also studied Bayesian incentive-compatible mechanism design, i.e. coming up with rules that incentivize Bayesians' honesty.
The book also discusses interesting features of interactions between Bayesians, such as Aumann-Aaronson's agreement theorem or Bayesian persuasion (i.e. maximizing a Bayesian judge's probability of convicting a defendant by optimizing what investigations should be persued). One research direction I'm interested in is that a Byzantine Bayesian agreement, i.e. how much a group of honest Bayesians can agree if they are infiltrated by a small number of malicious individuals, though I have not yet found the time to dig this topic further.
A more empirical challenge is to determine how well these Bayesian game theory models fit the description of human (or AI) interactions. Clearly, we humans are not Bayesians. We have some systematic cognitive biases (and even powerful AIs may also have systematic biases, since they won't be running Bayes rule exactly!). How can we best model and predict humans' divergence from Bayes rule? There has been a lot of spectacular advance in cognitive sciences in this regard (check out Josh Tenenbaum's work for instance), but there's definitely a lot more to do!
I promoted Bayes-up on my YouTube channel a couple of times 😋 (and on Twitter)
The YouTube algorithm is arguably an example of a "simple" manipulative algorithm. It's probably a combination of some reinforcement learning and a lot of supervised learning by now; but the following arguments apply even for supervised learning alone.
To maximize user engagement, it may recommend more addictive contents (cat videos, conspiracy, ...) because it learned from previous examples that users who clicked on one such content tended to stay longer on YouTube afterwards. This is massive user manipulation at scale.
Is this an existential risk? Well, some of these addictive contents are radicalizing and angering users,. This arguably increases the risk of international tensions, which increases the risk of nuclear war. This may not be the most dramatic increase in existential risk; but it's one that seems already going on today!
More generally, I believe that by pondering a lot more the behavior and impact of the YouTube algorithm, a lot can be learned about complex algorithms, including AGI. In a sense, the YouTube algorithm is doing so many different tasks that it can be argued to be already quite "general" (audio, visual, text, preference learning, captioning, translating, recommending, planning...).
More on this algorithm here: https://robustlybeneficial.org/wiki/index.php?title=YouTube
This is probably more contentious. But I believe that the concept of "intelligence" is unhelpful and causes confusion. Typically, Legg-Hutter intelligence does not seem to require any "embodied intelligence".
I would rather stress two key properties of an algorithm: the quality of the algorithm's world model and its (long-term) planning capabilities. It seems to me (but maybe I'm wrong) that "embodied intelligence" is not very relevant to world model inference and planning capabilities.
By the way, I've just realized that the Wikipedia page on AI ethics begins with robots. 😤