Marv K

I agree on both counts. You're right that I should model the alignment of the system as well as its intelligence. I guess the alignment could be thought of as minimizing the distance of high dimensional vectors representing the players' and the AI's values. So each user (and the AI, too) could have a value vector associated with it, and the cost functions of the user could then incorporate how much they care about their own alignment (to the rest of the users), and the cost function of the AI needs to be tuned so that it is enough aligned when it reaches a critical threshold of intelligence. That way, you could express how important it is that the AI is aligned, as a function of its intelligence.

Intelligence allocation from a Mean Field Game Theory perspective

Marv K

Each of us now faces a choice every minute of every day:

Should I really use my brain power for this, or shoud I let ChatGPT do it?

Using your own brain has its merits. You might get more intelligent and resilient in the long run. But ChatGPT already does most things much better and quicker, so why not let it spare you stirring your own grey goo? Then again, supporting ChatGPT may increase the likelihood of an extinction event.

In this post, we’re gonna look at this from a mean field game theory perspective.

First, let’s state the assumptions:

There are N individuals, each of them having a level of intelligence described by the stochastic process... (read 503 more words →)

Aligning alignment with performance

Marv K

This post is a short sketch of a problem with alignment that I wanted to share. The solution suggested below is not fleshed out at all yet, I may update this in a future post. I feel its worth sharing the initial idea already though.

The Problem

Claim: Usually, making an AI aligned will reduce its performance, i.e. increases in alignment are a sufficient condition for decreases in performance.

This is almost by definition, since alignment is about ruling out behavior we deem bad, which would otherwise be the AI doing "its best" to follow our instructions literally.

Some examples of how this might look:

The AI is trained with an objective function that contains an

... (read 352 more words →)

Replying toVariational Bayesian methods

Marv K3y

Variational Bayesian methods

Nice writeup. I wasn't even aware k-means clustering can be viewed from the Variational Bayes framework. In case more perspectives are useful to any readers: When I first tried to learn about this, I found the Pyro Introduction very helpful; because it is split up over a lot of files, I put together these slides for Bayesian Neural Networks, which also start out with a motivation for Variational Bayes.

Replying toAnnouncing the Alignment of Complex Systems Research Group

Marv K4y

Announcing the Alignment of Complex Systems Research Group

I've been thinking about alignment of subsystems in a very similar style and am really excited to see someone else thinking along this way. I started a comment with my own thoughts on this approach; but it got out of hand quickly; so I made a separate post: https://www.lesswrong.com/posts/AZfq4jLjqsrt5fjGz/formalizing-alignment

Would be keen on having any sort of feedback.

Formalizing Alignment

Marv K

I just saw Jan Kulveit's Announcement of the Alignment of Complex Systems Research Group, and was very excited to see other people in the Alignment Space noticing a need for a formal theory of alignment of agents arranged in a hierarchy. I also think formalizing how agents can work together to create an agent at a higher abstraction level to solve some collective need of the subsystems is the very first step towards understanding how we can hope to align powerful AGIs in different contexts.

Some thoughts on this:

One of the critical aspects to analyze in the need-fulfilling agent creation situation is WHAT that need is; it seems mostly to be about regulation/control

... (read 358 more words →)

Abstraction sacrifices causal clarity

Marv K

In my previous post I listed some considerations for a theory of narratives. The smallest building block of narratives are abstractions over empirically observed things and events; that is the ontology of the language that the narrative uses. In this post I want to start laying out a framework that allows showing how, although initially, one may have observational access to a graph with clear causality between events, by abstracting over its vertices and edges naively (or as best you can?), you lose this causal clarity and are left with correlation. The end goal is to end up with some considerations on how to abstract well while preserving causal clarity optimally.

My observational... (read 714 more words →)

Replying toQuestions for a Theory of Narratives

Marv K4y

Questions for a Theory of Narratives

Thanks for the pointers! The overviews in both sources are great. I especially like Rumelhart's Story Grammar. Though from what I gather from Mark Riedl's post is that the field is mostly about structure/grammar inherent to stories as objects that exist pretty much in a vacuum, and does not explicitly focus on making connections to some sort of models of agents that communicate using these stories.

Questions for a Theory of Narratives

Marv K

I kindly ask for your comments, questions and feedback of any kind.

Various predictive and rigorous mathematical (/logical/linguistic) frameworks exist for analyzing and implementing agent's world MODELLING- especially that of causality (e.g. pearl's structural causal model) -, their ACTION (e.g. decision theory, reinforcement learning) and some aspects of their COMMUNICATION (information theory; vocabulary, syntax, semantics, pragmatics). I believe one missing link between communication and the former two aspects of intelligent agents is a formal theory of NARRATIVES or stories: At least as far as I am aware, there is no good theory of why we share bits of world model and policy with one another in the way that we do.

This is a... (read 1098 more words →)

LESSWRONG
LW

LESSWRONG
LW

Intelligence allocation from a Mean Field Game Theory perspective

Questions for a Theory of Narratives

Formalizing Alignment

Abstraction sacrifices causal clarity

Marv K

Intelligence allocation from a Mean Field Game Theory perspective

Aligning alignment with performance

Formalizing Alignment

Abstraction sacrifices causal clarity

Questions for a Theory of Narratives

Marv K

Intelligence allocation from a Mean Field Game Theory perspective

Questions for a Theory of Narratives

Formalizing Alignment

Abstraction sacrifices causal clarity

Marv K

Intelligence allocation from a Mean Field Game Theory perspective

Aligning alignment with performance

Formalizing Alignment

Abstraction sacrifices causal clarity

Questions for a Theory of Narratives

The Problem