Alex Flint

Independent AI alignment researcher


The accumulation of knowledge

Wiki Contributions


If you pin down what a thing refers to according to what that thing was optimized to refer to, then don't you have to look at the structure of the one who did the optimizing in order to work out what a given thing refers to? That is, to work out what the concept "thermodynamics" refers to, it may not be enough to look at the time evolution of the concept "thermodynamics" on its own, I may instead need to know something about the humans who were driving those changes, and the goals held within their minds. But, if this is correct, then doesn't it raise another kind of homunculus-like regression where we were trying to directly explain semantics, but we ended up needing to inquire into yet another mind, the complete understanding of which would require further unpacking of the frames and concepts held in that mind, and the complete understanding of those frames and concepts requiring even further inquiry into a yet earlier mind that was responsible for doing the optimization of those frames and concepts?

There seems to be some real wisdom in this post but given the length and title of the post, you haven't offered much of an exit -- you've just offered a single link to a youtube channel for a trauma healer. If what you say here is true, then this is a bit like offering an alcoholic friend the sum total of one text message containing a single link to the homepage of alcoholics anonymous -- better than nothing, but not worthy of the bombastic title of this post.

friends and family significantly express their concern for my well being

What exact concerns do they have?

  1. You don't get to fucking assume any shit on the basis of "but... ah... come on". If you claim X and someone asks why, then congratulations now you're in a conversation. That means maybe possible shit is about to get real, like some treasured assumptions might soon be questioned. There are no sarcastic facial expressions or clever grunts that get you an out from this. You gotta look now at the thing itself.

I just want to acknowledge the very high emotional weight of this topic.

For about two decades, many of us in this community have been kind of following in the wake of a certain group of very competent people tackling an amazingly frightening problem. In the last couple of years, coincident with a quite rapid upsurge in AI capabilities, that dynamic has really changed. This is truly not a small thing to live through. The situation has real breadth -- it seems good to take it in for a moment, not in order to cultivate anxiety, but in order to really engage with the scope of it.

It's not a small thing at all. We're in this situation where we have AI capabilities kind of out of control. We're not exactly sure where any of the leader's we've previously relied on stand. We all have this opportunity now to take action. The opportunity is simply there. Nobody, actually, can take it away. But there is also the opportunity, truly available to everyone regardless of past actions, to falter, exactly when the world most needs us.

What matters, actually, is what, concretely, we do going forward.

That is correct. I know it seems little weird to generate a new policy on every timestep. The reason it's done that way is that the logical inductor needs to understand the function that maps prices to the quantities that will be purchased, in order to solve for a set of prices that "defeat" the current set of trading algorithms. That function (from prices to quantities) is what I call a "trading policy", and it has to be represented in a particular way -- as a set of syntax tree over trading primitives -- in order for the logical inductor to solve for prices. A trading algorithm is a sequence of such sets of syntax trees, where each element in the sequence is the trading policy for a different time step.

Normally, it would be strange to set up one function (trading algorithms) that generates another function (trading policies) that is different for every timestep. Why not just have the trading algorithm directly output the amount that it wants to buy/sell? The reason is that we need not just the quantity to buy/sell, but that quantity as a function of price, since prices themselves are determined by solving an optimization problem with respect to these functions. Furthermore, these functions (trading policies) have to be represented in a particular way. Therefore it makes most sense to have trading algorithms output a sequence of trading policies, one per timestep.

Thank you for this extraordinarily valuable report!

I believe that what you are engaging in, when you enter into a romantic relationship with either a person or a language model, is a kind of artistic creation. What matters is not whether the person on the "other end" of the relationship is a "real person" but whether the thing you create is of true benefit to the world. If you enter into a romantic relationship with a language model and produce something of true benefit to the world, then the relationship was real, whether or not there was a "real person" on the other end of it (whatever that would mean, even in the case of a human).

This is a relatively banal meta-commentary on reasons people sometimes give for doing worst-case analysis, and the differences between those reasons. The post reads like a list of things with no clear through-line. There is a gesture at an important idea from a Yudkowsky post (the logistic success curve idea) but the post does not helpfully expound that idea. There is a kind of trailing-off towards the end of the post as things like "planning fallacy" seem to have been added to the list with little time taken to place them in the context of the other things on the list. In the "differences between these arguments" section, the post doesn't clearly elucidate deep differences between the arguments, it just lists verbal responses that you might make if you are challenged on plausibility grounds in each case.

Overall, I felt that this post under-delivered on an important topic.

Many people believe that they already understand Dennett's intentional stance idea, and due to that will not read this post in detail. That is, in many cases, a mistake. This post makes an excellent and important point, which is wonderfully summarized in the second-to-last paragraph:

In general, I think that much of the confusion about whether some system that appears agent-y “really is an agent” derives from an intuitive sense that the beliefs and desires we experience internally are somehow fundamentally different from those that we “merely” infer and ascribe to systems we observe externally. I also think that much of this confusion dissolves with the realization that internally experienced thoughts, beliefs, desires, goals, etc. are actually “external” with respect to the parts of the mind that are observing them—including the part(s) of the mind that is modeling the mind-system as a whole as “being an agent” (or a “multiagent mind,” etc.). You couldn't observe thoughts (or the mind in general) at all if they weren't external to "you" (the observer), in the relevant sense.

The real point of the intentional stance idea is that there is no fact of the matter about whether something really is an agent, and that point is most potent when applied to ourselves. It is neither the case that we really truly are an agent, nor that we really truly are not an agent.

This post does an excellent job of highlighting this facet. However, I think this post could have been more punchy. There is too much meta-text of little value, like this paragraph:

In an attempt to be as faithful as possible in my depiction of Dennett’s original position, as well as provide a good resource to point back to on the subject for further discussion[1], I will err on the side of directly quoting Dennett perhaps too frequently, at least in this summary section.

In a post like this, do we need to be fore-warned that the author will err perhaps to frequently on the side of directly quoting Dennett, at least in the summary section? No, we don't need to know that. In fact the post does not contain all that many direct quotes.

At the top of the "takeaways" section, the author gives the following caveat:

Editorial note: To be clear, these “takeaways” are both “things Dan Dennett is claiming about the nature of agency with the intentional stance” and “ideas I’m endorsing in the context of deconfusing agency for AI safety.”

The word "takeaways" in the heading already tells us that this section will contain points extracted by the reader that may or may not be explicitly endorsed by the original author. There is no need for extra caveats, it just leads to a bad reading experience.

In the comments section, Rohin makes the following very good point:

I mostly agree with everything here, but I think it is understating the extent to which the intentional stance is insufficient for the purposes of AI alignment. I think if you accept "agency = intentional stance", then you need to think "well, I guess AI risk wasn't actually about agency".

Although we can "see through" agency as not-an-ontologically-fundamental-thing, nevertheless we face the practical problem of what to do about the (seemingly) imminent destruction of the world by powerful AI. What actually should we do about that? The intentional stance not only fails to tell us what to do, it also fails to tell us how any approach to averting AI risk can co-exist with the powerful deconstruction of agency offered by the intentional stance idea itself. If agency is in the eye of the beholder, then... what? What do we actually do about AI risk?

Load More