Aspiring monastic and AI safety researcher


Thoughts on Iason Gabriel’s Artificial Intelligence, Values, and Alignment

Our choice is not between having humans run the world and having a benevolent god run the world.

Right, I agree that having a benevolent god run the world is not within our choice set.

Our choice is between having humans run the world, and having humans delegate the running of the world to something else (which is kind of just an indirect way of running the world).

Well just to re-state the suggestion in my original post: is this dichotomy between humans running the world or something else running the world really so inescapable? The child in the sand pit does not really run the world, and in an important way the parent also does not run the world -- certainly not from the perspective of the child's whole-life trajectory.

Thoughts on Iason Gabriel’s Artificial Intelligence, Values, and Alignment

Thank you for this jbash.

Humans aren't fit to run the world, and there's no reason to think humans can ever be fit to run the world

My short response is: Yes, it would be very bad for present-day humanity to have more power than it currently does, since its current level of power is far out of proportion to its level of wisdom and compassion. But it seems to me that there are a small number of humans on this planet who have moved some way in the direction of being fit to run the world, and in time, more humans could move in this direction, and could move further. I would like to build the kind of AI that creates a safe container for movement in this direction, and then fades away as humans in fact become fit to run the world, however long or short that takes. If it turns out not to be possible then the AI should never fade away.

There's no point at which an AI with a practical goal system can tell anything recognizably human, "OK, you've grown up, so I won't interfere if you want to destroy the world, make life miserable for your peers, or whatever".

I think what it means to grow up is to not want to destroy the world or make life miserable for one's peers. I do not think that most biological full-grown humans today have "grown up".

Institutions may have fine-tuned how they restrict individual agency. They may have managed to do it more when it helps and less when it hurts. But they haven't given it up. Institutions don't make individual adults sovereign, not even over themselves and definitely not in any matter that affects others.

Well just to state my position on this without arguing for it: my sense is that institutions should make individual adults sovereign if and when they grow up in the sense I've alluded to above. Very few currently living humans meet this bar in my opinion. In particular, I do not think that I meet this bar.

Not unless you deliberately modify them to the point where the word "human" becomes unreasonable.

True, but whether or not the word "human" is a reasonable description of what a person becomes when they become fit to run the world, the question is really: can humans become fit to run the world, and should they? Based on the few individuals I've spent time with who seem, in my estimation, to have moved some way in the direction of being fit to the run world, I'd say: yes and yes.

Reflections on Larks’ 2020 AI alignment literature review

I very much agree with these two:

On the other hand, there are lots of people who really do want to help, for the right reason. So if growth is the goal, helping these people out seems like just an obvious thing to do

So I think there is a lot of room for growth, by just helping the people who are already involved and trying.

Reflections on Larks’ 2020 AI alignment literature review

Thank you for this thoughtful comment Linda -- writing this replying has helped me to clarify my own thinking on growth and depth. My basic sense is this:

If I meet someone who really wants to help out with AI safety, I want to help them to do that, basically without reservation, regardless of their skill, experience, etc. My sense is that we have a huge and growing challenge in navigating the development of advanced AI, and there is just no shortage of work to do, though it can at first be quite difficult to find. So when I meet individuals, I will try to help them find out how to really help out. There is no need for me to judge whether a particular person really wants to help out or not; I simply help them see how they can help out, and those who want to help out will proceed. Those who do not want to help out will not proceed, and that's fine too -- there are plenty of good reasons for a person to not want to dive head-first into AI safety.

But it's different when I consider setting up incentives, which is what @Larks was writing about:

My basic model for AI safety success is this: Identify interesting problems. As a byproduct this draws new people into the field through altruism, nerd-sniping, apparent tractability. Solve interesting problems. As a byproduct this draws new people into the field through credibility and prestige.

I'm quite concerned about "drawing people into the field through credibility and prestige" and even about "drawing people into the field through altruism, nerd-sniping, and apparent tractability". The issue is not the people who genuinely want to help out, whom I consider to be a boon to the field regardless of their skill or experience. The issue is twofold:

  1. Drawing people who are not particularly interested in helping out into the field via incentives (credibility, prestige, etc).
  2. Tempting those who do really want to help out and are already actually helping out to instead pursue incentives (credibility, prestige, etc).

So I'm not skeptical of growth via helping individuals, I'm skeptical of growth via incentives.

Belief Functions And Decision Theory

Ah this is helpful, thank you.

So let's say I'm estimating the position of a train on a straight section of track as a single real number and I want to do an update each time I receive a noisy measurement of the train's position. Under the theory you're laying out here I might have, say, three Gaussians N(0, 1), N(1, 10), N(4, 6), and rather than updating a single pdf over the position of the train, I'm updating measures associated with each of these three pdf. Is that roughly correct?

(I realize this isn't exactly a great example of how to use this theory since train positions are perfectly realizable, but I just wanted to start somewhere familiar to me.)

Do you by chance have any worked examples where you go through the update procedure for some concrete prior and observation? If not, do you have any suggestions for what would be a good toy problem where I could work through an update at a very concrete level?

Belief Functions And Decision Theory

Thank you for your work both in developing this theory and putting together this heroic write-up! It's really a lot of work to write all this stuff out.

I am interested in understanding the thing you're driving at here, but I'm finding it difficult to navigate because I don't have much of a sense for where the definitions are heading towards. I'm really looking for an explanation of what exactly is made possible by this theory, so that as I digest each of the definitions I have a sense for where this is all heading.

My current understanding is that this is all in service of working in unrealizable settings where the true environment is not in your hypothesis class. Is that correct? What exactly does an infrabayesian agent do to cope with unrealizability?

Reflections on Larks’ 2020 AI alignment literature review

Yeah so to be clear, I do actually think strategy research is pretty important, I just notice that in practice most of the strategy write-ups that I actually read do not actually enlighten me very much, whereas it's not so uncommon to read technical write-ups that seem to really move our understanding forward. I guess it's more that doing truly useful strategy research is just ultra difficult. I do think that, for example, some of Bostrom's and Yudkowsky's early strategy write-ups were ultra useful and important.

Search versus design

Yes I agree with this. Another example is the way a two-by-four length of timber is a kind of "interface" between the wood mill and the construction worker. There is a lot of complexity in the construction of these at the wood mill, but the standard two-by-four means that the construction worker doesn't have to care. This is also a kind of factorization that isn't about decomposition into parts or subsystems.

Search versus design

Nice post, very much the type of work I'd like to see more of.

Thank you!

I'm not sure I'd describe this work as "notorious", even if some have reservations about it.

Oops, terrible word choice on my part. I edited the article to say "gained attention" rather than "gained notoriety".

I think this is incorrect - for example, "biological systems are highly modular, at multiple different scales". And I expect deep learning to construct minds which are also fairly modular. That also allows search to be more useful, because it can make changes which are comparatively isolated.

Yes I agree with this, but modularity is only a part of what is needed for comprehensibility. Chris Olah's work on circuits in convnets suggests that convnets trained on image recognition tasks are somewhat modular, but it's still very very difficult to tease them apart and understand them. Biological trees are modular in many ways, but we're still working on understanding how trees work after many centuries of investigation.

You might say that comprehensibility = modularity + stories. You need artifacts that decompose into subsystems, and you need stories about that decomposition and what the pieces do so that you're not left figuring it out from scratch.

Load More