All of gsastry's Comments + Replies

(While I appreciate many of the investigations in this paper and think it is good to improve our understanding, I don’t think they let us tell what’s up with risk.) This could be the subject of a much longer post and maybe will be discussed in the comments.

Do you mean they don't tell us what's up with the difference in risks of the measured techniques, or that they don't tell us much about AI risk in general? (I'd at least benefit from learning more about your views here)

Yes, I mean that those measurements don't really speak directly to the question of whether you'd be safer using RLHF or imitation learning.

Agreed on (1) and (2). I'm still interested in the counterfactual value of theoretical research in security. One reason is that the "reasoning style" of ELK seems quite similar to that of cryptography – and at least we have some track record with the development of computer security.

The military and information assurance communities, which are used to dealing with highly adversarial environments, do not search for solutions that render all failures an impossibility.

In information security, practitioners do not look for airtight guarantees of security, but instead try to increase security iteratively as much as possible. Even RSA, the centerpiece of internet encryption, is not provably completely unbreakable (perhaps a superintelligence could find a way to efficiently factor large numbers).

I take your point, and I like the analog... (read more)

That's true! Cryptography is amenable to rigorous guarantees and proofs. I gave the example of RSA because even then, the assumption is essentially "we don't know how to efficiently factor large numbers, and lots of people have tried, so we will assume it can't be done efficiently." There are two broader points though: 1. Information assurance is about far more than encryption. Encryption involves proofs about algorithms, but it doesn't guarantee information assurance (for instance, how do you keep private keys private?) 2. We argue in our second post that theory is limited for deep learning (in a way it is less so in cryptography). Of course, cryptography may itself be relevant to ML safety in the sense that it can be used to secure model weights, etc.

Can you recommend some other posts in that reference class?

I agree with both your claims, but maybe with less confidence than you (I also agree with DanielFilan's point below).

Here are two places I can imagine MIRI's intuitions here coming from, and I'm interested in your thoughts on them:

(1) The "idealized reasoner is analogous to a Carnot engine" argument. It seems like you think advanced AI systems will be importantly disanalogous to this idea, and that's not obvious to me.

(2) 'We might care about expected utility maximization / theoretical rationality because there is an impo... (read more)

6Rohin Shah5y
(1) I am unsure whether there exists an idealized reasoner analogous to a Carnot engine (see Realism about rationality). Even if such a reasoner exists, it seems unlikely that we will a) figure out what it is, b) understand it in sufficient depth, and c) successfully use it to understand and improve ML techniques, before we get powerful AI systems through other means. Under short timelines, this cuts particularly deeply, because a) there's less time to do all of these things and b) it's more likely that advanced AI is built out of "messy" deep learning systems that seem less amenable to this sort of theoretical understanding. (2) I certainly agree that all else equal, advanced agents should act closer to ideal agents. (Assuming there is such a thing as an ideal agent.) I also agree that advanced AI should be less susceptible to money pumps, from which I learn that their "preferences" (i.e. world states that they work to achieve) are transitive. I'm also on board that more advanced AI systems are more likely to be described by some utility function that they are maximizing the expected utility of, per the VNM theorem. I don't agree that the utility function must be simple, or that the AI must be internally reasoning by computing the expected utility over all actions and then choosing the one that's highest. I would be extremely surprised if we built powerful AI such that when we say the English sentence "make paperclips" it acts in accordance with the utility function U(universe history) = number of paperclips in the last state of the universe history. I would be very surprised if we built powerful AI such that we hardcode in the above utility function and then design the AI to maximize its expected value.

I'm not sure what it means for this work to "not apply" to particular systems. It seems like the claim is that decision theory is a way to understand AI systems in general and reason about what they will do, just as we use other theoretical tools to understand current ML systems. Can you spell this out a bit more? (Note that I'm also not really sure what it means for decision theory to apply to all AI systems: I can imagine kludgy systems where it seems really hard in some sense to understand their behavior with decision theory, but I'm not confident at all)

I claim (with some confidence) that Updateless Decision Theory and Logical Induction don't have much to do with understanding AlphaGo or OpenAI Five, and you are better off understanding those systems using standard AI/ML thinking.

I further claim (with less confidence) that in a similar way, at the time that we build our first powerful AI systems, the results of Agent Foundations research at that time won't have much to do with understanding those powerful AI systems.

Does that explain what it means? And if so, do you disagree with either of the claims?

I'm not sure if this will be helpful or if you've already explored this connection, but the field of abstract interpretation tries to understand the semantics of a computer program without fully executing it. The theme of "trying to understand what a program will do by just examining its source code" is also present in program analysis. If we can understand neural networks as typed functional programs maybe there's something worth thinking about here.

Like some other commenters, I also highly recommend Impro if this post resonates with you.

Readers who are very interested in a more conceptual analysis of what decision making "is" in the narrative framework may want to check out Tempo (by Venkatesh Rao, who writes at Ribbonfarm). Rao takes as axiomatic the memetically derived idea that all our choices are between life scripts that end in our death, and looks at how to make these choices. It's more of an analytical book on strategy (with exercises) than a poetic exemplar of Mythic Mode, but... (read more)

I'm still confused on where to post stuff that I would think of posting in the old LW's Open Threads. For example, "What are the best pieces of writing/advice on dealing with 'shoulds'?" would be one thing that I'd want to post in an Open Thread. I have other various little questions/requests like this.

I don't understand the point about avoiding government involvement in the long run. It seems like your argument is that government projects are incompetent at managing tech projects (maybe because of structural reasons). This seems like a very strong claim to me, and seems only accurate when there's bad incentive compatibility. For example, are you excluding things like the Manhattan Project?

I'd be interested in a list of well-managed government science and engineering projects if one exists. The Manhattan Project and the Apollo Project both belong on that list (despite both having their flaws- leaks to the USSR from the former, and the Apollo 1 disaster from the latter); what are other examples?

Are there plans for recurring Open Threads like the old LW? Or is there a substitute now where it's recommended to post comments that used to go in the Open Threads?

(see my recent reply to Chris) In the next couple days, when it becomes much easier for people to permanently opt into "all personal blogposts in their feed", most use-cases like Open Threads will probably be best handled as personal blogs. This can either be by simply posting the sort of comment you'd place on an Open Thread as a blogpost on its own (there's no reason not to), or, if you like the format of an Open Thread, you can host an Open Thread on your personal blog. (Note that we haven't yet really examined the "short form content" idea, which was a separate concept that we've talked about in the past but which we probably won't get to in the immediate future)

Has either one been fully specified/formalized?

Here's one attempt to further formalize the different decision procedures: (H/T linked by Luke)

Thanks. I have some followup questions :)

  1. What projects are you currently working on?/What confusing questions are you attempting to answer?
  2. Do you think that most people should be very uncertain about their values, e.g. altruism?
  3. Do you think that your views about the path to FAI are contrarian (amongst people working on FAI/AGI, e.g. you believing most of the problems are philosophical in nature)? If so, why?
  4. Where do you hang out online these days? Anywhere other than LW?

Please correct me if I've misrepresented your views.

9Wei Dai10y
If you go through my posts on LW, you can read most of the questions that I've been thinking about in the last few years. I don't think any of the problems that I raised have been solved so I'm still attempting to answer them. To give a general idea, these include questions in philosophy of mind, philosophy of math, decision theory, normative ethics, meta-ethics, meta-philosophy. And to give a specific example I've just been thinking about again recently: What is pain exactly (e.g., in a mathematical or algorithmic sense) and why is it bad? For example can certain simple decision algorithms be said to have pain? Is pain intrinsically bad, or just because people prefer not to be in pain? As a side note, I don't know if it's good from a productivity perspective to jump around amongst so many different questions. It might be better to focus on just a few with the others in the back of one's mind. But now that I have so many unanswered questions that I'm all very interested in, it's hard to stay on any of them for very long. So reader beware. :) Yes, but I tend not to advertise too much that people should be less certain about their altruism, since it's hard to see how that could be good for me regardless of what my values are or ought to be. I make an exception of this for people who might be in a position to build an FAI, since if they're too confident about altruism then they're likely to be too confident about many other philosophical problems, but even then I don't stress it too much. I guess there is a spectrum of concern over philosophical problems involved in building an FAI/AGI, and I'm on the far end of the that spectrum. I think most people building AGI mainly want short term benefits like profits or academic fame, and do not care as much about the far reaches of time and space, in which case they'd naturally focus more on the immediate engineering issues. Among people working on FAI, I guess they either have not thought as much about philosophical proble
  1. What do you think are the most interesting philosophical problems within our grasp to be solved?
  2. Do you think that solving normative ethics won't happen until a FAI? If so, why?
  3. You argued previously that metaphilosophy and singularity strategies are fields with low hanging fruit. Do you have any examples of progress in metaphilosophy?
  4. Do you have any role models?

What do you think are the most interesting philosophical problems within our grasp to be solved?

I'm not sure there is any. A big part of it is that metaphilosophy is essentially a complete blank, so we have no way of saying what counts as a correct solution to a philosophical problem, and hence no way of achieving high confidence that any particular philosophical problem has been solved, except maybe simple (and hence not very interesting) problems, where the solution is just intuitively obvious to everyone or nearly everyone. It's also been my experien... (read more)