Wei Dai offered 7 tips on how to answer really hard questions:

  • Don't stop at the first good answer.
  • Explore multiple approaches simultaneously.
  • Trust your intuitions, but don't waste too much time arguing for them.
  • Go meta.
  • Dissolve the question.
  • Sleep on it.
  • Be ready to recognize a good answer when you see it. (This may require actually changing your mind.)

Some others from the audience include:

I'd like to offer one more technique for tackling hard questions: Hack away at the edges.

General history books compress time so much that they often give the impression that major intellectual breakthroughs result from sudden strokes of insight. But when you read a history of just one breakthrough, you realize how much "chance favors the prepared mind." You realize how much of the stage had been set by others, by previous advances, by previous mistakes, by a soup of ideas crowding in around the central insight made later.

It's this picture of the history of mathematics and science that makes me feel quite comfortable working on hard problems by hacking away at their edges.

I don't know how to build Friendly AI. Truth be told, I doubt humanity will figure it out before going extinct. The whole idea might be impossible or confused. But I'll tell you this: I doubt the problem will be solved by getting smart people to sit in silence and think real hard about decision theory and metaethics. If the problem can be solved, it will be solved by dozens or hundreds of people hacking away at the tractable edges of Friendly AI subproblems, drawing novel connections, inching toward new insights, drawing from others' knowledge and intuitions, and doing lots of tedious, boring work.

Here's what happened when I encountered the problem of Friendly AI and decided I should for the time being do research on the problem rather than, say, trying to start a few businesses and donate money. I realized that I didn't see a clear path toward solving the problem, but I did see tons of apparently relevant research that could be done around the edges of the problem, especially with regard to friendliness content (because metaethics is my background). Snippets of my thinking process look like this:

Friendliness content is about human values. Who studies human values, besides philosophers? Economists and neuroscientists. Let's look at what they know. Wow, neuroeconomics is far more advanced than I had realized, and almost none of it has been mentioned by anybody researching Friendly AI! Let me hack away at that for a bit, and see if anything turns up.

Some people approach metaethics/CEV with the idea that humans share a concept of 'ought', and figuring out what that is will help us figure out how human values are. Is that the right way to think about it? Lemme see if there's research on what concepts are, how much they're shared between human brains, etc. Ah, there is! I'll hack away at this next.

CEV involves the modeling of human preferences. Who studies that? Economists do it in choice modeling, and AI programmers do it in preference elicitation. They even have models for dealing with conflicting desires, for example. Let me find out what they know...

CEV also involves preference extrapolation. Who has studied that? Nobody but philosophers, unfortunately, but maybe they've found something. They call such approaches "ideal preference" or "full information" accounts of value. I can check into that.

You get the idea.

This isn't the only way to solve hard problems, but when problems are sufficiently hard, then hacking away at their edges may be just about all you can do. And as you do, you start to see where the problem is more and less tractable. Your intuitions about how to solve the problem become more and more informed by regular encounters with it from all angles. You learn things from one domain that end up helping in a different domain. And, inch by inch, you make progress.

Of course you want to be strategic about how you're tackling the problem. But you also don't want to end up thinking in circles because the problem is too hard to even think strategically about how to tackle it.

You also shouldn't do 3 months of thinking and never write any of it down because you know what you've thought isn't quite right. Hacking away at a tough problem involves lots of wrong solutions, wrong proposals, wrong intuitions, and wrong framings. Maybe somebody will know how to fix what you got wrong, or maybe your misguided intuitions will connect to something they know and you don't and spark a useful thought in their head.

Okay, that's all. Sorry for the rambling!

New Comment
30 comments, sorted by Click to highlight new comments since:

Good post. I've mostly figured that I'm not smart enough to contribute anything useful to Friendliness research, but this gives me hope that I might be of some use after all. :-)

I'm not going to have the insight that solves reflective decision theory, either. But there is so much plausibly relevant stuff lying on the ground waiting to be picked up right now it's killing me. I could spend the next year doing just the useful 'hacking away at the edges' research that I happen to see lying on the ground right from where I'm standing now.

You write good book summaries, among other things. Wanna summarize the most useful parts of this book for the rest of us? :)

there is so much plausibly relevant stuff lying on the ground waiting to be picked up right now it's killing me.

Another useful technique is pointing out this low-hanging fruit to your fellow researchers (or to the community at large), in a hope that someone else (also) might have the skill set suitable for picking it up. A post with a detailed list of stuff that is "killing you" might be a good idea. Let us hack together :)

I would like to write up that list, but I've mentioned some of it already. If somebody wants to summarize the literature on choice modelling or AI preference elicitation/learning from the point of view of what's most likely to be useful for CEV, that would be awesome. But I am very, very skeptical that anybody will actually do that. If somebody else does that before I do, they will jump onto my very short list of uber-special super-people who actually complete useful projects out of their own motivation. Of course, many people have pretty decent reasons for not doing that kind of thing - they decided a while back to have a spouse and kids that now depend on them, or they work in their comparative advantage and donate to save the world, or whatever. But some people won't do things like that because they have decided they have a disease called "akrasia" and that they are helpless to defeat it.


Is that as math-heavy as it looks like? :) I can summarize those books too, presuming that I understand the math involved, but it's slooow and requires special motivation... (and I still haven't even written up the rest of Kurzban's book)

Yeah, it's math heavy. There are definitely things that would be useful to summarize that would require less motivation to achieve. How 'bout Braitenberg's 'Vehicles'? People loved Yvain's Blue-Minimizing Robot, which is basically a robot version of the original 'Vehicles' article and book.

Ooh, Vehicles seems awesome. I'll see if I get around it. (Might take a while.)


At worst, you could get a cerebral job and use the proceeds to hire someone to serve lemonade to Friendliness researchers.

Or hire someone to discourage Friendliness researchers from consuming lemonade or anything else that promotes negative health outcomes. Come to think of it that is quite close to something I've heard seriously proposed!

The distinction between science and philosophy is that scientists hacks at the edges within reach, while philosophers guess what's in the center.

This principle is more general than stated here - it applies to warfare, economic infrastructures, personal improvement, and probably lots of other things.

Maybe less so to warfare. There have been cases of notable success by "striking at the center". Such as the US pacific theater in WW2, and MacArthur's invasion of Inchon.

I think for this to be analogous "the center" is the concentration of greatest military force, while the strikes you mention merely appear to be at the center because they are at the physical middle and places where supplies emanate from. Striking lines of communication and island hopping is hacking away at the edges of the enemy military.

(A lot of stuff seems potentially relevant only until you've studied the problem for a few years and learned that mostly it's actually not.)


I expect a lot of actually relevant stuff doesn't seem relevant until you've studied it in connection with the problem for a few years. But maybe you don't get that far, because it didn't seem relevant :(

Friendly AI is a monster problem partly because nearly everything any human experiences, believes, wants to believe or has any opinion at all on, is potentially relevant. You could be forgiven for thinking maybe there isn't a well-defined problem buried under all that mess after all. But there may be some useful sub-problems around the edges.

Personally, even if AI-that-goes-FOOM-catastrophically isn't very likely, I think we shouldn't even need that reason to study what sort of life and environment would be optimal for humans. It doesn't have to be about asking dangerous wishes of some technological genie-in-a-bottle. We already have supra-human entities such as governments and corporations making decisions with non-zero existential risk attached, and we probably want them to be a bit friendlier if possible.

Do you have specific examples in mind?

Machine learning (in particular, graphical models), more general AI, philosophy, game theory, algorithmic complexity, cognitive science, neuroscience seem to be mostly useless (beyond the basics) for attacking friendliness content problem. Pure mathematics seems potentially useful.

I would really, really like to know: What areas of pure mathematics stand out to you now?

He might have changed his mind till now, but in case you missed it: Recommended Reading for Friendly AI Research

I've looked over that list, but the problem is that it essentially consists of a list of items to catch you up to the state of the discussion as it was a year ago, along with a list of general mathematics texts.

I'm pretty well acquainted with mathematical logic; the main item on the list that I'm particularly weak in would be category theory, and I'm not sure why category theory is on the list. I've a couple of ideas about the potential use of category theory in, maybe, knowledge representation or something along those lines, but I have no clue how it could be brought to bear on the friendliness content problem.

The book list is somewhat obsolete (the list of LW posts is not), but I'm not ready to make the next iteration. The state of decision theory hasn't changed much since then.

Roughly, the central mystery seems to be the idea of acausal control. It feels like it might even be useful for inferring friendliness content, along the lines of what I described here. But we don't understand that idea. It first more or less explicitly appeared in UDT with its magical mathematical intuition module, and became more concrete in ADT, where proofs are used instead (at the cost of making it useless where complete proofs can't be expected, that is almost always outside very simple thought experiments).

The problem is this: given action-definition and utility-definition, agent can find a function between their sets of possible values and use it as a "utility function", but other "utility functions" are correct as well, the agent just isn't capable of finding them, but somehow it's a good thing, that's why it works (see this post). What makes some of the functions "better" than others? Can we generalize this to inference of dependencies between facts other than action and utility-value? What particular properties of agents constructed in one of the standard ways allows them to be controlled by some, but not other dependencies? What kinds of "facts" are relevant? What constitutes a "fact"? (In ADT, a "fact" is an axiomatic definition of a structure, which refers to some particular class of structures and not to other structures; decision theory then considers ways in which some of these "facts" can control other "facts", that is make the structures defined by certain definitions be a certain way, given control over other structures that contain agent's action.)

It feels like mathematics is the discipline for clarifying questions like this (and it's perhaps not useful to prioritize its areas, though some emphasis on foundations seems right). An important milestone would be to produce a useful problem statement about clarification of this idea of acausal dependence that can be communicated at least to mathematicians on LW.

Of the things on your list, I'm most surprised by cognitive science and maybe game theory, unless you're talking about the fields' current insights rather than their expected future insights. In that case, I'm still somewhat surprised game theory is on this list. I'd love to learn what led you to this belief.

It's possible I only know the basics, so feel free to say "read more about what the fields actually offer and it'll be obvious if you've been on Less Wrong long enough."

I agree on most of this, but would you mind explaining why you think neuroscience is "mostly useless?" My intuition is the opposite. Also agreed that pure mathematics seems useful.

Even if we knew everything about brains, right now we lack conceptual/philosophical insight to turn that data into something useful. In turn, neuroscience is not even primarily concerned with getting such data, it develops its own generalizations that paint a picture of roughly how brains work, but this picture probably won't be detailed enough to capture the complexity of human (extrapolated) value, even if we knew how to interpret it, which we don't.

I was also wondering about neuroscience. If we take a CEV approach, wouldn't neuroscience be useful for actually determining the volitions to be extrapolated?

Agreed but would add algorithmic information theory, deep theoretical computer science, and maybe quantum information theory. There are some interesting questions about hypercomputation, getting information from context, and concrete semi-"physical" AI coordination problems. (Also reversible computing is just trippy as hell. Intuitions, especially "moral" intuitions, gawk at it.) These are of course secondary to study of updateless-like decision theories.

Also reversible computing is just trippy as hell. Intuitions, especially "moral" intuitions, gawk at it.

They do? Why? I haven't experienced moral trippiness myself. This may be because I haven't considered the same things you have or because my intuitions are eccentric. (Assume I mean 'eccentric in a different way to how your moral intuitions are eccentric' or not depending on whether you prefer to be seen as having typical moral intuitions or atypical ones.)

Of course you want to fail as quickly as you can, though you and I do seem to have slightly different intuitions about what is likely to end up being useful for friendliness content. Or rather, I have a slightly broader set of things that I think have a decent chance of being useful.

+1 just for the advice - I also do this, and it's the only way to get the right kind of insight to actually solve the problem. You want to climb what looks like an impossibly steep mountain, so you start by mapping the territory around the mountain - discovering where others have already been to, and potential ways up that don't work. Eventually you may well discover there's an EASY way up the mountain, or at least a practical one.

I've certainly found this to be a useful strategy when dealing with complicated problems in software development. Sometimes a problem is just too big, and I can't quite see how all the pieces need to fit together. If I allow myself to leave some important design problems unresolved while I work on the parts that I do understand well enough to write, I often find that the other pieces then fall into place straightforwardly.


How about trying to interact with a lot of humans that are different from you? This would result in you having more models of parts of humanity in your mind, each of which would control your behavior to some degree. It seems like a practical way to get closer to implementing CEV yourself.