The important thing to note about the problems you identified is how they differ from the problem domains of basic research. What happens to human evaluative judgement under the effects of intelligence augmentation? That's an experimental question. Can we trust a single individual to be enhanced? Almost certainly not. So perhaps we need to pick 100 or 1,000 people, wired into an shared infrastructure which enhances them in lock-step, and has incentives in place to ensure collaboration over competition, and consensus over partisanship in decision making protocols. Designing these protocols and safeguards takes a lot of work, but both the scale and the scope of that work is fairly well quantified. We can make a project plan and estimate with a high degree of accuracy how long and how much money it would take to design sufficiently safe oracle AI and intelligence augmentation projects.

FAI theory, on the other hand, is like the search for a grand unified theory of physics. We presume such a theory exists. We even have an existence proof of sorts (the human mind for FAI, the universe itself in physics). But the discovery of a solution is something that will or will not happen, and if it does it will be on an unpredictable time scale. Maybe it will take 5 years. Maybe 50, maybe 500. Who knows? After the rapid advances of the early 20th century, I'm sure most physicists thought a grand unified theory must be within reach; Einstein certainly did. Yet here we are nearly 100 years after the publication of the general theory of relativity, 85 years after most of the major discoveries of quantum mechanics, and yet in many ways we seem no closer to a theory of everything than we were some 40 years ago when the standard model was largely finalized.

It could be that at the very next MIRI workshop some previously unknown research associate solves the FAI problem conclusively. That'd be awesome. Or maybe she proves it impossible, which would be an equally good outcome because then we could at least refocus our efforts. Far worse, it might be that 50 years from now all MIRI has accumulated is a thoroughly documented list of dead-ends.

But that's not the worst case, because in reality UFAI will appear within the next decade or two, whether we want it to or not. So unless we are confident that we will solve the FAI problem and build out the solution before the competition, we'd better start investing heavily in alternatives.

The AI winter is over. Already multiple very well funded groups are rushing forward to generalize already super-human narrow AI techniques. AGI is finally a respectable field again, and there are multiple teams making respectable progress towards seed AI. And parallel hardware and software tools have finally gotten to the point where a basement AGI breakthrough is a very real and concerning possibility.

We don't have time to be dicking around doing basic research on whiteboards.

Aaaand there's the "It's too late to start researching FAI, we should've started 30 years ago, we may as well give up and die" to go along with the "What's the point of starting now, AGI is too far away, we should start 30 years later because it will only take exactly that amount of time according to this very narrow estimate I have on hand."

If the overlap between your credible intervals on "How much time we have left" and "How much time it will take" do not overlap, then you either know a heck of a lot I don't, or y... (read more)

2[anonymous]6yOk, let me finally get around to answering this. FAI has definite subproblems. It is not a matter of scratching away at a chalkboard hoping to make some breakthrough in "philosophy" or some other proto-sensical field that will Elucidate Everything and make the problem solvable at all. FAI, right now, is a matter of setting researchers to work on one subproblem after another until they are all solved. In fact, when I do literature searches for FAI/AGI material, I often find that the narrow AI or machine-learning literature contains a round dozen papers nobody working explicitly on FAI has ever cited, or even appears to know about. This is my view: there is low-hanging fruit in applying existing academic knowledge to FAI problems. Where such low-hanging fruit does not exist, the major open problems can largely be addressed by recourse to higher-hanging fruit within mathematics, or even to empirical science. Since you believe it's all so wide-open, I'd like to know what you think of as "the FAI problem". If you have an Oracle AI you can trust, you can use it to solve FAI problems for you. This is a fine approach. Luckily, we don't need to dick around.

On Terminal Goals and Virtue Ethics

by Swimmer963 4 min read18th Jun 2014207 comments



A few months ago, my friend said the following thing to me: “After seeing Divergent, I finally understand virtue ethics. The main character is a cross between Aristotle and you.”

That was an impossible-to-resist pitch, and I saw the movie. The thing that resonated most with me–also the thing that my friend thought I had in common with the main character–was the idea that you could make a particular decision, and set yourself down a particular course of action, in order to make yourself become a particular kind of person. Tris didn’t join the Dauntless cast because she thought they were doing the most good in society, or because she thought her comparative advantage to do good lay there–she chose it because they were brave, and she wasn’t, yet, and she wanted to be. Bravery was a virtue that she thought she ought to have. If the graph of her motivations even went any deeper, the only node beyond ‘become brave’ was ‘become good.’ 

(Tris did have a concept of some future world-outcomes being better than others, and wanting to have an effect on the world. But that wasn't the causal reason why she chose Dauntless; as far as I can tell, it was unrelated.)

My twelve-year-old self had a similar attitude. I read a lot of fiction, and stories had heroes, and I wanted to be like them–and that meant acquiring the right skills and the right traits. I knew I was terrible at reacting under pressure–that in the case of an earthquake or other natural disaster, I would freeze up and not be useful at all. Being good at reacting under pressure was an important trait for a hero to have. I could be sad that I didn’t have it, or I could decide to acquire it by doing the things that scared me over and over and over again. So that someday, when the world tried to throw bad things at my friends and family, I’d be ready.

You could call that an awfully passive way to look at things. It reveals a deep-seated belief that I’m not in control, that the world is big and complicated and beyond my ability to understand and predict, much less steer–that I am not the locus of control. But this way of thinking is an algorithm. It will almost always spit out an answer, when otherwise I might get stuck in the complexity and unpredictability of trying to make a particular outcome happen.

Virtue Ethics

I find the different houses of the HPMOR universe to be a very compelling metaphor. It’s not because they suggest actions to take; instead, they suggest virtues to focus on, so that when a particular situation comes up, you can act ‘in character.’ Courage and bravery for Gryffindor, for example. It also suggests the idea that different people can focus on different virtues–diversity is a useful thing to have in the world. (I'm probably mangling the concept of virtue ethics here, not having any background in philosophy, but it's the closest term for the thing I mean.)

I’ve thought a lot about the virtue of loyalty. In the past, loyalty has kept me with jobs and friends that, from an objective perspective, might not seem like the optimal things to spend my time on. But the costs of quitting and finding a new job, or cutting off friendships, wouldn’t just have been about direct consequences in the world, like needing to spend a bunch of time handing out resumes or having an unpleasant conversation. There would also be a shift within myself, a weakening in the drive towards loyalty. It wasn’t that I thought everyone ought to be extremely loyal–it’s a virtue with obvious downsides and failure modes. But it was a virtue that I wanted, partly because it seemed undervalued. 

By calling myself a ‘loyal person’, I can aim myself in a particular direction without having to understand all the subcomponents of the world. More importantly, I can make decisions even when I’m rushed, or tired, or under cognitive strain that makes it hard to calculate through all of the consequences of a particular action.


Terminal Goals

The Less Wrong/CFAR/rationalist community puts a lot of emphasis on a different way of trying to be a hero–where you start from a terminal goal, like “saving the world”, and break it into subgoals, and do whatever it takes to accomplish it. In the past I’ve thought of myself as being mostly consequentialist, in terms of morality, and this is a very consequentialist way to think about being a good person. And it doesn't feel like it would work. 

There are some bad reasons why it might feel wrong–i.e. that it feels arrogant to think you can accomplish something that big–but I think the main reason is that it feels fake. There is strong social pressure in the CFAR/Less Wrong community to claim that you have terminal goals, that you’re working towards something big. My System 2 understands terminal goals and consequentialism, as a thing that other people do–I could talk about my terminal goals, and get the points, and fit in, but I’d be lying about my thoughts. My model of my mind would be incorrect, and that would have consequences on, for example, whether my plans actually worked.


Practicing the art of rationality

Recently, Anna Salamon brought up a question with the other CFAR staff: “What is the thing that’s wrong with your own practice of the art of rationality?” The terminal goals thing was what I thought of immediately–namely, the conversations I've had over the past two years, where other rationalists have asked me "so what are your terminal goals/values?" and I've stammered something and then gone to hide in a corner and try to come up with some. 

In Alicorn’s Luminosity, Bella says about her thoughts that “they were liable to morph into versions of themselves that were more idealized, more consistent - and not what they were originally, and therefore false. Or they'd be forgotten altogether, which was even worse (those thoughts were mine, and I wanted them).”

I want to know true things about myself. I also want to impress my friends by having the traits that they think are cool, but not at the price of faking it–my brain screams that pretending to be something other than what you are isn’t virtuous. When my immediate response to someone asking me about my terminal goals is “but brains don’t work that way!” it may not be a true statement about all brains, but it’s a true statement about my brain. My motivational system is wired in a certain way. I could think it was broken; I could let my friends convince me that I needed to change, and try to shoehorn my brain into a different shape; or I could accept that it works, that I get things done and people find me useful to have around and this is how I am. For now. I'm not going to rule out future attempts to hack my brain, because Growth Mindset, and maybe some other reasons will convince me that it's important enough, but if I do it, it'll be on my terms. Other people are welcome to have their terminal goals and existential struggles. I’m okay the way I am–I have an algorithm to follow.


Why write this post?

It would be an awfully surprising coincidence if mine was the only brain that worked this way. I’m not a special snowflake. And other people who interact with the Less Wrong community might not deal with it the way I do. They might try to twist their brains into the ‘right’ shape, and break their motivational system. Or they might decide that rationality is stupid and walk away.