The Inefficiency of Theoretical Discovery

by lukeprog1 min read3rd Nov 2013109 comments

29

Personal Blog

Previously: Why Neglect Big Topics.

Why was there no serious philosophical discussion of normative uncertainty until 1989, given that all the necessary ideas and tools were present at the time of Jeremy Bentham?

Why did no professional philosopher analyze I.J. Good’s important “intelligence explosion” thesis (from 19591) until 2010?

Why was reflectively consistent probabilistic metamathematics not described until 2013, given that the ideas it builds on go back at least to the 1940s?

Why did it take until 2003 for professional philosophers to begin updating causal decision theory for the age of causal Bayes nets, and until 2013 to formulate a reliabilist metatheory of rationality?

By analogy to financial market efficiency, I like to say that “theoretical discovery is fairly inefficient.” That is: there are often large, unnecessary delays in theoretical discovery.

This shouldn’t surprise us. For one thing, there aren’t necessarily large personal rewards for making theoretical progress. But it does mean that those who do care about certain kinds of theoretical progress shouldn’t necessarily think that progress will be hard. There is often low-hanging fruit to be plucked by investigators who know where to look.

Where should we look for low-hanging fruit? I’d guess that theoretical progress may be relatively easy where:

  1. Progress has no obvious, immediately profitable applications.
  2. Relatively few quality-adjusted researcher hours have been devoted to the problem.
  3. New tools or theoretical advances open up promising new angles of attack.
  4. Progress is only valuable to those with unusual views.

These guesses make sense of the abundant low-hanging fruit in much of MIRI’s theoretical research, with the glaring exception of decision theory. Our September decision theory workshop revealed plenty of low-hanging fruit, but why should that be? Decision theory is widely applied in multi-agent systems, and in philosophy it’s clear that visible progress in decision theory is one way to “make a name” for oneself and advance one’s career. Tons of quality-adjusted researcher hours have been devoted to the problem. Yes, new theoretical advances (e.g. causal Bayes nets and program equilibrium) open up promising new angles of attack, but they don’t seem necessary to much of the low-hanging fruit discovered thus far. And progress in decision theory is definitely not valuable only to those with unusual views. What gives?

Anyway, three questions:

  1. Do you agree about the relative inefficiency of theoretical discovery?
  2. What are some other signs of likely low-hanging fruit for theoretical progress?
  3. What’s up with decision theory having so much low-hanging fruit?

1 Good (1959) is the earliest statement of the intelligence explosion: “Once a machine is designed that is good enough… it can be put to work designing an even better machine. At this point an ”explosion“ will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.” The term itself, “intelligence explosion,” originates with Good (1965). Technically, artist and philosopher Stefan Themerson wrote a "philosophical analysis" of Good's intelligence explosion thesis called Special Branch, published in 1972, but by "philosophical analysis" I have in mind a more analytic, argumentative kind of philosophical analysis than is found in Themerson's literary Special Branch ↩

29

109 comments, sorted by Highlighting new comments since Today at 12:25 AM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I vastly disagree. I will just state it for now, and hopefully this will be a commitment to explain it further when I have the time. Here are my unjustified assertions about the nature of philosophy regarding OP's topics:

  1. Philosophy has the most huge search space known to man, it encompasses everything (a) without a good clear-cut solution and (b) which has any hope to be solved (this rules out two extremes: science and religion).

  2. Philosophy, by its very nature, has few systematized methods for efficient search. Seems like we discovered logical and clear thinking recently, but that's almost about it.

  3. Because it is so difficult, philosophy is wrong 99,9% of the time.

  4. When philosophy is right, major breakthroughs are made, sciences are created, new reasoning tools, higher moral standards and so on.

  5. There's a massive and astronomical hindsight bias. Once solved a problem is no longer on the realm of philosophy and the solution tend to seem extremely obvious after 1 or 2 generations.

Thus, low hanging fruits in philosophy are nowhere to be found. Most of your examples were already found, they just need to be worked on. I chalenge you to present a yet unknown low hanging fruit, one all your peers don't know it already, one which would knock Nick's socks off.

I will second this. It's not that the process of theoretical discovery is inefficient due to any fault of its own, it's that the problem is intractable (e.g. we don't know how to do better than exhaustive search). So that linear looking search path from concept A to concept B did not take linear time to find...

3lukeprog7yI didn't focus my post on this, but since you object, I'll attempt to support my claim that "there are often large, unnecessary delays in theoretical discovery." What I'm saying is that we can do this search in less time than we typically do. That shouldn't be surprising: there are all kinds of inefficiencies in theoretical discovery. Young researchers are incentivized to work on things that will get them tenure quickly, they often choose subject areas due to practical career concerns or personal interest rather than likely fruitfulness or social importance, they often pursue research out of momentum rather than due to an EV calculation, etc. This is all quite understandable, but it does introduce inefficiencies. Do you disagree?
1IlyaShpitser7yNo, these are all spot-on criticisms (but I don't think they are specific to theoretical research). I certainly agree that many problems of mainstream academia can be solved via the older patronage model, or perhaps even the newer crowdsourcing model. I guess it is not clear to me that the failures in the OP's list are due to a structural fault, or due to more excusable issues, like: (a) "the scholarship coordination problem" (stuff gets forgotten and rediscovered over and over again, people don't read other disciplines, etc.) (b) the standard exponential search for insight
1lukeprog7yYeah, as I said to joaolkf [http://lesswrong.com/lw/iyo/the_inefficiency_of_theoretical_discovery/a06t], I think my title was misleading: I wasn't trying to contrast theoretical research with empirical research, but merely to look at these inefficiencies in the context of theoretical research, since that's what MIRI does. And you're probably right that for any given example of apparently unnecessary delay in progress, it can be pretty hard to tell which inefficiencies deserve the most blame.
0Kaj_Sotala7yNote that singling this out as a reason for inefficiency feels somewhat contradictory with the OP, where you suggested that there being no obvious profitable applications in the near-term was a reason for inefficiency. If people were choosing subjects areas based on "likely fruitfulness", then we should expect areas with useful near-term applications to be prioritized.
0joaolkf7yI would not disagree. But then the claim seems trivial. Your comment's second paragraph also applies to scientific research. Worse, in more applied areas graduate students have much less freedom to choose their own research topic and they seem to have a higher degree of overall social conformity. We can then reinstate the question of what exactly about theoretical or philosophical research that makes it so particularly slow and unproductive. I say it is a huge, unknown search space, with no good search process and the fact that every time we find something, we lose it to another area. (Plus the fact that in the past our claims had political/practical consequences, as MIRI's might have)
2lukeprog7yOh I see what's happening. Sorry, I think my title was accidentally misleading. My post wasn't trying to contrast the efficiency of theoretical research vs. empirical research. I just wanted to talk about those inefficiencies in the context of theoretical work specifically, since that's what MIRI does. (E.g. I wanted to focus on examples from theoretical research.) Anyway, the point about the large search space is an important one, and I hadn't been thinking of the inefficiencies coming from political consequences until you mentioned it.
0joaolkf7yA book about Einstein and Godel claims both of them were able to identify a problem that became suddenly relevant and trackable due to other developments. I think there are certain 'game changers' that reshape discovery space producing low-hanging fruits. But, I do not think these low hanging fruits stay there for long. The possibility of AGI and X-Risks made some of your examples relevant, and they were addressed shortly after those game changers arose. But otherwise, some of your points seem similar to those on the Einstein-Godel book I read.
0lukeprog7yWhich book?
0joaolkf7yhttp://www.ams.org/notices/200707/tx070700861p.pdf [http://www.ams.org/notices/200707/tx070700861p.pdf] It is more of a biography of their friendship. I don't think is worth reading. I almost summarized all his conclusions of the matter, except he applies it in more detail to history of science.
0joaolkf7yOk. But you did say relative inefficiency. Relative to what? And still, I think many of your low hanging fruits were retrospective. I'm not sure that they were really obviously important and easy to obtain before, say, 1995. One easy fix would be to just could about inviting some young possibly relevant philosophers for dinner and saying "do you see these 2 equally fun abstract problems? this one is more relevant because impacts the future of humanity!"
1lukeprog7yRelative to financial markets, to which I was analogizing.
2joaolkf7yOk, then the mistaken interpretation was my fault, you weren't relevantly using the theoretical/applied dimension anywhere. About decision theory. Perhaps utility maximizers were pulled towards game theory and thence economics and more narrow minded areas, while decision theory end up being maximized for oddness sometimes. That is, people who could attend to low hanging fruits were on areas where the background assumptions were unpopular,while people who could - perhaps - understand the background assumptions couldn't care less for utility.
2joaolkf7yIt would be strange if all the greatest minds of human history had indeed merely muddle in the swamp while ignoring all those beautiful low hanging fruits. Even great scientists or mathematicians often produce absurdities when delving into philosophy. Either this is a task completely useless and difficult, or useful and difficult. But theoretical discoveries don't seem easy at all. Optimistically, I believe there's a massive hindsight bias. But if this is true, philosophy is indeed a sad craft, gems continuously slip from our hands, while he are left with nothing but mud. On the other hand, I must say sometimes I feel we are slow and stubborn independently of the difficult nature of the huge search space. Perhaps my comment bellow is evidence of that [http://lesswrong.com/r/discussion/lw/iyo/the_inefficiency_of_theoretical_discovery/a05a] . I remember one time young-Nick said he convinced a big time philosopher of some point, I said "Well, that's it, you should erase everything on your CV and state just that. Forget your fancy PhD, convincing a philosopher triumphs all!". Old-Nick once told sometimes he felt philosophers where only reinstating some long held "truth" they had since early on, and they would build their publications, careers and life over it. How can one be so pathological stubborn? The weird part is that even though is difficult, hard to advance, its few breakthroughs not acknowledged, relatively low paying, it still is the most competitive academic career, by far. I say all of this, yet I'm pulling all nighters since January to mildly increase my odds of getting in a good philosophy PhD. Maybe only an anthropological/psychiatric study to find out what hell it's wrong with it. I'm sorry if I'm inadvertently psychologising the question. I have been around our kind from birth and can't help but relying on personal experience. Can't help also remembering the meme "It's a dirty job, but somebody's gotta do it". Is it some kind of necessary evil? Or
0ChristianKl7yWhich breakthrough did philosophy produce that aren't acknowledged?
0joaolkf7yScientific method, reason, utilitarianism, logic, subjective and objective probability. Although if asked some well-educated people would concede these might have come from philosophy, they often will still see philosophy as a failed, diseased, mislead and/or useless enterprise instead as one of the most fundamental and useful fields. A common overlapping pattern is to agree with a subset of philosophy's claims, say there's nothing more to be discussed and that, hence, philosophy is useless.
0somervta7yI would love to see a justification of 'reason', myself. What work(s) would you point to as having made the breakthrough on reason?
2joaolkf7ytaps out for now
0ChristianKl7yThere a lot of basic work in proability came from mathematicians like Bernoulli and Laplace. The same goes for the "scientific method". Most scientists just do whatever they feel make sense and that let's them use their toys. Can you point to breakthroughs by academic 20th/21th century philosophers?
2joaolkf7ytaps out for now

Luke,

I think you are mistaken about the relative efficiency / inefficiency of scientific research. I believe that research is comparably efficient to much of industry, and that many of the things that look like inefficiencies are actually trading off small local gains for large global gains. I've come to this conclusion as the result of years of doing scientific research, where almost every promising idea I've come up with (including some that I thought quite clever) had already been explored by someone else. In fact, the typical case for when I was able to make progress was when solving the problem required a combination of tools, each of which individually was relatively rare in the field.

For instance, my paper on stochastic verification required: (i) familiarity of sum-of-squares programming; (ii) the application of supermartingale techniques from statistics; and (iii) the ability to produce relatively non-trivial convex relaxations of a difficult optimization problem. In robotics, most people are familiar with convex optimization, and at least some are familiar with sum-of-squares programming and supermartingales. In fact, at least one other person had already published a fairl... (read more)

I agree that Luke here overstates the significance of my result, but I do think you miss the point a bit or are uncharitable. Regardless of whether making predictions about your own behavior is fundamentally difficult, we don't yet understand any formal framework that can capture reasoning of the form “my decisions are good because my beliefs correspond to reality.” Assuming there is a natural formal framework capturing human reasoning (I think the record so far suggests optimism) then there is something interesting that we don’t yet understand. It seems like you are applying the argument: “We know that humans can do X, so why do you think that X is an important problem?” The comment about undecidability issues not applying in practice also seems a bit unfair; for programs that do proof search we know that we cannot prove claims of the desired type based on simple Godelian arguments, and almost all interesting frameworks for reasoning are harder to prove things about than a simple proof search. (Of course the game is that we don’t want to prove things about the algorithms in question, we are happy to form justified beliefs about them in whatever way we can, including inductive infe... (read more)

0[anonymous]7yAnd the question is: who cares? The mechanism by which human beings predict their future behavior is not logical inference. Similar ad-hoc Bayesian extrapolation techniques can be used in any general AI without worry about Löbian obstacles. So why is it such a pressing issue? I don't wish to take away from the magnitude of your accomplishment. It is an important achievement. But in the long run I don't think it's going to be a very useful result in the construction of superhuman AGIs, specifically. And it's reasonable to ask why MIRI is assigning strategic importance to these issues.
0jsteinhardt7yI agree with this. Luke seems to be making a much stronger claim than the above, though. I agree that that would be a bad argument. That was not the argument I intended to make, though I can see why it has been interpreted that way and I should have put more effort into explaining myself. I am rather saying that human reasoning looks so far away from even getting close to running into issues with Godel / Lob, that it seems like a rather abstruse starting point for investigation. The rest of your comment seems most easily discussed in person, so I'll do that and hopefully we'll remember to update the thread with our conclusion.
0lukeprog7yWhat makes you say that? Did you see what I said about this here [http://lesswrong.com/lw/iyo/the_inefficiency_of_theoretical_discovery/a091]?
8[anonymous]7yAs someone with a reasonable acquaintance with program analysis, synthesis, and semantics... YIKES. Rice's Theorem is, so to speak, the biggest, nastiest rock we semantics folks have to crawl around on a regular basis. The way we generally do it is by constructing algorithms, semantic frameworks, and even entire programming languages in which undecidability cannot happen in the first place, thus restricting ourselves to analyzing something less than the set of all possible programs. Now, admittedly, in practice we've made a lot of progress this way, because in practice there are really four kinds of programs: ones that provably terminate by design, ones that provably don't terminate by design, provably undecidable programs (usually programs that rely on the halting behavior of some other program or logic for their own halting behavior), and just plain messed-up what-the-fuck programs. The last kind are mostly created only by mistake. The third kind come up in program analysis and semantics, but we can usually construct a proof that we're dealing with a formally undecidable problem there and set reasonable hard bounds on length of iteration or depth of recursion (or even find decidable subclasses of these problems that are decently useful to real people). The second kind is handled by writing coterminating programs over codata. The first kind is handled by writing terminating programs over data. Undecidability issues do come up in practice, and the current research frontier (MIRI's Lobian paper, Goedel Machines, AIXI) certainly indicates that these issues definitely come up in the study of Universal Artificial Intelligence. However, for most problems below the level of program analysis or universal induction, undecidability issues can be handled or contained productively by research effort.
3Kaj_Sotala7yUsing heuristic methods rather than formal proofs. These already often fail (humans even fail to take the effect of being in a different emotional state [https://www.rci.rutgers.edu/~gbc/psycDM/Loewenstein2005.pdf] properly into account), and that's without having to deal with the effects of radical self-modifications which might impact your whole reasoning and motivational system.
1jsteinhardt7yI don't think this is sufficient to dismiss my example. Whether or not we prove things, we certainly have some way of reasoning at least somewhat reliably about how we and others will behave. It seems important to ask why we expect AI to be fundamentally different; I don't think that drawing a distinction between heuristics and logical proofs is sufficient to do so, since many of the logical obstacles carry over to the heuristic case, and to the extent they don't this seems important and worth grappling with. Also note that, even if you did think it was sufficient, I gave you another example that was based purely in the realm of formal logic.

Jacob, have you seen Luke's interview with me, where I've tried to reply to some arguments of the sort you've given in this thread and elsewhere?

I don't think [the fact that humans' predictions about themselves and each other often fail] is sufficient to dismiss my example. Whether or not we prove things, we certainly have some way of reasoning at least somewhat reliably about how we and others will behave. It seems important to ask why we expect AI to be fundamentally different; I don't think that drawing a distinction between heuristics and logical proofs is sufficient to do so, since many of the logical obstacles carry over to the heuristic case, and to the extent they don't this seems important and worth grappling with.

Perhaps here is a way to get a handle on where we disagree: Suppose we make a whole-brain emulation of Jacob Steinhardt, and you start modifying yourself in an attempt to achieve superintelligence while preserving your values, so that you can save the world. You try to go through billions of (mostly small) changes. In this process, you use careful but imperfect human (well, eventually transhuman) reasoning to figure out which changes are sufficiently safe to ... (read more)

5jsteinhardt7yGlad to see you're doing well, Benja :) Here's a concrete way you could try to get stable self-modification: Suppose for concreteness that we have a C program, call it X, and that within the C program there is an array called "world_state" of length M and a floating point number called "utility". A simple instantiation of X would look something like: while(true){ action = chooseAction(worldState); world_state = propgateWorldState(worldState, action); utility = calculateUtility(worldState); } We would like to consider modifications to X where we replace chooseAction with some new method chooseAction2 to get a program X2. Suppose we want to ensure some condition such as: from the current world state, if we use X2 instead of X, then after some finite period of time the sequence of utilities we get from using chooseAction2 will always be larger than the corresponding sequence if we have used chooseAction. Abusing notation a bit, this is the same as verifying the statement: "there exists N such that for all n > N, utility2(n) > utility(n)" [although note that utility2 and utility have fairly complicated descriptions if you actually try to write them out]. Now I agree that reasoning about this for arbitrary choices of chooseAction and chooseAction2 will be quite difficult (probably undecidable although I haven't proved that). But the key point is that I get to choose chooseAction2, and there are many decision procedures that can prove such a statement in special cases. For instance, I could partition the space of world states into finitely many pieces, write down a transition function that over-approximates the possible transitions (for instance, by having a transition from Piece1 to Piece2 if any element of Piece1 can transition to any element of Piece2). Then I only need to reason about finite automata and those are trivially decidable. You could argue that this proof system is fairly weak, but again, the AI gets to tailor its choices of chooseAction2 to be easy

I thought the example was pretty terrible.

Glad to see you're doing well, Benja :)

Sorry for being curmudgeonly there -- I did afterwards wish that I had tempered that. The thing is that when you write something like

I also agree that the idea of "logical uncertainty" is very interesting. I spend much of my time as a grad student working on problems that could be construed as versions of logical uncertainty.

that sounds to me like you're painting MIRI as working on these topics just because it's fun, and supporting its work by arguments that are obviously naive to someone who knows the field, and that you're supporting this by arguments that miss the point of what MIRI is trying to say. That's why I found the example of program analysis so annoying -- people who think that the halting problem means that program analysis is impossible really are misinformed (actually Rice's theorem, really, but someone with this misconception wouldn't be aware of that), both about the state of the field and about why these theorems say what they say. E.g., yes, of course your condition is undecidable as long as there is any choice f(s) of chooseAction2(s) that satisfies it; proof: le... (read more)

8jsteinhardt7yDon't worry, I wasn't offended :) I don't think that MIRI is working on these topics just because they are fun, and I apologize for implying that. I should note here that I respect the work that you and Paul have done, and as I said at the beginning I was somewhat hesitant to start this discussion at all, because I was worried that it would have a negative impact on either you / Paul's reputation (regardless of whether my criticisms ended up being justified) or on our relationship. But in the end I decided that it was better to raise my objections in fairly raw form and deal with any damage later. What I would say is that the arguments start to look really fishy when one thinks about concrete instantiations of the problem. I'm not sure I understand what you're saying here, but I'm not convinced that this is the sort of reasoning I'd use. It seems like Paul's argument is similar to yours, though, and I'm going to talk to him in person in a few days, so perhaps the most efficient thing will be for me to talk to him and then report back. I don't think that "whole brain emulations can safely self-modify" is a good description of our disagreements. I think that this comment (the one you just made) does a better job of it. But I should also add that my real objection is something more like: "The argument in favor of studying Lob's theorem is very abstract and it is fairly unintuitive that human reasoning should run into that obstacle. Standard epistemic hygiene calls for trying to produce concrete examples to motivate this line of work. I have not seen this done by MIRI, and all of the examples I can think of, both from having done AI and verification work myself, and from looking at what my colleagues do in program analysis, points in the squarely opposite direction." When I say "failure to understand the surrounding literature", I am referring more to a common MIRI failure mode of failing to sanity-check their ideas / theories with concrete examples / evidence. I
6Benya7yGood to hear, and thanks for the reassurance :-) And yeah, I do too well know the problem of having too little time to write something polished, and I do certainly prefer having the discussion in fairly raw form to not having it at all. I'm not really sure what you mean by a "concrete instantiation". I can think of concrete toy models, of AIs using logical reasoning which know an exact description of their environment as a logical formula, which can't reason in the way I believe is what we want to achieve, because of the Löbian obstacle. I can't write down a self-rewriting AGI living in the real world that runs into the Löbian obstacle, but that's because I can't write down any AGI that lives in the real world. My reason for thinking that the Löbian obstacle may be relevant is that, as mentioned in the interview [http://intelligence.org/2013/08/04/benja-interview/] , I think that a real-world seed FAI will probably use (something very much like) formal proofs to achieve the high level of confidence it needs in most of its self-modifications. I feel that formally specified toy models + this informal picture of a real-world FAI are as close to thinking about concrete instantiations as I can get at this point. I may be wrong about this, but it seems to me that when you think about concrete instantiations, you look towards solutions that reason about the precise behavior of the program they're trying to verify -- reasoning like "this variable gets decremented in each iteration of this loop, and when it reaches zero we exit the loop, so we won't loop forever". But heuristically, while it seems possible to reason about the program you're creating in this way, our task is to ensure that we're creating a program which creates a program which creates a program which goes out to learn about the world and look for the most efficient way to use transistors it finds in the external environment to achieve its goals, and we want to verify that those transistors won't decide to
4jsteinhardt7yI don't have time to reply to all of this right now, but since you explicitly requested a reply to: The answer is yes, I think this is essentially right although I would probably want to add some hedges to my version of the statement (and of course the usual hedge that our intuitions probably conflict at multiple points but that this is probably the major one and I'm happy to focus in on it).
0Benya7yThanks!
2V_V7yThought experiment: We have an AI which controls a robotic arm inside a box. The box also contains various cooking tools and ingredients to make cake, and a big red button that kills a puppy. We prefer to cake to no cake, and we prefer our cake to be delicious and moist cake, but above all we prefer the companion cu-... puppy to stay alive. Therefore, we implement in the AI a "puppy safety module" (PSM) which vetoes any course of action proposed by the planning module if it determines that there is any non-negligible probability of the red button being depressed. When the PSM can't make an accurate prediction it will always err on the safe side: things like throwing tools on the box walls where they can bounce unpredictably and hit the red button don't get approved. Athena, the Greek goddess of wisdom, justice and math, came down from Mt. Olympus and debugged the code for PSM. She was also so kind to check all the compilers, OS and hardware for bugs and glitches. Do you agree that with this AI the puppy is most likely safe? Now, let's assume that the cakes made by this AI aren't very good. We don't really know how to program it to make better cakes, we try our best reinforcement learning algorithms but it doesn't improve much. Then somebody suggest we try to allow the AI to modify it own code: In addition to sending motor commands to the arm, the planning module can also edit a copy of the AI code and eventually replace the whole program with this successor. Of course, these actions must still be approved by the PSM, but they are otherwise unrestricted: in principle they can modify the PSM, or add a bypass, or get rid of it entirely. Do you think that this AI is unsafe for the puppy? Do you think it is safe but unable to improve itself in effective ways?
2Benya7ySince the PSM was designed without self-modification in mind, "safe but unable to improve itself in effective ways". (Not sure how this thought experiment helps the discussion along.)
2V_V7yCan you please motivate? Suppose that in the recesses of the code there is an instantiation of the bubble sort [http://en.wikipedia.org/wiki/Bubble_sort#In_practice] algorithm. The planner proposes to change it with, say, merge sort [http://en.wikipedia.org/wiki/Merge_sort]. Do you think that the PSM would generally disapprove such change? Do you think it would approve it, but it would still be unable to approve modifications that would be needed for significant improvement?
0V_V7yMIRI claim to have had an important insight on AI design (this so called "Löbian obstacle") that experts in relevant fields (AI, model checking, automated theorem proving, etc.) didn't have. MIRI attempted to communicate their insight, but so far the experts have mostly ignored MIRI claims or denied that they are likely to be important and relevant. What is the most likely explanation for that? It seems that we can narrow it to two hypotheses: A) MIRI's insight is really relevant and important to AI design, but communication with the experts failed because of some problem on MIRI's side, or on the experts' side (e.g. stubbornness, stupidity) or both (e.g. excessively different backgrounds). B) MIRI is mistaken about the value of their insight (possible psychological causes may include confirmation bias, Dunning–Kruger effect, groupthink, overconfident personalities, etc.). I would say that, barring evidence to the contrary, hypothesis B is the most likely explanation.
4asr7yI don't believe these options are exhaustive. "Relevant and valuable" are subjective and time-varying. The published work might be interesting and useful down the line, but not help the problems that AI researchers are working on right now. It usually takes a few years for the research community to assimilate strange new ideas -- sometimes much more than a few years. This isn't due to a failure on anybody's part -- it's due to the fact that scientists pick problems where they have a reasonable prospect of success within a few years. I would give MIRI at least a decade or two before evaluating whether their work had any mainstream traction.
0V_V7yMIRI stated goals are similar to those of mainstream AI research, and MIRI approach in particular includes as subgoals the goals of research fields such as model checking and automated theorem proving. Do you claim that MIRI is one or two decades ahead of mainstream researchers? If the answer is no, then how does MIRI (or MIRI donors) evaluate now whether these lines of work are valuable towards advancing their stated goals?
5asr7yResearch has both ultimate goals ("machines that think") and short-term goals ("machines that can parse spoken English"). My impression is that the MIRI agenda is relevant to the ultimate goal of AI research, but has only limited overlap with the things people are really working on in the short term. I haven't seen MIRI work that looked directly relevant to existing work on theorem proving or model checking. (I don't know much about automated theorem proving, but do know a bit about model checking.) It's not a matter of "ahead". Any research area is typically a bunch of separate tracks that proceed separately and eventually merge together or have interconnections. It might be several decades before the MIRI/self modifying AI track merges with the main line of AI or CS research. That's not necessarily a sign anything is wrong. It took decades of improvement before formal verification or theorem proving become part of the computer science toolkit. I would consider MIRI a success if it follows a similar trajectory. I can't imagine any really credible assurance that "this basic research is definitely useful," for any basic research. The ultimate goal "safe self modifying AI" is too remote to have any idea if we're on the right track. But if MIRI, motivated by that goal, does neat stuff, I think it's a safe bet that (A) the people doing the work are clueful, and (B) their work was at least potentially useful in dealing with AI risks. And potentially useful is the best assurance anybody can ever give. I'm a computer systems guy, not a theorist or AI researcher, but my opinion of MIRI has gradually shifted from "slightly crankish" to "there are some interesting questions here and MIRI might be doing useful work on them that nobody else is currently doing." My impression is a number of mainstream computer scientists have similar views. Eliezer recently gave a talk at MIT. If the audience threw food at the stage, I would consider that evidence for MIRI being crankish.
0Benya7yIt's definitely not a goal of mainstream AI, and not even a goal of most AGI researchers, to create self-modifying AI that provably preserves its goals. MIRI's work on this topic doesn't seem relevant to what mainstream AI researchers want to achieve. Zooming out from MIRI's technical work to MIRI's general mission, it's certainly true that MIRI's failure to convince the AI world of the importance of preventing unFriendly AI is Bayesian evidence against MIRI's perspective being correct. Personally, I don't find this evidence strong enough to make me think that preventing unFriendly AI isn't worth working on. Also, two more points why MIRI isn't that likely to produce research AI researchers will see as a direct boon to their field: One, stuff that's close to something people are already trying to do is more often already worked on; the stuff that people aren't working on seem more important for MIRI to work on. And two, AGI researchers in particular are particularly interested in results that get us closer to AGI, and MIRI is trying to work on topics that can be published about without bringing the world closer to AGI.
0lukeprog7yI wouldn't say MIRI has tried very hard yet to communicate about the Lobian obstacle to people in the relevant fields. E.g. we haven't published about the Lobian obstacle in a journal or conference proceedings. Part of the reason for that is that we don't expect experts in these fields to be very interested in it. The Lobian obstacle is aiming at better understanding of strongly self-modifying systems, which won't exist for at least 15 years, and probably longer than that.
0asr7yI agree the AI community won't be very interested. But it might be worth sending it to some theoretical computer science venue -- STOC, say -- instead. If nothing else, it would give useful information about how receptive academics are to the topic.
0[anonymous]7yI look forward to a clear, detailed explanation of MIRI's thinking on this subject. In particular this counter-intuitive result: deserves some technical elaboration.
0Benya7yMark, have you read Eliezer's article about the Löbian obstacle [http://intelligence.org/files/TilingAgents.pdf], and what was your reaction to it? I'm in the early stages of writing up my own work on the Löbian obstacle for publication, which will need to include its own (more condensed, rather than expanded) exposition of the Löbian obstacle; but I liked Eliezer's article, so it would be helpful to know why you didn't think it argued the point well enough.
2[anonymous]7yI have, although formal logic is not my field so please excuse me if I have misunderstood it. Eliezer does not demonstrate that overcoming the Löbian obstacle is necessary in the construction of tiling agents, he rather assumes it. No form of program verification is actually required, if you do not use the structure of a logical agent. Consider, for example, the GOLUM architecture[1] which is a form of tiling agent that proceeds by direct experimentation (simulation). It does not require an ability to prove logical facts about the soundness and behavior of its offspring, just an ability to run them in simulation. Of course logical program analysis helps in focusing in on the situations which give rise to differing behavior between the two programs, but there are no Gödelian difficulties there (even if there were you could fall back on probabilistic sampling of environments, searching for setups which trigger different results). The MIRI argument, as I understand it is: “a program which tried to predict the result of modifying itself runs into a Löbian obstacle; we need to overcome the Löbian obstacle to create self-modifying programs with steadfast goal systems.” (I hope I am not constructing a strawman in simplifying it as such.) The problem comes from the implicit assumption that the self-modifying agent will use methods of formal logic to reason about the future actions of its modified self. This need not be the case! There are other methods which work well in practice, converge on stable solutions under the right circumstances, and have been well explored in theory and in practice. I'm reminded of the apocryphal story of two space-age engineers that meet after the fall of the Soviet Union. The American, who oversaw a $1.5 million programme to develop the “Astronaut Pen” which would write in hard vacuum and microgravity environments, was curious to know how his Russian counterpart solved the same problem. “Simple,” he replied, “we used a pencil.” You could ex
0Will_Sawin7yDoesn't the non-apocryphal version of that story have some relevance? http://en.wikipedia.org/wiki/Space_Pen [http://en.wikipedia.org/wiki/Space_Pen] http://www.snopes.com/business/genius/spacepen.asp [http://www.snopes.com/business/genius/spacepen.asp] Using a space pencil could cause your spaceship to light on fire. Sometimes it pays to be careful.
4Kaj_Sotala7yI don't think the argument is that AI would be fundamentally different, but rather that "we can reason at least somewhat reliably when making predictions of agents who don't drastically self-modify, and of whom we have thousands of years of data to help build our predictions on" isn't good enough to deal with the case of a drastically self-modifying agent that could exhibit entirely novel behavior and cognitive dynamics even if it wasn't capable of self-modifying. "Somewhat reliably" is fine only as long as a single failure isn't enough to throw all the rest of your predictions to the trash bin. I don't know enough about your second example to feel confident commenting on it.
5jsteinhardt7yHumans seem pretty good at making correct predictions even if they have made incorrect predictions in the past. More generally, any agent for whom a single wrong prediction throws everything into disarray will probably not continue to function for very long. Fair enough. This is an admirable habit that is all too rare, so have an upvote :).
3Kaj_Sotala7yThat's basically my point. A human has to predict the answer to questions of the type "what would I do in situation X", and their overall behavior is the sum of their actions over all situations, so they can still get the overall result roughly correct as long as they are correct on average. An AI that's capable of self-modification also has to predict the answer to questions of the type "how would my behavior be affected if I modified my decision-making algorithm in this way", where the answer doesn't just influence the behavior in one situation but all the ones that follow. The effects of individual decisions become global rather than local. It needs to be able to make much more reliable predictions if it wants to have a chance of even remaining basically operational over the long term. Thanks. :)
0jmmcd7yAnd more important, its creators want to be sure that it will be very reliable before they switch it on.
2JGWeissman7yThese examples involve predictions generated by processes which are not purely logical systems, and which we don't understand enough to code into an AI. So it seems like Paul's idea could be progress towards having a process that makes such predictions about itself that we can understand at that level.
0jsteinhardt7yCan you clarify in what sense you think a computer program is not a purely logical system? Or am I misunderstanding you? ETA: In particular, I'm referring to the point where I said:
0JGWeissman7yI didn't meant that the computer program is not a purely logical system, but that the people proving facts about its behavior aren't purely logical systems.
0jsteinhardt7yProgram analysis consists of writing computer programs that reason about other computer programs. Is the objection that these programs were written by a human? That seems like a strange objection to me if so.
2JGWeissman7yOk, then I did not understand exactly what you meant, but I still don't think this is a counterexample to the problem Paul's idea tries to get around. The problem is that logical systems have problems reasoning about their own behavior, not a claim that there is no other logical system that can reason about them. In particular, we are interested in if an optimization process can model itself as an optimization process, accurately predicting that its future decisions are likely to achieve outcomes that score well on its optimization criteria and the score will be better if it has more resources, and will become much worse if its representation of its criteria gets corrupted, (all using abstract reasoning in much less time than fully simulating its future decisions). Can program analysis do that?
2jsteinhardt7yETA: Also, I should note that this is a good question and I'm glad you asked it! If your question is whether a program analyzer can, when given itself as input, produce sensible results, the answer is yes. Program analyzers are meant to run on arbitrary code, so in particular they can be run on themselves as a special instance. (Actually, nothing particularly special happens in this case as far as I can tell.) Now, a key point is the formalism we are working in: a program analyzer takes in a program P and some specification S, and checks whether P obeys specification S (for instance, checking that it never accesses memory before it allocates it). Importantly, P is allowed to do any 3 of the following: * report that P satisfies S (with a proof) * report that P does not satisfy S (with a counterexample) * give up (i.e. report "unsure") So a key question is how often it gives up! But while there are many instances where modern program analysis tools do give up, there are also many where they don't. Furthermore if you are an AI and you propose a modification to your code, and your program analysis subroutine reports "unsure", you are free (and would be wise) to try a different modification instead. Researchers in the field of program analysis are extremely cognizant of the halting problem (which is closely related to Lob's theorem) and typically deal with it by over-approximating, e.g. by identifying conditions that would be sufficient for halting, although not necessarily necessary. As a result one obtains solubility at the cost of precision (although note that the approximation is always sound: if we report that P satisfies S, then P does indeed always satisfy S).
1JGWeissman7yThis is a good approach for dealing with the halting problem, but I think that Lob's theorem is not so closely related that getting around the halting problem means you get around Lob's theorem. The theoretical AI that would run into Lob's theorem would need more general proof producing capability than these relatively simple program analyzers. It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by supplying basic facts that the program can check. For example, I have a library that produces .Net assemblies, with byte code that targets a stack based virtual machine, and I want to verify that instructions to read elements off the stack get elements of the type is expecting. So I wrote my library to keep track of types that could be written to the stack at any execution point (represented by a stack of types). It is straightforward to compute the possible stack states after each instruction, given that instruction and the previous possible stack trace, and determine if the instruction is legal given the previous state (well, branching makes it more complicated, but not too much). So, in this case, I came up with the structure of tracking possible states after each instruction, and then it was straightforward to write my program to fill in that structure, but I did not, and don't know how to, write my program to come up with the proof structure. It is therefor easy to be confident that proof will have nice properties, because the space of possible proofs with this structure is much smaller than the space of all possible proofs. The theoretical AI that would run into Lob's theorem would be able to come up with proof structures, as an AI that could only use proof structures prepackaged by human programmers would have huge gaps in its capabilities. This may introduce difficulties not seen in simple program ana
0V_V7yI'm not sure I've understood what you have in mind here, but in the general case complete type checking in .NET (that is, proving that an assembly not only is syntactically well-formed but also never throws type-related exceptions at runtime) is undecidable because of Rice's theorem. In the general case, complete type checking is as difficult as proving arbitrary claims in first-order logic.
0JGWeissman7yI am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement) with a particular proof strategy that covers a huge space of useful, nicely structured programs. See also jsteinhardt's comment I was responding to, which discussed getting around the halting problem by allowing the checker to say "I don't know".
0V_V7yI'm not an expert on .NET, but is there anything that can throw a type exception other than an explicit cast or an explicit throw (or the standard library, I suppose)?
0JGWeissman7yThere are sequences of .Net instructions that result in the runtime throwing type exceptions, because it tries to read a value of a certain type of the stack, and it gets an incompatible value. This is the situation that my verifier guards against. The standard .Net runtime also includes a verifier that checks the same thing, and it will not run code that fails this validation unless it is explicitly trusted. So a verifiable .Net assembly will not throw type exceptions without an explicit cast or throw, but an arbitrary assembly may do so. The compilers for the standard languages such as C# will generally only produce verifiable assemblies unless you explicitly mark parts of the source code as "unsafe", and unsafe, or unverifiable assemblies need special permissions to run at all. (There are other properties that both my verifier and the standard verifier check for. The reason I wrote my own is that it produces much more informative descriptions of problems it finds, and it is integrated into my assembly emitting libraries, so it detects problems as the assembly to be emitted is defined, and when run in the debugger, will easily show the compiler code and execution state that caused the problem.)
0V_V7yThanks for the clarification. IIUC, unverifiable code does not, or at least is not guaranteed to, politely throw an exception should a type error occur. It may crash the runtime or fail silently leaving the application in an incorrect state. Ok. I thought that you were considering assemblies that passed the standard .NET verification and you were trying to check for some stronger property (such as absence of runtime exceptions caused by downcasts). That would have been equivalent to arbitrary first-order logic inference. Since you are instead checking for decidable properties, your system is indeed not equivalent to arbitrary first-order logic inference. But as jsteinhardt says, it is actually possible to write verifiers that attempt to check for undecidable properties, provided that they have the option to give up.
0jsteinhardt7yMy mathematical logic is a bit rusty, but my impression is that the following are true: 1. Godel's theorem presents a strictly stronger obstacle than Lob's theorem. A reflectively consistent theory may still be incomplete, but any complete theory is necessarily reflectively consistent. 2. The undecidability of the halting problem is basically Godel's theorem stated in computational terms. If we could identify a subset L of Turing machines for whom the halting problem can be decided, as long as it was closed under operations such as inserting a (non-self-referential) sub-routine, then we would be able to verify any (non-self-referential) property of the program that was also expressible in L. This is a sketch of a claim rather than an actual claim that I've proved, though. Finally, I think it's worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides [http://www.easychair.org/FLoC-06/aiken_sspv_floc06.pdf] are a good example of such a tool. At a high level, it gets around the problems you are worried about by constructing an over-approximation of the halting problem that is expressible in propositional logic (and thus decidable, in fact it is even in NP). More generally we can construct a sequence of approximations, each expressible in propositional logic, whose conjunction is no longer an approximation but in fact exactly the original statement.
0JGWeissman7yWhy do you say that? My understanding is that Godel's theorem says that a (sufficiently powerful) logical system has true statements it can't prove, but these statements are excessively complicated and probably not important. Is there some way you envision an AGI being limited in its capacity to achieve its goals by Godel's theorem, as we envision Lob's theorem blocking an AGI from trusting its future self to make effective decisions? (Besides where the goals are tailored to be blocked by the theorem: "Prove all true statements in my formal system") As near as I can tell, this is more powerful than other static analysis tools that I have seen (though maybe not, the Google Web Toolkit optimizer is pretty impressive), but it is well within what I expect to be possible, and doesn't get around Lob's theorem. (I can't be too confident in this assessment of its power because I don't see a clear claim of what sort of statements it can prove or how it works except that it seems to detect when inputs to a statement may have invalid values and that it uses brute force techniques to analyze functions and then associates a summary of the analysis with that function (constraints on valid inputs and guarantees on outputs?) so that its analysis of call sites can use the summary.) (This sort of program analyzer is interesting in its own right, and I would like to see a more complete explanation of it, but I don't think it's existence says anything about the problems posed by Lob's theorem.)
0jsteinhardt7yDo you agree or disagree that complete implies reflectively consistent? If you agree, then do you agree or disagree that this means avoidance of Godelian obstacles implies avoidance of Lobian obstacles? If you agree with both of those statements, I'm confused as to why: is a controversial statement.
0JGWeissman7yAh, so by "Godel's theorem presents a strictly stronger obstacle than Lob's theorem" you mean if you overcome Godelian obstacles you also overcome Lobian obstacles? I think I agree, but I am not sure that it is relevant, because the program analyzer examples don't overcome Godelian obstacles, they just cope with the Godelian obstacles, which does not similarly imply coping with or overcoming Lobian obstacles.
-2[anonymous]7yI'm guessing that he is using the standard AI terminology of a "logical program" as one which reasons by means of logical inference over a knowledgebase. This is not how human minds work, nor do many AI researchers view it as a viable path forward. The types of programs which have seen success in the AI revolution that is currently going on, often are not amenable to program analysis. They very often fall in the "undecidable" category, at least with respect to their anticipated behavior.
2asr7yThis depends which properties you care about. I suspect there isn't a small model of the AI program that duplicates its output behavior, but you can do a lot with off-the-shelf analysis. It's relatively easy to prove things like "this program will only ever use the following subset of available system calls," or "this program is well typed", or "the program cannot modify itself", or "can only modify these fixed specific parts of its state." I can also imagine a useful AI program where you can prove a bound on the run-time, of the form "this program will always terminate in at most X steps", for some actual fixed constant X. (Obviously you cannot do this for programs in general, but if the program only loops over its inputs for a fixed number of iterations or somesuch, you can do it.) These sorts of properties are far from a general proof "this program is Safe", but they are non-trivial and useful properties to verify.
0[anonymous]7yThe FAI-related things you would want to prove are of the form: “when given a world state characterized by X, input percepts characterized by Y, the program always generates outputs characterized by Z.” For many existing and popular AI architectures, we haven't the foggiest idea how to do this. (It's no surprise that Eliezer Yudkowsky favors Pearl's causal probabilistic graph models, where such analysis is at least known to be possible.)
0jsteinhardt7yTo the extent that X, Y, and Z can be written formally within the programming language, program analysis at least in principle is fully capable of proving such a statement. I apologize if this comes off as rude, but your statement that "we haven't the foggiest idea how to do this" is flat-out false. While there are certainly unique challenges to reasoning about the sort of code that gets written in machine learning, it seems to me that the main reason we don't have well-developed analysis tools is that most code doesn't look like machine learning code, and so there has been little pressure to develop such tools.
0jsteinhardt7yProgram analysis consists of writing computer programs that reason about other computer programs. Is the objection that these programs were written by a human? That seems like a strange objection to me if so. Edit: oops, replied to wrong comment.
0jsteinhardt7yCan you give an example? The main reason they are not amenable to program analysis is because the sorts of guarantees they are supposed to satisfy are probabilistic / statistical in nature, and we don't yet have good techniques for verifying such properties. I am pretty sure that the issue is not undecidability. I should also clarify that whether or not JGWeissman thought that I meant "logical program" instead of "computer program", my comment was intended to mean "computer program" and the field of program analysis studies "computer programs", not "logical programs".
0[anonymous]7yOk, throw it back at you: how do you prove the behavior of a deep belief network, or any other type of neural network currently in vogue, short of actually running it? If you do have some way of doing it, can I be coauthor? I didn't mean to imply that it was proved impossible, just that no one has the faintest idea how to do it - and not for lack of trying.
0jsteinhardt7yWhat does "prove the behavior" mean? If you mean "prove that it will classify this next image correctly" then of course the answer is no, although that is because it's not even necessarily true. If you mean "prove that the program's behavior always falls within some well-defined regime", then yes we can absolutely prove that sort of statement. Things that are at the boundary of what we can prove (i.e. I don't know of any existing formal tools that do it, but I'm pretty sure we could make one) would be something like "the weights of the network don't change too much if we change the input a small amount" or "the prediction error on the training set will decrease monotonically" or "the generalization error is bounded by assuming that the test data ends up being drawn from the same distribution as the training data". Presumably for friendliness you want something like "P and P' satisfy some contract relative to each other" (such as sufficiently similar behavior) which is the sort of thing we are already relatively good at proving. Here I'm imagining that P is the "old" version of a program and P' is a "modified" version of the program that we are considering using.
0lukeprog7yI understand "trade small local gains for large global gains" as a prescriptive principle, but does it work as a descriptive hypothesis? Why expect academics to be so much better than philanthropists at cause neutrality? When I speak to academics who aren't also EAs, they are basically never cause neutral, and they even joke about how ridiculously non-cause-neutral everybody in academia is, and how accidental everyone's choice of focus is, including their own.
0jsteinhardt7yI'm not talking about cause neutrality. My point is that even once the general problem has been decided, there are many possible approaches, and academics often do things that seem inefficient but are actually exploring the space of possible approaches (possibly by trying to better understand the objects they are studying).
2lukeprog7yWhat level of "general problem" do you have in mind? To a large degree I'm thinking about things like "Gosh, it took (unnecessary) centuries or decades for researchers to launch subfields to study normative uncertainty and intelligence explosion", and that could be a "lack of cause neutrality" problem. And maybe you're thinking instead on a smaller scale, and want to say something like " Given that people decide to work on X, they're relatively efficient in working on X, and exploring the space within X, even if they're completely missing normative uncertainty and intelligence explosion."
0lukeprog7yThis is an interesting hypothesis, and one I wasn't thinking of. But hard to measure! Out of curiosity, what gives you that impression? I tend to cite it because it is (along with the Lobian cooperation stuff) among the most important results to come out of MIRI's first couple workshops, not because I can already tell whether it's an important breakthrough in mathematical logic in general. As for the purpose and relevance of the Lobian obstacle work, it seems like there might still be a failure of communication there. Since you and Eliezer and I discussed this at length and there still seems to be an unbridged gap, I'm not sure which thing I can say to bridge the gap. Maybe this quote [http://lesswrong.com/lw/hmt/tiling_agents_for_selfmodifying_ai_opfai_2/94cg] from Paul? In the OP I actually gave program equilibrium as an example of new theoretical progress that opens up new lines of inquiry, e.g. the modal agents work (though of course there are other pieces contributing to modal agents, too). So yeah, I don't think the modal agents work is an example of inefficiency. The examples I gave in the OP for apparent inefficiency in decision theory research was philosophy's failure to formulate a reliabilist metatheory of instrumental rationality until 2013, even though reliabilist theories of epistemic rationality have been popular since the late 1960s, and also the apparently slow uptake of causal Bayes nets in the causal decision theory world.
2jsteinhardt7yIn this very post you placed it in a list next to normative uncertainty and the intelligence explosion. The implication seemed obvious to me but perhaps it was unintended. I seem to remember other comments / posts where similar sentiments were either expressed or implied, although a quick search doesn't turn them up, so perhaps I was wrong.
2lukeprog7yYeah, unintended, but I can see why one might infer that. Does my "philosophical edge" comment imply importance to you? I was [http://lesswrong.com/lw/hmt/tiling_agents_for_selfmodifying_ai_opfai_2/992o] merely trying to say that it's philosophical even though I'm thinking of it in terms of AI, and it's not obvious to me, like your first example, why one would read the comment as assigning particular importance to the result.
0jsteinhardt7yI think that the comment that I quoted is not by itself objectionable to me. If that's actually the only example I can come up with, then I think it would be unfair to criticize it, so I will update the parent to remove it.

Regardless of why the opportunity has presented itself, can we hope that the MIRI research team and associated researchers will use (or are using) the fact that "visible progress in decision theory is one way to “make a name” for oneself" and proceed to do so? Seems like pretty low-hanging status-fruit given the team's progress so far.

For MIRI, the hard part is writing up the results in a way that appeals to philosophers. That's a highly specialized skill, and not one we've focused on hiring for (at our current budget). We tried to pay Rachael Briggs $20k to do it, since she had two decision theory papers selected for the Philosopher's Annual, but it was too work-intensive even for her. I think it would drive Eliezer mad to write in that style. I suspect I could do it, but it would take a lot of my time. I might be able to persuade Preston Greene to do it some day. Or maybe Kenny Easwaran, who attended our September 2013 decision theory workshop.

If possible, I'd be curious to hear more details about why Briggs found it too work-intensive. Her giving up on it was definitely not an outcome I would have predicted.

2evgenit7ySeconded, I am also curious about why this is hard/how the style needed differs from how lukeprog and Eliezer write papers.
0satt7ySome [http://lesswrong.com/lw/hbd/new_report_intelligence_explosion_microeconomics/8vio] of [http://lesswrong.com/lw/hbd/new_report_intelligence_explosion_microeconomics/8vvw] the [http://lesswrong.com/lw/hbd/new_report_intelligence_explosion_microeconomics/8wlz] comments [http://lesswrong.com/lw/hbd/new_report_intelligence_explosion_microeconomics/8vqy] on [http://lesswrong.com/lw/hbd/new_report_intelligence_explosion_microeconomics/8vjg] Eliezer's "Intelligence Explosion Microeconomics" suggest that his style might be too digressive & chatty by typical journal standards.

Maybe the difficulties that you face are part of the answer to the question of why theoretical progress doesn't happen faster.

After a paper published in 2011: "[Original draft was available in 2003. Hurrah for academic publishing. One journal reviewed the manuscript for nearly two years before determining that it was too long. No wonder philosophy has not advanced farther in the past 2,500 years.] "

Just wanted to mention that Physics is not immune to this. Bell's theorem requires only a first-year college math skill, yet it took 30 odd years after EPR to formulate it. Not even Einstein himself was able do it. Event horizons and inescapability of singularity required virtually no new math beyond 1917, yet it took some 50 years for the physicists to understand the picture. There are clearly some mental blocks people have which take decades and new generations to overcome.

4[anonymous]7yThe idea of an event horizon goes back to John Michell in 1783. The derivation of a black hole and event horizon from the equations of General Relativity were done by Schwarzschild mere months after the publication of GR in 1915 (get your dates right!). (Cool tidbit: Schwarzschild was an artillery gunner during the Great War, and spent his time looking for solutions to the Einstein field equations when he wasn't calculating trajectories. He published the math behind GR black holes in a letter to Einstein from the front.)
2Alejandro17yWhat shminux means is that, even though the Schwarzschild metric was derived early as you say, its physical interpretation as a black hole was not understood till much later. According to Wikipedia [http://en.wikipedia.org/wiki/Black_hole#Golden_age], it was not until 1958 that the Schwarzschild radius was identified as an event horizon, a surface which causal influences could only cross in one direction. It was also in the 1950s that the maximal extension analytic extension of the Schwarzschild metric was constructed, and it was not till the 1960s that these results became widely known and progress in black hole physics really took off (along with Wheeler coining the term "black hole" sometime around 1964).
0IlyaShpitser7yHere's a test of hindsight bias: QM violates Bell inequalities, but obeys Tsirelson inequalities ( http://arxiv.org/pdf/1303.2849v1.pdf [http://arxiv.org/pdf/1303.2849v1.pdf]). What does that mean? (Not a rhetorical question, I really don't know!) In graphical model terms, Bell inequality violations mean there is no "hidden variable DAG model" underlying what we see. But maybe Tsirelson inequality points to some correct generalization of the "hidden variable DAG" concept to the quantum setting (???). To my knowledge, nobody knows what to make of this, although it doesn't take much math background to understand Tsirelson inequalities. -------------------------------------------------------------------------------- To be a little more precise, I can imagine an object that does not posit anything ontologically beyond the four variables related by the graph: A -> B <-> C <- D The distributions that live in this object will, in general, violate both Bell and Tsirelson inequalities. So this object is "not physical." I can also posit a hidden variable DAG (in other words I posit in addition to A,B,C,D another variable H): A -> B <- H -> C <- D This will obey Bell inequality. So this is "classically physical, but not physical." The question is, what should I posit beyond A,B,C,D to violate Bell, but obey Tsirelson? Whatever it is, it cannot be a hidden variable H. But maybe I can posit something more complicated, or "weird"? But obvious in hindsight?
0shminux7yI am not familiar with using DAG in QM, sorry. Just wanted to mention that you can trade the EPR-style non-locality for macroscopic many worlds. For all its failings, this approach pushes the strangeness of QM into a local event where the branches interact. In the EPR example, it is where you compare the measurement results from the two detectors. Thus it might be more productive to base any DAG model on an MWI picture, or at least on a setup where there are only a finite and small number of branches, not uncountably many of them, like in Schrodinger's cat or EPR, maybe something like this quantum bomb tester [http://en.wikipedia.org/wiki/Elitzur%E2%80%93Vaidman_bomb_tester].
7IlyaShpitser7yThe "non-DAG jargon" question is: "what are the ontological implications of Tsirelson inequalities?" My point is that this has the feel of one of those questions with an answer that will be very obvious (but only in hindsight).
[-][anonymous]7y 7

Charles Fort, Lo!: "If human thought is a growth, like all other growths, its logic is without foundation of its own, and is only the adjusting constructiveness of all other growing things. A tree can not find out, as it were, how to blossom, until comes blossom-time. A social growth cannot find out the use of steam engines, until comes steam-engine-time. For whatever is supposed to be meant by progress, there is no need in human minds for standards of their own: this is in the sense that no part of a growing plant needs guidance of its own devising, ... (read more)

Some examples of a different kind of inefficiency, from AntiFragile:

It struck me how lacking in imagination we are: we had been putting our suitcases on top of a cart with wheels, but nobody thought of putting tiny wheels directly under the suitcase. Can you imagine that it took close to six thousand years between the invention of the wheel (by, we assume, the Mesopotamians) and this brilliant implementation (by some luggage maker in a drab industrial suburb)? And billions of hours spent by travelers like myself schlepping luggage through corridors full

... (read more)
4Lumifer7yFor the perfectly valid reason that tiny wheels are useless unless you have a really even and smooth surface to roll them on. Imagine the usefulness of a wheeled suitcase on farm. Or even on cobblestones -- how long do you think these tiny wheels will survive?
2[anonymous]7yI actually found that out the hard way.
0lukeprog7yWe didn't have even and smooth surfaces to roll them on until 1970 [http://www.nytimes.com/2010/10/05/business/05road.html]?
5Lumifer7yTaleb complains about the gap of 6,000 years.

I think conformity effects play a huge role in this area. The large majority of modern philosophers all have similar educational and cultural backgrounds. They go to elite universities. They read the standard Western philosophical canon. They work very hard to publish a lot of papers in prestigious journals. They are friends with other academics and with other high achievers in the "standard" fields like finance, law, and medicine. Their parents were probably academics or from the upper middle class. They have spent most of their lives in a university setting.

[-][anonymous]7y 0

If I had to take an honest guess? Theoretical discovery will behave "inefficiently" when it requires a breadth-first (or at least, breadth-focused) search through the idea space before you can find things that "fit together". Only once you have a bunch of things which "fit together" can you look at the shape of the "hole in idea-space" they all border, dive to the bottom of that lake, and bring up an entirely new idea which links them or unifies them.

So:

1) Mostly agreed, as described above.

2) As described above. M... (read more)

It strikes me that there's a second set of reasons that read something like:

  1. There is significant motivation to get it wrong, or wrong answers are very obvious.

See: the free will problem.

Normative uncertainty does not seem particularly interesting. You can just use set multiplication over probable futures and evaluations of them, and wind up with a regular problem with certain evaluations. It even agrees with normal methods if you assume a single, certain evaluation.

Now, as a concept it's important - realizing that you might not know what you want is a tremendous source of uncertainty that is easy to overlook. I just don't think that any new concepts or mathematical tools are needed to tackle it.

Do you agree about the relative inefficiency of theoretical discovery?

In presence, yes. In degree, no. Even the efficient market hypothesis presumes (indeed, requires) some delay inefficiencies, and even restricting to fields where economic incentives are very, very high we see some pretty significant delay inefficiencies.

The modern containerization revolution is dependent on certain types of math and metallurgy being available, but the underlying tools were probably available before the first world war (and, indeed, would have been even more useful... (read more)