2018 Review Discussion

What is voting theory?

Voting theory, also called social choice theory, is the study of the design and evaulation of democratic voting methods (that's the activists' word; game theorists call them "voting mechanisms", engineers call them "electoral algorithms", and political scientists say "electoral formulas"). In other words, for a given list of candidates and voters, a voting method specifies a set of valid ways to fill out a ballot, and, given a valid ballot from each voter, produces an outcome.

(An "electoral system" includes a voting method, but also other implementation details, such as how the candidates and voters are validated, how often elections happen and for what offices, etc. "Voting system" is an ambiguous term that can refer to a full electoral system, just to the voting method,...

Basically, a utility monster is a person or group that derives orders of magnitude more utility from some activity, and the effect is so large that it cancels out the rest of the population's preferences. An example is Confederate slave owners derived way more utility from owning slaves than non-slave owners, so if there was an election over whether slavery is illegal, then the Confederate slave owners have a strategy that is: Turn into utility monsters that derive orders of magnitude more utility, and make slave owners have extreme preferences. Even if there were 1,000 slave owners to 1 million non slave owners, there's still a way for the slave owners to win using a strategy.

While this is a solid example if argued better, in my view this is a somewhat badly argued description of it, and the example is one that is extremely important to be correct about. There are several turns of phrase that I think are false under standard definitions of the words, despite themselves being standard turns of phrase; eg, it is in my view not possible under <?natural law?> to own another being, so the "ownership" in the law of the enforcers of the time was misleading phrasing, and that "ownership" should not be agreed with today since we h... (read more)

(Cross-posted from Facebook.)

0: Tl;dr.

  • A problem with the obvious-seeming "wizard's code of honesty" aka "never say things that are false" is that it draws on high verbal intelligence and unusually permissive social embeddings. I.e., you can't always say "Fine" to "How are you?" This has always made me feel very uncomfortable about the privilege implicit in recommending that anyone else be more honest.
  • Genuinely consistent Glomarization (i.e., consistently saying "I cannot confirm or deny" whether or not there's anything to conceal) does not work in principle because there are too many counterfactual selves who might want to conceal something.
  • Glomarization also doesn't work in practice if the Nazis show up at your door asking if you have fugitive Jews in your attic.
  • If you would lie to Nazis about fugitive

I think that 

"Don't say things that you believe to be literally false in a context where people will (with reasonably high probability) persistently believe that you believe them to be true"

is actually in line with the "bayesian honesty" component/formulation of the proposal. If one is known to universally lie, one's words have no information content, and therefore don't increase other people's bayesian probabilities of falsy statements. However, it seems this is not a behaviour that Eliezer finds morally satisfactory. (I agree with Rob Bensinger that this formulation is more practical in daily life)

[Epistemic status: Pretty good, but I make no claim this is original]

A neglected gem from Less Wrong: Why The Tails Come Apart, by commenter Thrasymachus. It explains why even when two variables are strongly correlated, the most extreme value of one will rarely be the most extreme value of the other. Take these graphs of grip strength vs. arm strength and reading score vs. writing score:

In a pinch, the second graph can also serve as a rough map of Afghanistan

Grip strength is strongly correlated with arm strength. But the person with the strongest arm doesn’t have the strongest grip. He’s up there, but a couple of people clearly beat him. Reading and writing scores are even less correlated, and some of the people with the best reading...

Very nicely written. A good example of this might be invention of genetic flaw correction, due to which morally controversial abortion could become less desired option.

Wei Dai, one of the first people Satoshi Nakamoto contacted about Bitcoin, was a frequent Less Wrong contributor. So was Hal Finney, the first person besides Satoshi to make a Bitcoin transaction.

The first mention of Bitcoin on Less Wrong, a post called Making Money With Bitcoin, was in early 2011 - when it was worth 91 cents. Gwern predicted that it could someday be worth "upwards of $10,000 a bitcoin". He also quoted Moldbug, who advised that:

If Bitcoin becomes the new global monetary system, one bitcoin purchased today (for 90 cents, last time I checked) will make you a very wealthy individual...Even if the probability of Bitcoin succeeding is epsilon, a million to one, it's still worthwhile for anyone to buy at least a few bitcoins now...I

[copying the reply here because I don't like looking at the facebook popup]

(I usually do agree with Scott Alexander on almost everything, so it's only when he says something I particularly disagree with that I ever bother to broadcast it. Don't let that selection bias give you a misleading picture of our degree of general agreement. #long)

I think Scott Alexander is wrong that we should regret our collective failure to invest early in cryptocurrency. This is very low on my list of things to kick ourselves about. I do not consider it one of my life's regrets... (read more)

[I am not a sleep specialist. Please consult with one before making any drastic changes or trying to treat anything serious.]

Van Geijlswijk et al describe supplemental melatonin as “a chronobiotic drug with hypnotic properties”. Using it as a pure hypnotic – a sleeping pill – is like using an AK-47 as a club to bash your enemies’ heads in. It might work, but you’re failing to appreciate the full power and subtlety available to you.

Melatonin is a neurohormone produced by the pineal gland. In a normal circadian cycle, it’s lowest (undetectable, less than 1 pg/ml of blood) around the time you wake up, and stays low throughout the day. Around fifteen hours after waking, your melatonin suddenly shoots up to 10 pg/ml – a process called “dim...

The connection between sleep architecture and mental disorders such as depression and bipolar/mania now has more of a neurological foundation. During wake the brain tends to strengthen and grow new synapses which diverges from ideal normalized synaptic homeostasis; during sleep (and REM sleep in particular) the brain tends to prune synapses to restore normalized synaptic homeostasis. The synaptic sleep pruning is selective and related to longer term memory consolidation processes. Given that synapses are unsigned and how hebbian plasticity works it make... (read more)

Epistemic status: Fake Framework

When you walk into an improv scene, you usually have no idea what role you’re playing. All you have is some initial prompt — something like:

“You three are in a garden. The scene has to involve a stuffed bear somehow. Go!”

So now you’re looking to the other people there. Then someone jumps forward and adds to the scene: “Oh, there it is! I’m glad we finally found it!” Now you know a little bit about your character, and about the character of the person who spoke, but not enough to fully define anyone’s role.

You can then expand the scene by adding something: “It’s about time! We’re almost late now.” Now you’ve specified more about what’s going on, who you are, and who the other...

Fascinating essay. It put into words a lot of inferences I've been independently making, while also suggesting a fun angle (the distributed social computations) to look at it from.

I suspect the framework is less fake in certain aspects than might seem at first glance, too! As in, it actually corresponds to some mechanical realities of how human minds are implemented. In terms of the post I've linked, you could view "stopping" as momentarily ignoring the self-model you've derived and looking at the actual shards that implement your values (and then re-compi... (read more)

Here’s a pattern I’d like to be able to talk about. It might be known under a certain name somewhere, but if it is, I don’t know it. I call it a Spaghetti Tower. It shows up in large complex systems that are built haphazardly.

Someone or something builds the first Part A.

Later, someone wants to put a second Part B on top of Part A, either out of convenience (a common function, just somewhere to put it) or as a refinement to Part A.

Now, suppose you want to tweak Part A. If you do that, you might break Part B, since it interacts with bits of Part A. So you might instead build Part C on top of the previous ones.

And by the time your...

I also thought of tax, and I think it is probably a good example. What is especially confusing about tax is that, for some reason, whenever anyone has an idea they add on a new bit, rather than removing an old one.

ISAs are an obvious example of this. They basically introduce a subsystem that bypasses the normal taxation system in a limited way. But a similar effect could have been introduced much more simply by just raising the tax threshold in the main system.

I think Paul Christiano’s research agenda for the alignment of superintelligent AGIs presents one of the most exciting and promising approaches to AI safety. After being very confused about Paul’s agenda, chatting with others about similar confusions, and clarifying with Paul many times over, I’ve decided to write a FAQ addressing common confusions around his agenda.

This FAQ is not intended to provide an introduction to Paul’s agenda, nor is it intended to provide an airtight defense. This FAQ only aims to clarify commonly misunderstood aspects of the agenda. Unless otherwise stated, all views are my own views of Paul’s views. (ETA: Paul does not have major disagreements with anything expressed in this FAQ. There are many small points he might have expressed differently, but he endorses...

Yes, a value grounded in a factual error will get blown up by better epistemics, just as "be uncertain about the human's goals" will get blown up by your beliefs getting their entropy deflated to zero by the good ole process we call "learning about reality." But insofar as corrigibility is "chill out and just do some good stuff without contorting 4D spacetime into the perfect shape or whatever", there are versions of that which don't automatically get blown up by reality when you get smarter. As far as I can tell, some humans are living embodiments of the latter. I have some "benevolent libertarian" values pushing me Pareto improving everyone's resource counts and letting them do as they will with their compute budgets. What's supposed to blow that one up? This paragraph as a whole seems to make a lot of unsupported-to-me claims and seemingly equivocates between the two bolded claims, which are quite different. The first is that we (as adult humans with relatively well-entrenched values) would not want to defer to a strange alien. I agree. The second is that we wouldn't want to defer "even if we had great respect toward it and had been trained hard in childhood to act corrigibly towards it." I don't see why you believe that. Perhaps if we were otherwise socialized normally, we would end up unendorsing that value and not deferring? But I conjecture if that a person weren't raised with normal cultural influences, you could probably brainwash them into being aligned baby-eaters via reward shaping via brain stimulation reward. A utilitarian? Like, as Thomas Kwa asked, what are the type signatures of the utility functions you're imagining the AI to have? Your comment makes more sense to me if I imagine the utility function is computed over "conventional" objects-of-value [https://www.lesswrong.com/posts/dqSwccGTWyBgxrR58/turntrout-s-shortform-feed#cuTotpjqYkgcwnghp] .
"Don't care" is quite strong. If you still hold this view -- why don't you care about 3? (Curious to hear from other people who basically don't care about 3, either.)

Yeah, "don't care" is much too strong. This comment was just meant in the context of the current discussion. I could instead say:

The kind of alignment agenda that I'm working on, and the one we're discussing here, is not relying on this kind of generalization of corrigibility. This kind of generalization isn't why we are talking about corrigibility.

However, I agree that there are lots of approaches to building AI that rely on some kind of generalization of corrigibility, and that studying those is interesting and I do care about how that goes.

In the contex... (read more)

"Often I compare my own Fermi estimates with those of other people, and that’s sort of cool, but what’s way more interesting is when they share what variables and models they used to get to the estimate."

– Oliver Habryka, at a model building workshop at FHI in 2016

One question that people in the AI x-risk community often ask is

"By what year do you assign a 50% probability of human-level AGI?"

We go back and forth with statements like "Well, I think you're not updating enough on AlphaGo Zero." "But did you know that person X has 50% in 30 years? You should weigh that heavily in your calculations."

However, 'timelines' is not the interesting question. The interesting parts are in the causal models behind the estimates. Some possibilities:

  • Do you

This is one of the most important reasons why hubris is so undervalued. People mistakenly think the goal is to generate precise probability estimates for frequently-discussed hypotheses (a goal in which deference can make sense). In a common-payoff-game research community, what matters is making new leaps in model space, not converging on probabilities. We (the research community) are bottlenecked by insight-production, not marginally better forecasts or decisions. Feign hubris if you need to, but strive to install it as a defense against model-dissolving deference.

(Cross-posted from Facebook.)

Now and then people have asked me if I think that other people should also avoid high school or college if they want to develop new ideas. This always felt to me like a wrong way to look at the question, but I didn't know a right one.

Recently I thought of a scary new viewpoint on that subject.

This started with a conversation with Arthur where he mentioned an idea by Yoshua Bengio about the software for general intelligence having been developed memetically. I remarked that I didn't think duplicating this culturally transmitted software would be a significant part of the problem for AGI development. (Roughly: low-fidelity software tends to be algorithmically shallow. Further discussion moved to comment below.)

But this conversation did get me thinking about...

I always thought of the internet in the same way that Night City is described in Neuromancer:

"Night City [is] like a deranged experiment in social Darwinism, designed by a bored researcher who kept one thumb permanently on the fast-forward button."


The internet is just societal evolution set on fast-forward.

What is the difference between a smart person who has read the sequences and considers AI x-risk important and interesting, but continues to be primarily a consumer of ideas, and someone who starts having ideas? I am not trying to set a really high bar here -- they don't have to be good ideas. They can't be off-the-cuff either, though. I'm talking about someone taking their ideas through multiple iterations.

A person does not need to research full-time to have ideas. Ideas can come during downtime. Maybe it is something you think about during your commute, and talk about occasionally at a lesswrong meetup.

There is something incomplete about my model of people doing this vs not doing this. I expect more people to have more ideas than they...

I think it's just a matter of some people (people like us) who find problem solving and coming up with ideas to be fun. Intellectually active individuals are so because they find it to be fun, while most others do not.

Circling is a practice, much like meditation is a practice.

There are many forms of it (again, like there are many forms of meditation). There are even life philosophies built around it. There are lots of intellectual, heady discussions of its theoretical underpinnings, often centered in Ken Wilber's Integral Theory. Subcultures have risen from it. It is mostly practiced in the US and Europe. It attracts lots of New Age-y, hippie, self-help-guru types. My guess is that the median age of practicers is in the 30's. I sometimes refer to practicers of Circling as relationalists (or just Circlers).

In recent years, Circling has caught the eye of rationalists, and that's why this post is showing up here, on LessWrong. I can hopefully direct people here who have...

Good job, well predicted. Even CFAR has degenerated into woo now.

One of the most pleasing things about probability and expected utility theory is that there are many coherence arguments that suggest that these are the “correct” ways to reason. If you deviate from what the theory prescribes, then you must be executing a dominated strategy. There must be some other strategy that never does any worse than your strategy, but does strictly better than your strategy with certainty in at least one situation. There’s a good explanation of these arguments here.

We shouldn’t expect mere humans to be able to notice any failures of coherence in a superintelligent agent, since if we could notice these failures, so could the agent. So we should expect that powerful agents appear coherent to us. (Note that it is possible that the...

I have no idea why I responded 'low' to 2. Does anybody think that's reasonable and fits in with what I wrote here, or did I just mean high?

"random utility-maximizer" is pretty ambiguous; if you imagine the space of all possible utility functions over action-observation histories and you imagine a uniform distribution over them (suppose they're finite, so this is doable), then the answer is low.

Heh, looking at my comment it turns out I said roughly the same thing 3 years ago.

This post adapts some internal notes I wrote for the Open Philanthropy Project, but they are merely at a "brainstorming" stage, and do not express my "endorsed" views nor the views of the Open Philanthropy Project. This post is also written quickly and not polished or well-explained.

My 2017 Report on Consciousness and Moral Patienthood tried to address the question of "Which creatures are moral patients?" but it did little to address the question of "moral weight," i.e. how to weigh the interests of different kinds of moral patients against each other:

For example: suppose we conclude that fishes, pigs, and humans are all moral patients, and we estimate that, for a fixed amount of money, we can (in expectation) dramatically improve the welfare of (a) 10,000 rainbow trout,

With a loguniform distribution, the mean moral weight is stable and roughly equal to 2.

This article was originally a post on my tumblr. I'm in the process of moving most of these kinds of thoughts and discussions here.

Okay. There’s a social interaction concept that I’ve tried to convey multiple times in multiple conversations, so I’m going to just go ahead and make a graph.

I’m calling this concept “Affordance Widths”.

Let’s say there’s some behavior {B} that people can do more of, or less of. And everyone agrees that if you don’t do enough of the behavior, bad thing {X} happens; but if you do too much of the behavior, bad thing {Y} happens.

Now, let’s say we have five different people: Adam, Bob, Charles, David, and Edgar. Each of them can do more or less {B}. And once they do too little, {X}


While the author here has been credibly accused of abuse, and so I have no desire to raise his social status, I see this concept as valuable. In fact, it is a good model of at least one element of the vaguely-defined concept of privilege.

Take general social assertiveness. Men are generally Bob, while women are more often Carol. However, it appears that women look at all the B men are getting away with without suffering Y, and see men as Adam. On the other hand, men see the amount of B women can afford not to do without suffering X, and see women as Alice. ... (read more)

Epistemic status: political, opinionated, personal, all the typical caveats for controversial posts.

I was talking with a libertarian friend of mine the other day about my growing discomfort with the political culture in the Bay Area, and he asked why I didn't just move.

It's a good question.  Peter Thiel just moved to L.A., citing the left-wing San Francisco culture as his reason.

But I like living in the Bay, and I don't plan to go anywhere in the near future. I could have said that I'm here for the tech industry, or here because my friends are, or any number of superficially "practical" reasons, but they didn't feel like my real motivation.

What I actually gave as the reason I stay was... aesthetics.

Wait, what?

Let's Talk About Design

I'm not a designer, so...

Could you expand on this?  What are the fnords?

(Cross-posted from Facebook.)

I've noticed that, by my standards and on an Eliezeromorphic metric, most people seem to require catastrophically high levels of faith in what they're doing in order to stick to it. By this I mean that they would not have stuck to writing the Sequences or HPMOR or working on AGI alignment past the first few months of real difficulty, without assigning odds in the vicinity of 10x what I started out assigning that the project would work. And this is not a kind of estimate you can get via good epistemology.

I mean, you can legit estimate 100x higher odds of success than the Modest and the Outside Viewers think you can possibly assign to "writing the most popular HP fanfiction on the planet out...

I recently got a chance to interview a couple people about this who'd done product management or similar at bay area tech companies.

They agreed that you can't run projects there unless you project near-certainty the project will succeed. However, they had a trick that had failed to occur to me prior to them saying it, which is to find a mid-scale objective that is all of: a) quite likely to have at least a bit of use in its own right; b) almost certainly do-able; and c) a stepping-stone for getting closer to the (more worthwhile but higher-failure-odds) g... (read more)

Epistemic status: trying to vaguely gesture at vague intuitions. A similar idea was explored here under the heading "the intelligibility of intelligence", although I hadn't seen it before writing this post. As of 2020, I consider this follow-up comment to be a better summary of the thing I was trying to convey with this post than the post itself. The core disagreement is about how much we expect the limiting case of arbitrarily high intelligence to tell us about the AGIs whose behaviour we're worried about.

There’s a mindset which is common in the rationalist community, which I call “realism about rationality” (the name being intended as a parallel to moral realism). I feel like my skepticism about agent foundations research is closely tied to my skepticism...

That isn't analogous to rationalism versus the mainstream. The mainstream has already developed more complex models...it's rationalism that's saying, "no , just use Bayes for everything" (etc).

1Alexander Gietelink Oldenziel1y
What. This seems obviously incorrect? The Pearl- Rubin- Sprites-Glymour- and others I theory of causality is a very powerful framework for causality that satisfies pretty what one intuitively understand as 'causality'. It is moreover powerful enough to make definite computations and even the much- craved for 'real applications'. It is 'a very correct' formalisation of 'normal' causality. I say 'very correct' instead of 'correct' because there are still areas of improvements - but this is more like GR improving on Newtonian gravity rather than Newtonian gravity being incorrect.
Got a link to the best overview/defense of that claim? I'm open to this argument but have some cached thoughts about Pearl's framework being unsatisfactory - would be useful to do some more reading and see if I still believe them.
1Alexander Gietelink Oldenziel1y
There are some cases where Pearl and others' causality framework can be improved - supposedly Factored Sets will, although I personally don't understand it. I was recently informed that certain abductive counterfactual phrases due to David Lewis are not well-captured by Pearl's system. I believe there are also other ways - all of this is actively being researched. What do you find unsatisfactory about Pearl? All of this is besides the point which is that there is a powerful well-developed, highly elegant theory of causality with an enourmous range of applications. Rubin's framework (which I am told is equivalent to Pearl) is used throughout econometrics - indeed econometrics is best understand as the Science of Causality. I am not an expert - I am trying to learn much of this theory right now. I am probably not the best person to ask about theory of causality. That said: I am not sure to what degree you are already familiar with Pearl's theory of causality but I recommend https://michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/ for an excellent introduction. THere is EY's https://www.lesswrong.com/posts/hzuSDMx7pd2uxFc5w/causal-diagrams-and-causal-models [https://www.lesswrong.com/posts/hzuSDMx7pd2uxFc5w/causal-diagrams-and-causal-models] which you may or may not find convincing For a much more leisurely argument for Pearl's viewpoint, I recommend his "book of why". In a pinch you could take a look at the book review on the causality bloglist on LW. https://www.lesswrong.com/tag/causality

I want to quickly draw attention to a concept in AI alignment: Robustness to Scale. Briefly, you want your proposal for an AI to be robust (or at least fail gracefully) to changes in its level of capabilities. I discuss three different types of robustness to scale: robustness to scaling up, robustness to scaling down, and robustness to relative scale.

The purpose of this post is to communicate, not to persuade. It may be that we want to bite the bullet of the strongest form of robustness to scale, and build an AGI that is simply not robust to scale, but if we do, we should at least realize that we are doing that.

Robustness to scaling up means that your AI system does not depend on not being...

Rereading this post while thinking about the approximations that we make in alignment, two points jump at me:

  • I'm not convinced that robustness to relative scale is as fundamental as the other two, because there is no reason to expect that in general the subcomponents will be significantly different in power, especially in settings like adversarial training where both parts are trained according to the same approach. That being said, I still agree that this is an interesting question to ask, and some proposal might indeed depend on a version of this.
  • Robustn
... (read more)

You are viewing Version 2 of this post: a major revision written for the LessWrong 2018 Review. The original version published on 9th November 2018 can be viewed here.

See my change notes for major updates between V1 and V2.

Combat Culture

I went to an orthodox Jewish high school in Australia. For most of my early teenage years, I spent one to three hours each morning debating the true meaning of abstruse phrases of Talmudic Aramaic. The majority of class time was spent sitting opposite your chavrusa (study partner, but linguistically the term has the same root as the word “friend”) arguing vehemently for your interpretation of the arcane words. I didn’t think in terms of probabilities back then, but if I had, I think at any point I...

mod note: this post probably shouldn't have been included in the 2020 review. It was behaving a bit weirdly because it had appeared in a previous review, and it'd be a fair amount of coding work to get it to seamlessly display the correct number of reviews. It's similar to a post of mine in that it was edited substantially for the 2018 review and re-published in 2020, which updated it's postedAt date which resulted in it bypassing the intended filters of 'must have been published in 2020'

I had previously changed the postedAt date on my post to be pre-2020 so that it wouldn't appear here, and just did the same for this one.

ReviewWow, I really love that this has been updated and appendix'd. It's really nice to see how this has grown with community feedback and gotten polished this from a rough concept. Creating common knowledge on how 'cultures' of communication can differ seems really valuable for a community focused on cooperatively finding truth.
Load More