# Nominated Posts for the 2019 Review

Posts need at least 2 nominations to continue into the Review Phase.
Nominate posts that you have personally found useful and important.
Sort by: fewest nominations

# 2019 Review Discussion

This post is eventually about partial agency. However, it's been a somewhat tricky point for me to convey; I take the long route. Epistemic status: slightly crazy.

I've occasionally said "Everything boils down to credit assignment problems."

What I really mean is that credit assignment pops up in a wide range of scenarios, and improvements to credit assignment algorithms have broad implications. For example:

• Politics.
• When politics focuses on (re-)electing candidates based on their track records, it's about credit assignment. The practice is sometimes derogatorily called "finger pointing", but the basic computation makes sense: figure out good and bad qualities via previous performance, and vote accordingly.
• When politics instead focuses on policy, it is still (to a degree) about credit assignment. Was raising the minimum wage responsible for reduced employment? Was it
...
• In between … well … in between, we're navigating treacherous waters …

Right, I basically agree with this picture. I might revise it a little:

• Early, the AGI is too dumb to hack its epistemics (provided we don't give it easy ways to do so!).
• In the middle, there's a danger zone.
• When the AGI is pretty smart, it sees why one should be cautious about such things, and it also sees why any modifications should probably be in pursuit of truthfulness (because true beliefs are a convergent instrumental goal) as opposed to other reasons.
• When the AGI is really smart, it

(cross posted from my personal blog)

Since middle school I've generally thought that I'm pretty good at dealing with my emotions, and a handful of close friends and family have made similar comments. Now I can see that though I was particularly good at never flipping out, I was decidedly not good "healthy emotional processing". I'll explain later what I think "healthy emotional processing" is, right now I'm using quotes to indicate "the thing that's good to do with emotions". Here it goes...

## Relevant context

When I was a kid I adopted a strong, "Fix it or stop complaining about it" mentality. This applied to stress and worry as well. "Either address the problem you're worried about or quit worrying about it!" Also being a kid, I had a limited...

It seems like there's something missing here and I don't know how to add it. You make your childhood behavior of not being upset over things seem bad through framing, but you don't offer many (or maybe any) examples of it being ineffective. You mention that more recently you've been experiencing a sense of general malaise on the weekends, but that just doesn't sound too bad. Many people have malaise on the weekends and sometimes that's just because you're tired from the week and need to recuperate. I don't think moving away from a major life strategy is a ... (read more)

Authors: Megan Crawford, Finan Adamson, Jeffrey Ladish

Special Thanks to Georgia Ray for Editing

Biorisk

Most in the effective altruism community are aware of a possible existential threat from biological technology but not much beyond that. The form biological threats could take is unclear. Is the primary threat from state bioweapon programs? Or superorganisms accidentally released from synthetic biology labs? Or something else entirely?

If you’re not already an expert, you’re encouraged to stay away from this topic. You’re told that speculating about powerful biological weapons might inspire terrorists or rogue states, and simply articulating these threats won’t make us any safer. The cry of “Info hazard!” shuts down discussion by fiat, and the reasons cannot be explained since these might also be info hazards. If concerned, intelligent people cannot articulate...

Surprised about the answers to the second question. In conversations in EA circles I've had about biorisk, infohazards have never been brought up.

Perhaps there is some anchoring going on here?

Epistemic Status: I've really spent some time wrestling with this one. I am highly confident in most of what I say. However, this differs from section to section. I'll put more specific epistemic statuses at the end of each section.

Some of this post is generated from mistakes I've seen people make (or, heard people complain about) in applying conservation-of-expected-evidence or related ideas. Other parts of this post are based on mistakes I made myself. I think that I used a wrong version of conservation-of-expected-evidence for some time, and propagated some wrong conclusions fairly deeply; so, this post is partly an attempt to work out the right conclusions for myself, and partly a warning to those who might make the same mistakes.

All of the mistakes I'll argue against...

Part 2 (and the dream algorithm) remind me of semi-decidability.

This post begins the Immoral Mazes sequence. See introduction for an overview of the plan. Before we get to the mazes, we need some background first.

Meditations on Moloch

Consider Scott Alexander’s Meditations on Moloch. I will summarize here.

Therein lie fourteen scenarios where participants can be caught in bad equilibria.

1. In an iterated prisoner’s dilemma, two players keep playing defect.
2. In a dollar auction, participants massively overpay.
3. A group of fisherman fail to coordinate on using filters that efficiently benefit the group, because they can’t punish those who don’t profi by not using the filters.
4. Rats are caught in a permanent Malthusian trap where only those who do nothing but compete and consume survive. All others are outcompeted.
5. Capitalists serve a perfectly competitive market, and cannot pay a living wage.
6. The tying of all good
...

There is a big difference between a universe with -hugenum value, and a universe with 0 value. Moloch taking over would produce a universe with 0 value, not -hugenum (not just because we might assume that pain and pleasure are equally energy-efficient).

When one then considers how much net value there is in the universe (or how much net disvalue!), I suspect Eluas winning, while probably positive, isn't that great: Sure, sometimes someone learns something in the education system, but many other people also waste intellectual potential, or get bullied.

The issue, as it seems to me, is that almost every text you read on Buddhism does not attempt to do the actual work of translation. The first transmission of Buddhism to the west reified a bunch of translations of terms, such as concentration, equanimity, tranquility, mindfulness, suffering, etc. and works since then have mostly stuck to rearranging these words in different combinations and referencing the same metaphors that have been in use since the time of the Buddha. If these authors had true discernment they would realize that the umpteenth text on 'establishing the noble bases of tranquility secluded from sensuous ignorance' or what-have-you aren't helping anyone who didn't already get the message.

At this point I want to say that I think this approach is 'working'...

2jbkjr1moAny pointers on where I can find information about the specific techniques as originally taught by the Buddha?

[Epistemic Status: Scroll to the bottom for my follow-up thoughts on this from months/years later.]

Early this year, Conor White-Sullivan introduced me to the Zettelkasten method of note-taking. I would say that this significantly increased my research productivity. I’ve been saying “at least 2x”. Naturally, this sort of thing is difficult to quantify. The truth is, I think it may be more like 3x, especially along the dimension of “producing ideas” and also “early-stage development of ideas”. (What I mean by this will become clearer as I describe how I think about research productivity more generally.) However, it is also very possible that the method produces serious biases in the types of ideas produced/developed, which should be considered. (This would be difficult to quantify at the best of...

3manunamz1moFirst, I want to say thanks for posting such a thorough walk-through of your experience with zettelkasten. I ended up using it as a sort of check against my own research and it proved very useful. It seems zettelkasten, atomic notes, and inter-linking have become popular in the note-taking world, but I haven't seen many (any?) apps/tech that support both webs and trees, as you say here. So, I made a jekyll template [https://manunamz.github.io/jekyll-bonsai/] that does that. I am still working on it, there's lots to add, the code isn't open quite yet, but I'd love to get feedback from anyone who is interested.

I personally find the bouncy animation for the tree annoying -- I would prefer if it would sit in place, so that I feel like I can click around without losing my spot. (I know I don't really "lose my spot", but the bouncy animation makes it feel non-static.)

But I don't really like 'concept graph' visualizations in note-taking stuff anyway, so, my feedback wrt that may not be indicative of what potential users of it would think.

I found that I wanted to expand more things on one page, rather than having to visit each note individually to read them, since the... (read more)

[epistemic status: that's just my opinion, man. I have highly suggestive evidence, not deductive proof, for a belief I sincerely hold]

"If you see fraud and do not say fraud, you are a fraud." --- Nasim Taleb

I was talking with a colleague the other day about an AI organization that claims:

1. AGI is probably coming in the next 20 years.
2. Many of the reasons we have for believing this are secret.
3. They're secret because if we told people about those reasons, they'd learn things that would let them make an AGI even sooner than they would otherwise.

His response was (paraphrasing): "Wow, that's a really good lie! A lie that can't be disproven."

I found this response refreshing, because he immediately jumped to the most likely conclusion.

## Near predictions generate more funding

Generally, entrepreneurs who

...

Thanks!

1Teerth Aloke2moAgeing?
6Adele Lopez2moReally? My impression is that rapid AI timelines make things increasingly "hopeless" because there's less time to try to prevent getting paperclipped, and that this is the default view of the community.
2Teerth Aloke2moI tilt towards rapid timeline - but I promise, my brain is not turning into mush. I have no terminal disease

An actual debate about instrumental convergence, in a public space! Major respect to all involved, especially Yoshua Bengio for great facilitation.

For posterity (i.e. having a good historical archive) and further discussion, I've reproduced the conversation here. I'm happy to make edits at the request of anyone in the discussion who is quoted below. I've improved formatting for clarity and fixed some typos. For people who are not researchers in this area who wish to comment, see the public version of this post here. For people who do work on the relevant areas, please sign up in the top right. It will take a day or so to confirm membership.

## Original Post

Yann LeCun: "don't fear the Terminator", a short opinion piece by Tony Zador and me that was just...

4Richard_Ngo2moWhat's your specific critique of this? I think it's an interesting and insightful point.
7TurnTrout2moLeCun claims too much. It's true that the case of animals like orangutans points to a class of cognitive architectures which seemingly don't prioritize power-seeking. It's true that this is some evidence against power-seeking behavior being common amongst relevant cognitive architectures. However, it doesn't show that instrumental subgoals are much weaker drives of behavior than hardwired objectives. One reading of this "drives of behavior" claim is that it has to be tautological; by definition, instrumental subgoals are always in service of the (hardwired) objective. I assume that LeCun is instead discussing "all else equal, will statistical instrumental tendencies ('instrumental convergence') be more predictive of AI behavior than its specific objective function?". But "instrumental subgoals are much weaker drives of behavior than hardwired objectives" is not the only possible explanation of "the lack of domination behavior in non-social animals"! Maybe the orangutans aren't robust to scale. Maybe orangutans do implement non power-seeking cognition, but maybe their cognitive architecture will be hard or unlikely for us to reproduce in a machine - maybe the distribution of TAI cognitive architectures we should expect, is far different from what orangutans are like. I do agree that there's a very good point in the neighborhood of the quoted argument. My steelman of this would be: (This is loose for a different reason, in that it presupooses a single relevant axis of variation between humans and orangutans. Is a personal computer more like a human, or more like an orangutan? But set that aside for the moment.) I think he's overselling the evidence. However, on reflection, I wouldn't pick out the point for such strong ridicule.

I feel like you can turn this point upside down. Even among primates that seem unusually docile, like orang utans, male-male competition can get violent and occasionally ends in death. Isn't that evidence that power-seeking is hard to weed out? And why wouldn't it be in an evolved species that isn't eusocial or otherwise genetically weird?

Epistemic status: Tentative. I’ve been practicing this on-and-off for a year and it’s seemed valuable, but it’s the sort of thing I might look back on and say “hmm, that wasn’t really the right frame to approach it from.”

In doublecrux, the focus is on “what observations would change my mind?”

In some cases this is (relatively) straightforward. If you believe minimum wage helps workers, or harms them, there are some fairly obvious experiments you might run. “Which places have instituted minimum wage laws? What happened to wages? What happened to unemployment? What happened to worker migration?”

The details will matter a lot. The results of the experiment might be weird and confusing. If I ran the experiment myself I’d probably get a lot of things wrong, misuse statistics and

...
1mike_hawke2moThis resonates for me, and I sometimes end up with an 'ignorance is bliss' attitude toward the latter. ~~~ Can you say more about this? Did this aesthetic shift feel good/bad/neutral, either in the moment or upon reflection? I have such shifts occasionally, and it sometimes makes me feel...tired. Like I just get weary at the thought of permanently increasing the amount of nuance that I track. Rereading the excerpt, I feel like some part of me is insisting that adding nuance and caveats is costly and unsustainable.

Hmm, I mostly see the nuance as rich/positive. (there might be some sort of meta-aesthetic-preference here that could also be updated based on facts).

Anecdote: when I went to university studying Computer Animation, one of the things I studied was how to render images with realistic lighting. In the process of doing that, I came to be able to notice lots of things about lighting and surface-texture in my environment, which made them both more beautiful, and beautiful in a wider variety of ways.

There are some cases where I add nuance to something, and the nu... (read more)

### Technical Appendix: First safeguard?

This sequence is written to be broadly accessible, although perhaps its focus on capable AI systems assumes familiarity with basic arguments for the importance of AI alignment. The technical appendices are an exception, targeting the technically inclined.

Why do I claim that an impact measure would be "the first proposed safeguard which maybe actually stops a powerful agent with an imperfect objective from ruining things – without assuming anything about the objective"?

The safeguard proposal shouldn't have to say "and here we solve this opaque, hard problem, and then it works". If we have the impact measure, we have the math, and then we have the code.

...

If the question about accessibility hasn't been resolved, I think Ramana Kumar was talking about making the text readable for people with visual impairments.

[Previously in sequence: Epistemic Learned Helplessness]

I.

“Culture is the secret of humanity’s success” sounds like the most vapid possible thesis. The Secret Of Our Success by anthropologist Joseph Heinrich manages to be an amazing book anyway.

Heinrich wants to debunk (or at least clarify) a popular view where humans succeeded because of our raw intelligence. In this view, we are smart enough to invent neat tools that help us survive and adapt to unfamiliar environments.

Against such theories: we cannot actually do this. Heinrich walks the reader through many stories about European explorers marooned in unfamiliar environments. These explorers usually starved to death. They starved to death in the middle of endless plenty. Some of them were in Arctic lands that the Inuit considered among their richest hunting grounds. Others...

1niplav2moFor completion, here's the prediction on the naive theory, namely that intelligence is instrumentally useful and evolved because solving plans helps you survive: Elicit Prediction (forecast.elicit.org/binary/questions/WuB8J-hP1 [forecast.elicit.org/binary/questions/WuB8J-hP1])

Strange indeed... but, here is a working version:

Related and required reading in life (ANOIEAEIB): The Copenhagen Interpretation of Ethics

Epistemic Status: Trying to be minimally judgmental

Spoiler Alert: Contains minor mostly harmless spoiler for The Good Place, which is the best show currently on television.

The Copenhagen Interpretation of Ethics (in parallel with the similarly named one in physics) is as follows:

The Copenhagen Interpretation of Ethics says that when you observe or interact with a problem in any way, you can be blamed for it. At the very least, you are to blame for not doing more. Even if you don’t make the problem worse, even if you make it slightly better, the ethical burden of the problem falls on you as soon as you observe it. In particular, if you interact with a problem and benefit from it, you

...

But what if A works with B and sees that B didn't go all the way they could to solve a problem? It happens all the time. CIE doesn't force A to peck B's brains out for acting badly; A is under no obligation to hand out punishment - at least if they do work together.

There has been considerable debate over whether development in AI will experience a discontinuity, or whether it will follow a more continuous growth curve. Given the lack of consensus and the confusing, diverse terminology, it is natural to hypothesize that much of the debate is due to simple misunderstandings. Here, I seek to dissolve some misconceptions about the continuous perspective, based mostly on how I have seen people misinterpret it in my own experience.

First, we need to know what I even mean by continuous takeoff. When I say it, I mean a scenario where the development of competent, powerful AI follows a trajectory that is roughly in line with what we would have expected by extrapolating from past progress. That is, there is no point at...

[Thanks to Marco G for proofreading and offering suggestions]

I.

Several studies have shown a genetic link between autism and intelligence; genes that contribute to autism risk also contribute to high IQ. But studies show autistic people generally have lower intelligence than neurotypical controls, often much lower. What is going on?

First, the studies. This study from UK Biobank finds a genetic correlation between genetic risk for autism and educational attainment (r = 0.34), and between autism and verbal-numerical reasoning (r = 0.19). This study of three large birth cohorts finds a correlation between genetic risk for autism and cognitive ability (beta = 0.07). This study of 45,000 Danes finds that genetic risk for autism correlates at about 0.2 with both IQ and educational attainment. These are just three randomly-selected...

Here's a few questions that might be interesting:

Is autism (or various ASDs) a scientifically valid definition( as defined in the DSM or ICD 10/11)?

What is the intra-site and inter-site diagnostic reliability for ASDs?

How consistently are ASD diagnostic tools applied?

Do ASDs have any unique traits?

Are separate, unrelated, etiologies considered ASDs?

Does "clinical expertise" or opinion having more weight than diagnostic tools impact validity?

Are the definitions of ASDs consistent enough longitudinally for analysis?

Do ASDs vary significantly by c... (read more)

Followup to: Where to Draw the Boundary?

Figuring where to cut reality in order to carve along the joints—figuring which things are similar to each other, which things are clustered together: this is the problem worthy of a rationalist. It is what people should be trying to do, when they set out in search of the floating essence of a word.

Once upon a time it was thought that the word "fish" included dolphins ...

The one comes to you and says:

The list: {salmon, guppies, sharks, dolphins, trout} is just a list—you can't say that a list is wrong. You draw category boundaries in specific ways to capture tradeoffs you care about: sailors in the ancient world wanted a word to describe the swimming finned creatures that they saw in

...

I feel like you’re taking my attempts to explain my position and requiring that each one be a rigorous defense.

If someone has made a position clear, they need to move onto defending it at some stage, or else it's all just opinion.

You clearly think that some concepts lack objectivity .. that's been explained a great length with equations and diagrams...and you think that the very existence of scientific objectivity is in danger. But between these two claims there are any number of intermediate steps that have not been explained or defended.

Much of the

We used to make land. We built long wharves for docking ships, and then over time filled in the areas between them. Later we built up mudflats wholesale to make even larger areas. Here's a map of Boston showing how much of the land wasn't previously dry:

In expensive areas, converting wetlands and shallow water into usable land is a very good thing on balance, and we should start doing it again. To take a specific example, we should make land out of the San Francisco Bay, at least South of the Dumbarton Bridge:

This is about 50mi2, a bit bigger than San Fransisco. This would be enough new central land to bring rents down dramatically across the...

Much higher densities are possible but not a good idea.

Not with that attitude! Also, Manhattan daytime and Paris densities are already much higher (~200k / sq mi).

## Introduction

Taxes are typically meant to be proportional to money (or negative externalities, but that's not what I'm focusing on). But one thing money buys you is flexibility, which can be used to avoid taxes. Because of this, taxes aimed at the wealthy tend to end up hitting the well-off-or-rich-but-not-truly-wealthy harder, and tax cuts aimed at the poor end up helping the middle class. Examples (feel free to stop reading these when you get the idea, this is just the analogy section of the essay):

• Computer programmers typically have the option to work remotely in a low-tax state; teachers need to be where the classroom is.
• Estate taxes tend to hit families with single large assets (like a business) harder than those with diverse investments (who can simply sell assets
...

Or more completely: In the absence of malice or extreme negligence there's nothing criminal to punish at all and money damages should suffice.  Given a 100x lower occurrence of accidents this should be insurable for ~1% the cost.  The default answer is drivers remain financially responsible for damages (but insurance gets cheaper) and driver can't be criminally negligent short of modifying/damaging the car in an obviously bad way (e.g. failing to fix a safety critical sensor in a reasonable amount of time that would have prevented the crash. &nbs... (read more)

Eliezer Yudkowsky, listing advantages of a "wizard's oath" ethical code of "Don't say things that are literally false", writes—

Repeatedly asking yourself of every sentence you say aloud to another person, "Is this statement actually and literally true?", helps you build a skill for navigating out of your internal smog of not-quite-truths.

I mean, that's one hypothesis about the psychological effects of adopting the wizard's code.

A potential problem with this is that human natural language contains a lot of ambiguity. Words can be used in many ways depending on context. Even the specification "literally" in "literally false" is less useful than it initially appears when you consider that the way people ordinarily speak when they're being truthful is actually pretty dense

...
2Raemon4moNew jargon term: SuperMetaHonest
2Ben Pace4moAnd if I’m honest about being SuperMetaHonest, then I’m: SuperDuperMetaHonest.
12Raemon4moIf I wrote a sequence about it it'd be my SuperDuperMetaHonestEpistemicOpus

“SuperDuperMetaHonestEpistemicOpus”

“If you try to Glomarize you will be too verbocious”

[Epistemic status: Strong claims vaguely stated and weakly held. I expect that writing this and digesting feedback on it will lead to a much better version in the future. EDIT: So far this has stood the test of time. EDIT: As of September 2020 I think this is one of the most important things to be thinking about.]

This post attempts to generalize and articulate a problem that people have been thinking about since at least 2016. [Edit: 2009 in fact!] In short, here is the problem:

Consequentialists can get caught in commitment races, in which they want to make commitments as soon as possible. When consequentialists make commitments too soon, disastrous outcomes can sometimes result. The situation we are in (building AGI and letting it self-modify) may be...

3JesseClifton4moIt seems like we can kind of separate the problem of equilibrium selection from the problem of “thinking more”, if “thinking more” just means refining one’s world models and credences over them. One can make conditional commitments of the form: “When I encounter future bargaining partners, we will (based on our models at that time) agree on a world-model according to some protocol and apply some solution concept (e.g. Nash or Kalai-Smorodinsky) to it in order to arrive at an agreement.” The set of solution concepts you commit to regarding as acceptable still poses an equilibrium selection problem. But, on the face of it at least, the “thinking more” part is handled by conditional commitments to act on the basis of future beliefs. I guess there’s the problem of what protocols for specifying future world-models you commit to regarding as acceptable. Maybe there are additional protocols that haven’t occurred to you, but which other agents may have committed to and which you would regard as acceptable when presented to you. Hopefully it is possible to specify sufficiently flexible methods for determining whether protocols proposed by your future counterparts are acceptable that this is not a problem.
3Daniel Kokotajlo4moIf I read you correctly, you are suggesting that some portion of the problem can be solved, basically -- that it's in some sense obviously a good idea to make a certain sort of commitment, e.g. "When I encounter future bargaining partners, we will (based on our models at that time) agree on a world-model according to some protocol and apply some solution concept (e.g. Nash or Kalai-Smorodinsky) to it in order to arrive at an agreement.” So the commitment races problem may still exist, but it's about what other commitments to make besides this one, and when. Is this a fair summary? I guess my response would be "On the object level, this seems like maybe a reasonable commitment to me, though I'd have lots of questions about the details. We want it to be vague/general/flexible enough that we can get along nicely with various future agents with somewhat different protocols, and what about agents that are otherwise reasonable and cooperative but for some reason don't want to agree on a world-model with us? On the meta level though, I'm still feeling burned from the various things that seemed like good commitments to me and turned out to be dangerous, so I'd like to have some sort of stronger reason to think this is safe."
3JesseClifton4moYeah I agree the details aren’t clear. Hopefully your conditional commitment can be made flexible enough that it leaves you open to being convinced by agents who have good reasons for refusing to do this world-model agreement thing. It’s certainly not clear to me how one could do this. If you had some trusted “deliberation module”, which engages in open-ended generation and scrutiny of arguments, then maybe you could make a commitment of the form “use this protocol, unless my counterpart provides reasons which cause my deliberation module to be convinced otherwise”. Idk. Your meta-level concern seems warranted. One would at least want to try to formalize the kinds of commitments we’re discussing and ask if they provide any guarantees, modulo equilibrium selection.

I think we are on the same page then. I like the idea of a deliberation module; it seems similar to the "moral reasoning module" I suggested a while back. The key is to make it not itself a coward or bully, reasoning about schelling points and universal principles and the like instead of about what-will-lead-to-the-best-expected-outcomes-given-my-current-credences.