Kaj_Sotala's Comments

Does donating to EA make sense in light of the mere addition paradox ?
So basically, the idea here is that it actually makes intuitive moral sense for most EA donors to donate to EA causes ?

Not sure whether every EA would endorse this description, but it's how I think of it, yes.

Does donating to EA make sense in light of the mere addition paradox ?
I can see it intuitively making sense, but barring a comprehensive moral system that can argue for the value of all human life, it sems intuition is not enough. As in, it also intuitively make sense to put 10% of your income into low-yield bonds, so in case one of your family members or friends has a horrible (deadly or severely life-quality diminishing) problem you can help them.

Utilitarianism is not the only system that becomes problematic if you try to formalize it enough; the problem is that there is no comprehensive moral system that wouldn't either run into paradoxical answers, or be so vague that you'd need to fill in the missing gaps with intuition anyway.

Any decision that you make, ultimately comes down to your intuition (that is: decision-weighting systems that make use of information in your consciousness but which are not themselves consciously accessible) favoring one decision or the other. You can try to formulate explicit principles (such as utilitarianism) which explain the principles behind those intuitions, but those explicit principles are always going to only capture a part of the story, because the full decision criteria are too complex to describe.

So the answer to

So basically, I'm kinda stuck understanding under which moral presincts it actually makes sense to donate to EA charities ?

is just "the kinds where donating to EA charities makes more intuitive sense than not donating"; often people describe these kinds of moral intuitions as "utilitarian", but few people would actually endorse all of the conclusions of purely utilitarian reasoning.

Becoming Unusually Truth-Oriented

Could you just take the description of the technique and discuss it in the context of recalling non-dream-related memories? As you note yourself, exactly the same steps seem to work for e.g. recalling events from the previous day.

Becoming Unusually Truth-Oriented

FWIW, when I have done similar practice on real-life memories rather than dreams, I have sometimes checked my recollection of past events with other people who were there, and they have agreed with my account. Of course they could be influenced by my recollection, but I have sometimes recalled details which I have reason to believe that they would otherwise remember much better than me. For example, a friend showed me an episode of a TV series that she had seen several times before, but which I had not. The next day I used this kind of a technique to bring up details about the plot which I didn't remember initially, and she confirmed that I remembered them correctly.

So if the technique seems to provide accurate recall rather than confabulation in a non-dream context, it would seem like a reasonable default guess that it would provide accurate recall in a dream context as well.

[Link and commentary] The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse?

I thought that the discussion of various fields having different tradeoffs with regard to disclosing vulnerabilities, was particularly interesting:

The framework helps to explain why the disclosure of software vulnerabilities will often be beneficial for security. Patches to software are often easy to create, and can often be made in a matter of weeks. These patches fully resolve the vulnerability. The patch can be easily propagated: for downloaded software, the software is often automatically updated over the internet; for websites, the fix can take effect immediately. In addition, counterfactual possession is likely, because it is normally easier to find a software vulnerability (of which there is a constant supply) than to make a scientific discovery (see [3]). These factors combine to make a reasonable argument in favour of public disclosure of software vulnerabilities, at least after the vendor has been given time to prepare a patch.

Contrasting other fields will further bring into relief the comparatively defence-dominant character of software vulnerability knowledge. We can focus on the tractability of defensive solutions: for certain technologies, there is no low-cost, straightforward, effective defence.

First, consider biological research that provides insight into the manufacture of pathogens, such as a novel virus. A subset of viruses are very difficult to vaccinate for (there is still no vaccination for HIV) or otherwise prepare against. This lowers the defensive benefit of publication, by blocking a main causal pathway by which publication leads to greater protection. This contrasts with the case where an effective treatment can be developed within a reasonable time period, which could weigh in favour of publication [15].
Second, consider cases of hardware based vulnerabilities, such as with kinetic attacks or physical key security. Advances in drone hardware have enabled the disruption of airports and attacks on infrastructure such as oil facilities; these attacks presently lack a cheap, effective solution [18]. This arises in part from the large attack surface of physical infrastructure: the drone’s destination can be one of many possible points on the facility, and it can arrive there via a multitude of different trajectories. This means that the path of the drone cannot simply be blocked.

Moreover, in 2003 a researcher published details about a vulnerability in physical key systems [2]. Apartment buildings, offices, hotels and other large buildings often use systems where a single master-key can open all doors. The research showed how to derive the master-key from a single non-master key. The researcher wrote that there was “no simple or completely effective countermeasure that prevents exploitation of this vulnerability short of replacing a master keyed system with a non-mastered one” ([1]; see [2] for further discussion of counter-measures). The replacement of masterkey systems is a costly solution insofar as master-key systems are useful, and changes are very difficult to propagate: physical key systems distributed across the world would need to be manually updated

Finally, consider the policy question of whether one should have published nuclear engineering research, such as on uranium enrichment, in the 1960s. For countries like India and Pakistan, this would have increased, not decreased, their potential to destroy each others’ cities, due to the lack of defensive solutions: as with certain diseases, nuclear bombs cannot be adequately protected against. Moreover, for the minor protections against nuclear bombs that exist, these can be pursued without intricate knowledge as to how nuclear bombs are manufactured: there is low transferability of offensive into defensive knowledge. For example, a blueprint for the design of a centrifuge does not help one build a better defensive bunker. Overall, if both a potential defender and potential attacker are given knowledge that helps them build nuclear weapons, that knowledge is more useful for making an attack than protecting against an attack: the knowledge is offense-biased.

Differences across fields will shape the security value of publication, which can influence disclosure norms among security-minded scientists and policymakers. The Manhattan Project was more secretive than locksmiths and influenza researchers, who are in turn often more secretive than those finding vulnerabilities in software. Indeed, there was a culture clash between the researcher who published the flaw in the master-key system, above, who came from a computer security background, and the locksmiths who accused him of being irresponsible. The different disclosure cultures exist in the form of default practices, but also in common refrains - for example, language about the virtues of “studying”a problem, or the value of users being empowered by disclosure to “make decisions for themselves”. Such language embeds implicit answers to the framework given in this section, and therefore caution should be exercised when importing concepts and language from other fields.
The Catastrophic Convergence Conjecture
Human values are complicated and fragile

It's not clear to me whether you actually meant to suggest this as well, but this line of reasoning makes me wonder if many of our values are actually not that complicated and fragile after all, instead being to connected to AU considerations. E.g. self-determination theory's basic needs of autonomy, competence and relatedness seem like different ways of increasing your AU, and the boredom example might not feel catastrophic because of some highly arbitrary "avoid boredom" bit in the utility function, but rather because looping a single experience over and over isn't going to help you maintain your ability to avoid catastrophes. (That is, our motivations and values optimize for maintaining AU among other things, even if that is not the thing that those values feel like from the inside.)

Confirmation Bias As Misfire Of Normal Bayesian Reasoning

See also Mercier & Sperber 2011 on confirmation bias:

... an absence of reasoning is to be expected when people already hold some belief on the basis of perception, memory, or intuitive inference, and do not have to argue for it. Say, I believe that my keys are in my trousers because that is where I remember putting them. Time has passed, and they could now be in my jacket, for example. However, unless I have some positive reason to think otherwise, I just assume that they are still in my trousers, and I don’t even make the inference (which, if I am right, would be valid) that they are not in my jacket or any of the other places where, in principle, they might be. In such cases, people typically draw positive rather than negative inferences from their previous beliefs. These positive inferences are generally more relevant to testing these beliefs. For instance, I am more likely to get conclusive evidence that I was right or wrong by looking for my keys in my trousers rather than in my jacket (even if they turn out not to be in my jacket, I might still be wrong in thinking that they are in my trousers). We spontaneously derive positive consequences from our intuitive beliefs. This is just a trusting use of our beliefs, not a confirmation bias (see Klayman & Ha 1987). [...]

One of the areas in which the confirmation bias has been most thoroughly studied is that of hypothesis testing, often using Wason’s rule discovery task (Wason 1960). In this task, participants are told that the experimenter has in mind a rule for generating number triples and that they have to discover it. The experimenter starts by giving participants a triple that conforms to the rule (2, 4, 6). Participants can then think of a hypothesis about the rule and test it by proposing a triple of their own choice. The experimenter says whether or not this triple conforms to the rule. Participants can repeat the procedure until they feel ready to put forward their hypothesis about the rule. The experimenter tells them whether or not their hypothesis is true. If it is not, they can try again or give up.

Participants overwhelmingly propose triples that fit with the hypothesis they have in mind. For instance, if a participant has formed the hypothesis “three even numbers in ascending order,” she might try 8, 10, 12. As argued by Klayman and Ha (1987), such an answer corresponds to a “positive test strategy” of a type that would be quite effective in most cases. This strategy is not adopted in a reflective manner, but is rather, we suggest, the intuitive way to exploit one’s intuitive hypotheses, as when we check that our keys are where we believe we left them as opposed to checking that they are not where it follows from our belief that they should not be. What we see here, then, is a sound heuristic rather than a bias.

This heuristic misleads participants in this case only because of some very peculiar (and expressly designed) features of the task. What is really striking is the failure of attempts to get participants to reason in order to correct their ineffective approach. It has been shown that, even when instructed to try to falsify the hypotheses they generate, fewer than one participant in ten is able to do so (Poletiek 1996; Tweney et al. 1980). Since the hypotheses are generated by the participants themselves, this is what we should expect in the current framework: The situation is not an argumentative one and does not activate reasoning. However, if a hypothesis is presented as coming from someone else, it seems that more participants will try to falsify it and will give it up much more readily in favor of another hypothesis (Cowley & Byrne 2005). The same applies if the hypothesis is generated by a minority member in a group setting (Butera et al. 1992). Thus, falsification is accessible provided that the situation encourages participants to argue against a hypothesis that is not their own. [...]

When one is alone or with people who hold similar views, one’s arguments will not be critically evaluated. This is when the confirmation bias is most likely to lead to poor outcomes. However, when reasoning is used in a more felicitous context – that is, in arguments among people who disagree but have a common interest in the truth – the confirmation bias contributes to an efficient form of division of cognitive labor.

When a group has to solve a problem, it is much more efficient if each individual looks mostly for arguments supporting a given solution. They can then present these arguments to the group, to be tested by the other members. This method will work as long as people can be swayed by good arguments, and the results reviewed in section 2 show that this is generally the case. This joint dialogic approach is much more efficient than one where each individual on his or her own has to examine all possible solutions carefully. The advantages of the confirmation bias are even more obvious given that each participant in a discussion is often in a better position to look for arguments in favor of his or her favored solution (situations of asymmetrical information). So group discussions provide a much more efficient way of holding the confirmation bias in check. By contrast, the teaching of critical thinking skills, which is supposed to help us overcome the bias on a purely individual basis, does not seem to yield very good results (Ritchart & Perkins 2005; Willingham 2008).
Demons in Imperfect Search


Why are boys and girls born in roughly equal numbers? (Leaving aside crazy countries that use artificial gender selection technologies.) To see why this is surprising, consider that 1 male can impregnate 2, 10, or 100 females; it wouldn't seem that you need the same number of males as females to ensure the survival of the species. This is even more surprising in the vast majority of animal species where the male contributes very little to raising the children—humans are extraordinary, even among primates, for their level of paternal investment. Balanced gender ratios are found even in species where the male impregnates the female and vanishes into the mist.

Consider two groups on different sides of a mountain; in group A, each mother gives birth to 2 males and 2 females; in group B, each mother gives birth to 3 females and 1 male. Group A and group B will have the same number of children, but group B will have 50% more grandchildren and 125% more great-grandchildren. You might think this would be a significant evolutionary advantage.

But consider: The rarer males become, the more reproductively valuable they become—not to the group, but to the individual parent. Every child has one male and one female parent. Then in every generation, the total genetic contribution from all males equals the total genetic contribution from all females. The fewer males, the greater the individual genetic contribution per male. If all the females around you are doing what's good for the group, what's good for the species, and birthing 1 male per 10 females, you can make a genetic killing by birthing all males, each of whom will have (on average) ten times as many grandchildren as their female cousins.

So while group selection ought to favor more girls, individual selection favors equal investment in male and female offspring.

What can the principal-agent literature tell us about AI risk?

Curated. This post represents a significant amount of research, looking into the question of whether an established area of literature might be informative to concerns about AI alignment. It looks at that literature, examines its relevance in light of the questions that have been discussed so far, and checks the conclusions with existing domain experts. Finally, it suggests further work that might provide useful insights to these kinds of questions.

I do have the concern that currently, the post relies a fair bit on the reader trusting the authors to have done a comprehensive search - the post mentions having done "extensive searching", but besides the mention of consulting domain experts, does not elaborate on how that search process was carried out. This is a significant consideration since a large part of the post's conclusions rely on negative results (there not being papers which examine the relevant assumptions). I would have appreciated seeing some kind of a description of the search strategy, similar in spirit to the search descriptions included in systematic reviews. This would have allowed readers to both reproduce the search steps, as well as notice any possible shortcomings that might have led to relevant literature being missed.

Nonetheless, this is an important contribution, and I'm very happy both to see this kind of work done, as well as it being written up in a clear form on LW.

How to Frame Negative Feedback as Forward-Facing Guidance
When you're thinking about how to tell Fred that he's talking too much in staff meetings, start by asking yourself what it would look like if Fred were exceptionally awesome at that instead of deficient. This helps you visualize a complete forward-to-backward axis. Then you can frame your message to Fred in terms of moving forward on the spectrum toward awesomeness.

This somewhat reminds me of the approach used in e.g. solution-focused brief therapy, which starts by getting the client to describe what would look like a better state to them, and then proceeds to figure out steps that would lead there:

In a specific situation, the counselor may ask,
"If you woke up tomorrow, and a miracle happened so that you no longer easily lost your temper, what would you see differently?" "What would the first signs be that the miracle occurred?"
The client, in this example, (a child) may respond by saying,
"I would not get upset when somebody calls me names."
The counselor wants the client to develop positive goals, or what they will do—rather than what they will not do—to better ensure success. So, the counselor may ask the client, "What will you be doing instead when someone calls you names?"


"Suppose tonight, while you slept, a miracle occurred. When you awake tomorrow, what would be some of the things you would notice that would tell you life had suddenly gotten better?"
The therapist stays with the question even if the client describes an "impossible" solution, such as a deceased person being alive, and acknowledges that wish and then asks "how would that make a difference in your life?"  Then as the client describes that he/she might feel as if they have their companion back again, the therapist asks "how would that make a difference?"  With that, the client may say, "I would have someone to confide in and support me."  From there, the therapist would ask the client to think of others in the client's life who could begin to be a confidant in a very small manner.
Load More