riceissa

I am Issa Rice. https://issarice.com/

Takeaways from safety by default interviews

What is the plan going forward for interviews? Are you planning to interview people who are more pessimistic?

Categorization of Meta-Ethical Theories (a flowchart)

In the first categorization scheme, I'm also not exactly sure what nihilism is referring to. Do you know? Is it just referring to Error Theory (and maybe incoherentism)?

Yes, Huemer writes: "Nihilism (a.k.a. 'the error theory') holds that evaluative statements are generally false."

Usually non-cognitivism would fall within nihilism, no?

I'm not sure how the term "nihilism" is typically used in philosophical writing, but if we take nihilism=error theory then it looks like non-cognitivism wouldn't fall within nihilism (just like non-cognitivism doesn't fall within error theory in your flowchart).

I actually don't think either of these diagrams place Nihilism correctly.

For the first diagram, Huemer writes "if we say 'good' purports to refer to a property, some things have that property, and the property does not depend on observers, then we have moral realism." So for Huemer, nihilism fails the middle condition, so is classified as anti-realist. For the second diagram, see the quote below about dualism vs monism.

I'm not super well acquainted with the monism/dualism distinction, but in the common conception don't they both generally assume that morality is real, at least in some semi-robust sense?

Huemer writes:

Here, dualism is the idea that there are two fundamentally different kinds of facts (or properties) in the world: evaluative facts (properties) and non-evaluative facts (properties). Only the intuitionists embrace this.

Everyone else is a monist: they say there is only one fundamental kind of fact in the world, and it is the non-evaluative kind; there aren't any value facts over and above the other facts. This implies that either there are no value facts at all (eliminativism), or value facts are entirely explicable in terms of non-evaluative facts (reductionism).

How special are human brains among animal brains?

It seems like "agricultural revolution" is used to mean both the beginning of agriculture ("First Agricultural Revolution") and the 18th century agricultural revolution ("Second Agricultural Revolution").

Categorization of Meta-Ethical Theories (a flowchart)

Michael Huemer gives two taxonomies of metaethical views in section 1.4 of his book Ethical Intuitionism:

As the preceding section suggests, metaethical theories are traditionally divided first into realist and anti-realist views, and then into two forms of realism and three forms of anti-realism:

           Naturalism
/
Realism
/       \
/         Intuitionism
/
\
\              Subjectivism
\            /
Anti-Realism -- Non-Cognitivism
\
Nihilism


This is not the most illuminating way of classifying positions. It implies that the most fundamental division in metaethics is between realists and anti-realists over the question of objectivity. The dispute between naturalism and intuitionism is then seen as relatively minor, with the naturalists being much closer to the intuitionists than they are, say, to the subjectivists. That isn't how I see things. As I see it, the most fundamental division in metaethics is between the intuitionists, on the one hand, and everyone else, on the other. I would classify the positions as follows:

   Dualism -- Intuitionism
/
/                      Subjectivism
/                      /
\          Reductionism
\        /            \
\      /              Naturalism
Monism
\               Non-Cognitivism
\             /
Eliminativism
\
Nihilism

Open & Welcome Thread - March 2020

Do you have prior positions on relationships that you don’t want to get corrupted through the dating process, or something else?

I think that's one way of putting it. I'm fine with my prior positions on relationships changing because of better introspection (aided by dating), but not fine with my prior positions changing because they are getting corrupted.

Intelligence beyond your cone of tolerance is usually a trait that people pursue because they think it’s “ethical”

I'm not sure I understand what you mean. Could you try re-stating this in different words?

Open & Welcome Thread - March 2020

A question about romantic relationships: Let's say currently I think that a girl needs to have a certain level of smartness in order for me to date her long-term/marry her. Suppose I then start dating a girl and decide that actually, being smart isn't as important as I thought because the girl makes up for it in other ways (e.g. being very pretty/pleasant/submissive). I think this kind of change of mind is legitimate in some cases (e.g. because I got better at figuring out what I value in a woman) and illegitimate in other cases (e.g. because the girl I'm dating managed to seduce me and mess up my introspection). My question is, is this distinction real, and if so, is there any way for me to tell which situation I am in (legitimate vs illegitimate change of mind) once I've already begun dating the girl?

This problem arises because I think dating is important for introspecting about what I want, i.e. there is a point after which I can no longer obtain new information about my preferences via thinking alone. The problem is that dating is also potentially a values-corrupting process, i.e. dating someone who doesn't meet certain criteria I think I might have means that I can get trapped in a relationship.

I'm also curious to hear if people think this isn't a big problem (and if so, why).

What are some exercises for building/generating intuitions about key disagreements in AI alignment?

I have only a very vague idea of what you mean. Could you give an example of how one would do this?

Name of Problem?

I think that makes sense, thanks.

Name of Problem?

Just to make sure I understand, the first few expansions of the second one are:

• f(n)
• f(n+1)
• f((n+1) + 1)
• f(((n+1) + 1) + 1)
• f((((n+1) + 1) + 1) + 1)

Is that right? If so, wouldn't the infinite expansion look like f((((...) + 1) + 1) + 1) instead of what you wrote?

Coherence arguments do not imply goal-directed behavior

I read the post and parts of the paper. Here is my understanding: conditions similar to those in Theorem 2 above don't exist, because Alex's paper doesn't take an arbitrary utility function and prove instrumental convergence; instead, the idea is to set the rewards for the MDP randomly (by sampling i.i.d. from some distribution) and then show that in most cases, the agent seeks "power" (states which allow the agent to obtain high rewards in the future). So it avoids the twitching robot not by saying that it can't make use of additional resources, but by saying that the twitching robot has an atypical reward function. So even though there aren't conditions similar to those in Theorem 2, there are still conditions analogous to them (in the structure of the argument "expected utility/reward maximization + X implies catastrophe"), namely X = "the reward function is typical". Does that sound right?

Writing this comment reminded me of Oliver's comment where X = "agent wasn't specifically optimized away from goal-directedness".