If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.
My main "claims to fame":
Bowing out for now because I strongly suspect Tsvi has been strongly downvoting all or most of my comments in this thread. Maybe will pick it up later, in a different venue.
You may have missed my footnote, where I addressed this?
To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way". Similarly "don't roll your own crypto" doesn't mean never try to invent new cryptography, but rather don't deploy it unless there has been extensive review, and consensus that it is likely to be secure.
By "metaethics," do you mean something like "a theory of how humans should think about their values"?
I feel like I've seen that kind of usage on LW a bunch, but it's atypical. In philosophy, "metaethics" has a thinner, less ambitious interpretation of answering something like, "What even are values, are they stance-independent, yes/no?"
By "metaethics" I mean "the nature of values/morality", which I think is how it's used in academic philosophy. Of course the nature of values/morality has a strong influence on "how humans should think about their values" so these are pretty closely connected, but definitionally I do try to use it the same way as in philosophy, to minimize confusion. This post can give you a better idea of how I typically use it. (But as you'll see below, this is actually not crucial for understanding my post.)
Anyway, I'm asking about this because I found the following paragraph hard to understand:
So in the paragraph that you quoted (and the rest of the post), I was actually talking about philosophical fields/ideas in general, not just metaethics. While my title has "metaethics" in it, the text of the post talks generically about any "philosophical questions" that are relevant for AI x-safety. If we substitute metaethics (in my or the academic sense) into my post, then you can derive that I mean something like this:
Different metaethics (ideas/theories about the nature of values/morality) have different implications for what AI designs or alignment approaches are safe, and if you design an AI assuming that one metaethical theory is true, it could be disastrous if a different metaethical theory actually turns out to be true.
For example, if moral realism is true, then aligning the AI to human values would be pointless. What you really need to do is design the AI to be able to determine and follow objective moral truths. But this approach would be disastrous if moral realism is actually false. Similarly, if moral noncognitivism is true, that means that humans can't be wrong about their values, and implies "how humans should think about their values" is of no importance. If you design AI under this assumption, that would be disastrous if actually humans can be wrong about their values and they really need AIs to help them think about their values and avoid moral errors.
I think in practice a lot of alignment researchers may not even have explicit metaethical theories in mind, but are implicitly making certain metaethical assumptions in their AI design or alignment approach. For example they may largely ignore the question of how humans should think about their values or how AIs should help humans think about their values, thus essentially baking in an assumption of noncognitivism.
You're conceding that morality/values might be (to some degree) subjective, but you're cautioning people from having strong views about "metaethics," which you take to be the question of not just what morality/values even are, but also a bit more ambitiously: how to best reason about them and how to (e.g.) have AI help us think about what we'd want for ourselves and others.
If we substitute "how humans/AIs should reason about values" (which I'm not sure has a name in academic philosophy but I think does fall under metaphilosophy, which covers all philosophical reasoning) into the post, then your conclusion here falls out, so yes, it's also a valid interpretation of what I'm trying to convey.
I hope that makes everything a bit clearer?
Conditional on True Convergent Goodness being a thing, companionate love would not be one of my top candidates for being part of it, as it seems too parochial to (a subset of) humans. My current top candidate would be something like "maximization of hedonic experiences" with a lot of uncertainty around:
Other top candidates include negative or negative-leaning utilitarianism, and preference utilitarianism (although this is a distant 3rd). And a lot of credence on "something we haven't thought of yet."
A lab leader who’s concerned enough to slow down will be pressured by investors to speed back up, or get replaced, or get outcompeted. Really you need to convince the whole lab and its investors. And you need to be more convincing than the magic of the market!
This seems to imply that lab leaders would be easier to convince if there were no investors and no markets, in other words if they had more concentrated power.
If you spread out the power of AI more, won't all those decentralized nodes of spread out AI power still have to compete with each other in markets? If market pressures are the core problem, how does decentralization solve that?
I'm concerned that your proposed solution attacks "concentration of power" when the real problem you've identified is more like market dynamics. If so, it could fail to solve the problem or make it even worse.
My own perspective is that markets are a definite problem, and concentration of power per se is more ambiguous (I'm not sure if it's good or bad). To solve AI x-safety we basically have to bypass or override markets somehow, e.g., through international agreements and government regulations/bans.
A difference is that Tsvi is still plenty motivated to talk on a meta level (about why he banned me), as evidenced by this post. So he could have easily said "I no longer want to talk about the object level. I think you're doing a bad thing, [explanation ...], please change your behavior if you agree, or let me know why you don't (on the meta level)." Or "I'm writing up an explanation of what you're doing wrong in this thread. Let's pause this discussion until I finish it."
Or if he actually doesn't want to talk at all, he could have said "I'm getting really annoyed so I'm disengaging." or "I think you're doing a bad thing here, here's a short explanation but I don't want to discuss it further. Please stop it or I'll ban you."
Note that I'm not endorsing banning or threat of banning in an absolute sense, just suggesting that all of these are more "pro-social" than banning someone out of the blue with no warning. None of these involve asking him to "suck it up and keep talking to me" or otherwise impose a large cost on him.
Need: A way to load all comments and posts of a user. Right now it only loads the top N by karma.
Want: A "download" button, for some users who have up to hundreds of MB of content, too unwieldy to copy/paste. Ability to collate/sort in various ways, especially as flat list of mixed posts and comments, sorted by posting date from oldest to newest.
I explained what probably caused this here. I think the current "Popular Comments" feature might often cause this kind of decontextualized voting, and there should perhaps be a way to mitigate it, like let the author of the post or of the comment remove a comment from Popular Comments.