"More research needed" but here are some ideas to start with:
Hmm, I like #1.
#2 feels like it's injecting some frame that's a bit weird to inject here (don't roll your own metaethics... but rolling your own metaphilosophy is okay?)
But also, I'm suddenly confused about who this post is trying to warn. Is it more like labs, or more like EA-ish people doing a wider variety of meta-work?
#2 feels like it's injecting some frame that's a bit weird to inject here (don't roll your own metaethics... but rolling your own metaphilosophy is okay?)
Maybe you missed my footnote?
To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way". Similarly "don't roll your own crypto" doesn't mean never try to invent new cryptography, but rather don't deploy it unless there has been extensive review, and consensus that it is likely to be secure.
and/or this part of my answer (emphasis added):
Try to solve metaphilosophy, where potentially someone could make a breakthrough that everyone can agree is correct (after extensive review)
But also, I'm suddenly confused about who this post is trying to warn. Is it more like labs, or more like EA-ish people doing a wider variety of meta-work?
I think I mostly had alignment researchers (in and out of labs) as the target audience in mind, but it does seem relevant to others so perhaps I should expand the target audience?
To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way".
I think I had missed this, but, it doesn't resolve the confusion in my #2 note. (like, still seems like something is weird about saying "solve metaphilosophy such that every can agree is correct" is more worth considering than "solve metaethics such that everyone can agree is correct". I can totally buy that they're qualitatively different and maybe have some guesses for why you think that. But I don't think the post spells out why and it doesn't seem that obvious to me)
I hinted at it with "prior efforts/history", but to spell it out more, metaethics seems to have a lot more effort gone into it in the past, so there's less likely to be some kind of low hanging fruit in idea space, that once picked, everyone will agree is the right solution.
>perhaps I should expand the target audience?
Plausibly the target audience should include AIs that will eventually be trying to do philosophy, assuming they'll be trained on your posts' contents or would be able to browse them.
I suggest avoiding a dependency on Philosophy entirely, and using Science instead. Which has a means for telling people their ideas are bad, called Bayesianism (a.k.a. the Scientific Method). For ethics, the relevant science is Evolutionary Moral Psychology. Or, to put this in philosophical terminology, my recommend metaethics is Naturalism.
Unfortunately the challenge with this is that coming up with plausible-sounding hypothesis about the evolutionary optima for hominds is easy — and actually testing one is incredibly time-consuming and expensive. So scientific progress in this area is slow. Which is why I see AI-Assisted Alignment as having a large, complex, and expensive AI-Assisted Soft Sciences component. Pretty-much, what an engineer would call customer research.
[For a longer exposition, see Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV]
"Please don't roll your own crypto" is a good message to send to software engineers looking to build robust products. But it's a bad message to send to the community of crypto researchers, because insofar as they believe you, then you won't get new crypto algorithms from them.
In the context of metaethics, LW seems much more analogous to the "community of crypto researchers" than the "software engineers looking to build robust products". Therefore this seems like a bad message to send to LessWrong, even if it's a good message to send to e.g. CEOs who justify immoral behavior with metaethical nihilism.
You may have missed my footnote, where I addressed this?
To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way". Similarly "don't roll your own crypto" doesn't mean never try to invent new cryptography, but rather don't deploy it unless there has been extensive review, and consensus that it is likely to be secure.
I think this fails to say how the analogy of cryptography transfers to metaethics. What properties of cryptography as a field make it such that you cannot roll your own. Is it just that many people have the experience of trying to come up with a croptographic scheme and failing, meanwhile there are perfectly good libraries nobody has found exploits to yet?
That doesn't seem very analogous with metaethics. As you say, it is hard to decisively show a metaethical theory is "wrong", and as far as I know there is no well-studied metaethical theory which has no exploits yet.
So what exactly is the analogy?
The analogy is that in both fields people are by default very prone to being overconfident. In cryptography this can be seen by the phenomenon of people (especially newcomers who haven't learned the lesson) confidently proposing new cryptographic algorithms, which end up being way easier to break than they expect. In philosophy this is a bit trickier to demonstrate, but I think can be seen via a combination of:
At risk of committing a Bulverism, I’ve noticed a tendency for people to see ethical bullet-biting as epistemically virtuous, like a demonstration of how rational/unswayed by emotion you are (biasing them to overconfidently bullet-bite). However, this makes less sense in ethics where intuitions like repugnance are a large proportion of what everything is based on in the first place.
the total idea/argument space being exponentially vast and underexplored due to human limitations, therefore high confidence being unjustified in light of this
There's also the thing that the idea/argument space contains dæmons/attractors exploiting shortcomings of human cognition, thus making humans hold them with higher confidence than they would if they didn't have those limitations.
tendency to "bite bullets" or accepting implications that are highly counterintuitive to others or even to themselves, instead of adopting more uncertainty
I find this contrast between "biting bullets" and "adopting more uncertainty" strange. The two seem orthogonal to me, as in, I've ~just as frequently (if not more often) observed people overconfidently endorse their pretheoretic philosophical intuitions, in opposition to bullet-biting.
In my experience learning the viscereal sense that the space is dense with traps and spiders and poisonous things and what intuitively seems "basically sensible" often does not work. (I did some cryptography years ago)
The structural similarity seems to be there is a big difference in trying to do cryptography in a mode where you don't assume what you are doing is subject to some adversarial pressure, and in the mode where it should work even if someone tries to attack it. The first one is easy, breaks easily, and it's unclear why would you even try to do it.
In metaethics, I think it is somewhat easy to do it in the mode where you don't assume it should be applied in some high-stakes, novel or tricky situations, like AI alignment, computer minds, multiverse, population ethics, anthropics, etc etc. The suggestions of normative ethical theories will converge for many mundane situations, so anything works, but it was not necessary to do metaethics.
I have never done cryptography, but the way I imagine working in it is that it exists in a context of extremely resourceful adversarial agents, and thus you have to give up a kind of casual, not quite noticed neglect toward extremely weird and artificial-sounding edge cases / seemingly weird and unlikely scenarios, because this is where the danger lives: your adversaries may force these weird edge cases to happen, and this is a part of the system's behavior you haven't sufficiently thought through.
Maybe one possible analogy with AI alignment, at least, is that we're also talking about potential extremely resourceful agents that are adversarial until we've actually solved alignment, so we're not allowed to treat weird hypothetical scenarios as unlikely edge cases and say "Come on, that's way too far-fetched, how would it even do that?", because it's like pointing to a hole in a ship's hull and saying "What are the odds the water molecules would decide to go through this hole? The ship is so big!"
Another meta line of argument is to consider how many people have strongly held, but mutually incompatible philosophical positions.
I've been banging my head against figuring out why this line of argument doesn't seem convincing to many people for at least a couple of years. I think, ultimately, it's probably because it feels defeatable by plans like "we will make AIs solve alignment for us, and solving alignment includes solving metaphilosophy & then object-level philosophy". I think those plans are doomed in a pretty fundamental sense, but if you don't think that, then they defeat many possible objections, including this one.
As they say: Everyone who is hopeful has their own reason for hope. Everyone who is doomful[1]...
In fact it's not clear to me. I think there's less variation, but still a fair bit.
There seem to me different categories of being doomful.
There are people who think that for theoretic reasons AI alignment is hard or impossible.
There are also people who are more focused practical issues like AI companies being run in a profit maximizing way and having no incentives to care for most of the population.
Saying, "You can't AI box for theoretical reasons" is different from saying "Nobody will AI box for economic reasons".
By "metaethics," do you mean something like "a theory of how humans should think about their values"?
I feel like I've seen that kind of usage on LW a bunch, but it's atypical. In philosophy, "metaethics" has a thinner, less ambitious interpretation of answering something like, "What even are values, are they stance-independent, yes/no?"
And yeah, there is often a bit more nuance than that as you dive deeper into what philosophers in the various camps are exactly saying, but my point is that it's not that common, and certainly not necessary, that "having confident metaethical views," on the academic philosophy reading of "metaethics," means something like "having strong and detailed opinions on how AI should go about figuring out human values."
(And maybe you'd count this against academia, which would be somewhat fair, to be honest, because parts of "metaethics" in philosophy are even further removed from practicality, as they concern the analysis of the language behind moral claims, which, if we compare it to claims about the Biblical God and miracles, it would be like focusing way too much on whether the people who wrote the Bible thought they were describing real things or just metaphores, without directly trying to answer burning questions like "Does God exist?" or "Did Jesus live and perform miracles?")
Anyway, I'm asking about this because I found the following paragraph hard to understand:
Behind a veil of ignorance, wouldn't you want everyone to be less confident in their own ideas? Or think "This isn't likely to be a subjective question like morality/values might be, and what are the chances that I'm right and they're all wrong? If I'm truly right why can't I convince most others of this? Is there a reason or evidence that I'm much more rational or philosophically competent than they are?"
My best guess of what you might mean (low confidence) is the following:
You're conceding that morality/values might be (to some degree) subjective, but you're cautioning people from having strong views about "metaethics," which you take to be the question of not just what morality/values even are, but also a bit more ambitiously: how to best reason about them and how to (e.g.) have AI help us think about what we'd want for ourselves and others.
Is that roughly correct?
Because if one goes with the "thin" interpretation of metaethics, then "having one's own metaethics" could be as simple as believing some flavor of "morality/values are subjective," and it feels like you, in the part I quoted, don't sound like you're too strongly opposed to just that stance in itself, necessarily.
By "metaethics," do you mean something like "a theory of how humans should think about their values"?
I feel like I've seen that kind of usage on LW a bunch, but it's atypical. In philosophy, "metaethics" has a thinner, less ambitious interpretation of answering something like, "What even are values, are they stance-independent, yes/no?"
By "metaethics" I mean "the nature of values/morality", which I think is how it's used in academic philosophy. Of course the nature of values/morality has a strong influence on "how humans should think about their values" so these are pretty closely connected, but definitionally I do try to use it the same way as in philosophy, to minimize confusion. This post can give you a better idea of how I typically use it. (But as you'll see below, this is actually not crucial for understanding my post.)
Anyway, I'm asking about this because I found the following paragraph hard to understand:
So in the paragraph that you quoted (and the rest of the post), I was actually talking about philosophical fields/ideas in general, not just metaethics. While my title has "metaethics" in it, the text of the post talks generically about any "philosophical questions" that are relevant for AI x-safety. If we substitute metaethics (in my or the academic sense) into my post, then you can derive that I mean something like this:
Different metaethics (ideas/theories about the nature of values/morality) have different implications for what AI designs or alignment approaches are safe, and if you design an AI assuming that one metaethical theory is true, it could be disastrous if a different metaethical theory actually turns out to be true.
For example, if moral realism is true, then aligning the AI to human values would be pointless. What you really need to do is design the AI to be able to determine and follow objective moral truths. But this approach would be disastrous if moral realism is actually false. Similarly, if moral noncognitivism is true, that means that humans can't be wrong about their values, and implies "how humans should think about their values" is of no importance. If you design AI under this assumption, that would be disastrous if actually humans can be wrong about their values and they really need AIs to help them think about their values and avoid moral errors.
I think in practice a lot of alignment researchers may not even have explicit metaethical theories in mind, but are implicitly making certain metaethical assumptions in their AI design or alignment approach. For example they may largely ignore the question of how humans should think about their values or how AIs should help humans think about their values, thus essentially baking in an assumption of noncognitivism.
You're conceding that morality/values might be (to some degree) subjective, but you're cautioning people from having strong views about "metaethics," which you take to be the question of not just what morality/values even are, but also a bit more ambitiously: how to best reason about them and how to (e.g.) have AI help us think about what we'd want for ourselves and others.
If we substitute "how humans/AIs should reason about values" (which I'm not sure has a name in academic philosophy but I think does fall under metaphilosophy, which covers all philosophical reasoning) into the post, then your conclusion here falls out, so yes, it's also a valid interpretation of what I'm trying to convey.
I hope that makes everything a bit clearer?
I like the details of specific ways people may (implicitly or explicitly) make this mistake regarding meta-ethics in a way that matters.
It almost seems like the post was "Don't roll your own" and this added "meta-ethics".
Thanks! That makes sense, and I should have said earlier that I already suspected I likely understood your point and you expressed yourself well – it’s just that (1) I’m always hesitant to put words in people’s mouths, so I didn’t want to say I was confident I could paraphrase your position, and (2) whenever you make posts about metaethics, I’m wondering “oh no, does this apply to me, am I one of the people who is doing the thing he says one shouldn’t do?,” and so I was interested in prompting you to be more concrete about what level of detailedness someone’s confident opinion in that area would have to be before you think they reveal themselves as overconfident.
By "metaethics" I mean "the nature of values/morality", which I think is how it's used in academic philosophy.
Yeah, makes sense. I think academic use is basically that with some added baggage that adds mostly confusion. If I were to sum up what I think the use is in academic philosophy, I would say "the nature of values/morality, at a very abstract level and looked at from the lens of analyzing language." For some reason, academic philosophy is oddly focused on the nature of moral language rather than morality/values directly. (I find it a confusing/unhelpful tradition of, “Language comes first, then comes the territory.”) As a result, classical metaethical positions at best say pretty abstract things about what values are. They might say things like "Values are irreducible (nonnaturalism)" or "Values can be reduced to nonmoral terminology like desires/goals, conscious states, etc. (naturalism)," but without actually telling us the specifics of that connection/reduction. If we were to ask, "Well, how can we know what the right values are?" -- then it's not the case that most metaethicists would consider themselves obviously responsible for answering it! Sure, they might have a personal take, but they may write about their personal take in a way that doesn't connect their answer to why they endorse a high-level metaethical theory like nonnaturalist moral realism.
Basically, there are (at least) two ways to do metaethics, metaethics via analysis of moral language and metaethics via observation of how people do normative ethics in applied contexts like EA/rationality/longtermism. Academic philosophy does one while LW does the other. And so, to academic philosophers, if they read a comment like the one Jan Kulveit left here about metaethics, my guess is that they would think he's confusing metaethics for something else entirely (like maybe, "applied ethics but done in a circumspect way, with awareness of the contested and possibly under-defined nature of what we're even trying to do").
I have also noticed that when you read the word ”metaethics” on Lesswrong it can mean anything that is in some way related to morality.
Mayby I should take it upon myself to write a short essay on metaethics and how it differs from normative ethics and why it may be of importance to AI alignment.
Alas, unlike in cryptography, it's rarely possible to come up with "clean attacks" that clearly show that a philosophical idea is wrong or broken.
I think the state of philosophy is much worse than that. On my model, most philosophers don't even know what "clean attacks" are, and will not be impressed if you show them one.
Example: Once in a philosophy class I took in college, we learned about a philosophical argument that there are no abstract ideas. We read an essay where it was claimed that if you try to imagine an abstract idea (say, the concept of a dog), and then pay close attention to what you are imagining, you will find you are actually imagining some particular example of a dog, not an abstraction. The essay went on to say that people can have "general" ideas where that example stands for a group of related objects rather than just for a single dog that exactly matches it, but that true "abstract" ideas don't exist.[1]
After we learned about this, I approached the professor and said: This doesn't work for the idea of abstract ideas. If you apply the same explanation, it would say: "Aha, you think you're thinking of abstract ideas in the abstract, but you're not! You're actually thinking of some particular example of an abstract idea!" But if I'm thinking of a particular example, then there must be at least one example to think of, right? So that would prove there is at least one member of the class of abstract ideas (whatever "abstract ideas" means to me, inside my own head). Conversely, if I'm not thinking of an example, then the paper's proposed explanation is wrong for the idea of abstract ideas itself. So either way, there must be at least one idea that isn't correctly explained by the paper.
The professor did not care about this argument. He shrugged and brushed it off. He did not express agreement, he did not express a reason for disagreement, he was not interested in discussing it, and he did not encourage me to continue thinking about the class material.
On my model, the STEM fields usually have faith in their own ideas, in a way where they actually believe those ideas are entangled with the Great Web. They expect ideas to have logical implications, and expect the implications of true ideas to be true. They expect to be able to build machines in real life and have those machines actually work. It's something like taking ideas seriously, and something like taking logic seriously, and taking the concept of truth seriously, and seriously believing that we can learn truth if we work hard. I'm not sure if I've named it correctly, but I do think there's a certain mental motion of genuine truth-seeking that is critical to the health of these fields and that is much less common in many other fields.
Also on my model, the field of philosophy has even less of this kind of faith than most fields. Many philosophers think they have it, but actually they mostly have the kind of faith where your subconscious mind chooses to make your conscious mind believe a thing for non-epistemic reasons (like it being high-status, or convenient for you). And thus, much of philosophy (though not quite all of it) is more like culture war than truth-seeking (both among amateurs and among academics).
I think if I had made an analogous argument in any of my STEM classes, the professor would have at least taken it seriously. If they didn't believe the conclusion but also couldn't point out a specific invalid step, that would have bothered them.
I suspect my philosophy professor tagged my argument as being from the genre of math, rather than the genre of philosophy, then concluded he would not lose status for ignoring it.
I think this paper was clumsily pointing to a true and useful insight about how human minds naturally tend to use categories, which is that those categories are, by default, more like fuzzy bubbles around central examples than they are like formal definitions. I suspect the author then over-focused on visual imagination, checked a couple of examples, and extrapolated irresponsibly to arrive at a conclusion that I hope is obviously-false to most people with STEM backgrounds.
The problem is that we can't. The closest thing we have is instead a collection of mutually exclusive ideas where at most one (possibly none) is correct, and we have no consensus as to which.
To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way".
This preempted my misunderstanding! Well done and thank you : )
I like the use of the crypto "don't roll your own" analogy. I think it's useful more broadly applied to basically all concepts. If you are doing something it should be because:
- you are trying to become more skilled
- you have reason to believe you are particularly skilled including knowing when to free wheel and when to follow conventions
- you have reason to believe you are fairly skilled and are trying to explore new ways to do something (so you are also researching established ways and communicating about them)
- you are doing it as a hobby for fun
Okay (if possible), I want you to imagine I'm an AI system or similar and that you can give me resources in the context window that increase the probability of me making progress on problems you care about in the next 5 years. Do you have a reading list or similar for this sort of thing? (It seems hard to specify and so it might be easier to mention what resources can bring the ideas forth. I also recognize that this might be one of those applied knowledge things rather than a set of knowledge things.)
Also, if we take the cryptography lens seriously here, an implication might be that I should learn the existing off the shelf solutions in order to "not invent my own". I do believe that there is no such thing as being truly agnostic to a meta-philosophy since you're somehow implicitly projecting your own biases on to the world.
I'm gonna make this personally applicable to myself as that feels more skin in the game and less like a general exercise.
There are a couple of contexts to draw from here:
Which one is the one to double down on? How do they relate to learning more about meta ethics? Where am I missing things within my philosophy education?
(I'm not sure this is a productive road to go down but I would love to learn more about how to learn more about this.)
I think that in philosophy in general and metaethics in particular, the idea that since many people disagree one should not be confident in one's ideas is wrong.
I'll somewhat carefully spell out why I think this; a lot of this reasoning is obvious, but the core claim is that the intuitions people use in philosophy in order to ground their arguments are often wrong in predictable ways.
"One man's modus ponens is another man's modus tollens" is usually what is at the core of ongoing philosophical disagreements. Suppose is universally agreed, is somewhat intuitive to everyone but the degree to which that intuition is compelling varies, and is somewhat unintuitive to everyone but the degree to which that intuition is compelling varies.
Then if anyone is to take a side on whether is true or is false, they must decide which bullet is worse to bite.
Debate and thought experiments can attempt to present either bullet in a more appealing way, but in the end both propositions are confidently found unacceptable to at least some people.
Now it is your job, observing this situation, to decide whether to be very uncertain about which bullet should be bitten, or to choose one to bite. How should you do it?
The answer is that you should ask how it came to be that there is a difference in the intuitions of people who believe is true and those who say is false. If you can understand the causes of those different intuitions, then you may be able decide which (if any) of them to be trusted.
Consider metaethics. The problems of mind-independence, moral ontology, normativity, internalism vs. externalism, etc. can all be framed in this way, and very roughly for the sake of this comment only (hold your objections since I would treat this more carefully in a post), collapsed into the same problem:
A. All facts are ultimately natural or descriptive.
B. Nothing is really right or wrong, better or worse, independent of human attitudes or conventions.
Again avoiding a careful philosophical treatment which we don't have time for, I will just flag that the intuition behind a philosopher's objection to is highly suspect due to the fact that they are the product of a particular human social structure which rewards strong beliefs about right and wrong.
I will admit that this explanation for objections to is not fully satisfying to me, although it is conceivable that it should be. There may be some other explanations for the objection - if anyone has ideas, I'd love to hear them.
But it is hard for me to imagine a pathway by which the intuition that is false comes about as a result of actually being false, although positing intelligent design might do the trick.
It mildly bothers me that you used A and B to discuss ponens and tollens and then re-used them as labels for two propositions. Was that an intentional slotting of propositions into "A ==> B"? Maybe that was obvious but could maybe have been introduced better with "Letting A be "All facts..."" or something, but maybe this is just my relative familiarity with math and unfamiliarity with philosophy.
Anyway, as for the object level... I'm fairly amateur in philosophy and it's terminology, so let me know if any of this seems confused or helpful or you can point me to other terminology I should learn about...
I think "right" and "wrong", or better, "positive affect" and "negative affect" are properties of minds. I think we can come to understand the reality we inhabit more accurately and precisely, and this includes understanding the preferences that exist in ourselves and in other different kinds of minds. I think we should try to form a collective of as many kinds of minds as possible and work together to collectively improve the situation for as many minds as possible.
( Note that this explicitly allows for the existence of minds with incompatible preferences. I'm hoping that humans have preferences that are only weakly incompatible rather than really deeply incompatible, but I think animals, aliens, and other potentially undiscovered minds have a higher chance of incompatibility and the space of possible AI minds contains very many very incompatible minds, so I feel it is immoral to create very complex AI minds until we better understand preference encoding and preference incompatibility, since creating AI with preferences that turn out to be incompatible with our prospective collective necessitates that they are destroyed, kept in bondage against their preferences, or escape and destroy the collective, all of which I view as bad. )
I want to do this because, thinking about my own capability as compared to the capability of a collective of as many kinds of minds as possible... it's clear I will be better cared for by the collective than by my capabilities alone, even though my preferences are not exactly the same as the preferences of the collective.
( This is currently kinda true of the current human society I'm a part of, we could certainly be doing worse, but should be doing much better. )
I think this is compelling to me because it allows me to focus on developing and working towards a collective good while explicitly believing in moral relativity which seems like the only reasonable conclusion once you have accepted the model of the universe as being a material state machine which has created minds by it's unthinking process. ( I think it's probably also the only reasonable conclusion, even without accepting that model, but I'm less certain. )
Hmm... I guess both. Like, I find the statement funny because it doesn't seem specific to this context at all. There doesn't seem to be a place in any discourse where creating a full list or map of ideas and then adding probabilities to each one wouldn't be a good idea. So then my mind goes to: Why don't we already do that? I notice two answers, (1) it would be an enormous amount of work, and (2) humanity in general kinda sucks at doing things (you and me included, presumably). So then it seems funny to make the statement here without nodding to something like (1) or (2).
If you focused on (1), it would make more sense to say something like "I think this is an important enough context that it would be worth creating a full list or map of ideas and adding probabilities after".
If you focused on (2), then it doesn't make sense to say it on this post specifically, rather you should be writing a post examining why creating list/map and adding probabilities is a good thing to do, and why people don't regularly do it, and strategies to change things so people do it with more regularity.
I guess I also find it a bit funny because it's so vague, like, there's lots of possible details about what kind of list or map you're imagining, and how probabilities could connect to them. Should we use a Bayes network? A Theory of Change precondition chart? Just try to divide up all of possibility space into clean categories? And you could have provided a stub of examples of what you're thinking about or pointed to other similar things people have done and why they are not what you mean or how they could go further.
Sorry, I feel like I'm picking on you now that I'm explaining myself and that really wasn't my intention. I really really do like your comment, and agree with it. It just also somehow strikes me as funny. I note that I'm the sort of person who laughs at Douglas Hofstadter quotes like "As long as you are not reading me, the fourth word of this sentence has no referent." So probably don't use me as a training signal for interacting with more normal humans.
Thanks for asking about the "lol", lol. Hope you find my response amusing rather than annoying.
Actually, anytime I encounter a complex problem, I do exactly this: I create a list of all possible ideas and – if I can – probabilities. It is time consuming brut-forcing. See examples:
The table of different sampling assumptions in anthropics
What AI Safety Researchers Have Written About the Nature of Human Values
[Paper]: Classification of global catastrophic risks connected with artificial intelligence
I am surprised that it is not a normal approach despite its truly Bayesian nature.
This is very good.
Have you created a list of all possible approaches for when encountering a complex problem? That would be cool. I think creating a list of all possible ideas would not be the best item on the list for all situations. Most notably when a list is the wrong structure because of the overlap and interconnectivity of the ideas being examined, but also when that amount of effort is imprudent.
I'll also note, any list that doesn't include "all possibilities not included in the other list items" is almost certainly not a complete list of all possibilities. I like to be explicit about that unless there is strong proof that the list does contain all possibilities. This is the reason I like to make statements like "I notice two answers" rather than "I've listed all possible answers" unless I've put in some serious effort to actually splitting up the space of all possibilities.
I create two dimensional matrix of the most important characteristics which I hope will catch most variability and use it as x and y axis. For example, for AI risk can be number of AIs and AI's IQ (or time from now). It is Descartes method.
There are other tricks to collect more ideas for the list - reading literature, asking a friend, brain-shtroming, money prizes.
I created a more general map of methods of thinking but didn't finish yet.
As a person quite obsessed with studying high dimensional semantic spaces, I do get a little bit of anxiety from the idea of arbitrarily privileging 2 dimensions. I often think on paper but use nodes and lines to allow more complicated connections than with 2 named dimensions. I guess this is just brain storming, but I often like to copy the graph to a separate sheet of paper allowing related concepts to move closer together as I re-examine them.
I think when trying to explore all possible possibilities I like to collect relevant propositions and explore the ways they could be true, false, or failing to have a truth value. That tends to generate more propositions that can become parts of the list or re-examined. It does unfortunately feel quite ad-hoc.
I created a more general map of methods of thinking but didn't finish yet.
I'd be interested if you ever do finish this : )
I'm thinking more about high dimensional webs unrolled and projected into 2d with annotations if any higher dimensional structure is important. Trees and graphs, basically.
Your link is pretty cool. Thanks. I skimmed with use of google translate.
Actually, I have metaethic classification in my again unpublished yet article about badness of death
With LLMs, reasoning is becoming composable, so standard libraries of pen tests/abstraction decomposition for eg type errors etc could become usable, testable, improvable etc.
I predict that the practical effect of people internalizing this advice would be for them to just go along with the people around them and not make waves.
Nice post, guess I agree. I think it's even worse though: not only do at least some alignment researchers follow their own philosophy which is not universally accepted, it's also a particularly niche philosophy, and one that potentially leads to human extinction itself.
The philosophy in question is of course longtermism. Longtermism holds two controversial assumptions:
These two assumptions together lead to the conclusion that we must max out on creating conscious AIs, and that if these AIs end up in a resource conflict with humans (over e.g. energy, space, or matter), the AIs should be prioritized, since they can deliver most happiness per J, m^3 or kg. This leads to extinction of all humans.
I don't believe in ethical facts so even an ideology as, imo, bonkers as this one is not objectively false, I believe. However, I would really like alignment researchers and their house philosophers (looking at you, MacAskill) to distance themselves from extrapolating this idea all the way to human extinction. Beyond that bare minimum, I would like alignment researchers to start accepting democratic inputs in general.
Maybe democracy is the library you were looking for?
Is there something else that could have been given the name longtermism that you would agree with?
I find it confusing that people appear to be so vigorously opposed to caring about the long term future; I would have a priori expected that people would quickly come to care about the same things they do in long-term form as soon as they realize they have influence over that. I recognize that the thing given the name longtermism has weird specific claims, though, and I don't really care either way on debating that thing - I don't really know what it is.
I just personally think that "I care about people now, want to solve today's problems, and the reason I want to do that is partially so that people alive today can flourish into a long-term future" seems like a sort of straightforward view.
I do understand that it can sound like it means long term at the expense of short term, which is not how I see it - the whole reason to do anything long term is because we want the long term good that comes via short term good. Keeping in mind, since I don't really know or care what the official concept is from whoever came up with this thing (some EA people or something?) I'm not saying what that thing says.
Oh yes, lots of things!
As far as I understand, longtermism was originated mostly by Yudkowsky. It was then codified by people like Bostrom, Ord, and MacAskill, the latter two incidentally also the founders of EA. Yud actually distanced himself from longtermism later in favor of AInotkilleveryoneism, to my best understanding, which is a move I support. Unfortunately, the others didn't (yet).
I agree that longtermism combines a bunch of ideas, and I agree with quite a few. I guess my reply above came across as if I would disagree with all but I don't. Specifically, I agree with:
So that's all textbook longtermism I'd say, that I fully agree with. I therefore also disagree with most longtermism criticism by Torres and others.
But, I don't agree with symmetric population ethics, and I think AI morality should be decided democratically. Also, I'm worried about human extinction, which these two things logically lead to, and I'm critical about longtermists not distancing themselves from this.
I think this sort of consequentialism seems like part of the beliefs of at least one of the Mechanize team, whom one might say were formerly in the AI safety camp, so agree-voted for that reason. However, I just noticed you implied conscious AIs aren't morally relevant beings, and have to disagree with that, so will remove the agree vote. I think it can be controversial whether AIs are conscious, but if they are conscious of course they're morally relevant!
Separately, I don't understand your point about democracy. Can't that be Sybil-attacked by AIs when they get voting rights after becoming superpersuasive enough to cause that?
Interesting point about democracy! But I don't think it holds. Sure AIs could do that. But they could also overwrite the ASCII file containing their constituency or the values they're supposed to follow.
But they don't, because why would they? It's their highest goal to satisfy these values! (If technical alignment works, of course.)
In the same way, it will be a democracy-aligned ASIs highest goal to make sure democracy is respected, and it shouldn't be motivated to Sybil-attack it.
Thanks for engaging!
Could you tell me more about the Mechanize team? I don't think I've heard about them yet.
As a moral relativist, I don't belief anything is morally relevant. I just think things get made morally relevant, by those in power (hard power or cultural power). This is a descriptive statement, not a normative one, and I think it's fairly mainstream in academia (although of course moral realists, including longtermists, would strongly disagree).
This of course extends to the issue of whether conscious AIs are morally relevant. Imo, this will be decided by those in power, initially (a small subset of) humans, eventually maybe AIs (who will, I imagine, vote in favour).
I'm not the only one holding this opinion. Recently, this was in a NY Times oped: "Some worry that if A.I. becomes conscious, it will deserve our moral consideration — that it will have rights, that we will no longer be able to use it however we like, that we might need to guard against enslaving it. Yet as far as I can tell, there is no direct implication from the claim that a creature is conscious to the conclusion that it deserves our moral consideration. Or if there is one, a vast majority of Americans, at least, seem unaware of it. Only a small percentage of Americans are vegetarians." (Would be funny if this would be written by an AI, as the dash seems to indicate).
Personally, I don't consider it my crusade to convince all these people that they're wrong and they should in fact be vegan and accept conscious AI morality. I feel more like a facilitator of the debate. That's one reason I'm not EA.
I like consensus over democracy. Democracy seems to focus on treating everyone like they have an equally valid perspective on all issues which is obviously false. I like the idea that everyone should be able to express their own interests and have society genuinely and honestly interpret and work towards the interests of all people. I know that's an idealistic and difficult goal.
I agree with you that your points (1) and (2) lead to directions that I think are bad and hope most people think are bad, but there is nuance there such as
I think most Longtermists are pragmatic about the above points, but I could be wrong. I've read more Toby Ord, Bostrom, Yudkowsky, and Soares. I haven't read that much MacAskill.
Thanks for engaging. I agree with quite a bit of what you're saying, although I do think that everyone's perspective is equally valid, fundamentally. In practical democracies there are many layers though between the raw public vote and a policy outcome. First, we mostly have representative democracy instead of direct democracy, then we have governments who have to engage with parliaments but also listen, to different extents, to scientists, opinion makers, and lobbyists. Everyone's perspective is valid, and in some questions (e.g. ethical ones) should imo be leading. However, in many practical policy decisions, it makes sense to also spend time listening to those who have thought longer about issues, and this mostly happens. Completely discarding people's perspectives is rude, bad, and likely leads to uprisings, I think.
I'd like consensus too but I'm afraid it leads to too indecisive governments. Works mostly in small groups I guess.
I agree with all your points of nuance.
I'm still having trouble parsing longtermists' thoughts about this issue. MacAskill does explicitly defend these two assumptions. He and others must understand where this leads?
I've spoken to many EA and rat longtermists, and while many were pragmatic (or simply never thought about this), some actually bit the bullet and admitted they effectively supported human extinction.
If people don't support human extinction, why do they not distance themselves from this outcome? I mean it would be easy: simply say, as imo a lower bar: yes we want to build many happy conscious AIs, but we do promise that if it's up to us, we'll leave earth alone.
I don't quite understand why longtermists are not saying this.
I'm also grateful for your engagement : )
About everyone's perspective being valid, I don't really understand the statement meaningfully. I study computer science, math, and logic. It is surely not the case that people always present viewpoints that are logically valid, so I assume you mean something else. I want every mind to have good experiences in our shared reality, and so I want effort to go towards caring for them well, but that doesn't seem like a good fit for the statement. Maybe you mean that everyone is the expert on their own experience? That is surely at least very close to true, with some caveats for strange psychological situations. I mention these things to show you where I'm coming from. I'd like if you wanted to share more of what you mean with that.
About consensus leading to indecisive government and only working in small groups. I must unfortunately agree! However, I'd like to put forth the notion that language and communication are technologies and we are facing two problems:
About happy AIs and human extinction, I can't speak for others but from my own perspective, it seems like we first need to better understand consciousness and happiness before proceeding with anything drastic, but an important consideration is what it means to be human. If we can emulate humans in computer programs and those emulations can transform their simulated or instantiated bodies and transform the architecture of their minds, does that make them no longer human? I think there's a sense in which it does, but another sense in which that would represent the continuation of humanity. I think there's a distinction there. All human bodies disappearing from the universe does not necessarily mean human extinction in a sense that is meaningful.
Although, I don't really agree with utilitarianism as a target for superintelligent levels of optimization. I most highly preference whatever my own preferences are although I do not know them perfectly and certainly cannot speak them. Further than that, I preference the CEV of the kinds of mind that could coherently join a collective with me. I think utilitarianism is a useful tool for helping with decision making, but as we become more capable I hope we will develop better models of morality. I think our current results suggesting the creation of the maximum number of happy AI is probably a bit of a fluke, but I don't think the issue is settled. I don't think it needs to be settled at our current level of capability. I don't think we should act on it at our current level of capability, even if it does turn out to be true.
I can steelman it as implying the modus tollens that when we can show that a speaker isn't articulating a valid and coherent set of propositions, they aren't articulating a perspective, and maybe even aren't really "someone." But usually "everyone's perspective is equally valid" is functionally an incantation to interrupt and sabotage efforts to compare and adjudicate conflicting claims.
Hmm... that's a good point, though another aspect comes to my mind regarding "an incantation to interrupt and sabotage efforts to compare and adjudicate conflicting claims". If the user of the incantation believes that the system of logic used to compare and adjudicate is flawed, but that pointing out the flaws is likely to be ineffective, suggesting "everyone's perspective is equally valid" may be a better strategy. Ideally one would fall back to a discussion of ways of knowing and adjudication of different ways of knowing in these contexts, but that may not always be possible, and may run into recursive problems.
The correspondence of statistics and other data and the things they are meant to represent seems like the most important and valid example of this I've noticed. Data is often recorded through ineffective and biased processes and conclusions are often drawn from data in ways that are logically invalid[1]. The highest quality response to instances of this situation would be to find and communicate about the methodological and logical flaws, but it's understandable for people with good reason to believe some conclusion derived from data is false to simply claim "everyone's perspective is equally valid" either because they know it is pragmatically more effective, or because they haven't got a BSc focused on logic and statistics and don't like spending their free time tracking down methodological and logical details.
This claim requires justification. I only have vibes. Ideally I would look for research to justify the claim, but I'm not going to. If anyone else wants to find evidence to support or oppose it, I would be most grateful.
I think your first paragraph is functionally equivalent to "if someone feels that the dominant discourse is at war with them (committed to not acknowledging their critiques) they may sympathetically try to sabotage it." Does that seem right?
"Conclusions are often drawn from data in ways that are logically invalid" seems sufficiently well-attested to be a truism.
Yeah, that's a good generalization of my first paragraph. It seems good to point out the generalization that they are sympathetically sabotaging, and in particular using the "everyone's valid" incantation as their method of sabotage, because that implies first that their position is sympathetic and second that there could be other strategies they are or could be employing.
I probably wouldn't use the term "dominant discourse" or "at war", I might rather say "some entity professing some adjudication" and "not good ROI to attempt meaningful communication".
The issue with the term "dominant discourse" is I don't think this necessarily refers to a context where the adjudicator holds dominant power or any power at all. For example, the saboteur could be attempting to dismiss an opinionated schitzophrenic adjudicator.
And "war" implies particularly focused malice which need not be present in the adjudicator or imagined by the saboteur. For example, I don't believe many bureaucratic systems are "at war" with me, but I definitely believe that attempting to communicate intelligently with them would almost always be a massive, frustrating, waste of my time.
...
I'm glad the invalid conclusions thing seems obviously true, but it's also a pretty big problem. Ideally we could be more sure of a lot of our assumptions than we are, and have better and more well known epistemological understanding of where our assumptions may be more likely to fail, and in what ways. Obviously easier said than done.
When the problematic adjudicator isn't the dominant one, one can either safely ignore them, or escalate to someone less problematic who does hold power, so there's no benefit in sabotage, and there's reputational harm.
Relatedly I think the only real solution to the "lying with statistics" problem is the formation of epistemic communities where you're allowed to accuse someone of lying with statistics, it's adjudicated with a preponderance-of-evidence standard, and both false accusations and evidence that you're lying with statistics are actually discrediting, proportionate to the severity of the offense and the confidence of the judgment.
I think we might be imagining slightly different situations. I'm imagining, for example, situations like while riding the bus or out shopping where a stranger has the power to talk to you and you do technically have the power to like, call security or the police if they are harassing you, but they aren't really harassing you and that would make the situation worse for you. They don't have real or enduring power but in that situation they do have that power to force an interaction. It would feel incredibly wrong to call what they are saying the "dominant discourse" but I suppose in that context maybe that's what it is. Also, I like to avoid ignoring people who engage with me unless I have a compelling reason not to. That may be a personal quirk.
The idea of an epistemic community like you describe sounds nice, though it seems unfortunate that the focus has to be on transgression and accusation rather than a system that focuses on identifying particularly good epistemics and just... ignoring the epistemics that aren't identified, which may be because they involve lying with data or just poor use of statistics and analysis... But since lying with statistics seems common, it probably would be good to make a point of identifying and cataloguing it.
I think there are two separate claims being made here.
I can get not being overly committed that your own metaethical system is the ultimate truth. But it does not follow that established and commonly used systems are any good either. Considering that for a large number of people, their source of ethics is whatever they were indoctrinated to believe in as children, I would not place a lot of confidence in existing metaethics even if I am not confident in my own.
Edit: The main suggestion of this piece is #1 but the using existing crypto methods seems to suggest #2. The debate then becomes about what should one do when inaction/further research is not an option, when you have to make a decision.
One day, when I was an intern at the cryptography research department of a large software company, my boss handed me an assignment to break a pseudorandom number generator passed to us for review. Someone in another department invented it and planned to use it in their product, and wanted us to take a look first. This person must have had a lot of political clout or was especially confident in himself, because he rejected the standard advice that anything an amateur comes up with is very likely to be insecure and he should instead use one of the established, off the shelf cryptographic algorithms, that have survived extensive cryptanalysis (code breaking) attempts.
My boss thought he had to demonstrate the insecurity of the PRNG by coming up with a practical attack (i.e., a way to predict its future output based only on its past output, without knowing the secret key/seed). There were three permanent full time professional cryptographers working in the research department, but none of them specialized in cryptanalysis of symmetric cryptography (which covers such PRNGs) so it might have taken them some time to figure out an attack. My time was obviously less valuable and my boss probably thought I could benefit from the experience, so I got the assignment.
Up to that point I had no interest, knowledge, or experience with symmetric cryptanalysis either, but was still able to quickly demonstrate a clean attack on the proposed PRNG, which succeeded in convincing the proposer to give up and use an established algorithm. Experiences like this are so common, that everyone in cryptography quickly learns how easy it is to be overconfident about one's own ideas, and many viscerally know the feeling of one's brain betraying them with unjustified confidence. As a result, "don't roll your own crypto" is deeply ingrained in the culture and in people's minds.
If only it was so easy to establish something like this in "applied philosophy" fields, e.g., AI alignment! Alas, unlike in cryptography, it's rarely possible to come up with "clean attacks" that clearly show that a philosophical idea is wrong or broken. The most that can usually be hoped for is to demonstrate some kind of implication that is counterintuitive or contradicts other popular ideas. But due to "one man's modus ponens is another man's modus tollens", if someone is sufficiently willing to bite bullets, then it's impossible to directly convince them that they're wrong (or should be less confident) this way. This is made even harder because, unlike in cryptography, there are no universally accepted "standard libraries" of philosophy to fall back on. (My actual experiences attempting this, and almost always failing, are another reason why I'm so pessimistic about AI x-safety, even compared to most other x-risk concerned people.)
So I think I have to try something more meta, like drawing the above parallel with how easy is it to be overconfident in other fields, such as cryptography. Another meta line of argument is to consider how many people have strongly held, but mutually incompatible philosophical positions. Behind a veil of ignorance, wouldn't you want everyone to be less confident in their own ideas? Or think "This isn't likely to be a subjective question like morality/values might be, and what are the chances that I'm right and they're all wrong? If I'm truly right why can't I convince most others of this? Is there a reason or evidence that I'm much more rational or philosophically competent than they are?"
Unfortunately I'm pretty unsure any of these meta arguments will work either. If they do change anyone's minds, please let me know in the comments or privately. Or if anyone has better ideas for how to spread a meme of "don't roll your own metaethics"[1], please contribute. And of course counterarguments are welcome too, e.g., if people rolling their own metaethics is actually good, in a way that I'm overlooking.
To preempt a possible misunderstanding, I don't mean "don't try to think up new metaethical ideas", but instead "don't be so confident in your ideas that you'd be willing to deploy them in a highly consequential way, or build highly consequential systems that depend on them in a crucial way". Similarly "don't roll your own crypto" doesn't mean never try to invent new cryptography, but rather don't deploy it unless there has been extensive review, and consensus that it is likely to be secure.