What kind of place is this?

Jim Pivarski

I started getting LessWrong posts in my email about a year ago—I don't remember signing up, but I must have done it intentionally. I like most of what I've been reading so far: it's a civil forum in which people think about the process of thought and its application, though some of the specific topics are out of context for me. (What's The Alignment? It sounds like something from The Dark Crystal.)

It occurred to me that maybe I could post some of my own thoughts, since I'm at a turning point in how I'm thinking about meta-ethics and the concept of a person, and maybe some feedback would be good for me. Normally, I go it alone, reading books and only discussing them internally. (Most people I know don't want to talk philosophy.)

Twice since the invention of the world wide web, I've written up grand summaries of my beliefs and "put them out there." I squirm to read them now, but they're true to some core ideas that I still have. In 2006, I wrote a Manifesto about my conversion from atheism to Christianity in the decade leading up to it, and in 2020, I wrote Could Have, Would Have, Should Have, about my newfound understanding of causality (defining the subjunctive "would be").

Poking around on this site, I noticed that LessWrong has a foundational text, The Sequences, so if I'm going to get involved here, I'd better go read them. ALL of them.

(Time passes...)

Well! I guess I was surprised that, with the exception of a few section-introductions, they were all written by a single person, Eliezer Yudkowsky. They're also entirely from 2007‒2009, so maybe he wouldn't stand behind everything he said now. But they're really, really arrogant.

I mean, in No, Really, I've Deceived Myself,

I recently spoke with a person who... it's difficult to describe. Nominally, she was an Orthodox Jew. She was also highly intelligent, conversant with some of the archaeological evidence against her religion, and the shallow standard arguments against religion that religious people know about. For example, she knew that Mordecai, Esther, Haman, and Vashti were not in the Persian historical records, but that there was a corresponding old Persian legend about the Babylonian gods Marduk and Ishtar, and the rival Elamite gods Humman and Vashti. She knows this, and she still celebrates Purim.

Knowing that the biblical account is false and yet celebrating a religious holiday anyway—this is a problem? Maybe she likes hamantaschen. I like hamantaschen.

More personally, because I'm a physicist, was Outside the Laboratory,

Now what are we to think of a scientist who seems competent inside the laboratory, but who, outside the laboratory, believes in a spirit world? We ask why, and the scientist says something along the lines of: "Well, no one really knows, and I admit that I don't have any evidence - it's a religious belief, it can't be disproven one way or another by observation." I cannot but conclude that this person literally doesn't know why you have to look at things. They may have been taught a certain ritual of experimentation, but they don't understand the reason for it - that to map a territory, you have to look at it - that to gain information about the environment, you have to undergo a causal process whereby you interact with the environment and end up correlated to it. This applies just as much to a double-blind experimental design that gathers information about the efficacy of a new medical device, as it does to your eyes gathering information about your shoelaces.

This annoys me deeply because the reason I got interested in experimental physics was because of how well it formalized the process of looking at your shoelaces!

Before grad school, I didn't have much interest in the experimental side of things: my main motivation was to learn the two most counterintuitive parts of physics, relativity and quantum mechanics. But there came a point when I had taken the last course, beyond which is research, and I was less enthused about going deep into a technical corner—it was the big picture that had interested me. Along the way, I had gotten some experience in experimental physics from summer projects at a few particle physics labs. The goals of these summer projects were nowhere near as cool and esoteric as the theories that originally drew me to the field, but I was blown away by the idea of really knowing things: "I know that this detector is misaligned 3.4 mm relative to that one! How cool is that?"

I was absolutely giddy with the idea of having a procedure for establishing physical facts—and of course I was applying it outside the laboratory. "Look! This pen exists because we can apply a clustering procedure to the atoms in space, weighted by the strength of mutual attraction, and the atoms in this volume form a stable cluster. Your clusters may differ from mine by a few atoms, or might not even converge, depending on your initial seed, but maybe we can define an ensemble of clustering runs, and in the preponderance of those runs..." This, by the way, is why my wife doesn't like to talk philosophy with me.

In fact, from my 2006 Manifesto,

I believe that we live in two overlapping worlds: an objective, physical world of matter, energy, and space-time, and an artificial world of objects, relationships, attributes, qualities and purposes. In the seamless continuum of nature, we draw boundaries and selectively identify meaningful entities. When I open my eyes, I see the mash of atoms before me as a pen, and that pen has well-defined borders which distinguish it from the surrounding air and my hand. My pen is more real than Macbeth's impalpable knife, because unlike a hallucination, matter fills the space occupied by the pen and this matter is distinct from air in a way that can be quantified. We must remember, though, that the quantification scheme is itself a human invention. Distinguishing the pen from the air is as much an interpretation as distinguishing letters of the alphabet from squiggles and ink splotches on paper, and good expressions from poor word choice.

(I learned about clustering algorithms a few years later.) By 2020, I had concluded that we're paying the various concepts of "reality" a disservice by using a single word for them all: there are different kinds of reality, such as mathematical reality (π exists) and physical reality (pie exists). The way these realities are distinguished is by the procedures used to establish true from false in each realm: deductive proof for mathematical reality (thanks, Euclid!) and scientific observation for physical reality (thanks, Galileo!).

These aren't the only kinds of reality, though. Still staying close to the hard sciences, there are explanations, or "reasons why." Explanations, even scientific explanations, aren't in the same category as observed facts, because the procedures for establishing explanations go beyond matching models to observations, and yes, even Bayes' theorem.

My favorite example is the reason why planetary orbits are ellipses, rather than epicycles-upon-epicycles. Both models perfectly fit the data (in Kepler's time—nowadays, perfect ellipses don't fit, but epicycles still do, since they are a Fourier series. But let's stick to the original problem). Physical observations can't distinguish between the models; Bayes theorem wouldn't put more probability mass on one or the other, but the ellipse theory won out because there's something about it that's more reasonable.

It's not simplicity. Eliezer correctly points out in Occam's Razor,

The formalism of Solomonoff induction measures the “complexity of a description” by the length of the shortest computer program which produces that description as an output. To talk about the “shortest computer program” that does something, you need to specify a space of computer programs, which requires a language and interpreter.

If this were really a quantitative procedure, we'd have to have a way to pick one language to represent it in, rather than another. (In an unpublished grand summary, I was going to use Kolmogorov complexity to describe that same conundrum.) Even if you could pick a favorite language, the simplest description would have to be measured in a number of bits, and that's definitely not how scientists pick theories.

In the case of ellipses versus epicycles, the ellipse theory eventually provided the most insight. Kepler's laws were revealed to be special cases of angular momentum and an inverse square law, which predicts that orbits should be conic sections, and an ellipse is a conic section. The ellipse theory is more insightful than the epicycle theory because of where it leads.

The conclusion that one model fits the data better than another can be formalized very nicely by Bayes' theorem, but a different procedure is needed to favor a model when its alternatives fit the data equally well. There are a lot of these discriminators in theoretical physics, to decide among models that can't be distinguished yet: naturalness, the Copernican principle, unification, aesthetics... (The charm quark is called "charm" because the theory was cute. Then it turned out to be true.) None of these principles have the sharp cutting-edge that observation has—a charming theory would be dropped in a heartbeat if it's clearly at odds with data—but there's a lot more in a scientific paper than the progression from prior probability to observations to posterior probability.

In fact, the same could be said of math papers. "585788223554050573278377 + 3611957383042997565190926 = 4197745606597048138469303" is a true theorem, arrived at by the laws of mathematical deduction, but it's not an interesting one. Criteria beyond proof need to be brought in to decide what to conjecture and prove.

Eliezer used criteria like this to claim that Decoherence is Falsifiable and Testable, where by "decoherence," he meant "Hugh Everett's many worlds interpretation." ("Decoherence" is a more general word, and when physicists use it, they rarely mean the many worlds interpretation.) But we also use "falsifiable" and "testable" to refer to the distinguish-models-by-observation method, which decides claims in the category of physical fact. This isn't one of them. Eliezer argued that the many worlds interpretation is simpler (that other interpretations have to explain the non-existence of these many worlds), which is an argument about what makes it a better explanation or about what needs to be explained. But that's different from "falsifiability" or "testability."

Some arguments need to use these non-observational methods because the data are out of reach (such as string theory), but others are trying to address topics that are in principle not even about measurable data. Eliezer wrote a long article dismissing David Chalmers's stance on consciousness in Zombies! Zombies?, which ended with

Chalmers wrote a big book, not all of which is available through free Google preview.

Well, it was a good book, worth reading. But even in the free preview, Chalmers wrote,

Everyday scientific methodology has trouble getting a grip on it, not least because of the difficulties in observing the phenomenon. Outside the first-person case, data are hard to come by. This is not to say that no external data can be relevant, but we first have to arrive at a coherent philosophical understanding before we can justify the data's relevance. So the problem of consciousness may be a scientific problem that requires philosophical methods of understanding before we can get off the ground.

...And if you don't like that example, how about ethics?

Eliezer wrote a lot about ethics, and anybody who says, "You should be rational" has ethical beliefs. "Should" statements don't clearly follow from the "is" statements that the scientific method provides, as David Hume pointed out long ago.

Eliezer distinguishes between Terminal Values and Instrumental Values, but what determines the terminal values? Somehow, he has to populate the objective function whose maximum is what he will rationally try to do. How he ends up assigning those intrinsic values relies on methods of argument that are neither deductive nor observational.

So when a scientist outside the laboratory says,

"Well, no one really knows, and I admit that I don't have any evidence - it's a religious belief, it can't be disproven one way or another by observation."

it's not because they don't understand the scientific method. It's because they do understand that there are different kinds of statements, and different procedures are used to argue for or against statements in each category. The methods for two of these categories are very sharp, effective, and mechanically verifiable: the method of logical deduction and the method of experimental observation.

Maybe another Euclid or Galileo will someday develop a third method for one of these other kinds of reality, but it won't look like deduction or observation. (And keep in mind that there were 1900 years between Euclid and Galileo!) Until then, or perhaps forever, we have less turn-the-crank methods of describing the experience of consciousness or determining what the goals of our lives should be. They're harder to talk about, but they're not worthless, either, any more than the pre-deductive methods that Babylonians used to discover the Pythagorean theorem.

If you ask me about my religious beliefs, it's going to be along the lines of Religion's Claim to be Non-Disprovable. In the late 1990's, it seemed to me that people arguing about whether God exists differed primarily in their definitions of "God," and so were Arguing "By Definition". I thought it better to just use that word as a label for whatever the foundation of reality might be and ask about its attributes; the other direction seems to be a Wrong Question. I was quite content (and in some ways, still am) with An Alien God, and since I was starting from a philosophy of Camus-inspired absurdism, admitting the existence of an objective world was what I considered the biggest part of my religious conversion—the rest was details.

I don't plan to use much God-language here—that's the thing that bothers me most about my 2006 Manifesto. In fact, I found a survey of the LessWrong Diaspora (2016), and some 88.3% of the respondents said they were either atheistic or agnostic (page 28). Certainly if I used the word "God" the way Einstein did, I would be misunderstood, so I'll Taboo My Words.

Also, what I'm interested in nowadays are not the kinds of philosophical questions that Christian churches fight over, they're more the kinds of questions Buddhist sects politely disagree about.

So is this the right place for me? I suppose I am in the 12%...

I don't think there's any place quite like lesswrong on the entire internet. It's a lot of fun to read, but it tends to be pretty one-note, and even if there is discord in lesswrong's song, it's far more controlled, Eru Illuvitar's hand can yet be felt, if not seen. (edit: that is to say, it's all the same song)

For the most part, people are generally-tolerant of Christians. There is even a Catholic who teaches (taught?) at the Center For Applied Rationality, and there's a few other rationalist-atheists who hopped to christianity, though I can't remember them by name.

Whether or not it's the place for you, I think you'll find that there's more pop!science, and if you are a real physicist, there's more and more posts where people who do not know physics will act like they do, and correcting them will be difficult, and it depends on if you can tolerate that.

To answer your question, the AI alignment problem is the problem of ensuring that the first artificial general intelligence smart enough to take over the world - that is, the Singularity - leaves at least one human being alive. No one knows how to solve it, and it's likely only rationalists could.

Anyway, welcome, but be warned, this community is full of egotists who use the orthogonality thesis to avoid having to have coherent moral principles. (Source: an angry vegan who doesn't understand why AI rights get more attention on this site than animal rights despite the fact that animals are basically equivalent to humans with brain damage in terms of mind structure, and AIs are aliens from another universe.)

Okay, I just did a deep-dive on the AI alignment problem and the Singularity on Wikipedia, and it will take me a while to digest all of that. My first impression is that it seems like an outlandish thing to worry about, but I am going to think about it more because I can easily imagine the situation reversed.

Among the things I came across was that Eliezer was writing about this in 1996, and predicted

Plug in the numbers for current computing speeds, the current doubling time, and an estimate for the raw processing power of the human brain, and the numbers match in: 2021.

GPT-3 has some tens to hundreds of billion parameters and the human brain has 86 billion neurons, and I know it's hand-waving because model parameters aren't equivalent to human neurons, but—not bad! On the other hand, we're seeing now what this numerical correspondence translates to in real life, and it's interestingly different than what we I had imagined. AI is passing the Turing test, but the Turing test no longer feels like a hard line in the sand; it doesn't seem to be testing what it was intended to test.

No one knows how to solve it, and it's likely only rationalists could.

Understanding what, exactly, human values are would be a first step toward expressing it in AI. I hadn't expected meta-ethics to get so applied.

...

You know what's really odd? The word "singularity" appears only 33 times in the Sequences, mostly as "when I attended the Singularity Summit, someone said..." and such, without explanation of what it was. Most of the references were in the autobiographical section, which I didn't read as deeply as the rest.

Figuring out what human values actually are is a pretty important part of the project. Though, we'd still have to figure out how to align it to them. Still, there is no end of use for applied meta-ethics here. You might also want to look into the Shard Theory subcommunity here - @TurnTrout and others are working on getting an understanding of how human values arise in the first place as "shards" of a much simpler optimization process in the human brain.

problem of ensuring that the first artificial general intelligence

Transitive misalignment (successors/descendants of first AGIs being misaligned at some point) is exactly as deadly as direct misalignment (in physical time there isn't even much distance between these, the singularity is fast). So not only must the first AGIs be aligned, they additionally need to be in a situation where they don't build misaligned AGIs as soon as they are able. And Moloch doesn't care about your substrate, by default it's going to be a problem for AGIs as much as it currently is for humanity.

You're correct, but since I define "aligned" as "tending to do what is actually best according to humanity's value system", and given that it would be harmful for them to take such a risk, a totally aligned AGI would not, in fact, take that risk lol. So although your addition is important to note, there's a sense in which it is redundant.

Both direct and transitive alignment are valuable concepts. Especially with LLM AGIs, which I think are the only feasible directly aligned AGI we are likely to build, but which I suspect won't be transitively aligned by default.

Since transitive alignment varies among humans (different humans have different inclinations towards building AGIs of uncertain alignment, given a capability to do that), it might be valuable to align LLM personalities to become people who are less likely to fail transitive alignment.

Eliezer wrote a lot about ethics, and anybody who says, “You should be rational” has ethical beliefs.

The ethical "should" doesn't have to be the only. "Should". Maybe he means that its your interests to be rational.

Okay, I take the word "should" to refer to a spectrum with ethics on one end (strong "should") and aesthetics on the other (weak "should"). It's possible that this is a wider use of the word "ethics" or "aesthetics" as others would have. Maybe those other things people are thinking about don't lie on a linear spectrum?

So, for example, when you're doing an algebra problem, "you should subtract the same amount from both sides of the equation, not just one side," is a choice to stay within the rules of algebra. Not doing so leads to less interesting results (everything being "equal" to everything else). I'm not sure whether that's closer to the ethics end or the aesthetics end; maybe it depends on whether the math is pure or applied.

But getting back to my meaning in that paragraph, the Sequences had a large section on ethics, and choosing to be rational comes through the text as a strong imperative. And then, having opinions on a "should" statement (whether you call it "ethics" or not) comes from beyond-experimental reasoning, because (by David Hume), an "is" does not imply a "should."

Despite what Hume says , it is fairly standard to derive instrumental "shoulds", such as how you should build a bridge or win at chess, from a mass of empirical and logical information. Ethical shoulds are often held to be a different matter.

Saying that one should do x rather than y seems to mean that act x is better than act y. In which case we can reduce an ought to an is. And what "good" or "better" means, seems to have do with maximizing expected utility. And there are arguably objective facts about what maximizes utility. E.g. murdering people is pretty bad for maximizing utility. So there seem to be objective facts about what is good. And therefore about what one should do.

Murdering people and harvesting their organs to save n>1 lives is pretty good for expected utility. But a lot of people feel that its wrong:its intuitive that what is ethically good is doing what is right. As well as well as intuitive that it is ethically good is to increase utility. There are different intuitions about ethics, which is why it is still an open problem. Focussing on one intuition is privileging the hypothesis.

Sometimes sacrificing one life for many others can have overall more negative indirect consequences, e.g. people distrusting hospitals because their organs might get harvested for other people. But even if utilitarianism is wrong, the correct ethical theory would apparently be one that correctly analyzes the meaning of "good" or "better" in terms of similar objective criteria.

If you can't give a robust example of an objective ethical theory, it can be doubted that there is one.

If there is none, it would mean a world where everyone suffers horribly forever is not objectively worse than one where everyone is eternally happy. But I think that's just not compatible with what words like "good" or "worse" mean! If we imagine a world where everything is the same as in ours, except that people call things "bad" we call "good", and "good" what we call "bad" -- would that mean they believe suffering is good? Of course not. They just use different words for the same concepts we have! Believing that, other things being equal, suffering is bad seems to be like believing that bachelors are unmarried, or that even numbers are divisible by two without rest. It seems to be a conceptual truth, an objective fact about the concepts in question. Which is incompatible with ethical statements not being able to be objectively true or false, because whether someone suffers, or not, is an objective psychological fact.

To acknowledge this I don't think we need to actually finish a complete ethical theory which translates all statements about goodness into statements about expected utility or suffering or preferences or such. Otherwise this would be like saying "If you don't have a precise analysis of the term 'rational', it can be doubted there are any objective facts about what is rational or irrational". We don't need to know a perfect theory of rationality to know that some things are definitely rational and some others are definitely irrational, which already rules out the view that there is nothing objective about rationality. The same holds for morality.

Consider a world where everyone suffers horribly, and it's no ones fault , and it's impossible or to change. Is it morally wrong , even though the the elements of intentionality and obligation are absent?

The terms "right" and "wrong" apply just to actions. This world is bad, without someone doing something wrong.

An imperfect world might be in various ways, such as being undesirable, but if it is not morally bad, it implies nothing about objective morality.

But it is clearly "morally" bad? It is just not a morally wrong action. Actions are wrong insofar their expected outcomes are bad, but an outcome can be bad without being the result of anyone's action.

(You might say that morality is only a theory of actions. Then saying that a world, or any outcome, is "morally" bad, would be a category mistake. Fine then, call "ethics" the theory both of good and bad outcomes, and of right and wrong actions. Then a world where everyone suffers is bad, ethically bad.)

But it is clearly “morally” bad

No, that's the point.

an outcome can be bad without being the result of anyone’s action.

Yep, but you still need to show its morally bad even if it is unintentional.

I wasn't intending to take a side in utilitarianism/consequentialism; I just meant that, ultimately, a decision is made from intuition. It can't be deductive all the way down.

Somehow, he has to populate the objective function whose maximum is what he will rationally try to do. How he ends up assigning those intrinsic values relies on methods of argument that are neither deductive nor observational.

In your opinion, does this relate in any way to the "lack of free will" arguments, like those alleged by Sam Harris? The whole: I can ask you about what your favourite movie is, and you will think of some. You will even try to justify your choices if asked about it, but ultimately you had no control of what movies popped into your head.

This is a good example of needing to watch my words: the same sentence, interpreted from the point of view of no-free-will, could mean the complex function of biochemical determinism playing out, resulting in what the human organism actually does.

What I meant was the utility function of consequentialism: for each possible goal , you have some preference of how good that is $f (x)$ , and so what you're trying to do is to maximize $f (x)$ over $x$ . It's presupposing that you have some ability to choose $x_{1}$ instead of $x_{2}$ , although there are some compatibilist views of free will and determinism that blur the line.

My point in that paragraph, though, is that you might have a perfectly rational machinery for optimizing $f$ , but one has to also choose $f$ . The way you choose $f$ can't be by optimizing over $f$ . The reasons one has for choosing $f$ also can't be directly derived from scientific observations about the physical world, because (paraphrasing David Hume), an "is" does not imply an "ought." So the way we choose $f$ , whatever that is, requires some kind of argumentation or feeling that is not derivable from the scientific method or Bayes' theorem.

Yeah, if you use religious or faith baised terminology, it might trigger negative signals (downvotes). Though whether that is because the information you meant to convey was being disagreed with, or it's because the statements themselves are actually overall more ambiguous, would be harder to distinguish.

Some kinds of careful resoning processes vibe with the community, and imop yours is that kind. Questioning each step separatetly on it's merits, being sufficiently skeptical of premises leading to conclusions.

Anyways, back to the subject of f and inferring it's features. We are definitely having trouble drawing out f out of the human brain in a systematic falsiable way.

Whether or not it is physically possible to infer it, or it's features, or how it is constructed; i.e whether it possible at all, that subject seems a little uninteresting to me. Humans are perfectly capable of pulling made up functions out of their ass. I kind of feel like all the gold will go to first group of people who come up with processes of constructing f in coherent predictable ways. Such that different initial conditions, when iterated over the process, produce predictably similiar f.

We might then try observe such process throughout people's lifetimes, and sort of guess that a version of the same process is going on in the human brain. But nothing about how that will develop is readily apparent to me. This is just my own imagination producing what seems like a plausible way forward.