TAG — LessWrong

About Me

Scientist by training, coder by previous session,philosopher by inclination, musician against public demand.

https://theancientgeek.substack.com/?utm_source=substack&utm_medium=web&utm_campaign=substack_profile

Why I am not a Doomer

I'm specifically addressing the argument for a high probability of near extinction (doom) from AI...

Eliezer Yudkowsky: "Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. "

....not whether it is barely possible, or whether other, less bad outcomes (dystopias) are probable. I'm coming from the centre, not the other extreme

Doom, complete or almost complete extinction of humanity, requires a less than superintelligent AI to become superintelligent either very fast , or very surreptitiously ... even though it is starting from a point where it does not have the resources to do either.

The "very fast" version is foom doom...Foom is rapid recursive self improvement (FOOM is supposed to represent a nuclear explosion)

The classic Foom Doom argument (https://www.greaterwrong.com/posts/kgb58RL88YChkkBNf/the-problem) involves an agentive AI that quickly becomes powerful through recursive self improvement, and has a value/goal system that is unfriendly and incorrigible.

The complete argument for Foom Doom is that:-

The AI will have goals/values in the first place (it wont be a passive tool like GPT*),.
The values will be misaligned, however subtly, to be unfavorable to humanity.
That the misalignment cannot be detected or corrected.
That the AI can achieve value stability under self modification.
That the AI will self modify in way too fast to stop.
That most misaligned values in the resulting ASI are highly dangerous (even goals that aren't directly inimical to humans can be a problem for humans, because the AS I might want to director sources away from humans.
And that the AI will have extensive opportunities to wreak havoc: biological warfare (custom DNA can be ordered by email), crashing economic systems (trading can be done online), taking over weapon systems, weaponing other technology and so on.

It’s a conjunction of six or seven claims, not just one. ( I say "complete argument " because pro doomers almost always leave out some stages. I am not convinced that rapid self improvement and incorrigibility are both needed, both needed, but I am sure that one or the other is. Doomers need to reject the idea that misalignment can be fixed gradually, as you go along. . A very fast-growing ASI, foom, is way of doing that; and assumption that AI's will resist having their goals changed is another).

Obviously the problem is that to claim a high overall probability of doom, each claim in the chain needs to have a high probability. It is not enough for some of the stages to be highly probable, all must be.

There are some specific weak points.

Goal stability under self improvement is not a given: it is not possessed by all mental architectures, and may not be possessed by any, since noone knows how to engineer it, and humans appear not to have it.

The Orthogonality Thesis (https://www.lesswrong.com/w/orthogonality-thesis)is sometimes mistakenly called on to support to support goal stability. It implies that a lot of combinations of goals and intelligence levels are possible, but doesn't imply that all possible minds have goals, or that all goal driven agents have fixed, incorrigible goals. There are goalless and corrigible agents in mindspace, too. That's not just an abstract possibility. At the time of writing, 2025, our most advanced AI's, the Large Language Models, are non agentive and corrigible.

It is plausible that an agent would desire to preserve its goals, but the desire to preserve goals does not imply the ability to preserve goals. Therefore, no goal stable system of any complexity exists on this planet, and goal instability cannot be assumed as a default or given. So the orthogonality thesis is true of momentary combinations of goal and intelligence, given the provisos above, but not necessarily true of stable combinations.

Another thing that doesn't prove incorrigibility or goal stability is von Neumann rationality. Frequently appealed to in MIRI 's early writings , it is an idealised framework for thinking about rationality , that doesn't app!y to humans, and therefore doesn't have to apply to any given mind.

There are arguments that AI's will become agentive because that"s what humans want. Gwerns Branwen's confusingly titled "Why Tool AIs Want to Be Agent AIs" ( https://gwern.net/tool-ai) is an example. This is true, but in more than one sense:-

The basic idea is that humans want agentive AI's because they are more powerful. And people want power, but not at the expense of control. Power that you can't control is no good to you. Taking the brakes off a car makes it more powerful, but more likely to kill you. No army wants a weapon that will kill their own soldiers, no financial organisation wants a trading system that makes money for someone else, or gives it away to charity, or causes stick market crashes. The maximum amount of power and the minimum of control is an explosion.

One needs to look askance at what "agent" means as well. Among other things, it means an entity that acts on behalf of a human -- as in principal/agent.(https://en.m.wikipedia.org/wiki/Principal–agent_problem) An agent is no good to its principal unless it has a good enough idea of its principal's goals. So while people will want agents, they wont want misaligned ones -- misalgined with themselves, that is. Like the Orthogonality Thesis, the argument is not entirely bad news.

Of course, evil governments and corporations controlling obedient superintelligences isn't a particularly optimistic scenario, but it's dystopia, not doom.

Yudkowsky's much repeated argument that safe , well-aligned behaviour is a small target to hit ... could actually be two arguments.

One would be the random potshot version of the Orthogonality Thesis, where there is an even chance of hitting any mind, and therefore a high chance ideas of hitting an eldritch, alien mind. But equiprobability is only one way of turning possibilities into probabilities, and not particularly realistic. Random potshots aren't analogous to the probability density for action of building a certain type of AI, without knowing much about what it would be.

While, many of the minds in mindpsace are indeed weird and unfriendly to humans, that does not make it likely that the AIs we will construct will be. we are deliberately seeking to build certainties of mind for one thing, and have certain limitations, for another. Current LLM 's are trained in vast copora of human generated content, and inevitably pick up a version of human values from them.

Another interpretation of the Small Target Argument is, again , based on incorrigibility. Corrigibility means you can tweak an AI's goals gradually, as you go on, so there s no need to get them exactly right on the first try.

"it" isn't a single theory.

The argument that Everettian MW is favoured by Solomonoff induction, is flawed.

If the program running the SWE outputs information about all worlds on a single output tape, they are going to have to be concatenated or interleaved somehow. Which means that to make use of the information, you gave to identify the subset of bits relating to your world. That's extra complexity which isn't accounted for because it's being done by hand, as it were..

By far the best definition I’ve ever heard of the supernatural is Richard Carrier’s: A “supernatural” explanation appeals to ontologically basic mental things, mental entities that cannot be reduced to nonmental entities.

Physicalism, materialism, empiricism, and reductionism are clearly similar ideas, but not identical. Carrier's criterion captures something about a supernatural ontology, but nothing about supernatural epistemology. Surely the central claim of natural epistemology is that you have to look...you can't rely on faith , or clear ideas implanted in our minds by God.

it seems that we have very good grounds for excluding supernatural explanations a priori

But making reductionism aprioristic arguably makes it less scientific...at least, what you gain in scientific ontology, you lose in scientific epistemology.

I mean, what would the universe look like if reductionism were false

We wouldn't have reductive explanations of some apparently high level phenomena ... Which we don't.

I previously defined the reductionist thesis as follows: human minds create multi-level models of reality in which high-level patterns and low-level patterns are separately and explicitly represented. A physicist knows Newton’s equation for gravity, Einstein’s equation for gravity, and the derivation of the former as a low-speed approximation of the latter. But these three separate mental representations, are only a convenience of human cognition. It is not that reality itself has an Einstein equation that governs at high speeds, a Newton equation that governs at low speeds, and a “bridging law” that smooths the interface. Reality itself has only a single level, Einsteinian gravity. It is only the Mind Projection Fallacy that makes some people talk as if the higher levels could have a separate existence—different levels of organization can have separate representations in human maps, but the territory itself is a single unified low-level mathematical object. Suppose this were wrong.

Suppose that the Mind Projection Fallacy was not a fallacy, but simply true.

Note that there are four possibilities here...

I assume a one level universe, all further details are correct.
I assume a one level universe, some details may be incorrect
I assume a multi level universe, all further details are correct.
I assume a multi level universe, some details may be incorrect.

How do we know that the MPF is actually fallacious, and what does it mean anyway?

If all forms of mind projection projection are wrong, then reductive physicalism is wrong, because quarks, or whatever is ultimately real, should not be mind projected, either.

If no higher level concept should be mind projected, then reducible higher level concepts shouldn't be ...which is not EY's intention.

Well, maybe irreducible high level concepts are the ones that shouldn't be mind projected.

That certainly amounts to disbelieving in non reductionism...but it doesn't have much to do with mind projection. If some examples of mind projection are acceptable , and the unacceptable ones coincide with the ones forbidden by reductivism, then MPF is being used as a Trojan horse for reductionism.

And if reductionism is an obvious truth , it could have stood on its own as apriori truth.

Suppose that a 747 had a fundamental physical existence apart from the quarks making up the 747. What experimental observations would you expect to make, if you found yourself in such a universe?

Science isn't 100% observation,it's a mixture of observation and explanation.

A reductionist ontology is a one level universe: the evidence for it is the success of reductive explanation , the ability to explain higher level phenomena entirely in terms of lower level behaviour. And the existence of explanations is aposteriori, without being observational data, in the usual sense. Explanations are abductive,not inductive or deductive.

As before, you should expect to be able to make reductive explanations of all high level phenomena in a one level universe....if you are sufficiently intelligent. It's like the Laplace's Demon illustration of determinism,only "vertical". If you find yourself unable to make reductive explanations of all phenomena, that might be because you lack the intelligence , or because you are in a non reductive multi level universe or because you haven't had enough time...

Either way, it's doubtful and aposteriori, not certain and apriori.

If you can’t come up with a good answer to that, it’s not observation that’s ruling out “non-reductionist” beliefs, but a priori logical incoherence"

I think I have answered that. I don't need observations to rule it out. Observations-rule it-in, and incoherence-rules-it-out aren't the only options.

People who live in reductionist universes cannot concretely envision non-reductionist universes.

Which is a funny thing to say, since science was non-reductionist till about 100 years ago.

One of the clinching arguments for reductionism.was the Schrödinger equation, which showed that in principle, the whole of chemistry is reducible to physics, while the rise of milecular biology showeds th rreducxibility of Before that, educators would point to the de facto hierarchy of the sciences -- physics, chemistry, biology, psychology, sociology -- as evidence of a multi-layer reality.

Unless the point is about "concretely". What does it mean to concretely envision a reductionist universe? Pehaps it means you imagine all the prima facie layers, and also reductive explanations linking them. But then the non-reductionist universe would require less envisioning, because byit's the same thing without the bridging explanations! Or maybe it means just envisioing huge arrays of quarks. Which you can't do. The reductionist world view , in combination with the limitations of the brain, implies that you pretty much have to use higher level, summarised concepts...and that they are not necessarily wrong.

But now we get to the dilemma: if the staid conventional normal boring understanding of physics and the brain is correct, there’s no way in principle that a human being can concretely envision, and derive testable experimental predictions about, an alternate universe in which things are irreducibly mental. Because, if the boring old normal model is correct, your brain is made of quarks, and so your brain will only be able to envision and concretely predict things that can predicted by quarks.

"Your brain is made of quarks" is aposteriori, not apriori.
Your brain being made of quarks doesn't imply anything about computability. In fact, the computatbolity of the ultimately correct version of quantum physics is an open question.
Incomputability isn't the only thing that implies irreducibility, as @ChronoDas points out.
Non reductionism is conceivable, or there would be no need to argue for reductionism.

The Deutsch-Yudkowsky argument for the Many Worlds Interpretation states that you can take the core of Quantum Mechanics -- the Schrödinger wave equation, and the projection postulate -- remove the projection postulate (also known as collapse and reduction ), and end with a simpler theory that is still adequate to explain observation. The idea is that entanglement can replace collapse: a scientist observing a superposed state becomes becomes entangled with it, an effectively splits into two, each having made a definite observation.

Moreover Yudkowsky, following David Deutsch, holds the many worlds interpretation to be obviously correct, in contrast to the majority of philosophers and physicists, who regard the problem of interpreting QM as difficult and unsolved.

This has some problems.

(Which are to do with the specific argument, and the level of certainty ascribed to it. To say that you cannot be certain about a claim is not to say it is false. To point out that one argument for a claim does not work is likewise not to say that the claim itself is false. There could be better arguments for these versions of many worlds, or better many worlds theories, for that matter).

The Problems.

The first thing to note is that there is more than one quantum mechanical many worlds theory. What splittng is...how complete and irrevocable it is ... varies between particular theories. So does the rate of splitting, so does the mechanism of splitting.

The second thing to note is that many worlders are pointing at something implied the physical formalism and saying "that's a world"....but whether it qualifies as a world is a separate question from whether it's in the formalism , and a separate kind of question, from whether it is really there in the formalism. One would expect a world, or universe, to be large, stable, non-interacting, and so on . It's possible to have a theory that has collapse , without having worlds. A successful MWI needs to jump three hurdles: empirical correctness, mathematical correctness and conceptual correctness -- actually having worlds

The third problem to note is that all outstanding issues with MWI are connected in some way with quantum mechanical basis....a subject about which Deutsch and Yudkowsky have little to say.

Coherence versus Decoherence

There is an approach to MWI based on coherent superpositions, and a version based on decoherence. These are (for all practical purposes) incompatible opposites, but are treated as interchangeable in Yudkowsky's writings.

Quantum superposition is a fundamental principle of quantum mechanics that states that linear combinations of solutions to the Schrödinger equation are also solutions of the Schrödinger equation. This follows from the fact that the Schrödinger equation is a linear differential equation in time and position. (WP)

Coherent superpositions are straightforwardly implied by the core mathematics of Quantum mechanics. They are small scale in two senses: they can go down to the single particle level, and it is difficult to.maintain large coherent superpositions even if you want to. They are also possibly observer dependent, reversible, and continue to interact (strictly speaking , interfere) after "splitting". The last point is particularly problematical. because if large scale coherent superposition exist , that would create naked eye, macrocsopic evidence:, e.g. ghostly traces of a world where the Nazis won. All in all, a coherent superposition isn't a world you could live in.

I said complex coherent superpositions are difficult to maintain. What destroys them? Environmental induced decoherence!

Interference phenomena are a well-known and crucial aspect of quantum mechanics, famously exemplified by the two-slit experiment. There are many situations, however, in which interference effects are artificially or spontaneously suppressed. The theory of decoherence is precisely the study of such situations. (SEP)

Decoherence tries to explain why we don't notice "quantum weirdness" in everyday life -- why the world of our experience is a more-or-less classical world. From the standpoint of decoherence, sure there might not be any objective fact about which slit an electron went through, but there is an objective fact about what you ate for breakfast this morning: the two situations are not the same!

The basic idea is that, as soon as the information encoded in a quantum state "leaks out" into the external world, that state will look locally like a classical state. In other words, as far as a local observer is concerned, there's no difference between a classical bit and a qubit that's become hopelessly entangled with the rest of the universe.

(http://scottaaronson.com/democritus)

Decoherence is the study of interactions between a quantum system (generally a very small number of microscopic particles like electrons, photons, atoms, molecules, etc. - often just a single particle) and the larger macroscopic environment, which is normally treated "classically," that is, by ignoring quantum effects, but which decoherence theorists study quantum mechanically. Decoherence theorists attribute the absence of macroscopic quantum effects like interference (which is a coherent process) to interactions between a quantum system and the larger macroscopic environment.(www.informationphilosopher.com)

Decoherent branches are necessarily large, since decoherence is a high level phenomenon. They are also stable, non interacting and irreversible...everything that would be intuitively expected of a "world". But there is no empirical evidence for them (in the plural) , nor are they obviously supported by the core mathematics of quantum mechanics, the Schrödinger equation.

We have evidence of small scale coherent superposition, since a number of observed quantum effects depend on it, and we have evidence of decoherence, since complex superposition are difficult to maintain. What we don't have evidence of is decoherence into multiple branches. From the theoretical perspective, decoherence is a complex , entropy like process which occurs when a complex system interacts with its environment. But without decoherence, MW doesn't match observation. So there is no theory of MW that is both simple and empirically adequate, contra Yudkowsky and Deutsch.

The original, Everettian, approach is based on coherence. (Yudkowsky says "Macroscopic decoherence, a.k.a. many-worlds, was first proposed in a 1957 paper by Hugh Everett III" ... but the paper doesn't mention decoherence^[1]) As such, it fails to predict classical observations -- at all -- it fails to predict the appearance of a broadly classical universe. If everything is coherently superposed, so are observers...but the naturally expected experience an observer in coherent superposition with themselves, is that they function as a single observer making ambiguous, superposed observations ... not two observers each making an unambiguous , classical observation, and each unaware of the other. Such observers would only ever see superpositions of dead and living cats, etc.

(A popular but mistaken idea is that full splitting happens microscopically, at every elementary interaction But that would make complex superpositions non-existent, whereas a number of instruments and technologies depend on them -- so it's empirically false).

Later, post 1970s, many world theorists started to include decoherence to make the theory more empirically adequate, but inasmuch as it is additional structure, it places the simplicity of MWI in doubt. In the worst case, the complexity is SWE+decoherence+preferred basis, whereas in the best case, it's SWE alone, because decoherence is implicit in SWE, and preferred basis is implicit in decoherence. Decoherentists hope to show that the theory can be reduced to core QM, such as the Schrödinger equation, but it currently uses more complex math, the "reduced density matrix". The fact that this research is ongoing is strong evidence that the whole problem was not resolved by Everetts's 1957 paper. In any case, without a single definitive mechanism of decoherence, there is no definitive answer to "how complex is MWI".

And single-universe decoherence is quite feasible. Decoherence adds something to many worlds, but many worlds doesn't add anything to decoherence.

So, coherent superpositions exist, but their components aren't worlds in any intuitive sense; and decoherent branches would be worlds in the intuitive sense, but decoherence isn't simple. Also, theoretically and observationally, decoherence could be a single world phenomenon. Those facts -- the fact that it doesn't necessarily involve multi way branching, and the fact that it is hard to evaluate its complexity because there is not a single satisfactory theory for it -- means it is not a "slam dunk" in Yudkowsky's sense.

The Yudkowsky-Deutsch claim is that there is a single MW theory, which explains everything that needed explaining, and is obviously simpler than its rivals. But coherence doesn't save appearances , and decoherence, while more workable, is not known to be simple. So neither theory has both virtues

Which makes the term *Everett branch" rather confusing. The writer possibly means a decohered branch, under the mistaken assumption that Everett was talking about them. Everett's dissertation can be found here ↩︎

My explanation was a bit confusing, sorry about that! I wasn’t intending for there to be an “original” Mary; she and everyone else only ever existed as a simulation. If we were to assume substrate independence, we’d be fine with saying that the denizens of Sim#1 are conscious.

I think the assumption the argument works from is that the Consciousness Is Computation. The substrate independence of computation , which I don't doubt, doesn't prove anything about consciousness without that.

And while Sim#2 Mary is not a P-zombie to the alien, she very much is one to the people in Sim#2.

I guess you’re correct that the right terminology would be that she’s C-zombie, but the people in the simulation can’t know that.

And since we can’t know for sure whether we ourselves are “really” physical, for all intents and purposes we can’t be sure that there is a distinction between P- and C- zombies.

It's about explanation. Dualism has more resources to explain consciousness than physicalism, which has more resources than computationalism, etc. That doesn't mean you should jump straight to the richest ontology , because that would be against Occam's Razor. What should you do? No one knows! But there is no fact that you can explain consciousness with algorithms alone.

Personally, I find physicalist theories of consciousness that don’t include substrate independence quite silly, but that’s a matter of taste, not a refutation.

Computationalism is a particular form of multiple realisability. Physicalism doesn't exclude it, or necessitate it. Other forms of multiple realisability are available.

My vague gesturing at an argument would be something like this: a brain in a vat is halfway between a physical person and a simulation of one

Err..why? A physical brain that happens to be in a vat is a physical brain, surely?

first by replacing each neuron with a chip, and then by replacing networks of chips with bigger chips running a network, and so on until the whole thing is a chip. Is it really the case that we’re losing the physics?

You are losing the specific physics. Computational substrate independence is a special case of substrate independence , but substrate independence in no case implies immateriality.

ETA

We used to think that the prediction of dark stars meant that Newton’s fact of gravity broke down when it came to light (which was true); but then again, we thought the same about the prediction of black holes and General Relativity. Nowadays, most physicists (probably) don’t believe that white holes exist, despite the fact that they’re just as predicted by GR as black holes, because they find the prospect absurd (in the absence of evidence).

You can be forced into a brief in counterintuitive conclusions by strong evidence or arguments ... and you should only believe it on the basis of strong evidence and arguments.

If substrate independence is true, we have no problem saying that Sim#1 Mary was conscious, and that everyone else is conscious in both Sim#1 and Sim#2. But, if we say that Sim#2 Mary is not conscious… then we have to grapple with the fact that she is a P-zombie.[6]

She is not exactly a p zombie. The Mary in sim #1 is not a p-zombie version of the original Mary, because she is only a functional duplicate, not a physical duplicate; and the Mary in Sim #2 is only a behavioural duplicate. So the question of "what difference explains the loss of consciousness" is easily answered -- all three are differerent.

And I don’t need to reinvent the wheel here, so I’ll just claim that belief in P-zombies is incoherent, and we don’t really have a good reason to say that she isn’t conscious. So Sim#2 Mary, a mere recording of Mary, must be cons– wait, what?and

P zombies aren't incoherent , they just contradict physicalism. And you are talking about c zombies, anyway.

Physicalism has it that an exact atom-by-atom duplicate of a person will be a person and not a zombie, because there is no nonphysical element to go missing. That's the argument against p-zombies. But if actually takes an atom-by-atom duplication to achieve human functioning, then the computational theory of mind will be false, because CTM implies that the same algorithm running on different hardware, will be sufficient. Physicalism doesn't imply computationalism, and arguments against p-zombies don't imply the non existence of c-zombies -- unconscious duplicates that are identical computationally, but not physically.

I think this is folly. I think we’re engaging in a category error if we’re thinking of things this way — we’re not fully grappling with the consequences of substrate independence. Are the people in Sim#1 and Sim#2 conscious twice, like some kind of deja-vu they can’t experience? I really don’t think so.

There's no strong reason to think they are conscious once.

We say X is conscious if and only if there is such a thing as ⟨what it’s like to be X⟩. If when we run the automaton, we have reason to think that there is such a thing as what it’s like to be the simulated brain, but we also conclude that it shouldn’t matter whether or not you run the automaton

Something gets lost at each stage. Going from a physical embodiment to a computational simulation loses the physics; going from a computational simulation to a behavioural simulation loses the counterfactual possibilities of the computational simulation; going from a behavioural simulation that actually runs to a notional one loses actual occurrence. Any of those losses could affect consciousness.

I’ve come to a nearly delusional form of belief (it’s not like I’m exactly convinced), that isn’t even fully articulated here; I’ve come to really think that this whole thing is quite bogus, that there really is no difference between realism and solipsism and nihilism and a strange kind of theism

That should be taken as a reductio as absurdum of the GAZP

But you see the importance of the question, “How far can you generalize the Anti-Zombie Argument and have it still be valid?”

Clearly , the answer isn't "indefinitely" .

@JBlack The problem with Dust theory is that it assumes that conscious states supervene on brain states instantaneously. There is no evidence for that. We should not be fooled by the "specious present". We seem to be conscious moment-by-moment, but the "moments" in question are rather coarse-grained, corresponding to the specious present of 0.025-0.25 second or so. It's quite compatible with the phenomenology that it requires thousands or millions of neural events or processing steps to achieve a subjective "instant" of consciousness. Which would mean you can't salami-slice someone's stream-of-consciousness too much without it vanishing: and also mean that spontaneously occurring Boltzman states are conscious; and also preserves the intuition that computation is a process -- that a computational state is defined as being a stage of a computation.

philosophy historically did not have the right tools to solve the problems. Theoretical computer science, and AI theory in particular, is a revolutionary method to reframe philosophical problems in a way that finally makes them tractable.

Theoretical computer science can tell you are not implementing some kind of perfect algorithm, because they tend not to be computable. It can't tell you what you should be implementing instead.

Naturalised ethics has been around for ages. It tends to tell you that de facto human ethics is an evolutionary kludge, not something mathematically clean.

The open question, https://en.wikipedia.org/wiki/Open-question_argument the question of what is the true ethics would be, is still open. Examining the de facto operation of the brain isn't going to tell answer it.

About “metaethics” vs “decision theory”, that strikes me as a wrong way of decomposing the problem. We need to create a theory of agents. Such a theory naturally speaks both about values and decision making, and it’s not really possible to cleanly separate the two. It’s not very meaningful to talk about “values” without looking at what function the values do inside the mind of an agent.

Even if you need to at least address values and decision theory , it doesn't follow that that's all you need. Something can be a truth without being the whole truth.

If you only look within the minds of agents, you are missing interactions between agents. Looking inwards excludes loom my outwards.

Just as you can't understand money by microscopically examining coins and banknotes, you can't understand ethics just by honing in on internal psychological processes.

If you only look within the minds of agents, and only consider values and decision theory, you are likely to end up with something like ethical egoism ... not because it is true, but you haven't even considered alternatives.

Humans already follow their actual Values, and will always do because their Values are the reason they do anything at all.

But I don't see how that says anything about ethics. Merely wanting to do something doesn't make it ethical; and being ethical need not make something intrinsically motivating. Extrinsic motivation, rewards and punishments ,are ubiquitous .. unless you're on a desert island. So it's not a case of everyone always following their intrinsic motivations, and if it were, that's still on the "is" side of the is-ought divide.

It’s not very meaningful to talk about “decisions” without looking at the purpose of decisions.

It's not very meaningful to talk about ethics without looking at the purpose of ethics. Is ethics really just values, and nothing else? Is it really just decision making , like any other kind? Does it actually have no distinguishing characteristics?

First, “ethics” is a confusing term because, on my view, the colloquial meaning of “ethics” is inescapably intertwined with how human societies negotiate of over norms. On the other hand, I want to talk purely about individual preferences, since I view it as more fundamental

Fundamental to what? Ethics? Even if ethical behaviour is made of individual decisions, that doesn't mean it reduces it to individual decisions, made atomistically , without regard to social mores or other people's concerns.

The three word theory is that "Ethics is Values" That leaves a number of unanswered questions, such as: why it's all about me;? are all values relevant? do I have the right to put someone in jail merely for going against my values?

It's prima facie unlikely that such a simple theory solves all the age old problems (at least it would requires the supplementary assumption that values are hard to understand in themselves, in order to explain the persistence of ethical and metaethical puzzles) And it is easy to see the flaws.

The one thing that the three word theory is supremely good at it is explaining, is motivation. Your values are what motivate you, so if your values are also your morals you can't fail to be motivated.by morality.

Is it all about me? Rationalists typically argue the case for for the three word theory by asking the rhetorical question whether you would support an ethical system that had nothing to do with your wishes. That's a none/some/all confusion. I want ethics to have something to do with me, but that does not make it all about me, or mean all values are equally ethical.

For one thing, people can have preferences that are intuitively immoral. If a psychopath wants to murder, that does not make murder moral.

For another, values can conflict. Not all values conflict. Where they do, the three words theory doesn't tell you who wins or loses. If morality is (are) seven billion utility functions, then a legal system will be a poor match for it (them).

Not all decisions are individual. There's a while set of questions about whether societal actions are justified, whether societies have rights over individuals, and so

For instance societies have systems of punishment and reward, which, hopefully, have an ethical basis. Putting people in jail is just wanton cruelty if they have done nothing wrong. But if ethics just "is" subjective value, and values vary, as they obviously do, who lands in jail.? It's easy enough to say the murderer and the thief, and to justify that by saying that murder and theft are against people's widely shared preferences...but remember that the three word theory is "flat", and treats all values the same. Should the vanilla lover or the tutti frutti lover, the little endian or the big endian go to jail, if others don't share their preferences? Voting allows you to decide, the issue, but it is not enough to justify it, because merely having a minority preference is not a crime. on .. which aren't answered by the simplistic there word theory.

One can go farther and argue that such societal issues are the essence of ethics. If we consider the case of someone who is alone on a desert island, they have no need, core common-sense morality, rules and against murder because there is no one to murder, and no need of rules against theft because there is no one to steal, and from and so on ... in their situation ethics isn't even definable.

You cant solve philosophy without solving epistemology, and you can't solve philosophy without solving epistemology. And you can't solve epistemology because of the Problem of the Criterion, which is pretty is pretty much the same as the Münchhausen Trilemma.

"Moreover, its [philosophy's] central tool is intuition, and this displays a near-total ignorance of how brains work. As Michael Vassar observes, philosophers are "spectacularly bad" at understanding that their intuitions are generated by cognitive algorithms." -- Rob Bensinger, Philosophy, a diseased discipline.

What's the problem?

It's not that philosophers weirdly and unreasonably prefer intuition to empirical facts and mathematical/logical reasoning, it is that those things either don't go far enough, or are themselves based on intuition.

"Just use empiricism" doesn't work, because philosophy is about interpreting empirical data.

"Just use maths/logic" doesn't work , because those things are based on axioms justified by intuitive appeal.

"Just use reductionism" doesn't work , because its not clear what lies at the bottom of the stack, or if anything does. Logic, epistemology and ontology have been held to be First Philosophy at different times. Logic, epistemology and ontology also seen to interact. Correct ontology depends on direct epistemology..but what minds are capable of knowing depends on ontology. Logic possibly depends on ontology too, since quantum.mechanics arguable challenges traditional bivalent logic.

Philosophers don't embrace intuitions because they think they are particularly reliable,but because they have reasoned that they can't do without them. (At least, the other options allowed by the MuNchausen trilemna, circulatory and regress, are distinctly unattractive )That is the essence of the Inconvenient Ineradicability of Intuition. An unfounded foundation is what philosophers mean by "intuition"...and not a supernatural insight that could not have been produced by a material brain

Humans haven't figured out meta ethics well enough to show that moral realism is true. So there is a probability, not a certainty , that an AI will realise moral truths. The argument also requires the AI to be motivated by the truths it discovers, and it requires preserving human life to be an objective moral imperative. The latter point sn't obvious -- there's a standard Sci Fi plot where a powerful AI is tasked with solving the world's problems, and decides humans are the problem. So there uncertain premises have to be true simultaneously, so moral realism is far from a surefire solution to AI safety.

For the these reasons, AI safety theorists focus on friendliness the preservation of humans, as a direct goal, rather than objective goodness.

LESSWRONG
LW

LESSWRONG
LW

About Me

Why I am not a Doomer

Posts

Wikitag Contributions

Comments

The Problems.

Coherence versus Decoherence