R:A-Z Glossary — LessWrong

This is a list of brief explanations and definitions for terms that Eliezer Yudkowsky uses in the book Rationality: From AI to Zombies, an edited version of the Sequences.

The glossary is a community effort, and you're welcome to improve on the entries here, or add new ones. See the Talk page for some ideas for unwritten entries.

__TOC__

A

a priori. Before considering the evidence. Similarly, "a posteriori" means "after considering the evidence"; compare prior and posterior probabilities.

In philosophy, "a priori" often refers to the stronger idea of something knowable in the absence of any experiential evidence (outside of the evidence needed to understand the claim).

affect heuristic. People's general tendency to reason based on things' felt goodness or badness.

affective death spiral. Yudkowsky's term for a halo effect that perpetuates and exacerbates itself over time.

AGI. See “artificial general intelligence.”

AI-Box Experiment. A demonstration by Yudkowsky that people tend to overestimate how hard it is to manipulate people, and therefore underestimate the risk of building an Unfriendly AI that can only interact with its environment by verbally communicating with its programmers. One participant role-plays an AI, while another role-plays a human whose job it is interact with the AI without voluntarily releasing the AI from its “box”. Yudkowsky and a few other people who have role-played the AI have succeeded in getting the human supervisor to agree to release them, which suggests that a superhuman intelligence would have an even easier time escaping.

akrasia.

alien god. One of Yudkowsky's pet names for natural selection.

ambiguity aversion. Preferring small certain gains over much larger uncertain gains.

amplitude. A quantity in a configuration space, represented by a complex number. Many sources misleadingly refer to quantum amplitudes as "probability amplitudes", even though they aren't probabilities. Amplitudes are physical, not abstract or formal. The complex number’s modulus squared (i.e., its absolute value multiplied by itself) yields the Born probabilities, but the reason for this is unknown.

amplitude distribution. See “wavefunction.”

anchoring. The cognitive bias of relying excessively on initial information after receiving relevant new information.

anthropics. Problems related to reasoning well about how many observers like you there are.

artificial general intelligence. Artificial intelligence that is "general-purpose" in the same sense that human reasoning is general-purpose. It's hard to crisply state what this kind of reasoning consists in—if we knew how to fully formalize it, we would already know how to build artificial general intelligence. However, we can gesture at (e.g.) humans' ability to excel in many different scientific fields, even though we did not evolve in an ancestral environment containing particle accelerators.

Aumann's Agreement Theorem.

availability heuristic. The tendency to base judgments on how easily relevant examples come to mind.

average utilitarianism.

B

Backward chaining.

Base rate.

Bayes's Theorem. The equation stating how to update a hypothesis H in light of new evidence E. In its simplest form, Bayes's Theorem says that a hypothesis' probability given the evidence, written P(H|E), equals the likelihood of the evidence given that hypothesis, multiplied by your prior probability P(H) that the hypothesis was true, divided by the prior probability P(E) that you would see that evidence regardless. I.e.:

P(H|E) = P(E|H) P(H) / P(E).

Also known as Bayes's Rule. See "odds ratio" for a simpler way to calculate a Bayesian update.

Bayesian. (a) Optimally reasoned; reasoned in accordance with the laws of probability. (b) An optimal reasoner, or a reasoner that approximates optimal inference unusually well. (c) Someone who treats beliefs as probabilistic and treats probability theory as a relevant ideal for evaluating reasoners. (d) Related to probabilistic belief. (e) Related to Bayesian statistical methods.

Bayesian updating. Revising your beliefs in a way that's fully consistent with the information available to you. Perfect Bayesian updating is wildly intractable in realistic environments, so real-world agents have to rely on imperfect heuristics to get by. As an optimality condition, however, Bayesian updating helps make sense of the idea that some ways of changing one's mind work better than others for learning about the world.

beisutsukai. Japanese for "Bayes user." A fictional order of high-level rationalists, also known as the Bayesian Conspiracy.

Bell's Theorem.

Berkeleian idealism. The belief, espoused by George Berkeley, that things only exist in various minds (including the mind of God).

bias. (a) A cognitive bias. In Rationality: From AI to Zombies, this will be the default meaning. (b) A statistical bias. (c) An inductive bias. (d) Colloquially: prejudice or unfairness.

bit. (a) A binary digit, taking the value 0 or 1. (b) The logarithm (base 1/2) of a probability—the maximum information that can be communicated using a binary digit, averaged over the digit's states. Rationality: From AI to Zombies usually uses "bit" in the latter sense.

black box. Any process whose inner workings are mysterious or poorly understood.

Black Swan.

blind god. One of Yudkowsky's pet names for natural selection.

Blue and Green. Rival sports teams and political factions in ancient Rome.

Born rule.

C

calibration. Assigning probabilities to beliefs in a way that matches how often those beliefs turn out to be right. E.g., if your assignment of "70% confidence" to claims is well-calibrated, then you will get such claims right about 70% of the time.

causal decision theory. The theory that the right way to make decisions is by picking the action with the best causal consequences.

causal graph. A directed acyclic graph in which an arrow going from node A to node B is interpreted as "changes in A can directly cause changes in B."

cognitive bias. A systematic error stemming from the way human reasoning works. This can be contrasted with errors due to ordinary ignorance, misinformation, brain damage, etc.

collapse.

comparative advantage. An ability to produce something at a lower cost than some other actor could. This is not the same as having an absolute advantage over someone: you may be a better cook than someone across-the-board, but that person will still have a comparative advantage over you at cooking some dishes. This is because your cooking skills make your time more valuable; the worse cook may have a comparative advantage at baking bread, for example, since it doesn’t cost them much to spend a lot of time on baking, whereas you could be spending that time creating a large number of high-quality dishes. Baking bread is more costly for the good cook than for the bad cook because the good cook is paying a larger opportunity cost, i.e., is giving up more valuable opportunities to be doing other things.

complex. (a) Colloquially, something with many parts arranged in a relatively specific way. (b) In information theory, something that's relatively hard to formally specify and that thereby gets a larger penalty under Occam's razor; measures of this kind of complexity include Kolmogorov complexity. (c) Complex-valued, i.e., represented by the sum of a real number and an imaginary number.

conditional independence.

conditional probability. The probability that a statement is true on the assumption that some other statement is true. E.g., the conditional probability P(A|B) means "the probability of A given that B."

configuration space.

confirmation bias. The cognitive bias of giving more weight to evidence that agrees with one's current beliefs.

conjunction. A sentence that asserts multiple things. "It's raining and I'm eating a sandwich" is a conjunction; its conjuncts are "It's raining" and "I'm eating a sandwich."

conjunction fallacy. The fallacy of treating a conjunction as though it were more likely than its conjuncts.

consequentialism. (a) The ethical theory that the moral rightness of actions depends only on what outcomes result. Consequentialism is normally contrasted with ideas like deontology, which says that morality is about following certain rules (e.g., "don't lie") regardless of the consequences. (b) Yudkowsky's term for any reasoning process that selects actions based on their consequences.

Copenhagen Interpretation.

correspondence bias. Drawing conclusions about someone's unique disposition from behavior that can be entirely explained by the situation in which it occurs. When we see someone else kick a vending machine, we think they are "an angry person," but when we kick the vending machine, it's because the bus was late, the train was early, and the machine ate our money.

Cox's Theorem.

cryonics. The low-temperature preservation of brains. Cryonics proponents argue that cryonics should see more routine use for people whose respiration and blood circulation have recently stopped (i.e., people who qualify as clinically deceased), on the grounds that future medical technology may be able to revive such people.

D

de novo. Entirely new; produced from scratch.

decibel.

decision theory. (a) The mathematical study of correct decision-making in general, abstracted from an agent's particular beliefs, goals, or capabilities. (b) A well-defined general-purpose procedure for arriving at decisions, e.g., causal decision theory.

decoherence.

deontology. The theory that moral conduct is about choosing actions that satisfy specific rules like "don't lie" or "don't steal."

directed acyclic graph. A graph that is directed (its edges have a direction associated with them) and acyclic (there's no way to follow a sequence of edges in a given direction to loop around from a node back to itself).

dukkha.

Dutch book.

E

edge. See “graph.”

élan vital. "Vital force." A term coined in 1907 by the philosopher Henri Bergson to refer to a mysterious force that was held to be responsible for life's "aliveness" and goal-oriented behavior.

entanglement. (a) Causal correlation between two things. (b) In quantum physics, the mutual dependence of two particles' states upon one another. Entanglement in sense (b) occurs when a quantum amplitude distribution cannot be factorized.

entropy. (a) In thermodynamics, the number of different ways a physical state may be produced (its Boltzmann entropy). E.g., a slightly shuffled deck has lower entropy than a fully shuffled one, because there are many more configurations a fully shuffled deck is likely to end up in. (b) In information theory, the expected value of the information contained in a message (its Shannon entropy). That is, a random variable’s Shannon entropy is how many bits of information one would be missing (on average) if one did not know the variable’s value.

Boltzmann entropy and Shannon entropy have turned out to be equivalent; that is, a system’s thermodynamic disorder corresponds to the number of bits needed to fully characterize it.

epistemic. Concerning knowledge.

epistemology. (a) A world-view or approach to forming beliefs. (b) The study of knowledge.

eudaimonia.

Eurisko.

eutopia. Yudkowsky’s term for a utopia that’s actually nice to live in, as opposed to one that’s unpleasant or unfeasible.

Everett branch. A "world" in the many-worlds interpretation of quantum mechanics.

existential risk. Something that threatens to permanently and drastically reduce the value of the future, such as stable global totalitarianism or human extinction.

expected utility. The expected value of a utility function given some action. Roughly: how much an agent’s goals will tend to be satisfied by some action, given uncertainty about the action's outcome.

A sure $1 will usually lead to more utility than a 10% chance of $1 million. Yet in all cases, the 10% shot at $1 million has more expected utility, assuming you assign more than ten times as much utility to winning $1 million. Expected utility is an idealized mathematical framework for making sense of the idea "good bets don't have to be sure bets."

expected value. The sum of all possible values of a variable, each multiplied by its probability of being the true value.

F

FAI. See “friendly AI.”

falsificationism.

Fermi paradox. The puzzle of reconciling "on priors, we should expect there to be many large interstellar civilizations visible in the night sky" and "we see no clear signs of such civilizations."

Some reasons many people find it puzzling that there are no visible alien civilizations include: "the elements required for life on Earth seem commonplace"; "life had billions of years to develop elsewhere before we evolved"; "high intelligence seems relatively easy to evolve (e.g., many of the same cognitive abilities evolved independently in humans, octopuses, crows)"; and "although some goals favor hiddenness, many different possible goals favor large-scale extraction of resources, and we only require there to exist one old species of the latter type."

fitness. See “inclusive fitness.”

foozality. See "rationality."

frequentism. (a) The view that the Bayesian approach to probability—i.e., treating probabilities as belief states—is unduly subjective. Frequentists instead propose treating probabilities as frequencies of events. (b) Frequentist statistical methods.

Friendly AI. Artificial general intelligence systems that are safe and useful. "Friendly" is a deliberately informal descriptor, intended to signpost that "Friendliness" still has very little technical content and needs to be further developed. Although this remains true in many respects as of this writing (2018), Friendly AI research has become much more formally developed since Yudkowsky coined the term "Friendly AI" in 2001, and the research area is now more often called "AI alignment research."

Fun Theory.

G

graph. In graph theory, a mathematical object consisting of simple atomic objects ("vertices," or "nodes") connected by lines (or "edges"). When edges have an associated direction, they are also called "arrows."

gray goo.

Gricean implication.

group selection. Natural selection at the level of groups, as opposed to individuals. Historically, group selection used to be viewed as a more central and common part of evolution—evolution was thought to frequently favor self-sacrifice "for the good of the species."

H

halo effect. The tendency to assume that something good in one respect must be good in other respects.

halting oracle. An abstract agent that is stipulated to be able to reliably answer questions that no algorithm can reliably answer. Though it is provably impossible for finite rule-following systems (e.g., Turing machines) to answer certain questions (e.g., the halting problem), it can still be mathematically useful to consider the logical implications of scenarios in which we could access answers to those questions.

happy death spiral. See “affective death spiral.”

hedonic. Concerning pleasure.

heuristic. An imperfect method for achieving some goal. A useful approximation. Cognitive heuristics are innate, humanly universal brain heuristics.

hindsight bias. The tendency to exaggerate how well one could have predicted things that one currently believes.

humility. Not being arrogant or overconfident. Yudkowsky defines humility as "taking specific actions in anticipation of your own errors." He contrasts this with "modesty," which he views as a social posture for winning others' approval or esteem, rather than as a form of epistemic humility.

I

inclusive fitness. The degree to which a gene causes more copies of itself to exist in the next generation. Inclusive fitness is the property propagated by natural selection. Unlike individual fitness, which is a specific organism’s tendency to promote more copies of its genes, inclusive fitness is held by the genes themselves. Inclusive fitness can sometimes be increased at the expense of the individual organism’s overall fitness.

inductive bias. The set of assumptions a learner uses to derive predictions from a data set. The learner is "biased" in the sense that it's more likely to update in some directions than in others, but unlike with other conceptions of "bias", the idea of "inductive bias" doesn't imply any sort of error.

instrumental. Concerning usefulness or effectiveness.

instrumental value. A goal that is only pursued in order to further some other goal.

intelligence explosion. A scenario in which AI systems rapidly improve in cognitive ability because they see fast, consistent, sustained returns on investing work into such improvement. This could happen via AI systems using their intelligence to rewrite their own code, improve their hardware, or acquire more hardware, then leveraging their improved capabilities to find more ways to improve.

intentionality. The ability of things to represent, or refer to, other things. Not to be confused with "intent."

isomorphism. A two-way mapping between objects in a category. Informally, two things are often called "isomorphic" if they're identical in every relevant respect.

Iterated Prisoner’s Dilemma. A series of Prisoner’s Dilemmas between the same two players. Because players can punish each other for defecting on previous rounds, they will usually more reason to cooperate than in the one-shot Prisoner’s Dilemma.

J

joint probability distribution. A probability distribution that assigns probabilities to combinations of claims. E.g., if the claims in question are "Is it cold?" and "Is it raining?", a joint probability distribution could assign probabilities to "it's cold and rainy," "it's cold and not rainy," "it's not cold but is rainy," and "it's neither cold nor rainy."

just-world fallacy. The cognitive bias of systematically overestimating how much reward people get for good deeds, and how much punishment they get for bad deeds.

K

koan. In Zen Buddhism, a short story or riddle aimed at helping the hearer break through various preconceptions.

Kolmogorov complexity. A formalization of the idea of complexity. Given a programming language, a computable string's Kolmogorov complexity is the length of the shortest computer program in that language that outputs the string.

L

likelihood. In Bayesian probability theory, how much probability a hypothesis assigns to a piece of evidence. Suppose we observe the evidence E = "Mr. Boddy was knifed," and our hypotheses are H_P = "Professor Plum killed Boddy" and H_W = "Mrs. White killed Boddy." If we think there's a 25% chance that Plum would use a knife in the worlds where he chose to kill Boddy, then we can say H_P assigns a likelihood of 25% to E.

Suppose that there's only a 5% chance Mrs. White would use a knife if she killed Boddy. Then we can say that the likelihood ratio between H_P and H_W is 25/5 = 5. This means that the evidence supports "Plum did it" five times as strongly as it supports "White did it," which tells us how to update upon observing E. (See "odds ratio" for a simple example.)

M

magisterium. Stephen Gould’s term for a domain where some community or field has authority. Gould claimed that science and religion were separate and non-overlapping magisteria. On his view, religion has authority to answer questions of "ultimate meaning and moral value" (but not empirical fact) and science has authority to answer questions of empirical fact (but not meaning or value).

many-worlds interpretation. The idea that the basic posits in quantum physics (complex-valued amplitudes) are objectively real and consistently evolve according to the Schrödinger equation. Opposed to anti-realist and collapse interpretations. Many-worlds holds that the classical world we seem to inhabit at any given time is a small component of an ever-branching amplitude.

The "worlds" of the many-worlds interpretation are not discrete or fundamental to the theory. Speaking of "many worlds" is, rather, a way of gesturing at the idea that the ordinary objects of our experience are part of a much larger whole that contains enormously many similar objects.

map and territory. A metaphor for the relationship between beliefs (or other mental states) and the real-world things they purport to refer to.

materialism. The belief that all mental phenomena can in principle be reduced to physical phenomena.

maximum-entropy probability distribution. A probability distribution which assigns equal probability to every event.

Maxwell’s Demon. A hypothetical agent that knows the location and speed of individual molecules in a gas. James Maxwell used this demon in a thought experiment to show that such knowledge could decrease a physical system’s entropy, “in contradiction to the second law of thermodynamics.” The demon’s ability to identify faster molecules allows it to gather them together and extract useful work from them. Leó Szilárd later pointed out that if the demon itself were considered part of the thermodynamic system, then the entropy of the whole would not decrease. The decrease in entropy of the gas would require an increase in the demon’s entropy. Szilárd used this insight to simplify Maxwell’s scenario into a hypothetical engine that extracts work from a single gas particle. Using one bit of information about the particle (e.g., whether it’s in the top half of a box or the bottom half), a Szilárd engine can generate log2(kT) joules of energy, where T is the system’s temperature and k is Boltzmann’s constant.

meta level. A domain that is more abstract or derivative than the object level.

metaethics. A theory about what it means for ethical statements to be correct, or the study of such theories. Whereas applied ethics speaks to questions like "Is murder wrong?" and "How can we reduce the number of murders?", metaethics speaks to questions like "What does it mean for something to be wrong?" and "How can we generally distinguish right from wrong?"

Mind Projection Fallacy.

Minimum Message Length Principle. A formalization of Occam’s Razor that judges the probability of a hypothesis based on how long it would take to communicate the hypothesis plus the available data. Simpler hypotheses are favored, as are hypotheses that can be used to concisely encode the data.

modesty. Yudkowsky's term for the social impulse to appear deferential or self-effacing, and resultant behaviors. Yudkowsky contrasts this with the epistemic virtue of humility.

monotonicity. Roughly, the property of never reversing direction. A monotonic function is any function between ordered sets that either preserves the order, or completely flips it. A non-monotonic function, then, is one that at least once takes an a<b input and outputs a>b, and at least once takes a c>d input and outputs c<d.

A monotonic logic is one that will always continue to assert something as true if it ever asserted it as true. If "2+2=4" is proved, then in a monotonic logic no subsequent operation can make it impossible to derive that theorem again in the future. In contrast, non-monotonic logics can "forget" past conclusions and lose the ability to derive them.

Moore’s Law. A 1965 observation and prediction by Intel co-founder Gordon Moore: roughly every two years (originally every one year), engineers are able to double the number of transistors that can be fit on an integrated circuit. This projection held true into the 2010s. Other versions of this "law" consider other progress metrics for computing hardware.

motivated cognition. Reasoning that is driven by some goal or emotion that's at odds with accuracy. Examples include non-evidence-based inclinations to reject a claim (motivated skepticism), to believe a claim (motivated credulity), to continue evaluating an issue (motivated continuation), or to stop evaluating an issue (motivated stopping).

Murphy’s law. The saying “Anything that can go wrong will go wrong.”

mutual information. For two variables, the amount that knowing about one variable tells you about the other's value. If two variables have zero mutual information, then they are independent; knowing the value of one does nothing to reduce uncertainty about the other.

N

nanotechnology. (a) Fine-grained control of matter on the scale of individual atoms, as in Eric Drexler's writing. This is the default meaning in Rationality: From AI to Zombies. (b) Manipulation of matter on a scale of nanometers.

Nash equilibrium. A situation in which no individual would benefit by changing their own strategy, assuming the other players retain their strategies. Agents often converge on Nash equilibria in the real world, even when they would be much better off if multiple agents simultaneously switched strategies. For example, mutual defection is the only Nash equilibrium in the standard one-shot Prisoner’s Dilemma (i.e., it is the only option such that neither player could benefit by changing strategies while the other player’s strategy is held constant), even though it is not Pareto-optimal (i.e., each player would be better off if the group behaved differently).

negentropy. Negative entropy. A useful concept because it allows one to think of thermodynamic regularity as a limited resource one can possess and make use of, rather than as a mere absence of entropy.

Newcomb’s Problem. A central problem in decision theory. Imagine an agent that understands psychology well enough to predict your decisions in advance, and decides to either fill two boxes with money, or fill one box, based on their prediction. They put $1,000 in a transparent box no matter what, and they then put $1 million in an opaque box if (and only if) they predicted that you’d only take the opaque box. The predictor tells you about this, and then leaves. Which do you pick?

If you take both boxes ("two-boxing"), you get only the $1000, because the predictor foresaw your choice and didn’t fill the opaque box. On the other hand, if you only take the opaque box, you come away with $1 million. So it seems like you should take only the opaque box.

However, causal decision theorists object to this strategy on the grounds that you can’t causally control what the predictor did in the past; the predictor has already made their decision by the time you make yours, and regardless of whether or not they placed the $1 million in the opaque box, you’ll be throwing away a free $1000 if you choose not to take it. For the same reason, causal decision theory prescribes defecting in one-shot Prisoner’s Dilemmas, even if you’re playing against a perfect atom-by-atom copy of yourself.

nonmonotonic logic. See “monotonic logic.”

normalization. Adjusting values to meet some common standard or constraint, often by adding or multiplying a set of values by a constant. E.g., adjusting the probabilities of hypotheses to sum to 1 again after eliminating some hypotheses. If the only three possibilities are A, B, and C, each with probability 1/3, then evidence that ruled out C (and didn’t affect the relative probability of A and B) would leave us with A at 1/3 and B at 1/3. These values must be adjusted (normalized) to make the space of hypotheses sum to 1, so A and B change to probability 1/2 each.

normative. Good, or serving as a standard for desirable behavior.

NP-complete. The hardest class of decision problems within the class NP, where NP consists of the problems that an ideal computer (specifically, a deterministic Turing machine) could efficiently verify correct answers to. The difficulty of NP-complete problems is such that if an algorithm were discovered to efficiently solve even one NP-complete problem, that algorithm would allow one to efficiently solve every NP problem. Many computer scientists hypothesize that this is impossible, a conjecture called “P ≠ NP.”

null-op. A null operation; an action that does nothing in particular.

O

object level. The level of concrete things, as contrasted with the "meta" level. The object level tends to be a base case or starting point, while the meta level is comparatively abstract, recursive, or indirect in relevance.

Occam’s Razor. The principle that, all else being equal, a simpler claim is more probable than a relatively complicated one. Formalizations of Occam’s Razor include Solomonoff induction and the Minimum Message Length Principle.

odds ratio. A way of representing how likely two events are relative to each other. E.g., if I have no information about which day of the week it is, the odds are 1:6 that it’s Sunday. This is the same as saying that "it's Sunday" has a prior probability of 1/7. If x:y is the odds ratio, the probability of x is x / (x + y).

Likewise, to convert a probability p into an odds ratio, I can just write p : (1 - p). For a percent probability p%, this becomes p : (100 - p). E.g., if my probability of winning a race is 40%, my odds are 40:60, which can also be written 2:3.

Odds ratios are useful because they're usually the easiest way to calculate a Bayesian update. If I notice the mall is closing early, and that’s twice as likely to happen on a Sunday as it is on a non-Sunday (a likelihood ratio of 2:1), I can simply multiply the left and right sides of my prior it’s Sunday (1:6) by the evidence’s likelihood ratio (2:1) to arrive at a correct posterior probability of 2:6, or 1:3.

Omega. A hypothetical arbitrarily powerful agent used in various thought experiments.

one-boxing. Taking only the opaque box in Newcomb's Problem.

ontology. An account of the things that exist, especially one that focuses on their most basic and general similarities. Things are "ontologically distinct" if they are of two fundamentally different kinds.

opportunity cost. The value lost from choosing not to acquire something valuable. If I choose not to make an investment that would have earned me $10, I don’t literally lose $10 -- if I had $100 at the outset, I’ll still have $100 at the end, not $90. Still, I pay an opportunity cost of $10 for missing a chance to gain something I want. I lose $10 relative to the $110 I could have had. Opportunity costs can result from making a bad decision, but they also occur when you make a good decision that involves sacrificing the benefits of inferior options for the different benefits of a superior option. Many forms of human irrationality involve assigning too little importance to opportunity costs.

optimization process. Yudkowsky’s term for a process that performs searches through a large search space, and manages to hit very specific targets that would be astronomically unlikely to occur by chance.

E.g., the existence of trees is much easier to understand if we posit a search process, evolution, that iteratively comes up with better and better solutions to cognitively difficult problems. A well-designed dam, similarly, is easier to understand if we posit an optimization process searching for designs or policies that meet some criterion. Evolution, humans, and beavers all share this property, and can therefore be usefully thought of as optimization processes. In contrast, the processes that produce mountains and stars are easiest to describe in other terms.

orthogonality. The independence of two (or more) variables. If two variables are orthogonal, then knowing the value of one doesn't help you learn the value of the other.

P

P ≠ NP. A widely believed conjecture in computational complexity theory. NP is the class of mathematically specifiable questions with input parameters (e.g., “can a number list A be partitioned into two number lists B and C whose numbers sum to the same value?”) such that one could always in principle efficiently confirm that a correct solution to some instance of the problem (e.g., “the list {3,2,7,3,5} splits up into the lists {3,2,5} and {7,3}, and the latter two lists sum to the same number”) is in fact correct. More precisely, NP is the class of decision problems that a deterministic Turing machine could verify answers to in a polynomial amount of computing time. P is the class of decision problems that one could always in principle efficiently solve -- e.g., given {3,2,7,3,5} or any other list, quickly come up with a correct answer (like “{3,2,5} and {7,3}”) should one exist. Since all P problems are also NP problems, for P to not equal NP would mean that some NP problems are not P problems; i.e., some problems cannot be efficiently solved even though solutions to them, if discovered, could be efficiently verified.

Pareto optimum. A situation in which no one can be made better off without making at least one person worse off.

phase space. A mathematical representation of physical systems in which each axis of the space is a degree of freedom (a property of the system that must be specified independently) and each point is a possible state.

phlogiston. A substance hypothesized in the 17th entity to explain phenomena such as fire and rust. Combustible objects were thought by late alchemists and early chemists to contain phlogiston, which evaporated during combustion.

physicalism. See “materialism.”

Planck units. Natural units, such as the Planck length and the Planck time, representing the smallest physically significant quantized phenomena.

positive bias. Bias toward noticing what a theory predicts you’ll see, instead of noticing what a theory predicts you won’t see.

possible world. A way the world could have been. One can say “there is a possible world in which Hitler won World War II” in place of “Hitler could have won World War II,” making it easier to contrast the features of multiple hypothetical or counterfactual scenarios. Not to be confused with the worlds of the many-worlds interpretation of quantum physics or Max Tegmark's Mathematical Universe Hypothesis, which are claimed (by their proponents) to be actual.

posterior probability. An agent's beliefs after acquiring evidence. Contrasted with its prior beliefs, or priors.

prior probability. An agent’s beliefs prior to acquiring some evidence.

Prisoner’s Dilemma. A game in which each player can choose to either "cooperate" with or "defect" against the other. The best outcome for each player is to defect while the other cooperates; and the worst outcome is to cooperate while the other defects. Each player views mutual cooperation as the second-best option, and mutual defection as the second-worst.

Traditionally, game theorists have argued that defection is always the correct move in one-shot dilemmas; it improves your reward if the other player independently cooperates, and it lessens your loss if the other player independently defects.

Yudkowsky is one of a minority of decision theorists who argue that rational cooperation is possible in the one-shot Prisoner's Dilemma, provided the two players' decision-making is known to be sufficiently similar. "My opponent and I are both following the same decision procedure, so if I cooperate, my opponent will cooperate too; and if I defect, my opponent will defect. The former seems preferable, so this decision procedure hereby outputs `cooperate."

probability amplitude. See “amplitude.”

probability distribution. A function which assigns a probability (i.e., a number representing how likely something is to be true) to every possibility under consideration. Discrete and continuous probability distributions are generally encoded by, respectively, probability mass functions and probability density functions.

Thinking of probability as a "mass" that must be divided up between possibilities can be a useful way to keep in view that reducing the probability of one hypothesis always requires increasing the probability of others, and vice versa. Probability, like (classical) mass, is conserved.

probability theory. The branch of mathematics concerned with defining statistical truths and quantifying uncertainty.

problem of induction. In philosophy, the question of how we can justifiably assert that the future will resemble the past without relying on evidence that presupposes that very fact.

Q

quark. An elementary particle of matter.

quine. A program that outputs its own source code.

R

rationalist. (a) Related to rationality. (b) A person who tries to apply rationality concepts to their real-world decisions.

rationality. Making systematically good decisions (instrumental rationality) and achieving systematically accurate beliefs (epistemic rationality).

reductio ad absurdum. Refuting a claim by showing that it entails a claim that is more obviously false.

reduction. An explanation of a phenomenon in terms of its origin or parts, especially one that allows you to redescribe the phenomenon without appeal to your previous conception of it.

reductionism. (a) The practice of scientifically reducing complex phenomena to simpler underpinnings. (b) The belief that such reductions are generally possible.

representativeness heuristic. A cognitive heuristic where one judges the probability of an event based on how well it matches some mental prototype.

Ricardo’s Law of Comparative Advantage. See “comparative advantage.”

S

satori. In Zen Buddhism, a non-verbal, pre-conceptual apprehension of the ultimate nature of reality.

Schrödinger equation. A fairly simple partial differential equation that defines how quantum wavefunctions evolve over time. This equation is deterministic; it is not known why the Born rule, which converts the wavefunction into an experimental prediction, is probabilistic, though there have been many attempts to make headway on that question.

scope insensitivity. A cognitive bias where people tend to disregard the size of certain phenomena.

screening off. Making something evidentially irrelevant. A piece of evidence A screens off a piece of evidence B from a hypothesis C if, once you know about A, learning about B doesn’t affect the probability of C.

search tree. A graph with a root node that branches into child nodes, which can then either terminate or branch once more. The tree data structure is used to locate values; in chess, for example, each node can represent a move, which branches into the other player’s possible responses, and searching the tree is intended to locate winning sequences of moves.

self-anchoring. Anchoring to oneself. Treating one’s own qualities as the default, and only weakly updating toward viewing others as different when given evidence of differences.

Shannon entropy. See “entropy.”

Shannon mutual information. See “mutual information.”

Simulation Hypothesis. The hypothesis that the world as we know it is a computer program designed by some powerful intelligence.

Singularity. One of several scenarios in which artificial intelligence systems surpass human intelligence in a large and dramatic way.

skyhook. An attempted explanation of a complex phenomenon in terms of a deeply mysterious or miraculous phenomenon -- often one of even greater complexity.

Solomonoff induction. An attempted definition of optimal (albeit computationally unfeasible) inference. Bayesian updating plus a simplicity prior that assigns less probability to percept-generating programs the longer they are.

stack trace. A retrospective step-by-step report on a program's behavior, intended to reveal the source of an error.

statistical bias. A systematic discrepancy between the expected value of some measure, and the true value of the thing you're measuring.

superintelligence. Something vastly smarter than present-day humans. This can be a predicted future technology, like smarter-than-human AI; or it can be a purely hypothetical agent, such as Omega or Laplace's Demon.

System 1. The processes behind the brain’s fast, automatic, emotional, and intuitive judgments.

System 2. The processes behind the brain’s slow, deliberative, reflective, and intellectual judgments.

Szilárd engine. See “Maxwell’s Demon.”

T

Taboo. A game by Hasbro where you try to get teammates to guess what word you have in mind while avoiding conventional ways of communicating it. Yudkowsky uses this as an analogy for the rationalist skill of linking words to the concrete evidence you use to decide when to apply them. Ideally, one should be know what one is saying well enough to paraphrase the message in several different ways, and to replace abstract generalizations with concrete observations.

Tegmark world. A universe contained in a vast multiverse of mathematical objects. The idea comes from Max Tegmark's Mathematical Universe Hypothesis, which holds that our own universe is a mathematical object contained in an ensemble in which all possible computable structures exist.

terminal value. A goal that is pursued for its own sake, and not just to further some other goal.

Tit for Tat. A strategy in which one cooperates on the first round of an Iterated Prisoner’s Dilemma, then on each subsequent rounds mirrors what the opponent did the previous round.

Traditional Rationality. Yudkowsky’s term for the scientific norms and conventions espoused by thinkers like Richard Feynman, Carl Sagan, and Charles Peirce. Yudkowsky contrasts this with the ideas of rationality in contemporary mathematics and cognitive science.

transhuman. (a) Entities that are human-like, but much more capable than ordinary biological humans. (b) Related to radical human enhancement. Transhumanism is the view that humans should use technology to radically improve their lives—e.g., curing disease or ending aging.

truth-value. A proposition’s truth or falsity.

Turing-computability. The ability to be executed, at least in principle, by a simple process following a finite set of rules. "In principle" here means that a Turing machine could perform the computation, though we may lack the time or computing power to build a real-world machine that does the same. Turing-computable functions cannot be computed by all Turing machines, but they can be computed by some. In particular, they can be computed by all universal Turing machines.

Turing machine. An abstract machine that follows rules for manipulating symbols on an arbitrarily long tape.

two-boxing. Taking both boxes in Newcomb's Problem.

U

Unfriendly AI. A hypothetical smarter-than-human artificial intelligence that causes a global catastrophe by pursuing a goal without regard for humanity’s well-being. Yudkowsky predicts that superintelligent AI will be “Unfriendly” by default, unless a special effort goes into researching how to give AI stable, known, humane goals. Unfriendliness doesn’t imply malice, anger, or other human characteristics; a completely impersonal optimization process can be “Unfriendly” even if its only goal is to make paperclips. This is because even a goal as innocent as ‘maximize the expected number of paperclips’ could motivate an AI to treat humans as competitors for physical resources, or as threats to the AI’s aspirations.

uniform probability distribution. A distribution in which all events have equal probability; a maximum-entropy probability distribution.

universal Turing machine. A Turing machine that can compute all Turing-computable functions. If something can be done by any Turing machine, then it can be done by every universal Turing machine. A system that can in principle do anything a Turing machine could is called “Turing-complete."

updating. Revising one’s beliefs. See also "Bayesian updating."

utilitarianism. An ethical theory asserting that one should act in whichever causes the most benefit to people, minus how much harm results. Standard utilitarianism argues that acts can be justified even if they are morally counter-intuitive and harmful, provided that the benefit outweighs the harm.

utility function. A function that ranks outcomes by "utility," i.e., by how well they satisfy some set of goals or constraints. Humans are limited and imperfect reasoners, and don't consistently optimize any endorsed utility function; but the idea of optimizing a utility function helps us give formal content to "what it means to pursue a goal well," just as Bayesian updating helps formalize "what it means to learn well."

utilon. Yudkowsky’s name for a unit of utility, i.e., something that satisfies a goal. The term is deliberately vague, to permit discussion of desired and desirable things without relying on imperfect proxies such as monetary value and self-reported happiness.

V

W

wavefunction. A complex-valued function used in quantum mechanics to explain and predict the wave-like behavior of physical systems at small scales. Realists about the wavefunction treat it as a good characterization of the way the world really is, more fundamental than earlier (e.g., atomic) models. Anti-realists disagree, although they grant that the wavefunction is a useful tool by virtue of its mathematical relationship to observed properties of particles (the Born rule).

wu wei. “Non-action.” The concept, in Daoism, of effortlessly achieving one’s goals by ceasing to strive and struggle to reach them.

X

XML. Extensible Markup Language, a system for annotating texts with tags that can be read both by a human and by a machine.

Z

ZF. The Zermelo–Fraenkel axioms, an attempt to ground standard mathematics in set theory. ZFC (the Zermelo–Fraenkel axioms supplemented with the Axiom of Choice) is the most popular axiomatic set theory.

zombie. In philosophy, a perfect atom-by-atom replica of a human that lacks a human’s subjective awareness. Zombies behave exactly like humans, but they lack consciousness. Some philosophers argue that the idea of zombies is coherent -- that zombies, although not real, are at least logically possible. They conclude from this that facts about first-person consciousness are logically independent of physical facts, that our world breaks down into both physical and nonphysical components. Most philosophers reject the idea that zombies are logically possible, though the topic continues to be actively debated.

This is a list of brief explanations and definitions for terms that Eliezer Yudkowsky uses in the book Rationality: From AI to Zombies, an edited version of the Sequences.

The glossary is a community effort, and you're welcome to improve on the entries here, or add new ones. See the Talk page for some ideas for unwritten entries.

__TOC__

A

a priori. Before considering the evidence. Similarly, "a posteriori" means "after considering the evidence"; compare prior and posterior probabilities.

In philosophy, "a priori" often refers to the stronger idea of something knowable in the absence of any experiential evidence (outside of the evidence needed to understand the claim).

affect heuristic. People's general tendency to reason based on things' felt goodness or badness.

affective death spiral. Yudkowsky's term for a halo effect that perpetuates and exacerbates itself over time.

AGI. See “artificial general intelligence.”

AI-Box Experiment. A demonstration by Yudkowsky that people tend to overestimate how hard it is to manipulate people, and therefore underestimate the risk of building an Unfriendly AI that can only interact with its environment by verbally communicating with its programmers. One participant role-plays an AI, while another role-plays a human whose job it is interact with the AI without voluntarily releasing the AI from its “box”. Yudkowsky and a few other people who have role-played the AI have succeeded in getting the human supervisor to agree to release them, which suggests that a superhuman intelligence would have an even easier time escaping.

akrasia.

alien god. One of Yudkowsky's pet names for natural selection.

ambiguity aversion. Preferring small certain gains over much larger uncertain gains.

amplitude. A quantity in a configuration space, represented by a complex number. Many sources misleadingly refer to quantum amplitudes as "probability amplitudes", even though they aren't probabilities. Amplitudes are physical, not abstract or formal. The complex number’s modulus squared (i.e., its absolute value multiplied by itself) yields the Born probabilities, but the reason for this is unknown.

amplitude distribution. See “wavefunction.”

anchoring. The cognitive bias of relying excessively on initial information after receiving relevant new information.

anthropics. Problems related to reasoning well about how many observers like you there are.

artificial general intelligence. Artificial intelligence that is "general-purpose" in the same sense that human reasoning is general-purpose. It's hard to crisply state what this kind of reasoning consists in—if we knew how to fully formalize it, we would already know how to build artificial general intelligence. However, we can gesture at (e.g.) humans' ability to excel in many different scientific fields, even though we did not evolve in an ancestral environment containing particle accelerators.

Aumann's Agreement Theorem.

availability heuristic. The tendency to base judgments on how easily relevant examples come to mind.

average utilitarianism.

B

Backward chaining.

Base rate.

Bayes's Theorem. The equation stating how to update a hypothesis H in light of new evidence E. In its simplest form, Bayes's Theorem says that a hypothesis' probability given the evidence, written P(H|E), equals the likelihood of the evidence given that hypothesis, multiplied by your prior probability P(H) that the hypothesis was true, divided by the prior probability P(E) that you would see that evidence regardless. I.e.:

P(H|E) = P(E|H) P(H) / P(E).

Also known as Bayes's Rule. See "odds ratio" for a simpler way to calculate a Bayesian update.

Bayesian. (a) Optimally reasoned; reasoned in accordance with the laws of probability. (b) An optimal reasoner, or a reasoner that approximates optimal inference unusually well. (c) Someone who treats beliefs as probabilistic and treats probability theory as a relevant ideal for evaluating reasoners. (d) Related to probabilistic belief. (e) Related to Bayesian statistical methods.

Bayesian updating. Revising your beliefs in a way that's fully consistent with the information available to you. Perfect Bayesian updating is wildly intractable in realistic environments, so real-world agents have to rely on imperfect heuristics to get by. As an optimality condition, however, Bayesian updating helps make sense of the idea that some ways of changing one's mind work better than others for learning about the world.

beisutsukai. Japanese for "Bayes user." A fictional order of high-level rationalists, also known as the Bayesian Conspiracy.

Bell's Theorem.

Berkeleian idealism. The belief, espoused by George Berkeley, that things only exist in various minds (including the mind of God).

bias. (a) A cognitive bias. In Rationality: From AI to Zombies, this will be the default meaning. (b) A statistical bias. (c) An inductive bias. (d) Colloquially: prejudice or unfairness.

bit. (a) A binary digit, taking the value 0 or 1. (b) The logarithm (base 1/2) of a probability—the maximum information that can be communicated using a binary digit, averaged over the digit's states. Rationality: From AI to Zombies usually uses "bit" in the latter sense.

black box. Any process whose inner workings are mysterious or poorly understood.

Black Swan.

blind god. One of Yudkowsky's pet names for natural selection.

Blue and Green. Rival sports teams and political factions in ancient Rome.

Born rule.

C

calibration. Assigning probabilities to beliefs in a way that matches how often those beliefs turn out to be right. E.g., if your assignment of "70% confidence" to claims is well-calibrated, then you will get such claims right about 70% of the time.

causal decision theory. The theory that the right way to make decisions is by picking the action with the best causal consequences.

causal graph. A directed acyclic graph in which an arrow going from node A to node B is interpreted as "changes in A can directly cause changes in B."

cognitive bias. A systematic error stemming from the way human reasoning works. This can be contrasted with errors due to ordinary ignorance, misinformation, brain damage, etc.

collapse.

comparative advantage. An ability to produce something at a lower cost than some other actor could. This is not the same as having an absolute advantage over someone: you may be a better cook than someone across-the-board, but that person will still have a comparative advantage over you at cooking some dishes. This is because your cooking skills make your time more valuable; the worse cook may have a comparative advantage at baking bread, for example, since it doesn’t cost them much to spend a lot of time on baking, whereas you could be spending that time creating a large number of high-quality dishes. Baking bread is more costly for the good cook than for the bad cook because the good cook is paying a larger opportunity cost, i.e., is giving up more valuable opportunities to be doing other things.

complex. (a) Colloquially, something with many parts arranged in a relatively specific way. (b) In information theory, something that's relatively hard to formally specify and that thereby gets a larger penalty under Occam's razor; measures of this kind of complexity include Kolmogorov complexity. (c) Complex-valued, i.e., represented by the sum of a real number and an imaginary number.

conditional independence.

conditional probability. The probability that a statement is true on the assumption that some other statement is true. E.g., the conditional probability P(A|B) means "the probability of A given that B."

configuration space.

confirmation bias. The cognitive bias of giving more weight to evidence that agrees with one's current beliefs.

conjunction. A sentence that asserts multiple things. "It's raining and I'm eating a sandwich" is a conjunction; its conjuncts are "It's raining" and "I'm eating a sandwich."

conjunction fallacy. The fallacy of treating a conjunction as though it were more likely than its conjuncts.

consequentialism. (a) The ethical theory that the moral rightness of actions depends only on what outcomes result. Consequentialism is normally contrasted with ideas like deontology, which says that morality is about following certain rules (e.g., "don't lie") regardless of the consequences. (b) Yudkowsky's term for any reasoning process that selects actions based on their consequences.

Copenhagen Interpretation.

correspondence bias. Drawing conclusions about someone's unique disposition from behavior that can be entirely explained by the situation in which it occurs. When we see someone else kick a vending machine, we think they are "an angry person," but when we kick the vending machine, it's because the bus was late, the train was early, and the machine ate our money.

Cox's Theorem.

cryonics. The low-temperature preservation of brains. Cryonics proponents argue that cryonics should see more routine use for people whose respiration and blood circulation have recently stopped (i.e., people who qualify as clinically deceased), on the grounds that future medical technology may be able to revive such people.

D

de novo. Entirely new; produced from scratch.

decibel.

decision theory. (a) The mathematical study of correct decision-making in general, abstracted from an agent's particular beliefs, goals, or capabilities. (b) A well-defined general-purpose procedure for arriving at decisions, e.g., causal decision theory.

decoherence.

deontology. The theory that moral conduct is about choosing actions that satisfy specific rules like "don't lie" or "don't steal."

directed acyclic graph. A graph that is directed (its edges have a direction associated with them) and acyclic (there's no way to follow a sequence of edges in a given direction to loop around from a node back to itself).

dukkha.

Dutch book.

E

edge. See “graph.”

élan vital. "Vital force." A term coined in 1907 by the philosopher Henri Bergson to refer to a mysterious force that was held to be responsible for life's "aliveness" and goal-oriented behavior.

entanglement. (a) Causal correlation between two things. (b) In quantum physics, the mutual dependence of two particles' states upon one another. Entanglement in sense (b) occurs when a quantum amplitude distribution cannot be factorized.

entropy. (a) In thermodynamics, the number of different ways a physical state may be produced (its Boltzmann entropy). E.g., a slightly shuffled deck has lower entropy than a fully shuffled one, because there are many more configurations a fully shuffled deck is likely to end up in. (b) In information theory, the expected value of the information contained in a message (its Shannon entropy). That is, a random variable’s Shannon entropy is how many bits of information one would be missing (on average) if one did not know the variable’s value.

Boltzmann entropy and Shannon entropy have turned out to be equivalent; that is, a system’s thermodynamic disorder corresponds to the number of bits needed to fully characterize it.

epistemic. Concerning knowledge.

epistemology. (a) A world-view or approach to forming beliefs. (b) The study of knowledge.

eudaimonia.

Eurisko.

eutopia. Yudkowsky’s term for a utopia that’s actually nice to live in, as opposed to one that’s unpleasant or unfeasible.

Everett branch. A "world" in the many-worlds interpretation of quantum mechanics.

existential risk. Something that threatens to permanently and drastically reduce the value of the future, such as stable global totalitarianism or human extinction.

expected utility. The expected value of a utility function given some action. Roughly: how much an agent’s goals will tend to be satisfied by some action, given uncertainty about the action's outcome.

A sure $1 will usually lead to more utility than a 10% chance of $1 million. Yet in all cases, the 10% shot at $1 million has more expected utility, assuming you assign more than ten times as much utility to winning $1 million. Expected utility is an idealized mathematical framework for making sense of the idea "good bets don't have to be sure bets."

expected value. The sum of all possible values of a variable, each multiplied by its probability of being the true value.

F

FAI. See “friendly AI.”

falsificationism.

Fermi paradox. The puzzle of reconciling "on priors, we should expect there to be many large interstellar civilizations visible in the night sky" and "we see no clear signs of such civilizations."

Some reasons many people find it puzzling that there are no visible alien civilizations include: "the elements required for life on Earth seem commonplace"; "life had billions of years to develop elsewhere before we evolved"; "high intelligence seems relatively easy to evolve (e.g., many of the same cognitive abilities evolved independently in humans, octopuses, crows)"; and "although some goals favor hiddenness, many different possible goals favor large-scale extraction of resources, and we only require there to exist one old species of the latter type."

fitness. See “inclusive fitness.”

foozality. See "rationality."

frequentism. (a) The view that the Bayesian approach to probability—i.e., treating probabilities as belief states—is unduly subjective. Frequentists instead propose treating probabilities as frequencies of events. (b) Frequentist statistical methods.

Friendly AI. Artificial general intelligence systems that are safe and useful. "Friendly" is a deliberately informal descriptor, intended to signpost that "Friendliness" still has very little technical content and needs to be further developed. Although this remains true in many respects as of this writing (2018), Friendly AI research has become much more formally developed since Yudkowsky coined the term "Friendly AI" in 2001, and the research area is now more often called "AI alignment research."

Fun Theory.

G

graph. In graph theory, a mathematical object consisting of simple atomic objects ("vertices," or "nodes") connected by lines (or "edges"). When edges have an associated direction, they are also called "arrows."

gray goo.

Gricean implication.

group selection. Natural selection at the level of groups, as opposed to individuals. Historically, group selection used to be viewed as a more central and common part of evolution—evolution was thought to frequently favor self-sacrifice "for the good of the species."

H

halo effect. The tendency to assume that something good in one respect must be good in other respects.

halting oracle. An abstract agent that is stipulated to be able to reliably answer questions that no algorithm can reliably answer. Though it is provably impossible for finite rule-following systems (e.g., Turing machines) to answer certain questions (e.g., the halting problem), it can still be mathematically useful to consider the logical implications of scenarios in which we could access answers to those questions.

happy death spiral. See “affective death spiral.”

hedonic. Concerning pleasure.

heuristic. An imperfect method for achieving some goal. A useful approximation. Cognitive heuristics are innate, humanly universal brain heuristics.

hindsight bias. The tendency to exaggerate how well one could have predicted things that one currently believes.

humility. Not being arrogant or overconfident. Yudkowsky defines humility as "taking specific actions in anticipation of your own errors." He contrasts this with "modesty," which he views as a social posture for winning others' approval or esteem, rather than as a form of epistemic humility.

I

inclusive fitness. The degree to which a gene causes more copies of itself to exist in the next generation. Inclusive fitness is the property propagated by natural selection. Unlike individual fitness, which is a specific organism’s tendency to promote more copies of its genes, inclusive fitness is held by the genes themselves. Inclusive fitness can sometimes be increased at the expense of the individual organism’s overall fitness.

inductive bias. The set of assumptions a learner uses to derive predictions from a data set. The learner is "biased" in the sense that it's more likely to update in some directions than in others, but unlike with other conceptions of "bias", the idea of "inductive bias" doesn't imply any sort of error.

instrumental. Concerning usefulness or effectiveness.

instrumental value. A goal that is only pursued in order to further some other goal.

intelligence explosion. A scenario in which AI systems rapidly improve in cognitive ability because they see fast, consistent, sustained returns on investing work into such improvement. This could happen via AI systems using their intelligence to rewrite their own code, improve their hardware, or acquire more hardware, then leveraging their improved capabilities to find more ways to improve.

intentionality. The ability of things to represent, or refer to, other things. Not to be confused with "intent."

isomorphism. A two-way mapping between objects in a category. Informally, two things are often called "isomorphic" if they're identical in every relevant respect.

Iterated Prisoner’s Dilemma. A series of Prisoner’s Dilemmas between the same two players. Because players can punish each other for defecting on previous rounds, they will usually more reason to cooperate than in the one-shot Prisoner’s Dilemma.

J

joint probability distribution. A probability distribution that assigns probabilities to combinations of claims. E.g., if the claims in question are "Is it cold?" and "Is it raining?", a joint probability distribution could assign probabilities to "it's cold and rainy," "it's cold and not rainy," "it's not cold but is rainy," and "it's neither cold nor rainy."

just-world fallacy. The cognitive bias of systematically overestimating how much reward people get for good deeds, and how much punishment they get for bad deeds.

K

koan. In Zen Buddhism, a short story or riddle aimed at helping the hearer break through various preconceptions.

Kolmogorov complexity. A formalization of the idea of complexity. Given a programming language, a computable string's Kolmogorov complexity is the length of the shortest computer program in that language that outputs the string.

L

likelihood. In Bayesian probability theory, how much probability a hypothesis assigns to a piece of evidence. Suppose we observe the evidence E = "Mr. Boddy was knifed," and our hypotheses are H_P = "Professor Plum killed Boddy" and H_W = "Mrs. White killed Boddy." If we think there's a 25% chance that Plum would use a knife in the worlds where he chose to kill Boddy, then we can say H_P assigns a likelihood of 25% to E.

Suppose that there's only a 5% chance Mrs. White would use a knife if she killed Boddy. Then we can say that the likelihood ratio between H_P and H_W is 25/5 = 5. This means that the evidence supports "Plum did it" five times as strongly as it supports "White did it," which tells us how to update upon observing E. (See "odds ratio" for a simple example.)

M

magisterium. Stephen Gould’s term for a domain where some community or field has authority. Gould claimed that science and religion were separate and non-overlapping magisteria. On his view, religion has authority to answer questions of "ultimate meaning and moral value" (but not empirical fact) and science has authority to answer questions of empirical fact (but not meaning or value).

many-worlds interpretation. The idea that the basic posits in quantum physics (complex-valued amplitudes) are objectively real and consistently evolve according to the Schrödinger equation. Opposed to anti-realist and collapse interpretations. Many-worlds holds that the classical world we seem to inhabit at any given time is a small component of an ever-branching amplitude.

The "worlds" of the many-worlds interpretation are not discrete or fundamental to the theory. Speaking of "many worlds" is, rather, a way of gesturing at the idea that the ordinary objects of our experience are part of a much larger whole that contains enormously many similar objects.

map and territory. A metaphor for the relationship between beliefs (or other mental states) and the real-world things they purport to refer to.

materialism. The belief that all mental phenomena can in principle be reduced to physical phenomena.

maximum-entropy probability distribution. A probability distribution which assigns equal probability to every event.

Maxwell’s Demon. A hypothetical agent that knows the location and speed of individual molecules in a gas. James Maxwell used this demon in a thought experiment to show that such knowledge could decrease a physical system’s entropy, “in contradiction to the second law of thermodynamics.” The demon’s ability to identify faster molecules allows it to gather them together and extract useful work from them. Leó Szilárd later pointed out that if the demon itself were considered part of the thermodynamic system, then the entropy of the whole would not decrease. The decrease in entropy of the gas would require an increase in the demon’s entropy. Szilárd used this insight to simplify Maxwell’s scenario into a hypothetical engine that extracts work from a single gas particle. Using one bit of information about the particle (e.g., whether it’s in the top half of a box or the bottom half), a Szilárd engine can generate log2(kT) joules of energy, where T is the system’s temperature and k is Boltzmann’s constant.

meta level. A domain that is more abstract or derivative than the object level.

metaethics. A theory about what it means for ethical statements to be correct, or the study of such theories. Whereas applied ethics speaks to questions like "Is murder wrong?" and "How can we reduce the number of murders?", metaethics speaks to questions like "What does it mean for something to be wrong?" and "How can we generally distinguish right from wrong?"

Mind Projection Fallacy.

Minimum Message Length Principle. A formalization of Occam’s Razor that judges the probability of a hypothesis based on how long it would take to communicate the hypothesis plus the available data. Simpler hypotheses are favored, as are hypotheses that can be used to concisely encode the data.

modesty. Yudkowsky's term for the social impulse to appear deferential or self-effacing, and resultant behaviors. Yudkowsky contrasts this with the epistemic virtue of humility.

monotonicity. Roughly, the property of never reversing direction. A monotonic function is any function between ordered sets that either preserves the order, or completely flips it. A non-monotonic function, then, is one that at least once takes an a<b input and outputs a>b, and at least once takes a c>d input and outputs c<d.

A monotonic logic is one that will always continue to assert something as true if it ever asserted it as true. If "2+2=4" is proved, then in a monotonic logic no subsequent operation can make it impossible to derive that theorem again in the future. In contrast, non-monotonic logics can "forget" past conclusions and lose the ability to derive them.

Moore’s Law. A 1965 observation and prediction by Intel co-founder Gordon Moore: roughly every two years (originally every one year), engineers are able to double the number of transistors that can be fit on an integrated circuit. This projection held true into the 2010s. Other versions of this "law" consider other progress metrics for computing hardware.

motivated cognition. Reasoning that is driven by some goal or emotion that's at odds with accuracy. Examples include non-evidence-based inclinations to reject a claim (motivated skepticism), to believe a claim (motivated credulity), to continue evaluating an issue (motivated continuation), or to stop evaluating an issue (motivated stopping).

Murphy’s law. The saying “Anything that can go wrong will go wrong.”

mutual information. For two variables, the amount that knowing about one variable tells you about the other's value. If two variables have zero mutual information, then they are independent; knowing the value of one does nothing to reduce uncertainty about the other.

N

nanotechnology. (a) Fine-grained control of matter on the scale of individual atoms, as in Eric Drexler's writing. This is the default meaning in Rationality: From AI to Zombies. (b) Manipulation of matter on a scale of nanometers.

Nash equilibrium. A situation in which no individual would benefit by changing their own strategy, assuming the other players retain their strategies. Agents often converge on Nash equilibria in the real world, even when they would be much better off if multiple agents simultaneously switched strategies. For example, mutual defection is the only Nash equilibrium in the standard one-shot Prisoner’s Dilemma (i.e., it is the only option such that neither player could benefit by changing strategies while the other player’s strategy is held constant), even though it is not Pareto-optimal (i.e., each player would be better off if the group behaved differently).

negentropy. Negative entropy. A useful concept because it allows one to think of thermodynamic regularity as a limited resource one can possess and make use of, rather than as a mere absence of entropy.

Newcomb’s Problem. A central problem in decision theory. Imagine an agent that understands psychology well enough to predict your decisions in advance, and decides to either fill two boxes with money, or fill one box, based on their prediction. They put $1,000 in a transparent box no matter what, and they then put $1 million in an opaque box if (and only if) they predicted that you’d only take the opaque box. The predictor tells you about this, and then leaves. Which do you pick?

If you take both boxes ("two-boxing"), you get only the $1000, because the predictor foresaw your choice and didn’t fill the opaque box. On the other hand, if you only take the opaque box, you come away with $1 million. So it seems like you should take only the opaque box.

However, causal decision theorists object to this strategy on the grounds that you can’t causally control what the predictor did in the past; the predictor has already made their decision by the time you make yours, and regardless of whether or not they placed the $1 million in the opaque box, you’ll be throwing away a free $1000 if you choose not to take it. For the same reason, causal decision theory prescribes defecting in one-shot Prisoner’s Dilemmas, even if you’re playing against a perfect atom-by-atom copy of yourself.

nonmonotonic logic. See “monotonic logic.”

normalization. Adjusting values to meet some common standard or constraint, often by adding or multiplying a set of values by a constant. E.g., adjusting the probabilities of hypotheses to sum to 1 again after eliminating some hypotheses. If the only three possibilities are A, B, and C, each with probability 1/3, then evidence that ruled out C (and didn’t affect the relative probability of A and B) would leave us with A at 1/3 and B at 1/3. These values must be adjusted (normalized) to make the space of hypotheses sum to 1, so A and B change to probability 1/2 each.

normative. Good, or serving as a standard for desirable behavior.

NP-complete. The hardest class of decision problems within the class NP, where NP consists of the problems that an ideal computer (specifically, a deterministic Turing machine) could efficiently verify correct answers to. The difficulty of NP-complete problems is such that if an algorithm were discovered to efficiently solve even one NP-complete problem, that algorithm would allow one to efficiently solve every NP problem. Many computer scientists hypothesize that this is impossible, a conjecture called “P ≠ NP.”

null-op. A null operation; an action that does nothing in particular.

O

object level. The level of concrete things, as contrasted with the "meta" level. The object level tends to be a base case or starting point, while the meta level is comparatively abstract, recursive, or indirect in relevance.

Occam’s Razor. The principle that, all else being equal, a simpler claim is more probable than a relatively complicated one. Formalizations of Occam’s Razor include Solomonoff induction and the Minimum Message Length Principle.

odds ratio. A way of representing how likely two events are relative to each other. E.g., if I have no information about which day of the week it is, the odds are 1:6 that it’s Sunday. This is the same as saying that "it's Sunday" has a prior probability of 1/7. If x:y is the odds ratio, the probability of x is x / (x + y).

Likewise, to convert a probability p into an odds ratio, I can just write p : (1 - p). For a percent probability p%, this becomes p : (100 - p). E.g., if my probability of winning a race is 40%, my odds are 40:60, which can also be written 2:3.

Odds ratios are useful because they're usually the easiest way to calculate a Bayesian update. If I notice the mall is closing early, and that’s twice as likely to happen on a Sunday as it is on a non-Sunday (a likelihood ratio of 2:1), I can simply multiply the left and right sides of my prior it’s Sunday (1:6) by the evidence’s likelihood ratio (2:1) to arrive at a correct posterior probability of 2:6, or 1:3.

Omega. A hypothetical arbitrarily powerful agent used in various thought experiments.

one-boxing. Taking only the opaque box in Newcomb's Problem.

ontology. An account of the things that exist, especially one that focuses on their most basic and general similarities. Things are "ontologically distinct" if they are of two fundamentally different kinds.

opportunity cost. The value lost from choosing not to acquire something valuable. If I choose not to make an investment that would have earned me $10, I don’t literally lose $10 -- if I had $100 at the outset, I’ll still have $100 at the end, not $90. Still, I pay an opportunity cost of $10 for missing a chance to gain something I want. I lose $10 relative to the $110 I could have had. Opportunity costs can result from making a bad decision, but they also occur when you make a good decision that involves sacrificing the benefits of inferior options for the different benefits of a superior option. Many forms of human irrationality involve assigning too little importance to opportunity costs.

optimization process. Yudkowsky’s term for a process that performs searches through a large search space, and manages to hit very specific targets that would be astronomically unlikely to occur by chance.

E.g., the existence of trees is much easier to understand if we posit a search process, evolution, that iteratively comes up with better and better solutions to cognitively difficult problems. A well-designed dam, similarly, is easier to understand if we posit an optimization process searching for designs or policies that meet some criterion. Evolution, humans, and beavers all share this property, and can therefore be usefully thought of as optimization processes. In contrast, the processes that produce mountains and stars are easiest to describe in other terms.

orthogonality. The independence of two (or more) variables. If two variables are orthogonal, then knowing the value of one doesn't help you learn the value of the other.

P

P ≠ NP. A widely believed conjecture in computational complexity theory. NP is the class of mathematically specifiable questions with input parameters (e.g., “can a number list A be partitioned into two number lists B and C whose numbers sum to the same value?”) such that one could always in principle efficiently confirm that a correct solution to some instance of the problem (e.g., “the list {3,2,7,3,5} splits up into the lists {3,2,5} and {7,3}, and the latter two lists sum to the same number”) is in fact correct. More precisely, NP is the class of decision problems that a deterministic Turing machine could verify answers to in a polynomial amount of computing time. P is the class of decision problems that one could always in principle efficiently solve -- e.g., given {3,2,7,3,5} or any other list, quickly come up with a correct answer (like “{3,2,5} and {7,3}”) should one exist. Since all P problems are also NP problems, for P to not equal NP would mean that some NP problems are not P problems; i.e., some problems cannot be efficiently solved even though solutions to them, if discovered, could be efficiently verified.

Pareto optimum. A situation in which no one can be made better off without making at least one person worse off.

phase space. A mathematical representation of physical systems in which each axis of the space is a degree of freedom (a property of the system that must be specified independently) and each point is a possible state.

phlogiston. A substance hypothesized in the 17th entity to explain phenomena such as fire and rust. Combustible objects were thought by late alchemists and early chemists to contain phlogiston, which evaporated during combustion.

physicalism. See “materialism.”

Planck units. Natural units, such as the Planck length and the Planck time, representing the smallest physically significant quantized phenomena.

positive bias. Bias toward noticing what a theory predicts you’ll see, instead of noticing what a theory predicts you won’t see.

possible world. A way the world could have been. One can say “there is a possible world in which Hitler won World War II” in place of “Hitler could have won World War II,” making it easier to contrast the features of multiple hypothetical or counterfactual scenarios. Not to be confused with the worlds of the many-worlds interpretation of quantum physics or Max Tegmark's Mathematical Universe Hypothesis, which are claimed (by their proponents) to be actual.

posterior probability. An agent's beliefs after acquiring evidence. Contrasted with its prior beliefs, or priors.

prior probability. An agent’s beliefs prior to acquiring some evidence.

Prisoner’s Dilemma. A game in which each player can choose to either "cooperate" with or "defect" against the other. The best outcome for each player is to defect while the other cooperates; and the worst outcome is to cooperate while the other defects. Each player views mutual cooperation as the second-best option, and mutual defection as the second-worst.

Traditionally, game theorists have argued that defection is always the correct move in one-shot dilemmas; it improves your reward if the other player independently cooperates, and it lessens your loss if the other player independently defects.

Yudkowsky is one of a minority of decision theorists who argue that rational cooperation is possible in the one-shot Prisoner's Dilemma, provided the two players' decision-making is known to be sufficiently similar. "My opponent and I are both following the same decision procedure, so if I cooperate, my opponent will cooperate too; and if I defect, my opponent will defect. The former seems preferable, so this decision procedure hereby outputs `cooperate."

probability amplitude. See “amplitude.”

probability distribution. A function which assigns a probability (i.e., a number representing how likely something is to be true) to every possibility under consideration. Discrete and continuous probability distributions are generally encoded by, respectively, probability mass functions and probability density functions.

Thinking of probability as a "mass" that must be divided up between possibilities can be a useful way to keep in view that reducing the probability of one hypothesis always requires increasing the probability of others, and vice versa. Probability, like (classical) mass, is conserved.

probability theory. The branch of mathematics concerned with defining statistical truths and quantifying uncertainty.

problem of induction. In philosophy, the question of how we can justifiably assert that the future will resemble the past without relying on evidence that presupposes that very fact.

Q

quark. An elementary particle of matter.

quine. A program that outputs its own source code.

R

rationalist. (a) Related to rationality. (b) A person who tries to apply rationality concepts to their real-world decisions.

rationality. Making systematically good decisions (instrumental rationality) and achieving systematically accurate beliefs (epistemic rationality).

reductio ad absurdum. Refuting a claim by showing that it entails a claim that is more obviously false.

reduction. An explanation of a phenomenon in terms of its origin or parts, especially one that allows you to redescribe the phenomenon without appeal to your previous conception of it.

reductionism. (a) The practice of scientifically reducing complex phenomena to simpler underpinnings. (b) The belief that such reductions are generally possible.

representativeness heuristic. A cognitive heuristic where one judges the probability of an event based on how well it matches some mental prototype.

Ricardo’s Law of Comparative Advantage. See “comparative advantage.”

S

satori. In Zen Buddhism, a non-verbal, pre-conceptual apprehension of the ultimate nature of reality.

Schrödinger equation. A fairly simple partial differential equation that defines how quantum wavefunctions evolve over time. This equation is deterministic; it is not known why the Born rule, which converts the wavefunction into an experimental prediction, is probabilistic, though there have been many attempts to make headway on that question.

scope insensitivity. A cognitive bias where people tend to disregard the size of certain phenomena.

screening off. Making something evidentially irrelevant. A piece of evidence A screens off a piece of evidence B from a hypothesis C if, once you know about A, learning about B doesn’t affect the probability of C.

search tree. A graph with a root node that branches into child nodes, which can then either terminate or branch once more. The tree data structure is used to locate values; in chess, for example, each node can represent a move, which branches into the other player’s possible responses, and searching the tree is intended to locate winning sequences of moves.

self-anchoring. Anchoring to oneself. Treating one’s own qualities as the default, and only weakly updating toward viewing others as different when given evidence of differences.

Shannon entropy. See “entropy.”

Shannon mutual information. See “mutual information.”

Simulation Hypothesis. The hypothesis that the world as we know it is a computer program designed by some powerful intelligence.

Singularity. One of several scenarios in which artificial intelligence systems surpass human intelligence in a large and dramatic way.

skyhook. An attempted explanation of a complex phenomenon in terms of a deeply mysterious or miraculous phenomenon -- often one of even greater complexity.

Solomonoff induction. An attempted definition of optimal (albeit computationally unfeasible) inference. Bayesian updating plus a simplicity prior that assigns less probability to percept-generating programs the longer they are.

stack trace. A retrospective step-by-step report on a program's behavior, intended to reveal the source of an error.

statistical bias. A systematic discrepancy between the expected value of some measure, and the true value of the thing you're measuring.

superintelligence. Something vastly smarter than present-day humans. This can be a predicted future technology, like smarter-than-human AI; or it can be a purely hypothetical agent, such as Omega or Laplace's Demon.

System 1. The processes behind the brain’s fast, automatic, emotional, and intuitive judgments.

System 2. The processes behind the brain’s slow, deliberative, reflective, and intellectual judgments.

Szilárd engine. See “Maxwell’s Demon.”

T

Taboo. A game by Hasbro where you try to get teammates to guess what word you have in mind while avoiding conventional ways of communicating it. Yudkowsky uses this as an analogy for the rationalist skill of linking words to the concrete evidence you use to decide when to apply them. Ideally, one should be know what one is saying well enough to paraphrase the message in several different ways, and to replace abstract generalizations with concrete observations.

Tegmark world. A universe contained in a vast multiverse of mathematical objects. The idea comes from Max Tegmark's Mathematical Universe Hypothesis, which holds that our own universe is a mathematical object contained in an ensemble in which all possible computable structures exist.

terminal value. A goal that is pursued for its own sake, and not just to further some other goal.

Tit for Tat. A strategy in which one cooperates on the first round of an Iterated Prisoner’s Dilemma, then on each subsequent rounds mirrors what the opponent did the previous round.

Traditional Rationality. Yudkowsky’s term for the scientific norms and conventions espoused by thinkers like Richard Feynman, Carl Sagan, and Charles Peirce. Yudkowsky contrasts this with the ideas of rationality in contemporary mathematics and cognitive science.

transhuman. (a) Entities that are human-like, but much more capable than ordinary biological humans. (b) Related to radical human enhancement. Transhumanism is the view that humans should use technology to radically improve their lives—e.g., curing disease or ending aging.

truth-value. A proposition’s truth or falsity.

Turing-computability. The ability to be executed, at least in principle, by a simple process following a finite set of rules. "In principle" here means that a Turing machine could perform the computation, though we may lack the time or computing power to build a real-world machine that does the same. Turing-computable functions cannot be computed by all Turing machines, but they can be computed by some. In particular, they can be computed by all universal Turing machines.

Turing machine. An abstract machine that follows rules for manipulating symbols on an arbitrarily long tape.

two-boxing. Taking both boxes in Newcomb's Problem.

U

Unfriendly AI. A hypothetical smarter-than-human artificial intelligence that causes a global catastrophe by pursuing a goal without regard for humanity’s well-being. Yudkowsky predicts that superintelligent AI will be “Unfriendly” by default, unless a special effort goes into researching how to give AI stable, known, humane goals. Unfriendliness doesn’t imply malice, anger, or other human characteristics; a completely impersonal optimization process can be “Unfriendly” even if its only goal is to make paperclips. This is because even a goal as innocent as ‘maximize the expected number of paperclips’ could motivate an AI to treat humans as competitors for physical resources, or as threats to the AI’s aspirations.

uniform probability distribution. A distribution in which all events have equal probability; a maximum-entropy probability distribution.

universal Turing machine. A Turing machine that can compute all Turing-computable functions. If something can be done by any Turing machine, then it can be done by every universal Turing machine. A system that can in principle do anything a Turing machine could is called “Turing-complete."

updating. Revising one’s beliefs. See also "Bayesian updating."

utilitarianism. An ethical theory asserting that one should act in whichever causes the most benefit to people, minus how much harm results. Standard utilitarianism argues that acts can be justified even if they are morally counter-intuitive and harmful, provided that the benefit outweighs the harm.

utility function. A function that ranks outcomes by "utility," i.e., by how well they satisfy some set of goals or constraints. Humans are limited and imperfect reasoners, and don't consistently optimize any endorsed utility function; but the idea of optimizing a utility function helps us give formal content to "what it means to pursue a goal well," just as Bayesian updating helps formalize "what it means to learn well."

utilon. Yudkowsky’s name for a unit of utility, i.e., something that satisfies a goal. The term is deliberately vague, to permit discussion of desired and desirable things without relying on imperfect proxies such as monetary value and self-reported happiness.

V

W

wavefunction. A complex-valued function used in quantum mechanics to explain and predict the wave-like behavior of physical systems at small scales. Realists about the wavefunction treat it as a good characterization of the way the world really is, more fundamental than earlier (e.g., atomic) models. Anti-realists disagree, although they grant that the wavefunction is a useful tool by virtue of its mathematical relationship to observed properties of particles (the Born rule).

wu wei. “Non-action.” The concept, in Daoism, of effortlessly achieving one’s goals by ceasing to strive and struggle to reach them.

X

XML. Extensible Markup Language, a system for annotating texts with tags that can be read both by a human and by a machine.

Z

ZF. The Zermelo–Fraenkel axioms, an attempt to ground standard mathematics in set theory. ZFC (the Zermelo–Fraenkel axioms supplemented with the Axiom of Choice) is the most popular axiomatic set theory.

zombie. In philosophy, a perfect atom-by-atom replica of a human that lacks a human’s subjective awareness. Zombies behave exactly like humans, but they lack consciousness. Some philosophers argue that the idea of zombies is coherent -- that zombies, although not real, are at least logically possible. They conclude from this that facts about first-person consciousness are logically independent of physical facts, that our world breaks down into both physical and nonphysical components. Most philosophers reject the idea that zombies are logically possible, though the topic continues to be actively debated.