New Comment
129 comments, sorted by Click to highlight new comments since: Today at 1:32 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

The human brain does not start out as an efficient reasoning machine, plausible or deductive. This is something which we require years to learn, and a person who is an expert in one field of knowledge may do only rather poor plausible reasoning in another. What is happening in the brain during this learning process?

Education could be defined as the process of becoming aware of more and more propositions, and of more and more logical relationships between them. Then it seems natural to conjecture that a small child reasons on a lattice of very open structure: large parts of it are not interconnected at all. For example, the association of historical events with a time sequence is not automatic; the writer has had the experience of seeing a child, who knew about ancient Egypt and had studied pictures of the treasures from the tomb of Tutankhamen, nevertheless coming home from school with a puzzled expression and asking: ‘Was Abraham Lincoln the first person?’

It had been explained to him that the Egyptian artifacts were over 3000 years old, and that Abraham Lincoln was alive 120 years ago; but the meaning of those statements had not registered in his mind. This makes us wonder whether

... (read more)

Minor spoilers for planecrash (Book 3).

Keltham's Governance Lecture

Keltham was supposed to start by telling them all to use their presumably-Civilization-trained skill of 'perspective-taking-of-ignorance' to envision a hypothetical world where nothing resembling Coordination had started to happen yet.  Since, after all, you wouldn't want your thoughts about the best possible forms of Civilization to 'cognitively-anchor' on what already existed.

You can imagine starting in a world where all the same stuff and technology from present Civilization exists, since the question faced is what form of Governance is best-suited to a world like that one.  Alternatively, imagine an alternative form of the exercise involving people fresh-born into a fresh world where nothing has yet been built, and everybody's just wandering around over a grassy plain.

Either way, you should assume that everybody knows all about decision theory and cooperation-defection dilemmas.  The question being asked is not 'What form of Governance would we invent if we were stupid?'

Civilization could then begin - maybe it wouldn't actually happen exactly that way, but it is nonetheless said as though in stori

... (read more)

A decent handle for rationalism is 'apolitical consequentialism.'

'Apolitical' here means avoiding playing the whole status game of signaling fealty to a political tribe and winning/losing status as that political tribe wins/loses status competitions. 'Consequentialism' means getting more of what you want, whatever that is.

I think having answers for political questions is compatible and required by rationalism. Instead of 'apolitical' consequentialism I would advise any of the following which mean approximately the same things as each other: • politically subficial consequentialism (as opposed to politically superficial consequentialism; instead of judging things on whether they appear to be in line with a political faction, which is superficial, rationalists aspire to have deeper and more justified standards for solving political questions) • politically impartial consequentialism • politically meritocratic consequentialism  • politically individuated consequentialism • politically open-minded consequentialism • politically human consequentialism (politics which aim to be good by the metric of human values, shared as much as possible by everyone, regardless of politics) • politically omniscient consequentialism (politics which aim to be good by the metric of values that humans would have if they had full, maximally objection-solved information on every topic, especially topics of practical philosophy)
3David Udell1y
I agree that rationalism involves the (advanced rationalist) [,on%20an%20issue.] skills of instrumentally routing through relevant political challenges to accomplish your goals … but I'm not sure any of those proposed labels captures that well. I like "apolitical" because it unequivocally states that you're not trying to slogan-monger for a political tribe, and are naively, completely, loudly, and explicitly opting out of that status competition and not secretly fighting for the semantic high-ground in some underhanded way (which is more typical political behavior, and is thus expected). "Meritocratic," "humanist," "humanitarian," and maybe "open-minded" are all shot for that purpose, as they've been abused by political tribes in the ongoing culture war (and in previous culture wars, too; our era probably isn't too special in this regard) and connotate allegiance to some political tribes over others. What I really want is an adjective that says "I'm completely tapping out of that game."
The problem is that whenever well meaning people come up with such an adjective, the people who are, in fact, not "completely tapping out of that game" quickly begin to abuse it until it loses meaning.  Generally speaking, tribalized people have an incentive to be seen as unaffiliated as possible. Being seen as a rational, neutral observer lends your perspective more credibility.
4Rana Dexsin1y
“apolitical” has indeed been turned into a slur around “you're just trying to hide that you hate change” or “you're just trying to hide the evil influences on you” (or something else vaguely like those) in a number of places.

Minor spoilers from mad investor chaos and the woman of asmodeus (planecrash Book 1) and Peter Watt's Echopraxia.

"Suppose everybody in a dath ilani city woke up one day with the knowledge mysteriously inserted into their heads, that their city had a pharaoh who was entitled to order random women off the street into his - cuddling chambers? - whether they liked that or not.  Suppose that they had the false sense that things had always been like this for decades.  It wouldn't even take until whenever the pharaoh first ordered a woman, for her to go "Wait why am I obeying this order when I'd rather not obey it?"  Somebody would be thinking about city politics first thing when they woke up in the morning and they'd go "Wait why we do we have a pharaoh in the first place" and within an hour, not only would they not have a pharaoh, they'd have deduced the existence of the memory modification because their previous history would have made no sense, and then the problem would escalate to Exception Handling and half the Keepers on the planet would arrive to figure out what kind of alien invasion was going on.  Is the source of my confusion - at all clear here?"

"You think

... (read more)
1David Udell1y
I don't get the relevance of the scenario. Is the idea that there might be many such other rooms with people like me, and that I want to coordinate with them (to what end?) using the Schelling points in the night sky? I might identify Schelling points using what celestial objects seem to jump out to me on first glance, and see which door of the two that suggests -- reasoning that others will reason similarly. I don't get what we'd be coordinating to do here, though.

We've all met people who are acting as if "Acquire Money" is a terminal goal, never noticing that money is almost entirely instrumental in nature. When you ask them "but what would you do if money was no issue and you had a lot of time", all you get is a blank stare.

Even the LessWrong Wiki entry on terminal values describes a college student for which university is instrumental, and getting a job is terminal. This seems like a clear-cut case of a Lost Purpose: a job seems clearly instrumental. And yet, we've all met people who act as if "Have a Job" is a terminal value, and who then seem aimless and undirected after finding employment …

You can argue that Acquire Money and Have a Job aren't "really" terminal goals, to which I counter that many people don't know their ass from their elbow when it comes to their own goals.

--Nate Soares, "Dark Arts of Rationality"

Why does politics strike rationalists as so strangely shaped? Why does rationalism come across as aggressively apolitical to smart non-rationalists?

Part of the answer: Politics is absolutely rife with people mixing their ends with their means and vice versa. It's pants-on-head confused, from a rationalist perspective, to be ul... (read more)

I often wonder if this framing (with which I mostly agree) is an example of typical mind fallacy.  The assumption that many humans are capable of distinguishing terminal from instrumental goals, or in having terminal goals more abstract than "comfort and procreation", is not all that supported by evidence. In other words, politicized debates DO rub you the wrong way, but on two dimensions - first, that you're losing, because you're approaching them from a different motive than your opponents.  And second that it reveals not just a misalignment with fellow humans in terminal goals, but an alien-ness in the type of terminal goals you find reasonable.

Yudkowsky has sometimes used the phrase "genre savvy" to mean "knowing all the tropes of reality."

For example, we live in a world where academia falls victim to publishing incentives/Goodhearting, and so academic journals fall short of what people with different incentives would be capable of producing. You'd be failing to be genre savvy if you expected that when a serious problem like AGI alignment rolled around, academia would suddenly get its act together with a relatively small amount of prodding/effort. Genre savvy actors in our world know what academia is like, and predict that academia will continue to do its thing in the future as well.

Genre savviness is the same kind of thing as hard-to-communicate-but-empirically-validated expert intuitions. When domain experts have some feel for what projects might pan out and what projects certainly won't but struggle to explain their reasoning in depth, the most they might be able to do is claim that that project is just incompatible with the tropes of their corner of reality, and point to some other cases.

How is "genre savviness" different from "outside view" or "reference class forecasting"?
1David Udell1y
I think they're all the same thing: recognizing patterns in how a class of phenomena pan out.

“What is the world trying to tell you?”

I've found that this prompt helps me think clearly about the evidence shed by the generator of my observations.

There's a rationality-improving internal ping I use on myself, which goes, "what do I expect to actually happen, for real?"

This ping moves my brain from a mode where it's playing with ideas in a way detached from the inferred genre of reality, over to a mode where I'm actually confident enough to bet about some outcomes. The latter mode leans heavily on my priors about reality, and, unlike the former mode, looks askance at significantly considering long, conjunctive, tenuous possible worlds.

God dammit people, "cringe" and "based" aren't truth values! "Progressive" is not a truth value! Say true things!

4David Udell1y
I've noticed that people are really innately good at sentiment classification, [] and, by comparison, crap at natural language inference. [] In a typical conversation with ordinary educated people, people will do a lot of the former relative to the latter. My theory of this is that, with sentiment classification and generation, we're usually talking in order to credibly signal and countersignal our competence, virtuous features, and/or group membership, and that humanity has been fine tuned to succeed at this social maneuvering task. At this point, it comes naturally. Success at the object-level-reasoning task was less crucial for individuals in the ancestral environment, and so people, typically, aren't naturally expert at it. What a bad situation to be in, when our species' survival hinges on our competence at object-level reasoning.

Having been there twice, I've decided that the Lightcone offices are my favorite place in the world.  They're certainly the most rationalist-shaped space I've ever been in.

Academic philosophers are better than average at evaluating object-level arguments for some claim. They don't seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.

If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with n... (read more)

Modest spoilers for planecrash (Book 9 -- null action act II).

Nex and Geb had each INT 30 by the end of their mutual war.  They didn't solve the puzzle of Azlant's IOUN stones... partially because they did not find and prioritize enough diamonds to also gain Wisdom 27.  And partially because there is more to thinkoomph than Intelligence and Wisdom and Splendour, such as Golarion's spells readily do enhance; there is a spark to inventing notions like probability theory or computation or logical decision theory from scratch, that is not directly me

... (read more)

Epistemic status: politics, known mindkiller; not very serious or considered.

People seem to have a God-shaped hole in their psyche: just as people banded around religious tribal affiliations, they now, in the contemporary West, band together around political tribal affiliations. Intertribal conflict can be, at its worst, violent, on top of mindkilling. Religious persecution in the UK was one of the instigating causes of British settlers migrating to the American colonies; religious conflict in Europe generally was severe.

In the US, the 1st Amendment legall... (read more)


If you take each of the digits of 153, cube them, and then sum those cubes, you get 153:

1 + 125 + 27 = 153.

For many naturals, if you iteratively apply this function, you'll return to the 153 fixed point. Start with, say, 298:

8 + 729 + 512 = 1,249

1 + 8 + 64 + 729 = 802

512 + 0 + 8 = 516

125 + 1 + 216 = 342

27 + 64 + 8 = 99

729 + 729 = 1,458

1 + 64 + 125 + 512 = 702

343 + 0 + 8 = 351

27 + 125 + 1 = 153

1 + 125 + 27 = 153

1 + 125 + 27 = 153...

These nine fixed points or cycles occur with the following frequencies (1 <= n <= 10e9):
33.3% : (153 → )
29.5% : (371 → )
17.8% : (370 → )
 5.0% : (55 → 250 → 133 → )
 4.1% : (160 → 217 -> 352 → )
 3.8% : (407 → )
 3.1% : (919 → 1459 → )
 1.8% : (1 → )
 1.5% : (136 → 244 → )

No other fixed points or cycles are possible (except 0 → 0, which isn't reachable from any nonzero input) since any number with more than four digits will have fewer digits in the sum of its cubed digits.

A model I picked up from Eric Schwitzgebel.

The humanities used to be highest-status in the intellectual world!

But then, scientists quite visibly exploded fission weapons and put someone on the moon. It's easy to coordinate to ignore some unwelcome evidence, but not evidence that blatant. So, begrudgingly, science has been steadily accorded more and more status, from the postwar period on.

When the sanity waterline is so low, it's easy to develop a potent sense of misanthropy.

Bryan Caplan's writing about many people hating stupid people really affected me on this point. Don't hate, or even resent, stupid people; trade with them! This is a straightforward consequence of Ricardo's comparative advantage theorem. Population averages are overrated; what matters is whether the individual interactions between agents in a population are positive-sum, not where those individual agents fall relative to the population average.

"Ignorant people do not exist."

It's really easy to spend a lot of cognitive cycles churning through bad, misleading ideas generated by the hopelessly confused. Don't do that!

The argument that being more knowledgeable leaves you strictly better off than being ignorant does relies you simply ignoring bad ideas when you spend your cognitive cycles searching for improvements on your working plans. Sometimes, you'll need to actually exercise this "simply ignore it" skill. You'll end up needing to do so more and more, to approach bounded instrumental rationality, the more inadequate civilization around you is and the lower its sanity waterline.

2David Udell8mo
I hereby confer on you, reader, the shroud of epistemic shielding from predictably misleading statements. It confers irrevocable, invokable protection from having to think about predictably confused claims ever again. Take those cognitive cycles saved, and spend them well!

You sometimes misspeak... and you sometimes misthink. That is, sometimes your cognitive algorithm a word, and the thought that seemed so unimpeachably obvious in your head... is nevertheless false on a second glance.

Your brain is a messy probabilistic system, so you shouldn't expect its cognitive state to ever perfectly track the state of a distant entity.

I find this funny. I don't know about your brain, but mine sometimes produces something closely resembling noise similar to dreams (admittedly more often in the morning when sleep deprived).
1David Udell9mo
Note that a "distant entity" can be a computation that took place in a different part of your brain! Your thoughts therefore can't perfectly track other thoughts elsewhere in your head -- your whole brain is at all noisy, and so will sometimes distort the information being passed around inside itself.

Policy experiments I might care about if we weren't all due to die in 7 years:

  1. Prediction markets generally, but especially policy prediction markets at the corporate- and U.S. state- levels. The goal would be to try this route to raising the sanity waterline in the political domain (and elsewhere) by incentivizing everyone's becoming more of a policy wonk and less of a tribalist.
  2. Open borders experiments of various kinds in various U.S. states, precluding roads to citizenship or state benefits for migrant workers, and leaving open the possibility of mass de
... (read more)

Become consequentialist enough, and it'll wrap back around to being a bit deontological.

4Daniel Kokotajlo10mo
"The rules say we must be consequentialists, but all the best people are deontologists, and virtue ethics is what actually works." --Yudkowsky, IIRC.
3Daniel Kokotajlo10mo
I think this quote stuck with me because in addition to being funny and wise I think it's actually true, or close enough to true.

A shard is a contextually activated behavior-steering computation. Think of it as a circuit of neurons in your brain that is reinforced by the subcortex, gaining more staying power when positively reinforced and withering away in the face of negative reinforcement. In fact, whatever modulates shard strength in this way is reinforcement/reward. Shards are born when a computation that is currently steering steers into some reinforcement. So shards can only accrete around the concepts currently in a system's world model (presumably, the world model is shared ... (read more)

5Thomas Kwa10mo
I'm pretty skeptical that sophisticated game theory happens between shards in the brain, and also that coalitions between shards are how value preservation in an AI will happen (rather than there being a single consequentialist shard, or many shards that merge into a consequentialist, or something I haven't thought of). To the extent that shard theory makes such claims, they seem to be interesting testable predictions.

My favorite books, ranked!


1. Rationality, Eliezer Yudkowsky

2. Superintelligence, Nick Bostrom

3. The Age of Em, Robin Hanson


1. Permutation City, Greg Egan

2. Blindsight, Peter Watts

3. A Deepness in the Sky, Vernor Vinge

4. Ra, Sam Hughes/qntm

Because your utility function is your utility function, the one true political ideology is clearly Extrapolated Volitionism.

Extrapolated Volitionist institutions are all characteristically "meta": they take as input what you currently want and then optimize for the outcomes a more epistemically idealized you would want, after more reflection and/or study.

Institutions that merely optimize for what you currently want the way you would with an idealized world-model are old hat by comparison!

Since when was politics about just one person?
2David Udell6mo
A multiagent Extrapolated Volitionist institution is something that computes and optimizes for a Convergent Extrapolated Volition, if a CEV exists. Really, though, the above Extrapolated Volitionist institutions do take other people into consideration. They either give everyone the Schelling weight of one vote in a moral parliament, or they take into consideration the epistemic credibility of other bettors as evinced by their staked wealth, or other things like that. Sometimes the relevant interpersonal parameters can be varied, and the institutional designs don't weigh in on that question. The ideological emphasis is squarely on individual considered preferences -- that is the core insight of the outlook. "Have everyone get strictly better outcomes by their lights, probably in ways that surprise them but would be endorsed by them after reflection and/or study."

Bogus nondifferentiable functions

The case most often cited as an example of a nondifferentiable function is derived from a sequence , each of which is a string of isosceles right triangles whose hypotenuses lie on the real axis and have length . As , the triangles shrink to zero size. For any finite , the slope of  is  almost everywhere. Then what happens as ? The limit  is often cited carelessly as a nondifferentiable function. Now it is clear that the limit of the derivativ

... (read more)
2David Udell7mo

Back and Forth

Only make choices that you would not make in reverse, if things were the other way around. Drop out of school if and only if you wouldn't enroll in school from out of the workforce. Continue school if and only if you'd switch over from work to that level of schooling.

Flitting back and forth between both possible worlds can make you less cagey about doing what's overdetermined by your world model + utility function already. It's also part of the exciting rationalist journey of acausally cooperating with your selves in other possible worlds.

It's probably a useful mental technique to consider from both directions, but also consider that choices that appear symmetric at first glance may not actually be symmetric. There are often significant transition costs that may differ in each direction, as well as path dependencies that are not immediately obvious. As such, I completely disagree with the first paragraph of the post, but agree with the general principle of considering such decisions from both directions and thank you for posting it.

Ten seconds of optimization is infinitely better than zero seconds of optimization.

Literal zero seconds of optimization is pretty rare tho (among humans). Your freewheeling impulses come pretty pre-optimized.

Science fiction books have to tell interesting stories, and interesting stories are about humans or human-like entities. We can enjoy stories about aliens or robots as long as those aliens and robots are still approximately human-sized, human-shaped, human-intelligence, and doing human-type things. A Star Wars in which all of the X-Wings were combat drones wouldn’t have done anything for us. So when I accuse something of being science-fiction-ish, I mean bending over backwards – and ignoring the evidence – in order to give basically human-shaped beings a c

... (read more)

Spoilers for planecrash (Book 2).

"Basic project management principles, an angry rant by Keltham of dath ilan, section one:  How to have anybody having responsibility for anything."

Keltham will now, striding back and forth and rather widely gesturing, hold forth upon the central principle of all dath ilani project management, the ability to identify who is responsible for something.  If there is not one person responsible for something, it means nobody is responsible for it.  This is the proverb of dath ilani management.  Are t

... (read more)
Thanks for posting this extract. I find the glowfic format a bit wearing to read, for some reason, and it is these nuggets that I read Planecrash for, when I do. (Although I had no such problem with HPMOR, which I read avidly all the way through.)

What would it mean for a society to have real intellectual integrity?  For one, people would be expected to follow their stated beliefs to wherever they led.  Unprincipled exceptions and an inability or unwillingness to correlate beliefs among different domains would be subject to social sanction.  Valid attempts to persuade would be expected to be based on solid argumentation, meaning that what passes for typical salesmanship nowadays would be considered a grave affront.  Probably something along the lines of punching someone

... (read more)
5David Udell1y
Cf. "there are no atheists in a foxhole." Under stress, it's easy to slip sideways into a world model where things are going better, where you don't have to confront quite so many large looming problems. This is a completely natural human response to facing down difficult situations, especially when brooding over those situations over long periods of time. Similar sideways tugs can come from (overlapping categories) social incentives to endorse a sacred belief of some kind, or to not blaspheme, or to affirm the ingroup attire [] when life leaves you surrounded by a particular ingroup, or to believe what makes you or people like you look good/high status. [] Epistemic dignity is about seeing "slipping sideways" as beneath you. Living in reality is instrumentally beneficial, period. There's no good reason to ever allow yourself to not live in reality. Once you can see something, even dimly, there's absolutely no sense in hiding from that observation's implications. Those subtle mental motions by which we disappear observations we know that we won't like down the memory hole … epistemic dignity is about coming to always and everywhere violently reject these hidings-from-yourself, as a matter of principle. We don't actually have a choice in the matter -- there's no free parameter of intellectual virtue here, that you can form a subjective opinion on. That slipping sideways is undignified is written in the [] very mathematics of inference itself. []
1David Udell1y
Minor spoilers for mad investor chaos and the woman of asmodeus (planecrash Book 1). []

You can usually save a lot of time by skimming texts or just reading pieces of them. But reading a work all the way through uniquely lets you make negative existential claims about its content: only now can you authoritatively say that the work never mentions something.

If you allow the assumption that your mental model of what was said matches what was said, then you don't necessarily need to read all the way through to authoritatively say that the work never mentions something, merely enough that you have confidence in your model. If you don't allow the assumption that your mental model of what was said matches what was said, then reading all the way through is insufficient to authoritatively say that the work never mentions something. (There is a third option here: that your mental model suddenly becomes much better when you finish reading the last word of an argument.)

Past historical experience and brainstorming about human social orders probably barely scratches the possibility space. If the CEV were to weigh in on possible posthuman social orders,[1] optimizing in part for how cool that social order is, I'd bet what it describes blows what we've seen out of the water in terms of cool factor.

  1. ^

    (Presumably posthumans will end up reflectively endorsing interactions with one another of some description.)

One important idea I've picked up from reading Zvi is that, in communication, it's important to buy out the status cost imposed by your claims.

If you're fielding a theory of the world that, as a side effect, dunks on your interlocutor and diminishes their social status, you can work to get that person to think in terms of Bayesian epistemology and not decision theory if you make sure you aren't hurting their social image. You have to put in the unreasonable-feeling work of framing all your claims such that their social status is preserved or fairly increas... (read more)

An Inconsistent Simulated World

I regret to inform you, you are an em inside an inconsistent simulated world. By this, I mean: your world is a slapdash thing put together out of off-the-shelf assets in the near future (presumably right before a singularity eats that simulator Earth).

Your world doesn't bother emulating far-away events in great detail, and indeed, may be messing up even things you can closely observe. Your simulators are probably not tampering with your thoughts, though even that is something worth considering carefully.

What are the flaws you... (read more)

When another article of equal argumentative caliber could have just as easily been written for the negation of a claim, that writeup is no evidence for its claim.

The explicit definition of an ordered pair  is frequently relegated to pathological set theory...

It is easy to locate the source of the mistrust and suspicion that many mathematicians feel toward the explicit definition of ordered pair given above. The trouble is not that there is anything wrong or anything missing; the relevant properties of the concept we have defined are all correct (that is, in accord with the demands of intuition) and all the correct properties are present. The trouble is that the concept has some irreleva

... (read more)
3Alexander Gietelink Oldenziel7mo
Modern type theory mostly solves this blemish of set theory and is highly economic conceptually to boot. Most of the adherence of set theory is historical inertia - though some aspects of coding & presentations is important. Future foundations will improve our understanding on this latter topic. 

Now, whatever  may assert, the fact that  can be deduced from the axioms cannot prove that there is no contradiction in them, since, if there were a contradiction,  could certainly be deduced from them!

This is the essence of the Gödel theorem, as it pertains to our problems. As noted by Fisher (1956), it shows us the intuitive reason why Gödel’s result is true. We do not suppose that any logician would accept Fisher’s simple argument as a proof of the full Gödel theorem; yet for most of us it is more convincing than Gödel’s

... (read more)
The text is slightly in error. It is straightforward to construct a program that is guaranteed to locate an inconsistency if one exists: just have it generate all theorems and stop when it finds an inconsistency. The problem is that it doesn't ever stop if there isn't an inconsistency. This is the difference between decidability and semi-decidability. All the systems covered by Gödel's completeness and incompletness theorems are semi-decidable, but not all are decidable.

What're the odds that we're anywhere close to optimal in any theoretical domain? Where are our current models basically completed, boundedly optimal representations of some part of the universe?

The arguments for theoretical completion are stronger for some domains than others, but in general the odds that we have the best model in any domain are pretty poor, and are outright abysmal in the mindkilling domains.

Is the concept of "duty" the fuzzy shadow cast by the simple mathematical structure of 'corrigibility'?

It's only modestly difficult to train biological general intelligences to defer to even potentially dumber agents. We call these deferential agents "dutybound" -- the sergeants who carry out the lieutenant's direct orders, even when they think they know better; the bureaucrats who never take local opportunities to get rich at the expense of their bureau, even when their higher-ups won't notice; the employees who work hard in the absence of effective overs... (read more)

2David Udell8mo
I note that Eliezer thinks that corrigibility is one currently-impossible-to-instill-in-an-AGI property that humans actually have. The sum total of human psychology... consists of many such impossible-to-instill properties. This is why we should want to accomplish one impossible thing, as our stopgap solution, rather than aiming for all the impossible things at the same time, on our first try at aligning the AGI.
It seems like corrigibility can't be usefully described as acting according to some terminal goal. But AIs are not by default expected utility maximizers in the ontology of the real world, so it could be possible to get them to do the desired thing despite lacking a sensible formal picture of it. I'm guessing some aspects of corrigibility might be about acting according to a whole space of goals (at the same time), which is easier to usefully describe. Some quantilizer-like thing selected to more natural desiderata, acting in a particular way in accordance with a collection of goals. With the space of goals not necessarily thought of as uncertainty about an unknown goal. This is not about being dumb, it's about not actually engaging in planning. Failing in this does require some level of non-dumbness, but not conversely. Unless spontaneous mesa-optimizers all over the place, the cognitive cancer, which probably takes many orders of magnitude above merely not being dumb. So for a start, train the models, not the agent.

Minor spoilers for planecrash (Book 3).

So!  On a few moments' 'first-reflection', it seems to Keltham that estimating the probability of Civilization being run by a Dark Conspiracy boils down to (1) the question of whether Civilization's apparently huge efforts to build anti-Dark-Conspiracy citizens constitute sincere work that makes the Dark Conspiracy's life harder, or fake work designed to only look like that; and (2) the prior probability that the Keepers and Governance would have arrived on the scene already corrupted, during the last major reorg

... (read more)

Non-spoiler quote from planecrash (Book 3).

Nonconformity is something trained in dath ilan and we could not be Law-shaped without that.  If you're conforming to what you were taught, to what other people seem to believe, to what other people seem to want you to believe, to what you think everyone believes, you're not conforming to the Law.

--Eliezer, planecrash

A great symbolic moment for the Enlightenment, and for its project of freeing humanity from needless terrors, occurred in 1752 in Philadelphia. During a thunderstorm, Benjamin Franklin flew a kite with a pointed wire at the end and succeeded in drawing electric sparks from a cloud. He thus proved that lightning was an electrical phenomenon and made possible the invention of the lightning-rod, which, mounted on a high building, diverted the lightning and drew it harmlessly to the ground by means of a wire. Humanity no longer needed to fear fire from heaven.

... (read more)

"You don't need to follow anybody! You've got to think for yourselves. You're all individuals!"

"Yes, we're all individuals!"

"You've all got to work it out for yourselves!"

"Yes! We've got to work it out for ourselves!"


"Tell us more!"

--Monty Python's Life of Brian

Building your own world model is hard work. It can be good intellectual fun, sometimes, but it's often more fun to just plug into the crowd around you and borrow their collective world model for your decision making. Why risk embarrassing yourself going off and doing weird things on your ... (read more)

In another world, in which people hold utterly alien values, I would be thrilled to find a rationalist movement with similar infrastructure and memes. If rationalism/Bayescraft as we know it is on to something about instrumental reasoning, then we should see that kind of instrumental reasoning in effective people with alien values.

Agents that explicitly represent their utility function are potentially vulnerable to sign flips.

What sorts of AI designs could not be made to pursue a flipped utility function via perturbation in one spot? One quick guess: an AI that represents its utility function in several places and uses all of those representations to do error correction, only pursuing the error corrected utility function.

Just a phrasing/terminology nitpick: I think this applies to agents with externally-imposed utility functions.  If an agent has a "natural" or "intrinsic" utility function which it publishes explicitly (and does not accept updates to that explicit form), I think the risk of bugs in representation does not occur.

A huge range of utility functions should care about alignment! It's in the interest of just about everyone to survive AGI.

I'm going to worry less about hammering out value disagreement with people in the here and now, and push this argument on them instead. We'll hammer out our value disagreements in our CEV, and in our future (should we save it).

There's a very serious chicken-and-egg problem when you talk about what a utility function SHOULD include, as opposed to what it does.  You need a place OUTSIDE of the function to have preferences about what the function is. If you just mean "I wish more humans shared my values on the topic of AGI x-risk", that's perfectly reasonable, but trivial.  That's about YOUR utility function, and the frustration you feel at being an outlier.
1David Udell1y
Ah, yeah, I didn't mean to say that others' utility functions should, by their own lights, be modified to care about alignment. I meant that instrumentally, their utility functions already value surviving AGI highly. I'd want to show this to them to get them to care about alignment, even if they and I disagree about a lot of other normative things. If someone genuinely, reflectively doesn't care about surviving AGI … then the above just doesn't apply to them, and I won't try to convince them of anything. In their case, we just have fundamental, reflectively robust value-disagreement.
I value not getting trampled by a hippo very highly too, but the likelihood that I find myself near a hippo is low. And my ability to do anything about it is also low.

One of the things that rationalism has noticeably done for me (that I see very sharply when I look at high-verbal-ability, non-rationalist peers) is that it's given me the ability to perform socially unorthodox actions on reflection. People generally have mental walls that preclude ever actually doing socially weird things.  If someone's goals would be best served by doing something socially unorthodox, like, e.g., signing up for cryonics or dropping out of a degree), they will usually rationalize that option away in order to stay on script. So for th... (read more)

Two moments of growing in mathematical maturity I remember vividly:

  1. Realizing that equations are claims that are therefore either true or false. Everything asserted with symbols... could just as well be asserted in English. I could start chunking up arbitrarily long and complicated equations between the equals signs, because those equals signs were just the English word "is"!
  2. Learning about the objects that mathematical claims are about. Going from having to look up "Wait, what's a real number again?" to knowing how , and  interrelat
... (read more)

2. The anchor of a major news network donates lots of money to organizations fighting against gay marriage, and in his spare time he writes editorials arguing that homosexuals are weakening the moral fabric of the country. The news network decides they disagree with this kind of behavior and fire the anchor.

a) This is acceptable; the news network is acting within their rights and according to their principles
b) This is outrageous; people should be judged on the quality of their work and not their political beliefs

12. The principal of a private school is a

... (read more)

What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?

Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world's AI trajectory. I'm sure the "savior sequence" exists mathematically, but finding it is a whole different ballgame.

Don't translate your values into just a loss function. Rather, translate them into a loss function and all the rest of a training story. Use all the tools at your disposal in your impossible task; don't tie one hand behind your back by assuming the loss function is your only lever over the AGI's learned values.

"Calling babble and prune the True Name of text generation is like calling bogosort the True Name of search."

In the 1920s when  and CL began, logicians did not automatically think of functions as sets of ordered pairs, with domain and range given, as mathematicians are trained to do today. Throughout mathematical history, right through to computer science, there has run another concept of function, less precise at first but strongly influential always; that of a function as an operation-process (in some sense) which may be applied to certain objects to produce other objects. Such a process can be defined by giving a set of rules describing how it acts

... (read more)
3Alexander Gietelink Oldenziel5mo
There is a third important aspect of functions-in-the-original-sense that distinguishes them from extensional functions (i.e. collection of input-output pairs): effects. Describing these 'intensional' features is an active area of research in theoretical CS. One important thread here is game semantics; you might like to take a look: []

Complex analysis is the study of functions of a complex variable, i.e., functions  where  and  lie in . Complex analysis is the good twin and real analysis the evil one: beautiful formulas and elegant theorems seem to blossom spontaneously in the complex domain, while toil and pathology rule in the reals. Nevertheless, complex analysis relies more on real analysis than the other way around.

--Pugh, Real Mathematical Analysis (p. 28)

Switching costs between different kinds of work can be significant. Give yourself permission to focus entirely on one kind of work per Schelling unit of time (per day), if that would help. Don't spend cognitive cycles feeling guilty about letting some projects sit on the backburner; the point is to get where you're going as quickly as possible, not to look like you're juggling a lot of projects at once.

This can be hard, because there's a conventional social expectation that you'll juggle a lot of projects simultaneously, maybe because that's more legible t... (read more)

Stress and time-to-burnout are resources to be juggled, like any other.

Social niceties and professionalism act as a kind of 'communications handshake' in ordinary society -- maybe because they're still a credible correlate of having your act together enough to be worth considering your outputs in the first place?

Large single markets are (pretty good) consequentialist engines. Run one of these for a while, and you can expect significantly improving outcomes inside of that bloc, by the lights of the entities participating in that single market.

Reflexively check both sides of the proposed probability of an event:

"What do I think about P(DOOM) = 81%?"


"What do I think about P(~DOOM) = 19%?"

This can often elicit feedback from parts of you that would stay silent if you only considered one way of stating the probability in question.

I've noticed that part of me likes to dedicate disproportionate cognitive cycles to the question: "If you surgically excised all powerful AI from the world, what political policies would be best to decree, by your own lights?"

The thing is, we live in a world with looming powerful AI. It's at least not consequentialist to spend a bunch of cognitive cycles honing your political views for a world we're not in. I further notice that my default justification for thinking about sans-AI politics a lot is consequentialist... so something's up here. I think some pa... (read more)

Fancy epistemic tools won't override the basics of good epistemics:

You are embedded in a 3D spatial world, progressing in a time dimension. You want to get better at predicting events in advance, so you want to find the underlying generator for this 3D world's events. This means that you're rooting around in math space, trying to find the mathematical object that your observational trajectory is embedded in.

Some observations of yours are differentially more likely in some math objects than in others, and so it's more likely that your world is the former ma... (read more)

Try pinging yourself:

What's overdetermined by what you already know?

Minor spoilers for planecrash (Book 3.1).

"Does the distinction between understanding and improving correspond to the distinction between the Law of Probability and the Law of Utility?  It sounds like it should."

"Sensible question, but no, not exactly.  Probability is something like a separable core that lies at the heart of Probable Utility.  The process of updating our beliefs, once we have the evidence, is something that in principle doesn't depend at all on what we want - the way reality is is something defined independently of anyth

... (read more)
5David Udell10mo
Minor spoilers for planecrash (Book 3.1). [] KELTHAM EXPLAINS MODEL ERROR []
Here, Eliezer seems to be talking about more specified versions of a not-fully specified hypothesis (case 1): Here, Eliezer seems to be talking about hypotheses that aren't subhypotheses of an existing hypothesis (case 2): Eliezer's approach is: For subhypotheses (case 1), we aren't actually considering these further features yet, so this seems true but not in a particularly exciting way. I think it is rare for a hypothesis to truly lie outside of all existing hypotheses, because you can have very underspecified meta-hypotheses that you will implicitly be taking into account even if you don't enumerate them.  (examples of vague meta-hypotheses: supernatural vs natural, realism vs. solipsism, etc). And of course there are varying levels of vagueness from very narrow to very broad. But, OK, within these vague meta-hypotheses the true hypothesis is still often not a subhypothesis of any of your more specified hypotheses (case 2). A number for the probability of this happening might be hard to pin down, and in order to actually obtain instrumental value from this probability assignment, or to make a Bayesian adjustment of it, you need a prior for what happens in the world where all your specific hypotheses are false.  But, you actually do have such priors and relevant information as to the probability! Eliezer mentions: This is relevant data. Note also that the expectation that all of your hypotheses will score lower than promised if they are all false is, in itself, a prior on the predictions of the 'all-other-hypotheses' hypothesis. Likewise, when you do the adjustments mentioned in Eliezer's last paragraph, you will do some specific amount of adjustment, and that specific adjustment amount will depend on an implicit value for the probability of the 'all-other-hypotheses' hypothesis and an implicit prior on its predictions.  In my view, there is no reason in principle that these priors and probabilities cannot be quantified. To be sure, people don't usually

Minor spoilers for planecrash (Book 1) and the dath-ilani-verse generally.

When people write novels about aliens attacking dath ilan and trying to kill all humans everywhere, the most common rationale for why they'd do that is that they want our resources and don't otherwise care who's using them, but, if you want the aliens to have a sympathetic reason, the most common reason is that they're worried a human might break an oath again at some point, or spawn the kind of society that betrays the alien hypercivilization in the future.

--Eliezer, planecrash

3David Udell1y
Minor spoilers for planecrash (Book 3). []

What is rationalism about?

Rationalism is about the real world. It may or may not strike you as an especially internally consistent, philosophically interesting worldview -- this is not what rationality is about. Rationality is about seeing things happen in the real world and then updating your understanding of the world when those things you see surprise you so that they wouldn't surprise you again.

Why care about predicting things in the world well?

Almost no matter what you ultimately care about, being able to predict ahead of time what's going to happen next will make you better at planning for your goal.

2David Udell1y
One central rationalist insight is that thoughts are for guiding actions. Think of your thinking as the connecting tissue sandwiched between the sense data that enters your sense organs and the behaviors your body returns. Your brain is a function from a long sequence of observations (all the sensory inputs you've ever received, in the order you received them) to your next motor output. Understood this way, the point of having a brain and having thoughts is to guide your actions. If your thoughts aren't all ultimately helping you better steer the universe (by your own lights) … they're wastes. Thoughts aren't meant to be causally-closed-off eddies that whirl around in the brain without ever decisively leaving it as actions. They're meant to transform observations into behaviors! This is the whole point of thinking! Notice when your thoughts are just stewing, without going anywhere, without developing into thoughts that'll go somewhere … and let go of those useless thoughts. Your thoughts should cut. [,able%20actually%20to%20cut%20him.%22]
1David Udell1y
If you can imagine a potential worry, then you can generate that worry. Rationalism is, in part, the skill of never being predictably surprised by things you already foresaw. It may be that you need to "wear another hat" in order to pull that worry out of your brain, or to model another person advising you [] to get your thoughts to flow that way, but whatever your process, anything you can generate for yourself is something you can foresee and consider. This aspect of rationalism is the art of "mining out your future cognition," to exactly the extent that you can foresee it, [] leaving whatever's left over a mystery to be updated on new observations.
1David Udell1y
Minor spoilers for mad investor chaos and the woman of asmodeus (planecrash Book 1). [] The citation link in this post takes you to a NSFW subthread in the story.

Gebron and Eleazar define kabbalah as “hidden unity made manifest through patterns of symbols”, and this certainly fits the bill. There is a hidden unity between the structures of natural history, human history, American history, Biblical history, etc: at an important transition point in each, the symbols MSS make an appearance and lead to the imposition of new laws. Anyone who dismisses this as coincidence will soon find the coincidences adding up to an implausible level.

The kabbalistic perspective is that nothing is a coincidence. We believe that the uni

... (read more)
4David Udell12d
The ML models that now speak English, and are rapidly growing in world-transformative capability, happen to be called transformers. This is not a coincidence because nothing is a coincidence. []

An implication of AI risk is that we, right now, stand at the fulcrum of human history.

Lots of historical people also claimed that they stood at that unique point in history … and were just wrong about it. But my world model also makes that self-important implication (in a specific form), and the meta-level argument for epistemic modesty isn't enough to nudge me off of the fulcrum-of-history view.

If you buy that, it's our overriding imperative to do what we can about it, right now. If we miss this one, ~all of future value evaporates.

For me, the implication of standing at the fulcrum of human history is to…read a lot of textbooks and think about hairy computer science problems.

That seems an odd enough conclusion to make it quite distinct from most other people in human history.

If the conclusion were "go over to those people, hit them on the head with a big rock, and take their women & children as slaves" or "acquire a lot of power", I'd be way more careful.

There exist both merely clever and effectively smarter people.

Merely clever people are good with words and good at rapidly assimilating complex instructions and ideas, but don't seem to maintain and update an explicit world-model, an explicit best current theory-of-everything. The feeling I get watching these people respond to topics and questions is that they respond reflexively, either (1) raising related topics and ideas they've encountered as something similar comes up, or (2) expressing their gut reactions to the topic or idea, or expressing the gut r... (read more)

In the game of chicken, an agent can do better by being the first to precommit to never swerve (say, by conspicuously tossing the steering wheel out of the window). So long as the other agent was slower on the trigger, and sees the first agent's precommitment being credibly made, the first agent will climb up to his best outcome! A smart (and quick) agent can thus shunt that car crash out of his actual future and into some counterfactual future such that the counterfactual crash's shadow favorably influences the way events actually unfold.

A deceptively ali... (read more)

...unless the other agent has already precommitted to not being rational. (What is the advantage of this over just precommitting not to swerve? Precommitting to not be rational can happen even in advance of the game, as it's mainly a property of the agent itself.) (This is one way that you can rationally arrive at irrational agents.)
1David Udell1y
I don't yet know too much about this, but I've heard that updateless decision theories are equivalent to conventional, updateful decision theories (e.g., EDT and CDT) once those theories have made every precommitment they'd want to make. The pattern I was getting at above seems a bit like this: it instrumentally makes sense to commit ahead of time to a policy that maps every possible series of observations to an action and then stick to it, instead of just outputting the locally best action in each situation you stumble into.

In the beginning God created four dimensions. They were all alike and indistinguishable from one another. And then God embedded atoms of energy (photons, leptons, etc.) in the four dimensions. By virtue of their energy, these atoms moved through the four dimensions at the speed of light, the only spacetime speed. Thus, as perceived by any one of these atoms, space contracted in, and only in, the direction of that particular atom's motion. As the atoms moved at the speed of light, space contracted so much in the direction of the atom's motion that the dimen

... (read more)

You can think of chain-of-thought interpretability as the combination of process-based methods with adversarial training.

When you supervised-train an ML model on an i.i.d. dataset that doesn't contain any agent modeling problems, you never strongly incentivize the emergence of mesa-optimizers. You do weakly incentivize the emergence of mesa-optimizers, because mesa-optimizers are generally capable algorithms that might outperform brittle bundles of rote heuristics on many simple tasks.

When you train a model in a path-dependent setting, you do strongly incentivize mesa-optimization. This is because algorithms trained in a path-dependent setting have the opportunity to defend ... (read more)

Unreasonably effective rationality-improving technique:

Spend an hour and a half refactoring your standing political views, by temporarily rolling those political views back to a childhood state from before your first encounter with highly communicable and adaptive memeplexes. Query your then-values, and reason instrumentally from the values you introspect. Finally, take or leave the new views you generate.

If your current political views are well supported, then they should regenerate under this procedure. But if you've mostly been recycling cached thoughts... (read more)

My memories of childhood aren't that precise. I don't really know what my childhood state was? Before certain extremely negative things happened to my psyche, that is. There are only a few scattered pieces I recall, like self-sufficiency and  honesty being important, but these are the parts that already survived into my present political and moral beliefs. The only thing I could actually use is that I was a much more orderly person when I was 4 or 5, but I don't see how it would work to use just that.

The unlovely neologism "agenty" means strategic.

"Agenty" might carry less connotational baggage in exchange for its unsightliness, however. Just like "rational" is understood by a lot of people to  mean, in part, stoical, "strategic" might mean manipulative to a lot of people.

"Thanks for doing your part for humanity!"

"But we're not here to do software engineering -- we're here to save the world."

Because of deception, we don't know how to put a given utility function into a smart agent that has grokked the overall picture of its training environment. Once training finds a smart-enough agent, the model's utility functions ceases to be malleable to us. This suggests that powerful greedy search will find agents with essentially random utility functions.

But, evolution managed to push human values in the rough direction of its own values: inclusive genetic fitness. We don't care about maximizing inclusive genetic fitness, but we do care about having sex... (read more)

The theoretical case for open borders is pretty good. But you might worry a lot about the downside risk of implementing such a big, effectively irreversible (it'd be nigh impossible to deport millions and millions of immigrants) policy change. What if the theory's wrong and the result is catastrophe?

Just like with futarchy, we might first try out a promising policy like open borders at the state level, to see how it goes. E.g., let people immigrate to just one US state with only minimal conditions. Scaling up a tested policy if it works and abandoning it i... (read more)

A semantic externalist once said,
"Meaning just ain't in the head.
Hence a brain-in-a-vat
Just couldn't think that
'Might it all be illusion instead?'"

I thought that having studied philosophy (instead of math or CS) made me an outlier for a rationalist.

But, milling about the Lightcone offices, fully half of the people I've encountered hold some kind of philosophy degree.  "LessWrong: the best philosophy site on the internet."

Some mantras I recall a lot, to help keep on the rationalist straight-and-narrow and not let anxiety get the better of me:

  1. What's more likely to do you in?
  2. Don't let the perfect be the enemy of the good.
  3. Equanimity in the face of small threats to brain and body health buys you peace of mind, with which to better prepare for serious threats to brain and body health.
  4. How have situations like this played out in the past?

Humans, "teetering bulbs of dream and dread," evolved as a generally intelligent patina around the Earth.  We're all the general intelligence the planet has to throw around.  What fraction of that generally intelligent skin is dedicated to defusing looming existential risks?  What fraction is dedicated towards immanentizing the eschaton?


New to LessWrong?