The human brain does not start out as an efficient reasoning machine, plausible or deductive. This is something which we require years to learn, and a person who is an expert in one field of knowledge may do only rather poor plausible reasoning in another. What is happening in the brain during this learning process?
Education could be defined as the process of becoming aware of more and more propositions, and of more and more logical relationships between them. Then it seems natural to conjecture that a small child reasons on a lattice of very open structure: large parts of it are not interconnected at all. For example, the association of historical events with a time sequence is not automatic; the writer has had the experience of seeing a child, who knew about ancient Egypt and had studied pictures of the treasures from the tomb of Tutankhamen, nevertheless coming home from school with a puzzled expression and asking: ‘Was Abraham Lincoln the first person?’
It had been explained to him that the Egyptian artifacts were over 3000 years old, and that Abraham Lincoln was alive 120 years ago; but the meaning of those statements had not registered in his mind. This makes us wonder whether
Keltham was supposed to start by telling them all to use their presumably-Civilization-trained skill of 'perspective-taking-of-ignorance' to envision a hypothetical world where nothing resembling Coordination had started to happen yet. Since, after all, you wouldn't want your thoughts about the best possible forms of Civilization to 'cognitively-anchor' on what already existed.
You can imagine starting in a world where all the same stuff and technology from present Civilization exists, since the question faced is what form of Governance is best-suited to a world like that one. Alternatively, imagine an alternative form of the exercise involving people fresh-born into a fresh world where nothing has yet been built, and everybody's just wandering around over a grassy plain.
Either way, you should assume that everybody knows all about decision theory and cooperation-defection dilemmas. The question being asked is not 'What form of Governance would we invent if we were stupid?'
Civilization could then begin - maybe it wouldn't actually happen exactly that way, but it is nonetheless said as though in stori
A decent handle for rationalism is 'apolitical consequentialism.'
'Apolitical' here means avoiding playing the whole status game of signaling fealty to a political tribe and winning/losing status as that political tribe wins/loses status competitions. 'Consequentialism' means getting more of what you want, whatever that is.
I think having answers for political questions is compatible and required by
rationalism. Instead of 'apolitical' consequentialism I would advise any of the
following which mean approximately the same things as each other:
• politically subficial consequentialism (as opposed to politically superficial
consequentialism; instead of judging things on whether they appear to be in line
with a political faction, which is superficial, rationalists aspire to have
deeper and more justified standards for solving political questions)
• politically impartial consequentialism
• politically meritocratic consequentialism
• politically individuated consequentialism
• politically open-minded consequentialism
• politically human consequentialism (politics which aim to be good by the
metric of human values, shared as much as possible by everyone, regardless of
politics)
• politically omniscient consequentialism (politics which aim to be good by the
metric of values that humans would have if they had full, maximally
objection-solved information on every topic, especially topics of practical
philosophy)
3David Udell1y
I agree that rationalism involves the (advanced rationalist)
[https://www.lesswrong.com/s/3ELrPerFTSo75WnrH/p/9weLK2AJ9JEt2Tt8f#:~:text=If%20you%20want,on%20an%20issue.]
skills of instrumentally routing through relevant political challenges to
accomplish your goals … but I'm not sure any of those proposed labels captures
that well.
I like "apolitical" because it unequivocally states that you're not trying to
slogan-monger for a political tribe, and are naively, completely, loudly, and
explicitly opting out of that status competition and not secretly fighting for
the semantic high-ground in some underhanded way (which is more typical
political behavior, and is thus expected). "Meritocratic," "humanist,"
"humanitarian," and maybe "open-minded" are all shot for that purpose, as
they've been abused by political tribes in the ongoing culture war (and in
previous culture wars, too; our era probably isn't too special in this regard)
and connotate allegiance to some political tribes over others.
What I really want is an adjective that says "I'm completely tapping out of that
game."
7lc1y
The problem is that whenever well meaning people come up with such an adjective,
the people who are, in fact, not "completely tapping out of that game" quickly
begin to abuse it until it loses meaning.
Generally speaking, tribalized people have an incentive to be seen as
unaffiliated as possible. Being seen as a rational, neutral observer lends your
perspective more credibility.
4Rana Dexsin1y
“apolitical” has indeed been turned into a slur around “you're just trying to
hide that you hate change” or “you're just trying to hide the evil influences on
you” (or something else vaguely like those) in a number of places.
"Suppose everybody in a dath ilani city woke up one day with the knowledge mysteriously inserted into their heads, that their city had a pharaoh who was entitled to order random women off the street into his - cuddling chambers? - whether they liked that or not. Suppose that they had the false sense that things had always been like this for decades. It wouldn't even take until whenever the pharaoh first ordered a woman, for her to go "Wait why am I obeying this order when I'd rather not obey it?" Somebody would be thinking about city politics first thing when they woke up in the morning and they'd go "Wait why we do we have a pharaoh in the first place" and within an hour, not only would they not have a pharaoh, they'd have deduced the existence of the memory modification because their previous history would have made no sense, and then the problem would escalate to Exception Handling and half the Keepers on the planet would arrive to figure out what kind of alien invasion was going on. Is the source of my confusion - at all clear here?"
I don't get the relevance of the scenario.
Is the idea that there might be many such other rooms with people like me, and
that I want to coordinate with them (to what end?) using the Schelling points in
the night sky?
I might identify Schelling points using what celestial objects seem to jump out
to me on first glance, and see which door of the two that suggests -- reasoning
that others will reason similarly. I don't get what we'd be coordinating to do
here, though.
We've all met people who are acting as if "Acquire Money" is a terminal goal, never noticing that money is almost entirely instrumental in nature. When you ask them "but what would you do if money was no issue and you had a lot of time", all you get is a blank stare.
Even the LessWrong Wiki entry on terminal values describes a college student for which university is instrumental, and getting a job is terminal. This seems like a clear-cut case of a Lost Purpose: a job seems clearly instrumental. And yet, we've all met people who act as if "Have a Job" is a terminal value, and who then seem aimless and undirected after finding employment …
You can argue that Acquire Money and Have a Job aren't "really" terminal goals, to which I counter that many people don't know their ass from their elbow when it comes to their own goals.
Why does politics strike rationalists as so strangely shaped? Why does rationalism come across as aggressively apolitical to smart non-rationalists?
Part of the answer: Politics is absolutely rife with people mixing their ends with their means and vice versa. It's pants-on-head confused, from a rationalist perspective, to be ul... (read more)
I often wonder if this framing (with which I mostly agree) is an example of
typical mind fallacy. The assumption that many humans are capable of
distinguishing terminal from instrumental goals, or in having terminal goals
more abstract than "comfort and procreation", is not all that supported by
evidence.
In other words, politicized debates DO rub you the wrong way, but on two
dimensions - first, that you're losing, because you're approaching them from a
different motive than your opponents. And second that it reveals not just a
misalignment with fellow humans in terminal goals, but an alien-ness in the type
of terminal goals you find reasonable.
Yudkowsky has sometimes used the phrase "genre savvy" to mean "knowing all the tropes of reality."
For example, we live in a world where academia falls victim to publishing incentives/Goodhearting, and so academic journals fall short of what people with different incentives would be capable of producing. You'd be failing to be genre savvy if you expected that when a serious problem like AGI alignment rolled around, academia would suddenly get its act together with a relatively small amount of prodding/effort. Genre savvy actors in our world know what academia is like, and predict that academia will continue to do its thing in the future as well.
Genre savviness is the same kind of thing as hard-to-communicate-but-empirically-validated expert intuitions. When domain experts have some feel for what projects might pan out and what projects certainly won't but struggle to explain their reasoning in depth, the most they might be able to do is claim that that project is just incompatible with the tropes of their corner of reality, and point to some other cases.
There's a rationality-improving internal ping I use on myself, which goes, "what do I expect to actually happen, for real?"
This ping moves my brain from a mode where it's playing with ideas in a way detached from the inferred genre of reality, over to a mode where I'm actually confident enough to bet about some outcomes. The latter mode leans heavily on my priors about reality, and, unlike the former mode, looks askance at significantly considering long, conjunctive, tenuous possible worlds.
I've noticed that people are really innately good at sentiment classification,
[https://en.wikipedia.org/wiki/Sentiment_analysis] and, by comparison, crap at
natural language inference. [https://en.wikipedia.org/wiki/Textual_entailment]
In a typical conversation with ordinary educated people, people will do a lot of
the former relative to the latter.
My theory of this is that, with sentiment classification and generation, we're
usually talking in order to credibly signal and countersignal our competence,
virtuous features, and/or group membership, and that humanity has been fine
tuned to succeed at this social maneuvering task. At this point, it comes
naturally. Success at the object-level-reasoning task was less crucial for
individuals in the ancestral environment, and so people, typically, aren't
naturally expert at it. What a bad situation to be in, when our species'
survival hinges on our competence at object-level reasoning.
Having been there twice, I've decided that the Lightcone offices are my favorite place in the world. They're certainly the most rationalist-shaped space I've ever been in.
Academic philosophers are better than average at evaluating object-level arguments for some claim. They don't seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.
If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with n... (read more)
Nex and Geb had each INT 30 by the end of their mutual war. They didn't solve the puzzle of Azlant's IOUN stones... partially because they did not find and prioritize enough diamonds to also gain Wisdom 27. And partially because there is more to thinkoomph than Intelligence and Wisdom and Splendour, such as Golarion's spells readily do enhance; there is a spark to inventing notions like probability theory or computation or logical decision theory from scratch, that is not directly me
Epistemic status: politics, known mindkiller; not very serious or considered.
People seem to have a God-shaped hole in their psyche: just as people banded around religious tribal affiliations, they now, in the contemporary West, band together around political tribal affiliations. Intertribal conflict can be, at its worst, violent, on top of mindkilling. Religious persecution in the UK was one of the instigating causes of British settlers migrating to the American colonies; religious conflict in Europe generally was severe.
In the US, the 1st Amendment legall... (read more)
No other fixed points or cycles are possible (except 0 → 0, which isn't reachable from any nonzero input) since any number with more than four digits will have fewer digits in the sum of its cubed digits.
The humanities used to be highest-status in the intellectual world!
But then, scientists quite visibly exploded fission weapons and put someone on the moon. It's easy to coordinate to ignore some unwelcome evidence, but not evidence that blatant. So, begrudgingly, science has been steadily accorded more and more status, from the postwar period on.
When the sanity waterline is so low, it's easy to develop a potent sense of misanthropy.
Bryan Caplan's writing about many people hating stupid people really affected me on this point. Don't hate, or even resent, stupid people; trade with them! This is a straightforward consequence of Ricardo's comparative advantage theorem. Population averages are overrated; what matters is whether the individual interactions between agents in a population are positive-sum, not where those individual agents fall relative to the population average.
It's really easy to spend a lot of cognitive cycles churning through bad, misleading ideas generated by the hopelessly confused. Don't do that!
The argument that being more knowledgeable leaves you strictly better off than being ignorant does relies you simply ignoring bad ideas when you spend your cognitive cycles searching for improvements on your working plans. Sometimes, you'll need to actually exercise this "simply ignore it" skill. You'll end up needing to do so more and more, to approach bounded instrumental rationality, the more inadequate civilization around you is and the lower its sanity waterline.
I hereby confer on you, reader, the shroud of epistemic shielding from
predictably misleading statements. It confers irrevocable, invokable protection
from having to think about predictably confused claims ever again.
Take those cognitive cycles saved, and spend them well!
You sometimes misspeak... and you sometimes misthink. That is, sometimes your cognitive algorithm a word, and the thought that seemed so unimpeachably obviousin your head... is nevertheless false on a second glance.
Your brain is a messy probabilistic system, so you shouldn't expect its cognitive state to ever perfectly track the state of a distant entity.
I find this funny. I don't know about your brain, but mine sometimes produces
something closely resembling noise similar to dreams (admittedly more often in
the morning when sleep deprived).
1David Udell9mo
Note that a "distant entity" can be a computation that took place in a different
part of your brain! Your thoughts therefore can't perfectly track other thoughts
elsewhere in your head -- your whole brain is at all noisy, and so will
sometimes distort the information being passed around inside itself.
Policy experiments I might care about if we weren't all due to die in 7 years:
Prediction markets generally, but especially policy prediction markets at the corporate- and U.S. state- levels. The goal would be to try this route to raising the sanity waterline in the political domain (and elsewhere) by incentivizing everyone's becoming more of a policy wonk and less of a tribalist.
Open borders experiments of various kinds in various U.S. states, precluding roads to citizenship or state benefits for migrant workers, and leaving open the possibility of mass de
A shard is a contextually activated behavior-steering computation. Think of it as a circuit of neurons in your brain that is reinforced by the subcortex, gaining more staying power when positively reinforced and withering away in the face of negative reinforcement. In fact, whatever modulates shard strength in this way is reinforcement/reward. Shards are born when a computation that is currently steering steers into some reinforcement. So shards can only accrete around the concepts currently in a system's world model (presumably, the world model is shared ... (read more)
I'm pretty skeptical that sophisticated game theory happens between shards in
the brain, and also that coalitions between shards are how value preservation in
an AI will happen (rather than there being a single consequentialist shard, or
many shards that merge into a consequentialist, or something I haven't thought
of).
To the extent that shard theory makes such claims, they seem to be interesting
testable predictions.
A multiagent Extrapolated Volitionist institution is something that computes and
optimizes for a Convergent Extrapolated Volition, if a CEV exists.
Really, though, the above Extrapolated Volitionist institutions do take other
people into consideration. They either give everyone the Schelling weight of one
vote in a moral parliament, or they take into consideration the epistemic
credibility of other bettors as evinced by their staked wealth, or other things
like that.
Sometimes the relevant interpersonal parameters can be varied, and the
institutional designs don't weigh in on that question. The ideological emphasis
is squarely on individual considered preferences -- that is the core insight of
the outlook. "Have everyone get strictly better outcomes by their lights,
probably in ways that surprise them but would be endorsed by them after
reflection and/or study."
The case most often cited as an example of a nondifferentiable function is derived from a sequence fn(x), each of which is a string of isosceles right triangles whose hypotenuses lie on the real axis and have length 1/n. As n→∞, the triangles shrink to zero size. For any finite n, the slope of fn(x) is ±1 almost everywhere. Then what happens as n→∞? The limit f∞(x) is often cited carelessly as a nondifferentiable function. Now it is clear that the limit of the derivativ
Only make choices that you would not make in reverse, if things were the other way around. Drop out of school if and only if you wouldn't enroll in school from out of the workforce. Continue school if and only if you'd switch over from work to that level of schooling.
Flitting back and forth between both possible worlds can make you less cagey about doing what's overdetermined by your world model + utility function already. It's also part of the exciting rationalist journey of acausally cooperating with your selves in other possible worlds.
It's probably a useful mental technique to consider from both directions, but
also consider that choices that appear symmetric at first glance may not
actually be symmetric. There are often significant transition costs that may
differ in each direction, as well as path dependencies that are not immediately
obvious.
As such, I completely disagree with the first paragraph of the post, but agree
with the general principle of considering such decisions from both directions
and thank you for posting it.
Science fiction books have to tell interesting stories, and interesting stories are about humans or human-like entities. We can enjoy stories about aliens or robots as long as those aliens and robots are still approximately human-sized, human-shaped, human-intelligence, and doing human-type things. A Star Wars in which all of the X-Wings were combat drones wouldn’t have done anything for us. So when I accuse something of being science-fiction-ish, I mean bending over backwards – and ignoring the evidence – in order to give basically human-shaped beings a c
Keltham will now, striding back and forth and rather widely gesturing, hold forth upon the central principle of all dath ilani project management, the ability to identify who is responsible for something. If there is not one person responsible for something, it means nobody is responsible for it. This is the proverb of dath ilani management. Are t
Thanks for posting this extract. I find the glowfic format a bit wearing to
read, for some reason, and it is these nuggets that I read Planecrash for, when
I do. (Although I had no such problem with HPMOR, which I read avidly all the
way through.)
What would it mean for a society to have real intellectual integrity? For one, people would be expected to follow their stated beliefs to wherever they led. Unprincipled exceptions and an inability or unwillingness to correlate beliefs among different domains would be subject to social sanction. Valid attempts to persuade would be expected to be based on solid argumentation, meaning that what passes for typical salesmanship nowadays would be considered a grave affront. Probably something along the lines of punching someone
Cf. "there are no atheists in a foxhole." Under stress, it's easy to slip
sideways into a world model where things are going better, where you don't have
to confront quite so many large looming problems. This is a completely natural
human response to facing down difficult situations, especially when brooding
over those situations over long periods of time. Similar sideways tugs can come
from (overlapping categories) social incentives to endorse a sacred belief of
some kind, or to not blaspheme, or to affirm the ingroup attire
[https://www.lesswrong.com/posts/nYkMLFpx77Rz3uo9c/belief-as-attire] when life
leaves you surrounded by a particular ingroup, or to believe what makes you or
people like you look good/high status.
[https://www.libertarianism.org/publications/essays/why-do-intellectuals-oppose-capitalism]
Epistemic dignity is about seeing "slipping sideways" as beneath you. Living in
reality is instrumentally beneficial, period. There's no good reason to ever
allow yourself to not live in reality. Once you can see something, even dimly,
there's absolutely no sense in hiding from that observation's implications.
Those subtle mental motions by which we disappear observations we know that we
won't like down the memory hole … epistemic dignity is about coming to always
and everywhere violently reject these hidings-from-yourself, as a matter of
principle. We don't actually have a choice in the matter -- there's no free
parameter of intellectual virtue here, that you can form a subjective opinion
on. That slipping sideways is undignified is written in the
[https://www.lesswrong.com/posts/QrhAeKBkm2WsdRYao/searching-for-bayes-structure]
very mathematics of inference itself.
[http://zackmdavis.net/blog/2016/08/the-fundamental-theorem-of-epistemology/]
1David Udell1y
Minor spoilers for mad investor chaos and the woman of asmodeus (planecrash Book
1). [https://www.glowfic.com/posts/4582]
You can usually save a lot of time by skimming texts or just reading pieces of them. But reading a work all the way through uniquely lets you make negative existential claims about its content: only now can you authoritatively say that the work never mentions something.
If you allow the assumption that your mental model of what was said matches what
was said, then you don't necessarily need to read all the way through to
authoritatively say that the work never mentions something, merely enough that
you have confidence in your model.
If you don't allow the assumption that your mental model of what was said
matches what was said, then reading all the way through is insufficient to
authoritatively say that the work never mentions something.
(There is a third option here: that your mental model suddenly becomes much
better when you finish reading the last word of an argument.)
Past historical experience and brainstorming about human social orders probably barely scratches the possibility space. If the CEV were to weigh in on possible posthuman social orders,[1] optimizing in part for how cool that social order is, I'd bet what it describes blows what we've seen out of the water in terms of cool factor.
One important idea I've picked up from reading Zvi is that, in communication, it's important to buy out the status cost imposed by your claims.
If you're fielding a theory of the world that, as a side effect, dunks on your interlocutor and diminishes their social status, you can work to get that person to think in terms of Bayesian epistemology and not decision theory if you make sure you aren't hurting their social image. You have to put in the unreasonable-feeling work of framing all your claims such that their social status is preserved or fairly increas... (read more)
I regret to inform you, you are an em inside an inconsistent simulated world. By this, I mean: your world is a slapdash thing put together out of off-the-shelf assets in the near future (presumably right before a singularity eats that simulator Earth).
Your world doesn't bother emulating far-away events in great detail, and indeed, may be messing up even things you can closely observe. Your simulators are probably not tampering with your thoughts, though even that is something worth considering carefully.
When another article of equal argumentative caliber could have just as easily been written for the negation of a claim, that writeup is no evidence for its claim.
The explicit definition of an ordered pair ((a,b)={{a},{a,b}}) is frequently relegated to pathological set theory...
It is easy to locate the source of the mistrust and suspicion that many mathematicians feel toward the explicit definition of ordered pair given above. The trouble is not that there is anything wrong or anything missing; the relevant properties of the concept we have defined are all correct (that is, in accord with the demands of intuition) and all the correct properties are present. The trouble is that the concept has some irreleva
Modern type theory mostly solves this blemish of set theory and is highly
economic conceptually to boot. Most of the adherence of set theory is historical
inertia - though some aspects of coding & presentations is important. Future
foundations will improve our understanding on this latter topic.
Now, whatever T may assert, the fact that T can be deduced from the axioms cannot prove that there is no contradiction in them, since, if there were a contradiction, T could certainly be deduced from them!
This is the essence of the Gödel theorem, as it pertains to our problems. As noted by Fisher (1956), it shows us the intuitive reason why Gödel’s result is true. We do not suppose that any logician would accept Fisher’s simple argument as a proof of the full Gödel theorem; yet for most of us it is more convincing than Gödel’s
The text is slightly in error. It is straightforward to construct a program that
is guaranteed to locate an inconsistency if one exists: just have it generate
all theorems and stop when it finds an inconsistency. The problem is that it
doesn't ever stop if there isn't an inconsistency.
This is the difference between decidability and semi-decidability. All the
systems covered by Gödel's completeness and incompletness theorems are
semi-decidable, but not all are decidable.
What're the odds that we're anywhere close to optimal in any theoretical domain? Where are our current models basically completed, boundedly optimal representations of some part of the universe?
The arguments for theoretical completion are stronger for some domains than others, but in general the odds that we have the best model in any domain are pretty poor, and are outright abysmal in the mindkilling domains.
Is the concept of "duty" the fuzzy shadow cast by the simple mathematical structure of 'corrigibility'?
It's only modestly difficult to train biological general intelligences to defer to even potentially dumber agents. We call these deferential agents "dutybound" -- the sergeants who carry out the lieutenant's direct orders, even when they think they know better; the bureaucrats who never take local opportunities to get rich at the expense of their bureau, even when their higher-ups won't notice; the employees who work hard in the absence of effective overs... (read more)
I note that Eliezer thinks that corrigibility is one
currently-impossible-to-instill-in-an-AGI property that humans actually have.
The sum total of human psychology... consists of many such impossible-to-instill
properties.
This is why we should want to accomplish one impossible thing, as our stopgap
solution, rather than aiming for all the impossible things at the same time, on
our first try at aligning the AGI.
2Vladimir_Nesov1y
It seems like corrigibility can't be usefully described as acting according to
some terminal goal. But AIs are not by default expected utility maximizers in
the ontology of the real world, so it could be possible to get them to do the
desired thing despite lacking a sensible formal picture of it.
I'm guessing some aspects of corrigibility might be about acting according to a
whole space of goals (at the same time), which is easier to usefully describe.
Some quantilizer-like thing selected to more natural desiderata, acting in a
particular way in accordance with a collection of goals. With the space of goals
not necessarily thought of as uncertainty about an unknown goal.
This is not about being dumb, it's about not actually engaging in planning.
Failing in this does require some level of non-dumbness, but not conversely.
Unless spontaneous mesa-optimizers all over the place, the cognitive cancer,
which probably takes many orders of magnitude above merely not being dumb. So
for a start, train the models, not the agent.
So! On a few moments' 'first-reflection', it seems to Keltham that estimating the probability of Civilization being run by a Dark Conspiracy boils down to (1) the question of whether Civilization's apparently huge efforts to build anti-Dark-Conspiracy citizens constitute sincere work that makes the Dark Conspiracy's life harder, or fake work designed to only look like that; and (2) the prior probability that the Keepers and Governance would have arrived on the scene already corrupted, during the last major reorg
Nonconformity is something trained in dath ilan and we could not be Law-shaped without that. If you're conforming to what you were taught, to what other people seem to believe, to what other people seem to want you to believe, to what you think everyone believes, you're not conforming to the Law.
A great symbolic moment for the Enlightenment, and for its project of freeing humanity from needless terrors, occurred in 1752 in Philadelphia. During a thunderstorm, Benjamin Franklin flew a kite with a pointed wire at the end and succeeded in drawing electric sparks from a cloud. He thus proved that lightning was an electrical phenomenon and made possible the invention of the lightning-rod, which, mounted on a high building, diverted the lightning and drew it harmlessly to the ground by means of a wire. Humanity no longer needed to fear fire from heaven.
Building your own world model is hard work. It can be good intellectual fun, sometimes, but it's often more fun to just plug into the crowd around you and borrow their collective world model for your decision making. Why risk embarrassing yourself going off and doing weird things on your ... (read more)
In another world, in which people hold utterly alien values, I would be thrilled to find a rationalist movement with similar infrastructure and memes. If rationalism/Bayescraft as we know it is on to something about instrumental reasoning, then we should see that kind of instrumental reasoning in effective people with alien values.
What sorts of AI designs could not be made to pursue a flipped utility function via perturbation in one spot? One quick guess: an AI that represents its utility function in several places and uses all of those representations to do error correction, only pursuing the error corrected utility function.
Just a phrasing/terminology nitpick: I think this applies to agents with
externally-imposed utility functions. If an agent has a "natural" or
"intrinsic" utility function which it publishes explicitly (and does not accept
updates to that explicit form), I think the risk of bugs in representation does
not occur.
A huge range of utility functions should care about alignment! It's in the interest of just about everyone to survive AGI.
I'm going to worry less about hammering out value disagreement with people in the here and now, and push this argument on them instead. We'll hammer out our value disagreements in our CEV, and in our future (should we save it).
There's a very serious chicken-and-egg problem when you talk about what a
utility function SHOULD include, as opposed to what it does. You need a place
OUTSIDE of the function to have preferences about what the function is.
If you just mean "I wish more humans shared my values on the topic of AGI
x-risk", that's perfectly reasonable, but trivial. That's about YOUR utility
function, and the frustration you feel at being an outlier.
1David Udell1y
Ah, yeah, I didn't mean to say that others' utility functions should, by their
own lights, be modified to care about alignment. I meant that instrumentally,
their utility functions already value surviving AGI highly. I'd want to show
this to them to get them to care about alignment, even if they and I disagree
about a lot of other normative things.
If someone genuinely, reflectively doesn't care about surviving AGI … then the
above just doesn't apply to them, and I won't try to convince them of anything.
In their case, we just have fundamental, reflectively robust value-disagreement.
1Ericf1y
I value not getting trampled by a hippo very highly too, but the likelihood that
I find myself near a hippo is low. And my ability to do anything about it is
also low.
One of the things that rationalism has noticeably done for me (that I see very sharply when I look at high-verbal-ability, non-rationalist peers) is that it's given me the ability to perform socially unorthodox actions on reflection. People generally have mental walls that preclude ever actually doing socially weird things. If someone's goals would be best served by doing something socially unorthodox, like, e.g., signing up for cryonics or dropping out of a degree), they will usually rationalize that option away in order to stay on script. So for th... (read more)
Two moments of growing in mathematical maturity I remember vividly:
Realizing that equations are claims that are therefore either true or false. Everything asserted with symbols... could just as well be asserted in English. I could start chunking up arbitrarily long and complicated equations between the equals signs, because those equals signs were just the English word "is"!
Learning about the objects that mathematical claims are about. Going from having to look up "Wait, what's a real number again?" to knowing how Z, Q, and R interrelat
2. The anchor of a major news network donates lots of money to organizations fighting against gay marriage, and in his spare time he writes editorials arguing that homosexuals are weakening the moral fabric of the country. The news network decides they disagree with this kind of behavior and fire the anchor.
a) This is acceptable; the news network is acting within their rights and according to their principles b) This is outrageous; people should be judged on the quality of their work and not their political beliefs
What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?
Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world's AI trajectory. I'm sure the "savior sequence" exists mathematically, but finding it is a whole different ballgame.
Don't translate your values into just a loss function. Rather, translate them into a loss function andall the rest of a training story. Use all the tools at your disposal in your impossible task; don't tie one hand behind your back by assuming the loss function is your only lever over the AGI's learned values.
In the 1920s when λ and CL began, logicians did not automatically think of functions as sets of ordered pairs, with domain and range given, as mathematicians are trained to do today. Throughout mathematical history, right through to computer science, there has run another concept of function, less precise at first but strongly influential always; that of a function as an operation-process (in some sense) which may be applied to certain objects to produce other objects. Such a process can be defined by giving a set of rules describing how it acts
There is a third important aspect of functions-in-the-original-sense that
distinguishes them from extensional functions (i.e. collection of input-output
pairs): effects.
Describing these 'intensional' features is an active area of research in
theoretical CS. One important thread here is game semantics; you might like to
take a look:
https://link.springer.com/chapter/10.1007/978-3-642-58622-4_1
[https://link.springer.com/chapter/10.1007/978-3-642-58622-4_1]
Complex analysis is the study of functions of a complex variable, i.e., functions f(z) where z and f(z) lie in C. Complex analysis is the good twin and real analysis the evil one: beautiful formulas and elegant theorems seem to blossom spontaneously in the complex domain, while toil and pathology rule in the reals. Nevertheless, complex analysis relies more on real analysis than the other way around.
Switching costs between different kinds of work can be significant. Give yourself permission to focus entirely on one kind of work per Schelling unit of time (per day), if that would help. Don't spend cognitive cycles feeling guilty about letting some projects sit on the backburner; the point is to get where you're going as quickly as possible, not to look like you're juggling a lot of projects at once.
This can be hard, because there's a conventional social expectation that you'll juggle a lot of projects simultaneously, maybe because that's more legible t... (read more)
Social niceties and professionalism act as a kind of 'communications handshake' in ordinary society -- maybe because they're still a credible correlate of having your act together enough to be worth considering your outputs in the first place?
Large single markets are (pretty good) consequentialist engines. Run one of these for a while, and you can expect significantly improving outcomes inside of that bloc, by the lights of the entities participating in that single market.
I've noticed that part of me likes to dedicate disproportionate cognitive cycles to the question: "If you surgically excised all powerful AI from the world, what political policies would be best to decree, by your own lights?"
The thing is, we live in a world with looming powerful AI. It's at least not consequentialist to spend a bunch of cognitive cycles honing your political views for a world we're not in. I further notice that my default justification for thinking about sans-AI politics a lot is consequentialist... so something's up here. I think some pa... (read more)
Fancy epistemic tools won't override the basics of good epistemics:
You are embedded in a 3D spatial world, progressing in a time dimension. You want to get better at predicting events in advance, so you want to find the underlying generator for this 3D world's events. This means that you're rooting around in math space, trying to find the mathematical object that your observational trajectory is embedded in.
Some observations of yours are differentially more likely in some math objects than in others, and so it's more likely that your world is the former ma... (read more)
"Does the distinction between understanding and improving correspond to the distinction between the Law of Probability and the Law of Utility? It sounds like it should."
"Sensible question, but no, not exactly. Probability is something like a separable core that lies at the heart of Probable Utility. The process of updating our beliefs, once we have the evidence, is something that in principle doesn't depend at all on what we want - the way reality is is something defined independently of anyth
Minor spoilers for planecrash (Book 3.1). [https://www.glowfic.com/posts/5785]
KELTHAM EXPLAINS MODEL ERROR
[https://www.glowfic.com/replies/1782692#reply-1782692]
2simon10mo
Here, Eliezer seems to be talking about more specified versions of a not-fully
specified hypothesis (case 1):
Here, Eliezer seems to be talking about hypotheses that aren't subhypotheses of
an existing hypothesis (case 2):
Eliezer's approach is:
For subhypotheses (case 1), we aren't actually considering these further
features yet, so this seems true but not in a particularly exciting way.
I think it is rare for a hypothesis to truly lie outside of all existing
hypotheses, because you can have very underspecified meta-hypotheses that you
will implicitly be taking into account even if you don't enumerate them.
(examples of vague meta-hypotheses: supernatural vs natural, realism vs.
solipsism, etc). And of course there are varying levels of vagueness from very
narrow to very broad.
But, OK, within these vague meta-hypotheses the true hypothesis is still often
not a subhypothesis of any of your more specified hypotheses (case 2). A number
for the probability of this happening might be hard to pin down, and in order to
actually obtain instrumental value from this probability assignment, or to make
a Bayesian adjustment of it, you need a prior for what happens in the world
where all your specific hypotheses are false.
But, you actually do have such priors and relevant information as to the
probability!
Eliezer mentions:
This is relevant data. Note also that the expectation that all of your
hypotheses will score lower than promised if they are all false is, in itself, a
prior on the predictions of the 'all-other-hypotheses' hypothesis.
Likewise, when you do the adjustments mentioned in Eliezer's last paragraph, you
will do some specific amount of adjustment, and that specific adjustment amount
will depend on an implicit value for the probability of the
'all-other-hypotheses' hypothesis and an implicit prior on its predictions.
In my view, there is no reason in principle that these priors and probabilities
cannot be quantified.
To be sure, people don't usually
When people write novels about aliens attacking dath ilan and trying to kill all humans everywhere, the most common rationale for why they'd do that is that they want our resources and don't otherwise care who's using them, but, if you want the aliens to have a sympatheticreason, the most common reason is that they're worried a human might break an oath again at some point, or spawn the kind of society that betrays the alien hypercivilization in the future.
Rationalism is about the real world. It may or may not strike you as an especially internally consistent, philosophically interesting worldview -- this is not what rationality is about. Rationality is about seeing things happen in the real world and then updating your understanding of the world when those things you see surprise you so that they wouldn't surprise you again.
Why care about predicting things in the world well?
Almost no matter what you ultimately care about, being able to predict ahead of time what's going to happen next will make you better at planning for your goal.
One central rationalist insight is that thoughts are for guiding actions. Think
of your thinking as the connecting tissue sandwiched between the sense data that
enters your sense organs and the behaviors your body returns. Your brain is a
function from a long sequence of observations (all the sensory inputs you've
ever received, in the order you received them) to your next motor output.
Understood this way, the point of having a brain and having thoughts is to guide
your actions. If your thoughts aren't all ultimately helping you better steer
the universe (by your own lights) … they're wastes. Thoughts aren't meant to be
causally-closed-off eddies that whirl around in the brain without ever
decisively leaving it as actions. They're meant to transform observations into
behaviors! This is the whole point of thinking! Notice when your thoughts are
just stewing, without going anywhere, without developing into thoughts that'll
go somewhere … and let go of those useless thoughts. Your thoughts should cut.
[https://www.lesswrong.com/posts/6ddcsdA2c2XpNpE5x/newcomb-s-problem-and-regret-of-rationality#:~:text=As%20Miyamoto%20Musashi%20said%3A,able%20actually%20to%20cut%20him.%22]
1David Udell1y
If you can imagine a potential worry, then you can generate that worry.
Rationalism is, in part, the skill of never being predictably surprised by
things you already foresaw.
It may be that you need to "wear another hat" in order to pull that worry out of
your brain, or to model another person advising you
[https://www.lesswrong.com/posts/X79Rc5cA5mSWBexnd/shoulder-advisors-101] to get
your thoughts to flow that way, but whatever your process, anything you can
generate for yourself is something you can foresee and consider. This aspect of
rationalism is the art of "mining out your future cognition," to exactly the
extent that you can foresee it,
[https://www.lesswrong.com/posts/BPFqBq7ch7pSRpybM/your-future-self-s-credences-should-be-unpredictable-to-you]
leaving whatever's left over a mystery to be updated on new observations.
1David Udell1y
Minor spoilers for mad investor chaos and the woman of asmodeus (planecrash Book
1). [https://www.glowfic.com/posts/4582]
The citation link in this post takes you to a NSFW subthread in the story.
Gebron and Eleazar define kabbalah as “hidden unity made manifest through patterns of symbols”, and this certainly fits the bill. There is a hidden unity between the structures of natural history, human history, American history, Biblical history, etc: at an important transition point in each, the symbols MSS make an appearance and lead to the imposition of new laws. Anyone who dismisses this as coincidence will soon find the coincidences adding up to an implausible level.
The kabbalistic perspective is that nothing is a coincidence. We believe that the uni
The ML models that now speak English, and are rapidly growing in
world-transformative capability, happen to be called transformers.
This is not a coincidence because nothing is a coincidence.
[https://unsongbook.com/chapter-1-dark-satanic-mills/#:~:text=None%20of%20this%20was%20a%20coincidence%20because%20nothing%20was%20ever%20a%20coincidence.]
An implication of AI risk is that we, right now, stand at the fulcrum of human history.
Lots of historical people also claimed that they stood at that unique point in history … and were just wrong about it. But my world model also makes that self-important implication (in a specific form), and the meta-level argument for epistemic modesty isn't enough to nudge me off of the fulcrum-of-history view.
If you buy that, it's our overriding imperative to do what we can about it, right now. If we miss this one, ~all of future value evaporates.
For me, the implication of standing at the fulcrum of human history is to…read a lot of textbooks and think about hairy computer science problems.
That seems an odd enough conclusion to make it quite distinct from most other people in human history.
If the conclusion were "go over to those people, hit them on the head with a big rock, and take their women & children as slaves" or "acquire a lot of power", I'd be way more careful.
There exist both merely clever and effectively smarter people.
Merely clever people are good with words and good at rapidly assimilating complex instructions and ideas, but don't seem to maintain and update an explicit world-model, an explicit best current theory-of-everything. The feeling I get watching these people respond to topics and questions is that they respond reflexively, either (1) raising related topics and ideas they've encountered as something similar comes up, or (2) expressing their gut reactions to the topic or idea, or expressing the gut r... (read more)
In the game of chicken, an agent can do better by being the first to precommit to never swerve (say, by conspicuously tossing the steering wheel out of the window). So long as the other agent was slower on the trigger, and sees the first agent's precommitment being credibly made, the first agent will climb up to his best outcome! A smart (and quick) agent can thus shunt that car crash out of his actual future and into some counterfactual future such that the counterfactual crash's shadow favorably influences the way events actually unfold.
...unless the other agent has already precommitted to not being rational. (What
is the advantage of this over just precommitting not to swerve? Precommitting to
not be rational can happen even in advance of the game, as it's mainly a
property of the agent itself.)
(This is one way that you can rationally arrive at irrational agents.)
1David Udell1y
I don't yet know too much about this, but I've heard that updateless decision
theories are equivalent to conventional, updateful decision theories (e.g., EDT
and CDT) once those theories have made every precommitment they'd want to make.
The pattern I was getting at above seems a bit like this: it instrumentally
makes sense to commit ahead of time to a policy that maps every possible series
of observations to an action and then stick to it, instead of just outputting
the locally best action in each situation you stumble into.
In the beginning God created four dimensions. They were all alike and indistinguishable from one another. And then God embedded atoms of energy (photons, leptons, etc.) in the four dimensions. By virtue of their energy, these atoms moved through the four dimensions at the speed of light, the only spacetime speed. Thus, as perceived by any one of these atoms, space contracted in, and only in, the direction of that particular atom's motion. As the atoms moved at the speed of light, space contracted so much in the direction of the atom's motion that the dimen
When you supervised-train an ML model on an i.i.d. dataset that doesn't contain any agent modeling problems, you never strongly incentivize the emergence of mesa-optimizers. You do weakly incentivize the emergence of mesa-optimizers, because mesa-optimizers are generally capable algorithms that might outperform brittle bundles of rote heuristics on many simple tasks.
When you train a model in a path-dependent setting, you do strongly incentivize mesa-optimization. This is because algorithms trained in a path-dependent setting have the opportunity to defend ... (read more)
Spend an hour and a half refactoring your standing political views, by temporarily rolling those political views back to a childhood state from before your first encounter with highly communicable and adaptive memeplexes. Query your then-values, and reason instrumentally from the values you introspect. Finally, take or leave the new views you generate.
If your current political views are well supported, then they should regenerate under this procedure. But if you've mostly been recycling cached thoughts... (read more)
My memories of childhood aren't that precise. I don't really know what my
childhood state was? Before certain extremely negative things happened to my
psyche, that is. There are only a few scattered pieces I recall, like
self-sufficiency and honesty being important, but these are the parts that
already survived into my present political and moral beliefs.
The only thing I could actually use is that I was a much more orderly person
when I was 4 or 5, but I don't see how it would work to use just that.
But, evolution managed to push human values in the rough direction of its own values: inclusive genetic fitness. We don't care about maximizinginclusive genetic fitness, but we do care about having sex... (read more)
The theoretical case for open borders is pretty good. But you might worry a lot about the downside risk of implementing such a big, effectively irreversible (it'd be nigh impossible to deport millions and millions of immigrants) policy change. What if the theory's wrong and the result is catastrophe?
Just like with futarchy, we might first try out a promising policy like open borders at the state level, to see how it goes. E.g., let people immigrate to just one US state with only minimal conditions. Scaling up a tested policy if it works and abandoning it i... (read more)
A semantic externalist once said, "Meaning just ain't in the head. Hence a brain-in-a-vat Just couldn't think that 'Might it all be illusion instead?'"
I thought that having studied philosophy (instead of math or CS) made me an outlier for a rationalist.
But, milling about the Lightcone offices, fully half of the people I've encountered hold some kind of philosophy degree. "LessWrong: the best philosophy site on the internet."
Equanimity in the face of small threats to brain and body health buys you peace of mind, with which to better prepare for serious threats to brain and body health.
Humans, "teetering bulbs of dream and dread," evolved as a generally intelligent patina around the Earth. We're all the general intelligence the planet has to throw around. What fraction of that generally intelligent skin is dedicated to defusing looming existential risks? What fraction is dedicated towards immanentizing the eschaton?
Minor spoilers for planecrash (Book 3).
Keltham's Governance Lecture
... (read more)A decent handle for rationalism is 'apolitical consequentialism.'
'Apolitical' here means avoiding playing the whole status game of signaling fealty to a political tribe and winning/losing status as that political tribe wins/loses status competitions. 'Consequentialism' means getting more of what you want, whatever that is.
Minor spoilers from mad investor chaos and the woman of asmodeus (planecrash Book 1) and Peter Watt's Echopraxia.
... (read more)Why does politics strike rationalists as so strangely shaped? Why does rationalism come across as aggressively apolitical to smart non-rationalists?
Part of the answer: Politics is absolutely rife with people mixing their ends with their means and vice versa. It's pants-on-head confused, from a rationalist perspective, to be ul... (read more)
Yudkowsky has sometimes used the phrase "genre savvy" to mean "knowing all the tropes of reality."
For example, we live in a world where academia falls victim to publishing incentives/Goodhearting, and so academic journals fall short of what people with different incentives would be capable of producing. You'd be failing to be genre savvy if you expected that when a serious problem like AGI alignment rolled around, academia would suddenly get its act together with a relatively small amount of prodding/effort. Genre savvy actors in our world know what academia is like, and predict that academia will continue to do its thing in the future as well.
Genre savviness is the same kind of thing as hard-to-communicate-but-empirically-validated expert intuitions. When domain experts have some feel for what projects might pan out and what projects certainly won't but struggle to explain their reasoning in depth, the most they might be able to do is claim that that project is just incompatible with the tropes of their corner of reality, and point to some other cases.
“What is the world trying to tell you?”
I've found that this prompt helps me think clearly about the evidence shed by the generator of my observations.
There's a rationality-improving internal ping I use on myself, which goes, "what do I expect to actually happen, for real?"
This ping moves my brain from a mode where it's playing with ideas in a way detached from the inferred genre of reality, over to a mode where I'm actually confident enough to bet about some outcomes. The latter mode leans heavily on my priors about reality, and, unlike the former mode, looks askance at significantly considering long, conjunctive, tenuous possible worlds.
God dammit people, "cringe" and "based" aren't truth values! "Progressive" is not a truth value! Say true things!
Having been there twice, I've decided that the Lightcone offices are my favorite place in the world. They're certainly the most rationalist-shaped space I've ever been in.
Academic philosophers are better than average at evaluating object-level arguments for some claim. They don't seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.
If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with n... (read more)
Modest spoilers for planecrash (Book 9 -- null action act II).
... (read more)Epistemic status: politics, known mindkiller; not very serious or considered.
People seem to have a God-shaped hole in their psyche: just as people banded around religious tribal affiliations, they now, in the contemporary West, band together around political tribal affiliations. Intertribal conflict can be, at its worst, violent, on top of mindkilling. Religious persecution in the UK was one of the instigating causes of British settlers migrating to the American colonies; religious conflict in Europe generally was severe.
In the US, the 1st Amendment legall... (read more)
153
If you take each of the digits of 153, cube them, and then sum those cubes, you get 153:
1 + 125 + 27 = 153.
For many naturals, if you iteratively apply this function, you'll return to the 153 fixed point. Start with, say, 298:
8 + 729 + 512 = 1,249
1 + 8 + 64 + 729 = 802
512 + 0 + 8 = 516
125 + 1 + 216 = 342
27 + 64 + 8 = 99
729 + 729 = 1,458
1 + 64 + 125 + 512 = 702
343 + 0 + 8 = 351
27 + 125 + 1 = 153
1 + 125 + 27 = 153
1 + 125 + 27 = 153...
These nine fixed points or cycles occur with the following frequencies (1 <= n <= 10e9):
33.3% : (153 → )
29.5% : (371 → )
17.8% : (370 → )
5.0% : (55 → 250 → 133 → )
4.1% : (160 → 217 -> 352 → )
3.8% : (407 → )
3.1% : (919 → 1459 → )
1.8% : (1 → )
1.5% : (136 → 244 → )
No other fixed points or cycles are possible (except 0 → 0, which isn't reachable from any nonzero input) since any number with more than four digits will have fewer digits in the sum of its cubed digits.
A model I picked up from Eric Schwitzgebel.
The humanities used to be highest-status in the intellectual world!
But then, scientists quite visibly exploded fission weapons and put someone on the moon. It's easy to coordinate to ignore some unwelcome evidence, but not evidence that blatant. So, begrudgingly, science has been steadily accorded more and more status, from the postwar period on.
When the sanity waterline is so low, it's easy to develop a potent sense of misanthropy.
Bryan Caplan's writing about many people hating stupid people really affected me on this point. Don't hate, or even resent, stupid people; trade with them! This is a straightforward consequence of Ricardo's comparative advantage theorem. Population averages are overrated; what matters is whether the individual interactions between agents in a population are positive-sum, not where those individual agents fall relative to the population average.
"Ignorant people do not exist."
It's really easy to spend a lot of cognitive cycles churning through bad, misleading ideas generated by the hopelessly confused. Don't do that!
The argument that being more knowledgeable leaves you strictly better off than being ignorant does relies you simply ignoring bad ideas when you spend your cognitive cycles searching for improvements on your working plans. Sometimes, you'll need to actually exercise this "simply ignore it" skill. You'll end up needing to do so more and more, to approach bounded instrumental rationality, the more inadequate civilization around you is and the lower its sanity waterline.
You sometimes misspeak... and you sometimes misthink. That is, sometimes your cognitive algorithm a word, and the thought that seemed so unimpeachably obvious in your head... is nevertheless false on a second glance.
Your brain is a messy probabilistic system, so you shouldn't expect its cognitive state to ever perfectly track the state of a distant entity.
Policy experiments I might care about if we weren't all due to die in 7 years:
- Prediction markets generally, but especially policy prediction markets at the corporate- and U.S. state- levels. The goal would be to try this route to raising the sanity waterline in the political domain (and elsewhere) by incentivizing everyone's becoming more of a policy wonk and less of a tribalist.
- Open borders experiments of various kinds in various U.S. states, precluding roads to citizenship or state benefits for migrant workers, and leaving open the possibility of mass de
... (read more)Become consequentialist enough, and it'll wrap back around to being a bit deontological.
A shard is a contextually activated behavior-steering computation. Think of it as a circuit of neurons in your brain that is reinforced by the subcortex, gaining more staying power when positively reinforced and withering away in the face of negative reinforcement. In fact, whatever modulates shard strength in this way is reinforcement/reward. Shards are born when a computation that is currently steering steers into some reinforcement. So shards can only accrete around the concepts currently in a system's world model (presumably, the world model is shared ... (read more)
My favorite books, ranked!
Non-fiction:
1. Rationality, Eliezer Yudkowsky
2. Superintelligence, Nick Bostrom
3. The Age of Em, Robin Hanson
Fiction:
1. Permutation City, Greg Egan
2. Blindsight, Peter Watts
3. A Deepness in the Sky, Vernor Vinge
4. Ra, Sam Hughes/qntm
Because your utility function is your utility function, the one true political ideology is clearly Extrapolated Volitionism.
Extrapolated Volitionist institutions are all characteristically "meta": they take as input what you currently want and then optimize for the outcomes a more epistemically idealized you would want, after more reflection and/or study.
Institutions that merely optimize for what you currently want the way you would with an idealized world-model are old hat by comparison!
Back and Forth
Only make choices that you would not make in reverse, if things were the other way around. Drop out of school if and only if you wouldn't enroll in school from out of the workforce. Continue school if and only if you'd switch over from work to that level of schooling.
Flitting back and forth between both possible worlds can make you less cagey about doing what's overdetermined by your world model + utility function already. It's also part of the exciting rationalist journey of acausally cooperating with your selves in other possible worlds.
Ten seconds of optimization is infinitely better than zero seconds of optimization.
Spoilers for planecrash (Book 2).
"Basic project management principles, an angry rant by Keltham of dath ilan, section one: How to have anybody having responsibility for anything."
... (read more)You can usually save a lot of time by skimming texts or just reading pieces of them. But reading a work all the way through uniquely lets you make negative existential claims about its content: only now can you authoritatively say that the work never mentions something.
Past historical experience and brainstorming about human social orders probably barely scratches the possibility space. If the CEV were to weigh in on possible posthuman social orders,[1] optimizing in part for how cool that social order is, I'd bet what it describes blows what we've seen out of the water in terms of cool factor.
(Presumably posthumans will end up reflectively endorsing interactions with one another of some description.)
One important idea I've picked up from reading Zvi is that, in communication, it's important to buy out the status cost imposed by your claims.
If you're fielding a theory of the world that, as a side effect, dunks on your interlocutor and diminishes their social status, you can work to get that person to think in terms of Bayesian epistemology and not decision theory if you make sure you aren't hurting their social image. You have to put in the unreasonable-feeling work of framing all your claims such that their social status is preserved or fairly increas... (read more)
An Inconsistent Simulated World
I regret to inform you, you are an em inside an inconsistent simulated world. By this, I mean: your world is a slapdash thing put together out of off-the-shelf assets in the near future (presumably right before a singularity eats that simulator Earth).
Your world doesn't bother emulating far-away events in great detail, and indeed, may be messing up even things you can closely observe. Your simulators are probably not tampering with your thoughts, though even that is something worth considering carefully.
What are the flaws you... (read more)
When another article of equal argumentative caliber could have just as easily been written for the negation of a claim, that writeup is no evidence for its claim.
What're the odds that we're anywhere close to optimal in any theoretical domain? Where are our current models basically completed, boundedly optimal representations of some part of the universe?
The arguments for theoretical completion are stronger for some domains than others, but in general the odds that we have the best model in any domain are pretty poor, and are outright abysmal in the mindkilling domains.
Is the concept of "duty" the fuzzy shadow cast by the simple mathematical structure of 'corrigibility'?
It's only modestly difficult to train biological general intelligences to defer to even potentially dumber agents. We call these deferential agents "dutybound" -- the sergeants who carry out the lieutenant's direct orders, even when they think they know better; the bureaucrats who never take local opportunities to get rich at the expense of their bureau, even when their higher-ups won't notice; the employees who work hard in the absence of effective overs... (read more)
Minor spoilers for planecrash (Book 3).
... (read more)Non-spoiler quote from planecrash (Book 3).
Building your own world model is hard work. It can be good intellectual fun, sometimes, but it's often more fun to just plug into the crowd around you and borrow their collective world model for your decision making. Why risk embarrassing yourself going off and doing weird things on your ... (read more)
In another world, in which people hold utterly alien values, I would be thrilled to find a rationalist movement with similar infrastructure and memes. If rationalism/Bayescraft as we know it is on to something about instrumental reasoning, then we should see that kind of instrumental reasoning in effective people with alien values.
Agents that explicitly represent their utility function are potentially vulnerable to sign flips.
What sorts of AI designs could not be made to pursue a flipped utility function via perturbation in one spot? One quick guess: an AI that represents its utility function in several places and uses all of those representations to do error correction, only pursuing the error corrected utility function.
A huge range of utility functions should care about alignment! It's in the interest of just about everyone to survive AGI.
I'm going to worry less about hammering out value disagreement with people in the here and now, and push this argument on them instead. We'll hammer out our value disagreements in our CEV, and in our future (should we save it).
One of the things that rationalism has noticeably done for me (that I see very sharply when I look at high-verbal-ability, non-rationalist peers) is that it's given me the ability to perform socially unorthodox actions on reflection. People generally have mental walls that preclude ever actually doing socially weird things. If someone's goals would be best served by doing something socially unorthodox, like, e.g., signing up for cryonics or dropping out of a degree), they will usually rationalize that option away in order to stay on script. So for th... (read more)
Two moments of growing in mathematical maturity I remember vividly:
- Realizing that equations are claims that are therefore either true or false. Everything asserted with symbols... could just as well be asserted in English. I could start chunking up arbitrarily long and complicated equations between the equals signs, because those equals signs were just the English word "is"!
- Learning about the objects that mathematical claims are about. Going from having to look up "Wait, what's a real number again?" to knowing how Z, Q, and R interrelat
... (read more)What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?
Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world's AI trajectory. I'm sure the "savior sequence" exists mathematically, but finding it is a whole different ballgame.
Don't translate your values into just a loss function. Rather, translate them into a loss function and all the rest of a training story. Use all the tools at your disposal in your impossible task; don't tie one hand behind your back by assuming the loss function is your only lever over the AGI's learned values.
"Calling babble and prune the True Name of text generation is like calling bogosort the True Name of search."
Switching costs between different kinds of work can be significant. Give yourself permission to focus entirely on one kind of work per Schelling unit of time (per day), if that would help. Don't spend cognitive cycles feeling guilty about letting some projects sit on the backburner; the point is to get where you're going as quickly as possible, not to look like you're juggling a lot of projects at once.
This can be hard, because there's a conventional social expectation that you'll juggle a lot of projects simultaneously, maybe because that's more legible t... (read more)
Stress and time-to-burnout are resources to be juggled, like any other.
Social niceties and professionalism act as a kind of 'communications handshake' in ordinary society -- maybe because they're still a credible correlate of having your act together enough to be worth considering your outputs in the first place?
Large single markets are (pretty good) consequentialist engines. Run one of these for a while, and you can expect significantly improving outcomes inside of that bloc, by the lights of the entities participating in that single market.
Reflexively check both sides of the proposed probability of an event:
and
This can often elicit feedback from parts of you that would stay silent if you only considered one way of stating the probability in question.
I've noticed that part of me likes to dedicate disproportionate cognitive cycles to the question: "If you surgically excised all powerful AI from the world, what political policies would be best to decree, by your own lights?"
The thing is, we live in a world with looming powerful AI. It's at least not consequentialist to spend a bunch of cognitive cycles honing your political views for a world we're not in. I further notice that my default justification for thinking about sans-AI politics a lot is consequentialist... so something's up here. I think some pa... (read more)
Fancy epistemic tools won't override the basics of good epistemics:
You are embedded in a 3D spatial world, progressing in a time dimension. You want to get better at predicting events in advance, so you want to find the underlying generator for this 3D world's events. This means that you're rooting around in math space, trying to find the mathematical object that your observational trajectory is embedded in.
Some observations of yours are differentially more likely in some math objects than in others, and so it's more likely that your world is the former ma... (read more)
Try pinging yourself:
What's overdetermined by what you already know?
Minor spoilers for planecrash (Book 3.1).
... (read more)Minor spoilers for planecrash (Book 1) and the dath-ilani-verse generally.
What is rationalism about?
Rationalism is about the real world. It may or may not strike you as an especially internally consistent, philosophically interesting worldview -- this is not what rationality is about. Rationality is about seeing things happen in the real world and then updating your understanding of the world when those things you see surprise you so that they wouldn't surprise you again.
Why care about predicting things in the world well?
Almost no matter what you ultimately care about, being able to predict ahead of time what's going to happen next will make you better at planning for your goal.
An implication of AI risk is that we, right now, stand at the fulcrum of human history.
Lots of historical people also claimed that they stood at that unique point in history … and were just wrong about it. But my world model also makes that self-important implication (in a specific form), and the meta-level argument for epistemic modesty isn't enough to nudge me off of the fulcrum-of-history view.
If you buy that, it's our overriding imperative to do what we can about it, right now. If we miss this one, ~all of future value evaporates.
For me, the implication of standing at the fulcrum of human history is to…read a lot of textbooks and think about hairy computer science problems.
That seems an odd enough conclusion to make it quite distinct from most other people in human history.
If the conclusion were "go over to those people, hit them on the head with a big rock, and take their women & children as slaves" or "acquire a lot of power", I'd be way more careful.
There exist both merely clever and effectively smarter people.
Merely clever people are good with words and good at rapidly assimilating complex instructions and ideas, but don't seem to maintain and update an explicit world-model, an explicit best current theory-of-everything. The feeling I get watching these people respond to topics and questions is that they respond reflexively, either (1) raising related topics and ideas they've encountered as something similar comes up, or (2) expressing their gut reactions to the topic or idea, or expressing the gut r... (read more)
In the game of chicken, an agent can do better by being the first to precommit to never swerve (say, by conspicuously tossing the steering wheel out of the window). So long as the other agent was slower on the trigger, and sees the first agent's precommitment being credibly made, the first agent will climb up to his best outcome! A smart (and quick) agent can thus shunt that car crash out of his actual future and into some counterfactual future such that the counterfactual crash's shadow favorably influences the way events actually unfold.
A deceptively ali... (read more)
You can think of chain-of-thought interpretability as the combination of process-based methods with adversarial training.
When you supervised-train an ML model on an i.i.d. dataset that doesn't contain any agent modeling problems, you never strongly incentivize the emergence of mesa-optimizers. You do weakly incentivize the emergence of mesa-optimizers, because mesa-optimizers are generally capable algorithms that might outperform brittle bundles of rote heuristics on many simple tasks.
When you train a model in a path-dependent setting, you do strongly incentivize mesa-optimization. This is because algorithms trained in a path-dependent setting have the opportunity to defend ... (read more)
Unreasonably effective rationality-improving technique:
Spend an hour and a half refactoring your standing political views, by temporarily rolling those political views back to a childhood state from before your first encounter with highly communicable and adaptive memeplexes. Query your then-values, and reason instrumentally from the values you introspect. Finally, take or leave the new views you generate.
If your current political views are well supported, then they should regenerate under this procedure. But if you've mostly been recycling cached thoughts... (read more)
The unlovely neologism "agenty" means strategic.
"Agenty" might carry less connotational baggage in exchange for its unsightliness, however. Just like "rational" is understood by a lot of people to mean, in part, stoical, "strategic" might mean manipulative to a lot of people.
"Thanks for doing your part for humanity!"
"But we're not here to do software engineering -- we're here to save the world."
Because of deception, we don't know how to put a given utility function into a smart agent that has grokked the overall picture of its training environment. Once training finds a smart-enough agent, the model's utility functions ceases to be malleable to us. This suggests that powerful greedy search will find agents with essentially random utility functions.
But, evolution managed to push human values in the rough direction of its own values: inclusive genetic fitness. We don't care about maximizing inclusive genetic fitness, but we do care about having sex... (read more)
The theoretical case for open borders is pretty good. But you might worry a lot about the downside risk of implementing such a big, effectively irreversible (it'd be nigh impossible to deport millions and millions of immigrants) policy change. What if the theory's wrong and the result is catastrophe?
Just like with futarchy, we might first try out a promising policy like open borders at the state level, to see how it goes. E.g., let people immigrate to just one US state with only minimal conditions. Scaling up a tested policy if it works and abandoning it i... (read more)
A semantic externalist once said,
"Meaning just ain't in the head.
Hence a brain-in-a-vat
Just couldn't think that
'Might it all be illusion instead?'"
I thought that having studied philosophy (instead of math or CS) made me an outlier for a rationalist.
But, milling about the Lightcone offices, fully half of the people I've encountered hold some kind of philosophy degree. "LessWrong: the best philosophy site on the internet."
Some mantras I recall a lot, to help keep on the rationalist straight-and-narrow and not let anxiety get the better of me:
Humans, "teetering bulbs of dream and dread," evolved as a generally intelligent patina around the Earth. We're all the general intelligence the planet has to throw around. What fraction of that generally intelligent skin is dedicated to defusing looming existential risks? What fraction is dedicated towards immanentizing the eschaton?