This article was written in ignorance of the alignment community’s reaction to Eliezer’s “Death with Dignity” post. The first part of this article responds to how I suspect some people reacted to that post, while the second part is my take on the post itself.
I write against defeatism; I write against decline; I write against that internal slumping that sneaks in on the coat-tails of bad news. I do not dispute the balance of evidence—are we doomed, or not? Let us simply assume that we live in a relatively doomed world: It’s very improbable that we solve AI alignment in time.
We have been taught what comes next, in this kind of story. Since, by assumption, we won’t receive a “happily ever after”, we infer we are in a tragedy. Realizing this, we are disappointed and sad. The fellowship breaks and scatters, its once-proud and vibrant members downtrodden. And then occurs a miracle which we could have turned to our advantage, but for our civilizational incompetence and our own stupidity—smart enough to build AI, yet too dumb to align it. And then the laugh track plays. And then we die! The end!
As AI_WAIFU said: “Fuck. That. Noise.”
We do not live in a story. We can, in fact, just assess the situation, and then do what makes the most sense, what makes us strongest and happiest. The expected future of the universe is—by assumption—sad and horrible, and yet where is the ideal-agency theorem which says I must be downtrodden and glum about it?
In response to how I suspect some people reacted to "Death with Dignity."
Suppose we have ten years left. Some may consider a very natural response—to retreat. Spend the remaining time with your family and your friends, working whatever job you want—just do whatever.
But... Um... This sounds like an awful plan? I’d feel like a lazy bum, like a cornered animal. And for those of us doing direct work, this plan sounds stupid. It sounds like a great way to throw away worlds where we would receive a miracle, and yet couldn’t do anything about it anymore because we all went home.
How would I like to see those ten years spent? I’m looking for plans which involve playing to win, which are fun and edifying, which would make me proud to be me and to be part of this community.
Ladies and gentlemen, let me mention a concept called instrumental convergence. It’s something I like thinking about sometimes. In many situations, for many goals, the best actions look similar: Gather resources, build up strength, develop yourself. Following this pattern, good plans in probably-doomed worlds look a lot like good plans in hopeful worlds.
Even if the probable-end is bearing down—imagine having a vibrant social scene anyways, where people are having fun anyways, developing deep friendships anyways, expecting more of themselves anyways, pushing each other to break their boundaries anyways. Because that’s awesome, because that’s fun, because that makes us stronger, because that positions us to seize miracles, because that’s the kind of thing that makes me proud, dammit.
Our civilization is incompetent. Gee, that sucks. But we don’t have to suck too, do we? If a society of people-like-us would really be so great, why don’t we show it?
- Set up communal institutions which actually have good incentives.
- Build out infrastructure for alignment research and training, so that smart young people want to try their hand at alignment research.
- Probably more people should skill up as managers, so we can have more capacity via high-quality organizations like Redwood Research. I don’t know how to do this, and cannot offer advice.
- Set up infrastructure in areas where our society is decrepit—can we do better
- on childcare?
- on healthcare? (amortize the fixed costs of medical tourism over a pool of individuals?)
- Build stronger and better and smarter norms
- around event culture,
- around dating
- I applaud Eliezer for funding a community matchmaker last year; I want more people trying ideas!
- Brainstorm over and over and over what institutions may help with miracles, and then build them. Offer large cash prizes for ideas which top thinkers agree is valuable.
Of course, we also want an AI alignment training pipeline full of brilliant bright-eyed newcomers, with obvious improvements to the pipeline rewarded by cash prizes (or a prediction market-based scheme). We list the possible miracles by probability, and find the prerequisites to seizing them, and then train a distribution of researchers according to these expected prerequisites. We train an army of alignment researchers—even if we don’t see any good research pathways now, because that can change later.
And yet it seems to me like Eliezer himself does not encourage some important preparations. In this comment and in other hearsay I am privy to, he has little advice to offer prospective alignment researchers, which I think is because he is unsure what will help.
Well, what he says about alignment is his business. But it’s also my business, since I live on this planet and share its fate. And even if I agreed completely with Eliezer, I think I would rather give advice like:
I think the situation is very bleak, and assess that you individually have a low probability of saving the world. I think you should know that before starting. That said, if you are still interested in helping, if you have something to protect—you should master deep learning and the science of intelligence, and you should practice the art of rationality with all the intensity and honesty you can muster.
If you want to do alignment theory, try the AI safety camp. Or maybe see if Chris Olah could use some help. And perhaps, one day, we will have a worthy alignment scheme which urgently needs the expertise which you can develop today.
As I see things, this advice is both honest and pragmatic—it is useful in many “miracle” worlds. Wouldn’t it be undignified to later have a massive breakthrough in alignment theory, and yet not have enough hands on deck to derive enough corollaries and evaluate alignment schemes derived thereby?
As for the individual—
- If you are having a hard time right now—that is OK; please do not use your anxiety/guilt as a cudgel to beat yourself into yet deeper anxiety/guilt. Behind these posts is a person who also sometimes has a hard time.
- If you feel that you should do more to help—not due to a deep yearning, but because you’d be bad if you didn’t do more—please read Replacing Guilt. I do not want bedraggled people pushing themselves yet harder, working even more hours, in order to scrap out a win. That’s unhealthy, and not an awesome or fun way to treat yourself.
- It’s hard to advise other people. I’m going to talk about what I want from myself. Translate accordingly.
Along those lines, I want to become as strong as I can. Maybe it won’t be enough to win, but maybe it will. And hot damn, do I want to become strong. Because it’s awesome, because it’s fun, and because I want to win.
Anyways, what do I mean by “strong”? Do I mean that I’ve absorbed lots of textbooks, that I know lots of facts, that I have lots of impressive skills? No. Here, “strong” means strong at the basics of thinking:
Everything inside Keltham's mind has a very trained feeling to it, his moment-to-moment thought-motions each feeling like a punch that a monk throws after twelve years of experience in martial arts, when the monk isn't particularly focused on showing off and simply knows what he's doing without thinking about it.
Mad Investor Chaos
I think that many people think that becoming more rational involves being harder on yourself; imagining yet fiercer and nitpickier critics, and coming out unscathed anyways because you were so paranoid. You checked more boxes, defended against more biases, and examined the evidence even harder.
That’s... not how it works. I may write elsewhere about how I have found it to work. But I will say this: More is possible.
I used to think I had absorbed the Sequences, that I had acquired most of the available art of rationality, and the rest I would have to build or experience for myself. I was wrong. Early this year, my brain got pinged in just the right way by Eliezer’s Mad Investor Chaos fiction, and—click!
- Before, I had declarative knowledge of biases and knew how they felt from the inside; I knew the basic math of probability theory; I had experience resolving internal motivational conflicts.
- After, I had an intuitive sense for (extremely basic) probability theory, a bright standard in my mind against which I compare my thoughts. “Oh, I believe that quite strongly. Why? Where did I get the evidence-fuel for the strong previous-update likelihood ratio implied by this present credence?” These concepts feel atomic to me, even though they’re clumsy to write, and eg the odds form of Bayes’ rule feels like a primitive mental operation.
I could see the ways in which my mental footwork is sloppy. The sloppiness made me realize how unoptimized my thought processes are—I had never optimized that!—and how much more I could learn. And so I began, and now I have a small part of me which is increasingly consequentialist and Bayesian, a part which I can call upon for strength and clarity.
And this is what I have done in a few months. Who could I grow into in ten years? I want to become strong, and I want to search for plans which win. Because trying to die in a “dignified” way is not a wise strategy (for people like me, at least).
Against motivation via dignity points
When Earth's prospects are that far underwater in the basement of the logistic success curve, it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%.
That's why I would suggest reframing the problem - especially on an emotional level - to helping humanity die with dignity, or rather, since even this goal is realistically unattainable at this point, die with slightly more dignity than would otherwise be counterfactually obtained.
So don't get your heart set on that "not die at all" business. Don't invest all your emotion in a reward you probably won't get. Focus on dying with dignity - that is something you can actually obtain, even in this situation. After all, if you help humanity die with even one more dignity point, you yourself die with one hundred dignity points! Even if your species dies an incredibly undignified death, for you to have helped humanity go down with even slightly more of a real fight, is to die an extremely dignified death.
But if enough people can contribute enough bits of dignity like that, wouldn't that mean we didn't die at all? Yes, but again, don't get your hopes up. Don't focus your emotions on a goal you're probably not going to obtain. Realistically, we find a handful of projects that contribute a few more bits of counterfactual dignity; get a bunch more not-specifically-expected bad news that makes the first-order object-level situation look even worse (where to second order, of course, the good Bayesians already knew that was how it would go); and then we all die.
Again, I am not, in this post, disputing Eliezer’s object-level model. I have supposed he is correct about our probable doom. Obviously, if we are in a probably-doomed world, I will keep that in mind. I do actually want to win, and finding winning plans requires entangling my brain with the details of each expected danger. Sharp danger awaits when you lose sight of the fact that reality is allowed to kill you.
However—if you work at all like I do, I think this is not how you should interface with yourself or your motivational system. It is like saying:
This vault door was professionally secured, and you are no professional burglar. Yes, your mother is starving inside, but you are unlikely to open the door before she dies. Therefore, you should search for dignified plans—plans which let you seize miracles in worlds where the door was installed incorrectly. But please don’t expect to actually open the door. Don’t get your hopes up.
A search for dignified plans is different from a search for plans which get my mother out of the damn vault. I can, in fact, conduct the latter search while still remembering how unlikely I am to actually open the vault, and the latter search has a better chance of actually finding dignified plans!
Want to try to make a million dollars? Buy a lottery ticket. Your odds of winning may not be very good, but you did try, and trying was what you wanted. In fact, you tried your best, since you only had one dollar left after buying lunch. Maximizing the odds of goal achievement using available resources: is this not intelligence?
It’s only when you want, above all else, to actually flip the switch—without quotation and without consolation prizes just for trying—that you will actually put in the effort to actually maximize the probability.
But if all you want is to “maximize the probability of success using available resources,” then that’s the easiest thing in the world to convince yourself you’ve done. The very first plan you hit upon will serve quite well as “maximizing”—if necessary, you can generate an inferior alternative to prove its optimality. And any tiny resource that you care to put in will be what is “available.” Remember to congratulate yourself on putting in 100% of it!
Don’t try your best. Win, or fail. There is no best.
I hope you do not let this “dignity” orientation cloud your intent to win. In fact, I think you should lean in the opposite direction, and sharpen your intent. Search only for plans which actually win, given your best understanding of the AI alignment problem. Be honest with yourself; do not flinch away from reality; do not take your eyes off the goal.
I’m not going to waste my time searching for dignified plans (which maximize humanity's probability of survival). Because I do have a mother in that vault, and a father, and a brother. In fact, there’s a whole damn planet in there. It’s my home, and it’s yours, too. And if we do stare down defeat together—let’s make that remaining time valiant and exciting and awesome.
We cannot fight at maximum all the time, and some times are more important than others. (Namely, when the logistic success curve seems relatively more sloped; those times are relatively more important.)
This is a good point. I am not advocating burnout. This is indeed a resource to conserve for situations closer to the 50% success rate, or for a targeted push at a particularly impactful moment. I am advocating growth and development in a way which is fun and awesome, which pushes limits without straining endurance. Perhaps this is not how most people work. But it’s how I can work.
Furthermore, do not conflate feelings with beliefs. You do not have to believe the future is rosy in order to feel good and strong and healthy and to give the problem everything you’ve got (without pushing to burnout). Feelings are not beliefs! I think of certain feelings as (harder-to-control) actions.
The point is not that Eliezer said “do not ‘try your best’” and now he has pivoted away from win-oriented-thinking, aha, gotcha!—The point is that I think the original orientation is wise and healthy and strengthening, and the new orientation is not.
I agree with most of this, but as I mentioned re playing to your outs there's a failure mode where in my zeal to make an extraordinary effort, sharpen my intent and win, I forget that I see only a small part of the picture.
I trust that you're wise enough not to fail this way - but such wisdom is not universal.
Here's some scenarios I cooked up because I want to see your reaction now that you've got probability sight.
Scenario A: You're transported to another world. Based off some quick calculations/experiments you perform (you're lucky you had a near magically powerful laser and telescope on you, plus your pockit), it seems to be roughly similair to earth in geological features. You then come across what looks like a building, human sized. Outside are a giant (3.5 m tall) and a humanoid (around 1.5 m tall). Shape wise, they're similair, but they're covered up. What's the probability that they're off the same race, and their species is not significantly polymorphic? How'd you update your evidence on seeing a bunch more of each size wandering around the forest?
I feel like after a couple of samples, I could narrow down to a few hypothesis and have something like confidence in what my probabilities are, what my updates are like, because I can just use a Gaussian vs. a sum of Guassians to guess probabilities. I can see how they shift quite well.
Scenario B: You are transported to another world with your technical doodads. Looking around you, you find yourself in a plain stretching as far as the eye can see. You walk around for a couple of days and nothing changes. You're terrified of vast expanses of water. Unfortunately, you are unmoored from the digital sea and can't just check how far fast they're rising, the tidal range or even where they are. What's the probability that you're x-meters above sea-level, supposing this planet even has oceans?
I don't get how I'd do much better than your descprition of vague thinking here. I'm not sure how you would either. Like, you could maybe try checking humidity levels, the amount of mist you've got, how cold the wind feels or so on. But it feels nowhere near as crisp as the prior example feels like it would likely be, since the sparsity of evidence and my understanding of what I can see, just kind of makes me shrug at stuff. Like "oh, there's some clouds there. If I knew meterology, I bet I that could be evidence of where those jumped up puddles are. But I don't, so I'm now in a state of constant angst about what will appear over the horizon."
One of the things probability-sight tells me is how constraining my models are. One of the benefits of learning more fields is being able to extract sharper likelihood ratios from the same evidence. Here, my likelihood ratios are pretty unsharp. And also, I can feel I'm entering an unrefined conditional distribution of my beliefs, where conditioning on world transport and also earth-similarity produces something pretty strange.
My gut tells me that such large variation in size is rare. And I have cached thoughts about the stress which extreme height places on bones?
However, what are the alternatives? The probability of convergently evolving the same shape in parallel is small. If they could genetically modify themselves or even just customize their morphology using more exotic tech, it seems even less likely that they would stick to two morphologies (which I see wandering around the forest). (I've recently updated my thinking on how evolved advanced minds generally will work, and that model suggests that their preferences would probably be very compatible with variety, even though they probably aren't human preferences.)
Of the hypotheses predicting they are of the same species, extreme sexual dimorphism seems to have the highest posterior probability. Something about this seems wrong to me, or rare, which suggests that there's some gut evidence I haven't yet consciously incorporated. There are also more conjunctive possibilities like "they have culture and also exotic morphology modification tech but there are ~two acceptable morphologies", but this basically feels like a garbage just-so story that I'd need way more evidence to properly elevate to attention.
I think maybe if I had sharper models of evolutionary history, I'd see a sharper (perhaps Gaussian) form like you do. My other hypothesis is that this conditional distribution is really weird and I'd be surprised if you could narrow down to that shape so quickly.
(Written before reading your answer) Ah, another area I don't know much about. Time for more qualitative reasoning. I think that I'm most curious about what is on the plain. Is there life? Presumably since I'm walking for days, there is some level of humidity, which suggests a water cycle, I think? And in that case, there are probably oceans. And if I can eyeball the soil composition, I could estimate the expected last time where a flood / rainfall occurred, and so that will give me some information about the altitude.
I don't know how altitude affects rain frequency (supposing the atmospheric dynamics are even remotely similar to Earth's), but under the flooding-is-possible hypotheses, observing "water has not touched this soil in a long time" represents a weak-to-moderate likelihood ratio against close-to-sea-level hypotheses. (With the weak-to-moderate from the unknown variance of sea level and of storms in this part of the world.)
Is everyone here properly aware of anthropics? e.g. that correctly ordered neurons for human intelligence might have had a 1-in-a-quadrillion chance of ever evolving naturally. But it would still look like a probable evolutionary outcome to us, because that is the course evolution must have taken in order for us to be born.
All the "failed intelligence" offshoots like mammals and insects would still be generated either way, it's just a question of how improbably difficult it is to replicate the remaining milestones are between them and us. Notably, lesser-brained lifeforms appear to be much more successful e.g. insects and then plants, and the recent neural networks were made by plagarizing the neuron, which is the most visible and easily copied part of the human brain.
It's only a possibility, but I don't see why it isn't doing more to push timelines outward.
See https://www.nickbostrom.com/aievolution.pdf for a discussion about why such arguments probably don't end up pushing timelines forward that much.
One thing to note in general is that AFAICT anthropic hypotheses take huge penalties compared to non-anthropic hypotheses, depending on how much anthropic lifting is required to explain our observations.
This could be the case. However, my instinct is that human intelligence is only incrementally higher than other animals. Sure, we crossed a threshold that allowed us to accomplish great things (language, culture, specialization), but I would honestly be shocked if you told me that evolution was incapable of producing another similarly intelligent species if it started from the baseline intelligence of, say wolves, or crows. If there is a "1-in-a-quadrillion chance" somewhere in our history, I expect that filter to be much further back than the recent evolution of hominids.
I don't have research to back this up. Just explaining why I personally wouldn't push timelines back significantly based on the anthropic principle.