This article was written in ignorance of the alignment community’s reaction to Eliezer’s “Death with Dignity” post. The first part of this article responds to how I suspect some people reacted to that post, while the second part is my take on the post itself.
I write against defeatism; I write against decline; I write against that internal slumping that sneaks in on the coat-tails of bad news. I do not dispute the balance of evidence—are we doomed, or not? Let us simply assume that we live in a relatively doomed world: It’s very improbable that we solve AI alignment in time.
We have been taught what comes next, in this kind of story. Since, by assumption, we won’t receive a “happily ever after”, we infer we are in a tragedy. Realizing this, we are disappointed and sad. The fellowship breaks and scatters, its once-proud and vibrant members downtrodden. And then occurs a miracle which we could have turned to our advantage, but for our civilizational incompetence and our own stupidity—smart enough to build AI, yet too dumb to align it. And then the laugh track plays. And then we die! The end!
As AI_WAIFU said: “Fuck. That. Noise.”
We do not live in a story. We can, in fact, just assess the situation, and then do what makes the most sense, what makes us strongest and happiest. The expected future of the universe is—by assumption—sad and horrible, and yet where is the ideal-agency theorem which says I must be downtrodden and glum about it?
In response to how I suspect some people reacted to "Death with Dignity."
Suppose we have ten years left. Some may consider a very natural response—to retreat. Spend the remaining time with your family and your friends, working whatever job you want—just do whatever.
But... Um... This sounds like an awful plan? I’d feel like a lazy bum, like a cornered animal. And for those of us doing direct work, this plan sounds stupid. It sounds like a great way to throw away worlds where we would receive a miracle, and yet couldn’t do anything about it anymore because we all went home.
How would I like to see those ten years spent? I’m looking for plans which involve playing to win, which are fun and edifying, which would make me proud to be me and to be part of this community.
Ladies and gentlemen, let me mention a concept called instrumental convergence. It’s something I like thinking about sometimes. In many situations, for many goals, the best actions look similar: Gather resources, build up strength, develop yourself. Following this pattern, good plans in probably-doomed worlds look a lot like good plans in hopeful worlds.
Even if the probable-end is bearing down—imagine having a vibrant social scene anyways, where people are having fun anyways, developing deep friendships anyways, expecting more of themselves anyways, pushing each other to break their boundaries anyways. Because that’s awesome, because that’s fun, because that makes us stronger, because that positions us to seize miracles, because that’s the kind of thing that makes me proud, dammit.
Our civilization is incompetent. Gee, that sucks. But we don’t have to suck too, do we? If a society of people-like-us would really be so great, why don’t we show it?
- Set up communal institutions which actually have good incentives.
- Build out infrastructure for alignment research and training, so that smart young people want to try their hand at alignment research.
- Probably more people should skill up as managers, so we can have more capacity via high-quality organizations like Redwood Research. I don’t know how to do this, and cannot offer advice.
- Set up infrastructure in areas where our society is decrepit—can we do better
- on childcare?
- on healthcare? (amortize the fixed costs of medical tourism over a pool of individuals?)
- Build stronger and better and smarter norms
- around event culture,
- around dating
- I applaud Eliezer for funding a community matchmaker last year; I want more people trying ideas!
- Brainstorm over and over and over what institutions may help with miracles, and then build them. Offer large cash prizes for ideas which top thinkers agree is valuable.
Of course, we also want an AI alignment training pipeline full of brilliant bright-eyed newcomers, with obvious improvements to the pipeline rewarded by cash prizes (or a prediction market-based scheme). We list the possible miracles by probability, and find the prerequisites to seizing them, and then train a distribution of researchers according to these expected prerequisites. We train an army of alignment researchers—even if we don’t see any good research pathways now, because that can change later.
And yet it seems to me like Eliezer himself does not encourage some important preparations. In this comment and in other hearsay I am privy to, he has little advice to offer prospective alignment researchers, which I think is because he is unsure what will help.
Well, what he says about alignment is his business. But it’s also my business, since I live on this planet and share its fate. And even if I agreed completely with Eliezer, I think I would rather give advice like:
I think the situation is very bleak, and assess that you individually have a low probability of saving the world. I think you should know that before starting. That said, if you are still interested in helping, if you have something to protect—you should master deep learning and the science of intelligence, and you should practice the art of rationality with all the intensity and honesty you can muster.
If you want to do alignment theory, try the AI safety camp. Or maybe see if Chris Olah could use some help. And perhaps, one day, we will have a worthy alignment scheme which urgently needs the expertise which you can develop today.
As I see things, this advice is both honest and pragmatic—it is useful in many “miracle” worlds. Wouldn’t it be undignified to later have a massive breakthrough in alignment theory, and yet not have enough hands on deck to derive enough corollaries and evaluate alignment schemes derived thereby?
As for the individual—
- If you are having a hard time right now—that is OK; please do not use your anxiety/guilt as a cudgel to beat yourself into yet deeper anxiety/guilt. Behind these posts is a person who also sometimes has a hard time.
- If you feel that you should do more to help—not due to a deep yearning, but because you’d be bad if you didn’t do more—please read Replacing Guilt. I do not want bedraggled people pushing themselves yet harder, working even more hours, in order to scrap out a win. That’s unhealthy, and not an awesome or fun way to treat yourself.
- It’s hard to advise other people. I’m going to talk about what I want from myself. Translate accordingly.
Along those lines, I want to become as strong as I can. Maybe it won’t be enough to win, but maybe it will. And hot damn, do I want to become strong. Because it’s awesome, because it’s fun, and because I want to win.
Anyways, what do I mean by “strong”? Do I mean that I’ve absorbed lots of textbooks, that I know lots of facts, that I have lots of impressive skills? No. Here, “strong” means strong at the basics of thinking:
Everything inside Keltham's mind has a very trained feeling to it, his moment-to-moment thought-motions each feeling like a punch that a monk throws after twelve years of experience in martial arts, when the monk isn't particularly focused on showing off and simply knows what he's doing without thinking about it.
Mad Investor Chaos
I think that many people think that becoming more rational involves being harder on yourself; imagining yet fiercer and nitpickier critics, and coming out unscathed anyways because you were so paranoid. You checked more boxes, defended against more biases, and examined the evidence even harder.
That’s... not how it works. I may write elsewhere about how I have found it to work. But I will say this: More is possible.
I used to think I had absorbed the Sequences, that I had acquired most of the available art of rationality, and the rest I would have to build or experience for myself. I was wrong. Early this year, my brain got pinged in just the right way by Eliezer’s Mad Investor Chaos fiction, and—click!
- Before, I had declarative knowledge of biases and knew how they felt from the inside; I knew the basic math of probability theory; I had experience resolving internal motivational conflicts.
- After, I had an intuitive sense for (extremely basic) probability theory, a bright standard in my mind against which I compare my thoughts. “Oh, I believe that quite strongly. Why? Where did I get the evidence-fuel for the strong previous-update likelihood ratio implied by this present credence?” These concepts feel atomic to me, even though they’re clumsy to write, and eg the odds form of Bayes’ rule feels like a primitive mental operation.
I could see the ways in which my mental footwork is sloppy. The sloppiness made me realize how unoptimized my thought processes are—I had never optimized that!—and how much more I could learn. And so I began, and now I have a small part of me which is increasingly consequentialist and Bayesian, a part which I can call upon for strength and clarity.
And this is what I have done in a few months. Who could I grow into in ten years? I want to become strong, and I want to search for plans which win. Because trying to die in a “dignified” way is not a wise strategy (for people like me, at least).
Against motivation via dignity points
When Earth's prospects are that far underwater in the basement of the logistic success curve, it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%.
That's why I would suggest reframing the problem - especially on an emotional level - to helping humanity die with dignity, or rather, since even this goal is realistically unattainable at this point, die with slightly more dignity than would otherwise be counterfactually obtained.
So don't get your heart set on that "not die at all" business. Don't invest all your emotion in a reward you probably won't get. Focus on dying with dignity - that is something you can actually obtain, even in this situation. After all, if you help humanity die with even one more dignity point, you yourself die with one hundred dignity points! Even if your species dies an incredibly undignified death, for you to have helped humanity go down with even slightly more of a real fight, is to die an extremely dignified death.
But if enough people can contribute enough bits of dignity like that, wouldn't that mean we didn't die at all? Yes, but again, don't get your hopes up. Don't focus your emotions on a goal you're probably not going to obtain. Realistically, we find a handful of projects that contribute a few more bits of counterfactual dignity; get a bunch more not-specifically-expected bad news that makes the first-order object-level situation look even worse (where to second order, of course, the good Bayesians already knew that was how it would go); and then we all die.
Again, I am not, in this post, disputing Eliezer’s object-level model. I have supposed he is correct about our probable doom. Obviously, if we are in a probably-doomed world, I will keep that in mind. I do actually want to win, and finding winning plans requires entangling my brain with the details of each expected danger. Sharp danger awaits when you lose sight of the fact that reality is allowed to kill you.
However—if you work at all like I do, I think this is not how you should interface with yourself or your motivational system. It is like saying:
This vault door was professionally secured, and you are no professional burglar. Yes, your mother is starving inside, but you are unlikely to open the door before she dies. Therefore, you should search for dignified plans—plans which let you seize miracles in worlds where the door was installed incorrectly. But please don’t expect to actually open the door. Don’t get your hopes up.
A search for dignified plans is different from a search for plans which get my mother out of the damn vault. I can, in fact, conduct the latter search while still remembering how unlikely I am to actually open the vault, and the latter search has a better chance of actually finding dignified plans!
Want to try to make a million dollars? Buy a lottery ticket. Your odds of winning may not be very good, but you did try, and trying was what you wanted. In fact, you tried your best, since you only had one dollar left after buying lunch. Maximizing the odds of goal achievement using available resources: is this not intelligence?
It’s only when you want, above all else, to actually flip the switch—without quotation and without consolation prizes just for trying—that you will actually put in the effort to actually maximize the probability.
But if all you want is to “maximize the probability of success using available resources,” then that’s the easiest thing in the world to convince yourself you’ve done. The very first plan you hit upon will serve quite well as “maximizing”—if necessary, you can generate an inferior alternative to prove its optimality. And any tiny resource that you care to put in will be what is “available.” Remember to congratulate yourself on putting in 100% of it!
Don’t try your best. Win, or fail. There is no best.
I hope you do not let this “dignity” orientation cloud your intent to win. In fact, I think you should lean in the opposite direction, and sharpen your intent. Search only for plans which actually win, given your best understanding of the AI alignment problem. Be honest with yourself; do not flinch away from reality; do not take your eyes off the goal.
I’m not going to waste my time searching for dignified plans (which maximize humanity's probability of survival). Because I do have a mother in that vault, and a father, and a brother. In fact, there’s a whole damn planet in there. It’s my home, and it’s yours, too. And if we do stare down defeat together—let’s make that remaining time valiant and exciting and awesome.
We cannot fight at maximum all the time, and some times are more important than others. (Namely, when the logistic success curve seems relatively more sloped; those times are relatively more important.)
This is a good point. I am not advocating burnout. This is indeed a resource to conserve for situations closer to the 50% success rate, or for a targeted push at a particularly impactful moment. I am advocating growth and development in a way which is fun and awesome, which pushes limits without straining endurance. Perhaps this is not how most people work. But it’s how I can work.
Furthermore, do not conflate feelings with beliefs. You do not have to believe the future is rosy in order to feel good and strong and healthy and to give the problem everything you’ve got (without pushing to burnout). Feelings are not beliefs! I think of certain feelings as (harder-to-control) actions.
The point is not that Eliezer said “do not ‘try your best’” and now he has pivoted away from win-oriented-thinking, aha, gotcha!—The point is that I think the original orientation is wise and healthy and strengthening, and the new orientation is not.