Success without dignity: a nearcasting story of avoiding catastrophe by luck

[-]Hastings3y243

A low quality prior on odds of lucky alignment: we can look at the human intelligence sharp left turn from different perspectives

Worst case scenario S risk: pigs, chickens, cows

X risk: Homo florensis, etc

Disastrously unaligned but then the superintelligence inexplicably started to align itself instead of totally wiping us out: Whales, gorillas

unaligned but that's randomly fine for us: raccoons, rats

Largely aligned: Housecats

[-]cousin_it3y2-2

X risk would be passenger pigeons, no?

Anyway your comment got me thinking. So far it seems the territory colonized by humans is a subset of the territory previously colonized by life, not stretching beyond it. And the territory covered by life is also not all of Earth, nevermind the universe. So we can imagine AI occupying the most "cushy" subset of former human territory, with most humans removed from there, some subsisting as rats, some as housecats, some as wild animals periodically hit by incomprehensible dangers coming from the AI zone (similar to oil spills and habitat destruction), and some in S-risk type situations due to the AI remaining concerned with humans in some way.

Though this "concentric circles" model is maybe a bit too neat to imagine, and too similar to existing human myths about gods and so on. So let's not trust it too much.

[-]DanArmak3y42

So we can imagine AI occupying the most "cushy" subset of former human territory

We can definitely imagine it - this is a salience argument - but why is it at all likely? Also, this argument is subject to reference class tennis: humans have colonized much more and more diverse territory than other apes, or even all other primates.

Once AI can flourish without ongoing human support (building and running machines, generating electricity, reacting to novel environmental challenges), what would plausibly limit AI to human territory, let alone "cushy" human territory? Computers and robots can survive in any environment humans can, and in some where we at present can't.

Also: the main determinant of human territory is inter-human social dynamics. We are far from colonizing everywhere our technology allows, or (relatedly) breeding to the greatest number we can sustain. We don't know what the main determinant of AI expansion will be; we don't even know yet how many different and/or separate AI entities there are likely to be, and how they will cooperate, trade or conflict with each other.

[-]Taleuntum3y21

loved this!

[-]Lukas_Gloor3y73

I think “Luck could be enough” should be the strong default on priors,² so in some sense I don’t think I owe tons of argumentation here (I think the burden is on the other side).

I agree with this being the default and the burden being on the other side. At the same time, I don't think of it as a strong default.

Here's a frame that I have that already gets me to a more pessimistic (updated) prior:

It has almost never happened that people who developed and introduced a revolutionary new technology displayed a lot of foresight about its long-term consequences. For instance, there were comparatively few efforts at major social media companies to address ways in which social media might change society for the worse. The same goes for the food industry and the obesity epidemic or online dating and its effects on single parenthood rates. When people invent cool new technology, it makes the world better on some metrics but creates new problems on its own. The whole thing is accelerating and feels out of control.

It feels out of control because even if we get cool new things from tech progress, we don't seem to be getting any better at fixing the messiness that comes with it (misaligned incentives/goodhearting, other Molochian forces, world-destroying tech becoming ever more accessible). Your post says "a [] story of avoiding catastrophe by luck." This framing makes it sound like things would be fine by default if it isn't for some catastrophe happening. However, humans have never seemed particularly "in control" over technological progress. For things to go well, we need the opposite of a catastrophe – a radical change towards the upside. We have to solve massive coordination problems and hope for a technology that gives us god-like power, finally putting sane and compassionate forces in control over the future. It so happens that we can tell a coherent story about how AI might do this for us. But to say that it might go right just by luck – I don't know, that seems far-fetched!

All of that said, I don't think we can get very far arguing from priors. What carries by far the most weight are arguments about alignment difficulty, takeoff speeds, etc. And I think it's a reasonable view to say that it's very unlikely that any researchers currently know enough to make highly confident statements about these variables. (Edit: So, I'm not sure we disagree too much – I think I'm more pessimistic about the future than you are, but I'm probably not as pessimistic as the position you're arguing against in this post. I mostly wanted to make the point that I think the "right" priors support at least moderate pessimism, which is a perspective I find oddly rare among EAs.)

FWIW, it's not obvious to me that slow takeoff is best. Fast takeoff at least gives you god-like abilities early on, which are useful from a perspective of "we were never particularly in control over history; lots of underlying problems need fixing before we pass a point of no return." By contrast, with slow takeoff, coordination problems seem more difficult because (at least by default) there will be more actors using AIs in some ways or other and it's not obvious that the AIs in a slow-takeoff scenario will be all that helpful at facilitating coordination.

[-]RogerDearnaley2y*63

My view is that we've already made some significant progress on alignment, compared to say where we were O(15) years ago, and have also had some unexpectedly lucky breaks. Personally I'd list:

Value learning, as a potential solution to issues like corrigibility and the shut-down problem.
Once your value learner is a STEM-capable AGI, then doing or assisting with alignment research becomes a convergent instrumental strategy for it.
The closest thing we currently have to an AGI, LLMs, are fortunately not particularly agentic, they're more of a tool AI (until you wrap them in a script to run them in a loop with suitable prompts).
To be more specific, for the duration of generating a specific document (at least before RLHF), an LLM emulates the output of a human or humans generating text, so to the extent that they pick up/emulate agentic behavior from us, it's myopic past the end of document, and emulates some human(s) who have contributed text to their training set. Semi-randomly-chosen humans are a type of agent that humans are unusually good at understanding and predicting. The orthogonality thesis doesn't apply to them: they will have an emulation of some version of human values. Like actual random humans, they're not inherently fully aligned, but on average they're distinctly better than paperclip maximizers. (Also both RLHF and prompts can alter the random distribution.)
While human values are large and fragile, LLMs are capable of capturing fairly good representations of large fragile things, including human values. So things like constitutional RL work. That still leaves concerns about what happens when we apply optimization pressure or distribution shifts to these representations of human values, but it's at least a lot better than expecting us to hand-craft a utility function for the entire of human values in symbolic form. If we could solve knowing when an LLM representation of human values was out-of distribution and not reliable, then we might actually have a basis for an AGI-alignment solution that I wouldn't expect to immediately kill everyone. (For example, it might make an acceptable initial setting to preload into an AGI value learner that could then refine it and extend its region of validity.) Even better, knowing when an LLM isn't able to give a reliable answer is a capabilities problem, not just an alignment problem, since it's the same issue as getting an LLM to reply "I don't know" when asked a question to which it would otherwise have hallucinated a false answer. So all of the companies buying and selling access to LLMs are strongly motivated to solve this. (Indeed, leading LLM companies appear to have made significant progress on reducing hallucination rates in the last year.)

This is a personal list and I'm sure will be missing some items.

That we've made some progress and had some lucky breaks doesn't guarantee that this will continue, but it's unsurprising to me that

alignment research in the context of a specific technology that we can actually experiment with is easier than trying to do alignment research in abstract for arbitrary future systems, and that
with more people interested in alignment research we're making progress faster.

[-]baturinsky3y*52

One of the most dangerous thing that even one misaligned AI could theoreticaly pull, is to successfully launch a misaligned Von Neumann probe. Because then it would be extremely hard to track it down in space and stop before it will do it's thing.

[-]MilesTS2y30

What about quickly launching a missile following its trajectory using the same technology? The probe eventually needs to slow down to survive impact and the missile doesn't so preventing Von Neumann probes seems fairly straightforward to me. My understanding is that tracking objects in space is very easy unless they've had time to cool to near absolute zero.

On the other hand, this requires a misaligned AI was able to build such a probe and get it on a rocket it built or commandeered without being detected or stopped. That rules out safety via monitoring (and related approaches) and we would need to rely on it being essentially aligned anyway (such as via the "natural generalizations" Holden mentioned).

[-]Noosphere891y4-9

I'd say the biggest thing that happened to make this scenario plausible is that we learned a few very important things about alignment that makes our lives easier:

It's looking like deep learning and AI in general naturally generalizes much better in respect to alignment than a lot of LWers thought several years ago.

A lot of the reason for that comes down to people underestimating how easy values data is to learn and underestimating how hard it is to learn a lot of useful capabilities, and more generally underestimating the influence of your data sources on your values.

More generally, I think LWers had a habit of overestimating the need for insights and underestimate the need for engineering work in alignment.

Indeed, a central faultline writ the entire LessWrong idea is my general view that insights in your head are far less necessary than experimenting/engineering work to solve a lot of problems, including alignment problems.

The link for it is down below:

https://www.beren.io/2024-05-15-Alignment-Likely-Generalizes-Further-Than-Capabilities/

2. I believe this:

Some relatively cheap, easy, “scalable” solution to AI alignment (the sort of thing ARC is currently looking for) is developed and becomes widely used.

Is actually pretty likely, and I'd argue we even have the rough outlines of how to do it, which is essentially make large, curated synthetic datasets about human values, and make the AI internalize what a human values before it can be deceptive/seek power.

Contra @RogerDearnaley, I think the evidence is more like "human values are less fragile and have less large of a complexity", because remember that the human values data is only a small part of it's training data, and LLMs are genuinely beyond look-up tables, and are actually discovering new regularities (the evidence for it will be shown below), so the main takeaway is that the human value function is both simpler and less fragile than people thought 15-20 years ago:

https://www.lesswrong.com/posts/kixewxJfuZ23DQDfF/how-should-deepmind-s-chinchilla-revise-our-ai-forecasts#4__LLMs_are__intelligent__

One final takeaway is that a lot of AI progress, as well as alignment progress, is essentially the revenge of the blank slate view popularly held in the 20th century, and while the strict form is disproven, we got ourselves quite close to it, and since AI and human brains are quite similar (my thread is below), AI progress also has implications for neuroscience of the human brain, and I have my own tentative takes on what AI progress means for human brains.

https://x.com/SharmakeFarah14/status/1837528997556568523

[-]RogerDearnaley1y*42

I think human values have a very simple and theoretically predictable basis: they're derived from a grab-bag of evolved behavioral, cognitive and sensory heuristics which had a good cost/performance ratio for maximizing our evolutionary fitness (mostly on the Savannah). So the basics of some of them are really easy to figure out: e.g. "Don't kill everyone!" can be trivially derived from Darwinian first principles (and would equally apply to any other sapient species). So I think modelling human values to low (but hopefully sufficient for avoiding X-risk) accuracy is pretty simple. E.g. if the there was a guide for alien zookeepers (who were already familiar with Terran biochemistry) on how to keep humans, how long would that need to be for the humans to mostly survive in captivity? I'm guessing a single textbook could do a good job of this, maybe even just a long chapter in a textbook.

However, I think there is a lot more complexity in the finer/subtler details, much of which is biological in nature, starting with the specific grab-bag of heuristics that evolution happened to land on and their tuning, then with even more sociological/cultural/historical complexity layered on top. So where I think the complexity ramps up a lot is if you want to do a really good job of modelling human values accurately in all their detail, as we would clearly prefer our ASIs to do. If you look through the Dewey Decimal system, roughly half the content of any general-purpose library is devoted to sub-specialities of "how to make humans happy". However, LLMs are good at learning large amounts of complex, nuanced information. So an LLM knowing how to make humans happy in a lot of detail is not that surprising: in general, modern LLMs display detailed knowledge of this material.

The challenging part is ensuring that an LLM-powered agent cares about making humans happy, more than, say, a typical human autocrat does. Base model LLMs are "distilled" from many humans, so they absorb humans' capability for consideration for others, and also humans' less aligned traits like competitiveness and ambition. The question then is how to ensure which of these dominate, and how reliably, in agents powered by an instruct-trained LLM.

[-]Noosphere891y*20

I think the key crux is this in my view is basically unnecessary:

However, I think there is a lot more complexity in the finer/subtler details, most of which is biological in nature, starting with the specific grab-bag of heuristics that evolution happened to land on and their tuning, with even more sociological/cultural/historical complexity layered on top. So where I think the complexity ramps up a lot is if you want to do a really good job of modelling human values accurately, as we would clearly prefer our ASIs to do.

@Steven Byrnes talks about how the mechanisms used in human brains might be horrifically complicated, but that the function is simple enough that you can code it quite well and robustly for AIs, and my difference from @Steven Byrnes is that I believe that this basically also works for the things that make humans have values, like the social learning parts of our brains.

Thus it's a bit of a conditional claim, in that either the mechanism used in human brains is also simple, or that we can simplify it radically to preserve the core function while discarding the unnecessary (in my view) complexity, and that's the takeaway I have from LLMs learning human values.

Link and quote below:

In other words, the brain's implementation of that thing can be super-complicated, but the input-output relation cannot be that complicated—at least, the useful part of the input-output relation cannot be that complicated.
The crustacean stomatogastric ganglion central pattern generators discussed above are a great example: their mechanisms are horrifically complicated, but their function is simple: they create a rhythmic oscillation. Hey, you need a rhythmic oscillation in your AGI? No problem! I can do that in one line of Python.

https://www.lesswrong.com/posts/PTkd8nazvH9HQpwP8/building-brain-inspired-agi-is-infinitely-easier-than#If_some_circuit_in_the_brain_is_doing_something_useful__then_it_s_humanly_feasible_to_understand_what_that_thing_is_and_why_it_s_useful__and_to_write_our_own_CPU_code_that_does_the_same_useful_thing_

Also, a question for this quote is what's the assumed capability/compute level used in this thought experiment?

E.g. if the there was an guide for alien zookeepers (ones already familiar with Terran biochemistry) on how to keep humans, how long would it need to be for the humans to mostly survive?

[-]RogerDearnaley1y20

I basically agree, for three reasons:

The level of understanding of and caring about human values required to not kill everyone and be able to keep many humans alive, is actually pretty low (especially on the knowledge side).
That's also basically sufficient to motivate wanting to learn more about human values, and being able to, so then the Value Learning process then kicks in: a competent and caring alien zookeeper would want to learn more about their charges' needs.
We have entire libraries half of whose content is devoted to "how to make humans happy", and we already fed most of them into our LLMs as training material. On a factual basis, knowing how to make humans happy in quite a lot of detail (and for a RAG agent, looking up details they don't already have memorized) is clearly well within their capabilities. The part that concerns me is the caring side, and that's not conceptually complicated: roughly speaking, the question is how to ensure an agent's selfless caring for humans is consistently a significantly stronger motivation than various bad habits like ambition, competitiveness, and powerseeking that it either picked up from us during the "distillation" of the base model, and/or learnt during RL training.

Also, a question for this quote is what's the assumed capability/compute level used in this thought experiment?
E.g. if the there was an guide for alien zookeepers (ones already familiar with Terran biochemistry) on how to keep humans, how long would it need to be for the humans to mostly survive?

ASI, or high AGI: capable enough that we've lost control and alignment is an existential risk.

[-]Noosphere891y40

ASI, or high AGI: capable enough that we've lost control and alignment is an existential risk.

Then the answer is probably kilobytes to megabytes, but at any rate the guide for alien zookeepers can be very short, and that the rest can be learned from data.

I like your point that humans aren't aligned, and while I'm more optimistic about human alignment than you are, I agree that the level of human alignment currently is not enough to make a superintelligence safe if it only had human levels of motivation/reliability.

Weirdly enough, I think getting aligned superintelligence is both harder and easier than you are, and I'm defining alignment like you, in which we could have a superintelligence deployed into the world that cared at least for humans totally and doesn't need restraints on it's power like law enforcement or government of superintelligences.

The thing that makes alignment harder is I believe achieving FOOM for AIs, while unlikely, isn't obviously impossible, and I believe right around the cusp when AIs start to automate research without humans in the loop is when I suspect a whole lot of algorithmic progress will be done, and the only real bottlenecks are power and physical interfaces like robotics, and if these are easy/very easy to solve, I see fast FOOM as being very plausible.

The thing that makes alignment easier is that currently, alignment generalizes more than capabilities, which is good for us, and it's looking like influencing an AI's values through it's data is far easier than making it have great capabilities like being an autonomous researcher for deep reasons, which means we could get by on smaller data quantities assuming very high sample efficiency:

> In general, it makes sense that, in some sense, specifying our values and a model to judge latent states is simpler than the ability to optimize the world. Values are relatively computationally simple and are learnt as part of a general unsupervised world model where there is ample data to learn them from (humans love to discuss values!). Values thus fall out mostly’for free’ from general unsupervised learning. As evidenced by the general struggles of AI agents, ability to actually optimize coherently in complex stochastic ‘real-world’ environments over long time horizons is fundamentally more difficult than simply building a detailed linguistic understanding of the world.

Link below:

https://www.beren.io/2024-05-15-Alignment-Likely-Generalizes-Further-Than-Capabilities/

I think that we agree on a lot, and only really disagree on how much data is necessary for a good outcome, if at all we disagree.

[-]RogerDearnaley1y42

I like your point that humans aren't aligned, and while I'm more optimistic about human alignment than you are, I agree that the level of human alignment currently is not enough to make a superintelligence safe if it only had human levels of motivation/reliability.

The most obvious natural experiments about what humans do when they have a lot of power with no checks-and-balances are autocracies. While there are occasional examples (such as Singapore) of autocracies that didn't work out too badly for the governed, they're sadly few and far between. The obvious question then is whether "humans who become autocrats" are a representative random sample of all humans, or if there's a strong selection bias here. It seems entirely plausible that there's at least some selection effects in the process of becoming an autocrat. A couple of percent of all humans are sociopaths, so if there were a sufficiently strong (two orders of magnitude or more) selection bias, then this might, for example, be a natural experiment about the alignment properties of a set of humans consisting mostly of sociopaths, in which case it usually going badly would be unsurprising.

The thing that concerns me is the aphorism "Power corrupts, and absolute power corrupts absolutely". There does seem to be a strong correlation between how long someone has had a lot of power and an increasing likelihood of them using it badly. That's one of the reasons for term limits in positions like president: humans seem to pretty instinctively not trust a leader after they've been in a position of a lot of power with few check-and-balances for roughly a decade. The histories of autocracies tend to reflect them getting worse over time, on decade time-scales. So I don't think the problem here is just from sociopaths. I think the proportion of humans who wouldn't eventually be corrupted by a lot of power with no checks-and-balances may be fairly low, comparable to the proportion of honest senior politicians, say.

How much of this argument applies to ASI agents powered by LLMs "distilled" from humans is unclear — it's much more obviously applicable to uploads of humans that then get upgraded to super-human capabilities.

[-]Noosphere891y42

IMO, there are fairly strong arguments that there is a pretty bad selection effect for people who aim to get into power generally being more Machiavellian/Sociopathic than other people, and at least part of the problem is that the parts of your brain that cares about other people gets damaged when you gain power, which is obviously not good.

But still, I agree with you that an ASI that can entirely run society while only being as aligned as humans are to very distant humans likely ends up in a very bad state for us, possibly enough to be an S-risk or X-risk (I currently see S-risk being more probable than X-risk for ASI if we only had human-level alignment to others.)

[-]Bogdan Ionut Cirstea1y30

I think interpretability looks like a particularly promising area for “automated research” - AIs might grind through large numbers of analyses relatively quickly and reach a conclusion about the thought process of some larger, more sophisticated system.

Arguably, this is already starting to happen (very early, with obviously-non-x-risky systems) with interpretability LM agents like in FIND and MAIA.

[-]Bogdan Ionut Cirstea1y30

Short-horizon tasks (e.g., fixing a problem on a Linux machine or making a web server) were those that would take less than 1 hour, whereas long-horizon tasks (e.g., building a web app or improving an agent framework) could take over four (up to 20) hours for a human to complete.
[...]
The Purple and Blue models completed 20-40% of short-horizon tasks but no long-horizon tasks. The Green model completed less than 10% of short-horizon tasks and was not assessed on long-horizon tasks³. We analysed failed attempts to understand the major impediments to success. On short-horizon tasks, models often made small errors (like syntax errors in code). On longer horizon tasks, models devised good initial plans but did not sufficiently test their solutions or failed to correct initial mistakes. Models also sometimes hallucinated constraints or the successful completion of subtasks.
Summary: We found that leading models could solve some short-horizon tasks, such as software engineering problems. However, no current models were able to tackle long-horizon tasks.

Like >10% ↩
Since another way of putting it is: “AI takeover (a pretty specific event) is not certain (conditioned on the ‘minimal-dignity’ conditions above, which don’t seem to constrain the future a ton).” ↩
Phase 1 in this analysis ↩
Phase 2 in this analysis ↩
I think there are alternative ways things could go well, which I’ll cover in the relevant section, so I don’t want to be stuck with a “pivotal acts” frame. ↩
Salient possible additions to today’s models:
- Greater scale (more parameters, more pretraining)
- Multimodality (training the same model on language + images or perhaps video)
- Memory/long contexts: it seems plausible that some relatively minor architectural modification could make today’s language models much better at handling very long contexts than today’s cutting-edge systems, e.g. they could efficiently identify which parts of an even very long context ought to be paid special attention at any given point. This could imaginably be sufficient for them to be “taught” to do tasks, in roughly the way humans are (e.g., I might give an AI a few examples of a successfully done task, ask it to try, critique it, and repeat this loop over the course of hundreds of pages of “teaching” - note that the “teaching” is simply building up a context it can consult for its next step, it is not using gradient descent).
- Scaffolding: a model somewhat like today’s cutting-edge models could be put in a setting where it’s able to delegate tasks to copies of itself. Such tasks might include things like “Think about how to accomplish X, and send me some thoughts” and “That wasn’t good enough, think more please.” In this way, it could be able to vary the amount of “thought” and effort it puts into different aspects of its task. It could also be given access to some basic actuators (shell access might be sufficient). None of this need involve further training, and it could imaginably give an AI enough of the functionality of things like “memory” to be quite capable.
It’s not out of the question to me that we could get to transformative AI with additions like this, and with the vast bulk of the training still just being generative pretraining. ↩
E.g., I think interpretability could be very useful for demonstrating danger, which could lead to a standards-and-monitoring regime, but such a regime would be a lot more “dignified” than the worlds I’m picturing in this post. ↩
I think interpretability is very appealing as something that large numbers of relatively narrow “automated alignment researchers” could work on. ↩
Debate-type setups seem like they would get harder for humans to adjudicate as AI systems advance; more advanced AI seems harder to red-team effectively without its noticing “tells” re: whether it’s in training; internal-state-based training seems more likely to result in “manipulating one’s own internal states” for more advanced AI; ↩
Byrnes’s post seems to assume there are relatively straightforward destruction measures that require draconian, scary “plans” to stop. (Contrast with my discussion here, in which AIs can be integrated throughout the economy in ways that makes it harder for misaligned AIs to “get off the ground” with respect to being developed, escaping containment and acquiring resources.)
- I don’t think this is the right default/prior expectation, given that we see little evidence of this sort of dynamic in history to date. (Relatively capable people who want to cause widespread destruction even at cost to themselves are rare, but do periodically crop up and don’t seem to have been able to effect these sorts of dynamics to date. Individuals have done a lot of damage by building followings and particularly via government power, but this seems very different from the type of dynamic discussed in Byrnes’s post.)
- One could respond by pointing to particular vulnerabilities and destruction plans that seem hard to stop, but I haven’t been sold on anything along these lines, especially when considering that a relatively small number of biological humans’ surviving could still be enough to stop misaligned AIs (if we posit that aligned AIs greatly outnumber misaligned AIs). And I think misaligned AIs are less likely to cause any damage if the odds are against ultimately achieving their aims.
- I note that Byrnes’s post also seems to assume that it’s greatly expensive and difficult to align an AI (I conjecture that it may not be, above).
↩
The latter, more dangerous possibility seems more likely to me, but it seems quite hard to say. (There could also of course be a hybrid situation, as the number and capabilities of AI grow.) ↩
I think optimizing for community epistemics has real downsides, both via infohazards/empowering bad actors and via reputational risks/turning off people who could be helpful. I wish this weren’t the case, and in general I heuristically tend to want to value epistemic virtue very highly, but it seems like it’s a live issue - I (reluctantly) don’t think it’s reasonable to treat “X is bad for community epistemics” as an automatic argument-ender about whether X is bad (though I do think it tends to be a very strong argument). ↩
E.g., working for an AI lab and speeding up AI (I plan to write more about this).
More broadly, it seems to me like essentially all attempts to make the most important century go better also risk making it go a lot worse, and for anyone out there who might’ve done a lot of good to date, there are also arguments that they’ve done a lot of harm (e.g., by raising the salience of the issue overall).
Even “Aligned AI would be better than misaligned AI” seems merely like a strong bet to me, not like a >95% certainty, given what I see as the appropriate level of uncertainty about topics like “What would a misaligned AI actually do, incorporating acausal trade considerations and suchlike?”; “What would humans actually do with intent-aligned AI, and what kind of universe would that lead to?”; and “How should I value various outcomes against each other, and in particular how should I think about hopes of very good outcomes vs. risks of very bad ones?”
To reiterate, on balance I come down in favor of aligned AI, but I think the uncertainties here are massive - multiple key questions seem broadly “above our pay grade” as people trying to reason about a very uncertain future. ↩
I’m a person who just doesn’t pretend to be emotionally scope-sensitive or to viscerally feel the possibility of impending doom. I think it would be hard to do these things if I tried, and I don’t try because I don’t think that would be good for anyone.
I like doing worthy-feeling work (I would be at least as happy with work premised on a “doomer” worldview as on my current one) and hanging out with my family. My estimated odds that I get to live a few more years vs. ~50 more years vs. a zillion more years are quite volatile and don’t seem to impact my daily quality of life much. ↩

LESSWRONG
LW

LESSWRONG
LW

85

Success without dignity: a nearcasting story of avoiding catastrophe by luck

85

85

The initial alignment problem

Basic countermeasures

The deployment problem

Some objections to this picture

Success without dignity

Notes