Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a special post for short-form writing by steven0461. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.

This is where I'll put content that's too short for a whole post.

40 comments, sorted by Click to highlight new comments since: Today at 6:20 PM

I'd like to register skepticism of the idea of a "long reflection". I'd guess any intelligence that knew how to stabilize the world with respect to processes that affect humanity's reflection about its values in undesirable ways (e.g. existential disasters), without also stabilizing it with respect to processes that affect it in desirable ways, would already understand the value extrapolation problem well enough to take a lot of shortcuts in calculating the final answer compared to doing the experiment in real life. (You might call such a calculation a "Hard Reflection".)

Suppose you have an AI powered world stabilization regime. Suppose somebody makes a reasonable moral argument about how humanity's reflection should proceed, like "it's unfair for me to have less influence just because I hate posting on Facebook". Does the world stabilization regime now add a Facebook compensation factor to the set of restrictions it enforces? If it does things like this all the time, doesn't the long reflection just amount to a stage performance of CEV with human actors? If it doesn't do things like this all the time, doesn't that create a serious risk of the long term future being stolen by some undesirable dynamic?

Inexorability of AI-enacted events doesn't intrude on decisions and discoveries of people written in those events. These decisions from the distant future may determine how the world preparing to reflect on them runs from the start.

Sorry, I don't think I understand what you mean. There can still be a process that gets the same answer as the long reflection, but with e.g. less suffering or waste of resources, right?

There's now further clarification in this thread.

I'm steelmanning long reflection, as both the source of goals for an AGI, and something that happens to our actual civilization, while resolving the issues that jumped out at you. Sorry if it wasn't clear from the cryptic summary.

If it's possible to make an AGI that coexists with our civilization (probably something that's not fully agentic), it should also be possible to make one that runs our civilization in a simulation while affecting what's going on in the simulation to a similar extent. If the nature of this simulation is more like that of a story (essay?), written without a plan in mind, but by following where the people written in it lead it, it can be dramatically more computationally efficient to run and to make preliminary predictions about.

The same way that determinism enables free will, so can sufficiently lawful storytelling, provided it's potentially detailed enough to generate thoughts of people in the simulation. So the decisions of the civilization simulated in a story are going to be determined by thoughts and actions of people living there, yet it's easy to make reasonable predictions about this in advance, and running the whole thing (probably an ensemble of stories, not a single story) is not that expensive, even if it takes a relatively long time, much more than to get excellent predictions of where it leads.

As a result, we quickly get a good approximation of what people will eventually decide, and that can be used to influence the story for the better from the start, without intruding on continuity, or to decide which parts to keep summarized, not letting them become real. So this version of long reflection is basically CEV, but with people inside being real (my guess is that having influence over the outer AGI is a significant component of being real), continuing the course of our own civilization. The outer AGI does whatever based on the eventual decisions of the people within the story, made during the long reflection, assisted within the story according to their own decisions from the future.

Edit: More details in this thread, in particular this comment.

There's a meme in EA that climate change is particularly bad because of a nontrivial probability that sensitivity to doubled CO2 is in the extreme upper tail. As far as I can tell, that's mostly not real. This paper seems like a very thorough Bayesian assessment that gives 4.7 K as a 95% upper bound, with values for temperature rise by 2089 quite tightly constrained (Fig 23). I'd guess this is an overestimate based on conservative choices represented by Figs 11, 14, and 18. The 5.7 K 95% upper bound after robustness tests comes from changing the joint prior over feedbacks to create a uniform prior on sensitivity, which as far as I can tell is unjustified. Maybe someone who's better at rhetoric than me should figure out how to frame all this in a way that predictably doesn't make people flip out. I thought I should post it, though.

For forecasting purposes, I'd recommend this and this as well, relevant to the amount of emissions to expect from nature and humans respectively.

How do they deal with model uncertainty (unknown unknowns)?

It's complicated. Searching the article for "structural uncertainty" gives 10 results about ways they've tried to deal with it. I'm not super confident that they've dealt with it adequately.

Additionally I think it's not real because if there would be such warming through feedback effects there's enough time to do heavy geoengineering. Geoengineering has it's own risks but it's doable in "runaway warming" scenarios.

Considering how much people talk about superforecasters, how come there aren't more public sources of superforecasts? There's prediction markets and sites like ElectionBettingOdds that make it easier to read their odds as probabilities, but only for limited questions. There's Metaculus, but it only shows a crowd median (with a histogram of predictions) and in some cases the result of an aggregation algorithm that I don't trust very much. There's PredictionBook, but it's not obvious how to extract a good single probability estimate from it. Both prediction markets and Metaculus are competitive and disincentivize public cooperation. What else is there if I want to know something like what the probability of war with Iran is?

I think the Metaculus crowd median is among the highest-quality predictions out there. Especially when someone goes through all the questions where they're confident the median is off, and makes comments pointing this out. I used to do this, some months back when there were more short term questions on Metaculus and more questions where I differed from the community. When you made a bunch of comments of this type a month back on Metaculus, that covered most of the 'holes', in my opinion, and now there are only a few questions where I differ from the median prediction.

Another source of predictions is from the IARPA Geoforecasting Challenge, where if you're competing you have access to hundreds of MTurk human predictions through an API. The quality of the predictions are not as great, and there are some questions where the MTurk crowd is way off. But they do have a question on whether Iran will execute or be targeted in a national military attack.

I agree that it's quite possible to beat the best publicly available forecasts. I've been wanting to work together on a small team to do this (where I imagine the same set of people would debate and make the predictions). If anyone's interested in this, I'm datscilly on Metaculus and can be reached at [my name] at gmail.

Maybe Good Judgement Open? I don't know how they actually get their probabilities though.

I think one could greatly outperform the best publicly available forecasts through collaboration between 1) some people good at arguing and looking for info and 2) someone good at evaluating arguments and aggregating evidence. Maybe just a forum thread where a moderator keeps a percentage estimate updated in the top post.

I would trust the aggregation algorithm on Metaculus more than an average (mostly because its performance is evaluated against an average). So I think that's usually pretty decent.

I would normally trust it more, but it's recently been doing way worse than the Metaculus crowd median (average log score 0.157 vs 0.117 over the sample of 20 yes/no questions that have resolved for me), and based on the details of the estimates that doesn't look to me like it's just bad luck. It does better on the whole set of questions, but I think still not much better than the median; I can't find the analysis page at the moment.

based on the details of the estimates that doesn't look to me like it's just bad luck

For example:

  • There's a question about whether the S&P 500 will end the year higher than it began. When the question closed, the index had increased from 2500 to 2750. The index has increased most years historically. But the Metaculus estimate was about 50%.
  • On this question, at the time of closing, 538's estimate was 99+% and the Metaculus estimate was 66%. I don't think Metaculus had significantly different information than 538.

Thinking out loud about some arguments about AI takeoff continuity:

If a discontinuous takeoff is more likely to be local to a particular agent or closely related set of agents with particular goals, and a continuous takeoff is more likely to be global, that seems like it incentivizes the first agent capable of creating a takeoff to make sure that that takeoff is discontinuous, so that it can reap the benefits of the takeoff being local to that agent. This seems like an argument for expecting a discontinuous takeoff and an important difference with other allegedly analogous technologies.

I have some trouble understanding the "before there are strongly self-improving AIs there will be moderately self-improving AIs" argument for continuity. Is there any reason to think the moderate self-improvement ability won't be exactly what leads to the strong self-improvement ability? Before there's an avalanche, there's probably a smaller avalanche, but maybe the small avalanche is simply identical to the early part of the large avalanche.

Where have these points been discussed in depth?

Online posts function as hard-to-fake signals of readiness to invest verbal energy into arguing for one side of an issue. This gives readers the feeling they won't lose face if they adopt the post's opinion, which overlaps with the feeling that the post's opinion is true. This function sometimes makes posts longer than would be socially optimal.

Newcomb's Problem sometimes assumes Omega is right 99% of the time. What is that conditional on? If it's just a base rate (Omega is right about 99% of people), what happens when you condition on having particular thoughts and modeling the problem on a particular level? (Maybe there exists a two-boxing lesion and you can become confident you don't have it.) If it's 99% conditional on anything you might think, e.g. because Omega has a full model of you but gets hit by a cosmic ray 1% of the time, isn't it clearer to just assume Omega gets it 100% right? Is this explained somewhere?

Are there online spaces that talk about the same stuff LW talks about (AI futurism, technical rationality, and so on), with reasonably high quality standards, but more conversational-oriented and less soapbox-oriented, and maybe with less respectability signaling? I often find myself wanting to talk about things discussed here but feeling overconstrained by things like knowing that comments are permanent and having to anticipate objections instead of taking them as they come.

What's the name of the proto-fallacy that goes like "you should exchange your oranges for pears because then you'll have more pears", suggesting that the question can be resolved, or has already been resolved, without ever considering the relative value of oranges and pears? I feel like I see it everywhere a lot, including on LW.

Sounds like failing at charity, not trying to figure out what thinking produced a claim/question/behavior and misinterpreting it as a result. In your example, there is an implication of difficulty with noticing the obvious, when the correct explanation is most likely having a different objective, which should be clear if the question is given half a thought. In some cases, running with the literal meaning of a claim as stated is actually a misinterpretation, since it differs from the intended meaning.

Another thing I feel like I see a lot on LW is disagreements where there's a heavy thumb of popularity or reputational costs on one side of the scale, but nobody talks about the thumb, and it makes it hard to tell if people are internally trying to correct for the thumb or if they're just substituting the thumb for whatever parts of their reasoning or intuition they're not explicitly talking about, and a lot of what looks like disagreement about the object level arguments that are being presented may actually be disagreement about the thumb. For example, in the case of the parent comment, maybe such a thumb is driving judgments of the relative values of oranges and pears.

Together with my interpretation of the preceding example this suggests an analogy between individual/reference-class charity and filtered evidence. The analogy is interesting as a means of transfering understanding of errors in ordinary charity to the general setting where the salient structure in the sources of evidence could have any nature.

So what usually goes wrong with charity is that the hypotheses about possible kinds of thinking behind an action/claim are not deliberatively considered (or consciously noticed), so the implicit assumption is intuitive, and can occasionally be comically wrong (or at least overconfident) in a way that would be immediately recognized if considered deliberatively. This becomes much worse if failure of charity is a habit, because then the training data for intuition can become systematically bad, dragging down the intuition itself to a point where it starts actively preventing deliberative consideration from being able to work correctly, so the error persists even in the face of being pointed out. If this branches out into the anti-epistemology territory, particularly via memes circulating in a group that justify the wrong intuitions about thinking of members of another group, we get a popular error with a reliably trained cognitive infrastructure for resisting correction.

But indeed this could happen for any kind of working with evidence that needs some Bayes and reasonable hypotheses to stay sane! So a habit of not considering obvious possibilities about origin of evidence risks training systematically wrong intuitions that make noticing their wrongness more difficult. In a group setting, this gets amplified with echo chamber/epistemic bubble effects, which draw their power from the very same error of not getting deliberatively considered as significant forces that shape available evidence.

There's been some discussion of tradeoffs between a group's ability to think together and its safety from reputational attacks. Both of these seem pretty essential to me, so I wish we'd move in the direction of a third option: recognizing public discourse on fraught topics as unavoidably farcical as well as often useless, moving away from the social norm of acting as if a consideration exists if and only if there's a legible Post about it, building common knowledge of rationality and strategic caution among small groups, and in general becoming skilled at being esoteric without being dishonest or going crazy in ways that would have been kept in check by larger audiences. I think people underrate this approach because they understandably want to be thought gladiators flying truth as a flag. I'm more confident of the claim that we should frequently acknowledge the limits of public discourse than the other claims here.

I don't think there's a general solution. Eliezer's old quote "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else." applies to social movements and discussion groups just as well. It doesn't matter if you're on the right or the wrong side - you have attention and resources that the war can use for something else.

There _may_ be an option to be on neither side, and just stay out. Most often, that's only available to those with no useful resources, or that can plausibly threaten both sides.

How much should I worry about the unilateralist's curse when making arguments that it seems like some people should have already thought of and that they might have avoided making because they anticipated side effects that I don't understand?

In most domains people don't make arguments because they either think they aren't strong or because making the argument would lose them social status.

The cases where an argument carries with it real danger are relatively small, and in most of those cases it should be possible to know that you are in a problematic area. In those cases, you should make arguments first nonpublically with people who you consider to be good judges of whether those arguments should be made publically.

Adding to your first point: Or they don't make arguments simply because - even if strong and in the absence of social costs - it does not pay.

(I think of the example of some policy debates where I know tons of academics who could easily provide tons of very strong, rather obvious arguments, that are essentially not made because none seems to care getting involved)

I paint a stylized case of some type of situation where the question arises, and where my gut feeling tells me it may often be better to release the argument than to hide it, for the sake of long-term social cohesion and advancement:

You're part of an intellectual elite, with their own values/biases, and you consider hiding a sensible argument (say, on a political topic) because commoners, given their separate values/biases, would risk to act to it in a way that goes counter your agenda. You might likely not release the argument thus.

In the long-run this can backfire. The only way for society to advance is by reducing the gap between the elite and commoners. Commoners understand if they are regularly fed biased info by the elite; and the less seriously the elite engages, the less commoners will trust and be able to be hauled into more nuanced ways of thinking.

In short, in this stylized case: Intellectual honesty, even risking an immediate harm to your values, may likely enough pay in the long-term. Lifting the level of the discussion, by bringing up rational arguments for both 'sides', is important, especially in democratic systems where you eventually anyways rely on a common understanding of the world.

Maybe this does not generalize well, and my hunch is wrong: in the long-run we're all dead, and the level of public discussions is often so low that adding an argument in one direction is often just ammunition being exploited without impacting much along other dimensions.

According to, this morning, Trump president 2024 contracts went up from about 0.18 to 0.31 on FTX but not elsewhere. Not sure what's going on there or if people can make money on it.

There's still a big gap between Betfair/Smarkets (22% chance Trump president) and Predictit/FTX (29-30%). I assume it's not the kind of thing that can be just arbitraged away.

Huh. I am curious to hear explanations if anyone has one.

Are We Approaching an Economic Singularity? Information Technology and the Future of Economic Growth (William D. Nordhaus)

Has anyone looked at this? Nordhaus claims current trends suggest the singularity is not near, though I wouldn't expect current trends outside AI to be very informative. He does seem to acknowledge x-risk in section Xf, which I don't think I've seen from other top economists.

Nordhaus seems to miss the point here, indeed. He does statistics purely on historic macro-economic data. In these, there could not be even a hint of the singularity we'd here talk about - and that also he seems to refer to in his abstract (imho). This core singularity effect of self-accelerating, nearly infinitely fast intelligence improvement, once a threshold is crossed, is almost by definition invisible in present data: Only after this singularity, we expect things to get weird, and visible in economic data.

Bit sad to see the paper as is. Nordhaus has written seminal contributions in integrated environmental-economic modelling in the resources/climate domain. And also for singularity questions, good economic analysis modelling explicitly substitutabilities between different types of productive capital, resources, labor, information processing, could be insightful, I believe, and at least I have not yet stumbled upon much in that regard. There is a difficulty to imagine a post-singularity world at all; but interesting scenarios could probably be created, trying to formalize more casual discussions.

A naive argument says the influence of our actions on the far future is ~infinity times as intrinsically important as the influence of our actions on the 21st century because the far future contains ~infinity times as much stuff. One limit to this argument is that if 1/1,000,000 of the far future stuff is isomorphic to the 21st century (e.g. simulations), then having an influence on the far future is "only" a million times as important as having the exact same influence on the 21st century. (Of course, the far future is a very different place so our influence will actually be of a very different nature.) Has anyone tried to get a better abstract understanding of this point or tried to quantify how much it matters in practice?

I haven't thought about it much, but it seems like the fraction of far future stuff isomorphic to the 21st century is probably fairly negligible from a purely utilitarian viewpoint, because the universe is so big that even using 1/1,000,000 of it for simulations would be a lot of simulations, and why would the far future want that many simulations of the 21st century? It doesn't seem like a good use of resources to do that many duplicate historical simulations in terms of either instrumental value or terminal value.

I guess I wasn't necessarily thinking of them as exact duplicates. If there are 10^100 ways the 21st century can go, and for some reason each of the resulting civilizations wants to know how all the other civilizations came out when the dust settled, each civilization ends up having a lot of other civilizations to think about. In this scenario, an effect on the far future still seems to me to be "only" a million times as big as the same effect on the 21st century, only now the stuff isomorphic to the 21st century is spread out across many different far future civilizations instead of one.

Maybe 1/1,000,000 is still a lot, but I'm not sure how to deal with uncertainty here. If I just take the expectation of the fraction of the universe isomorphic to the 21st century, I might end up with some number like 1/10,000,000 (because I'm 10% sure of the 1/1,000,000 claim) and still conclude the relative importance of the far future is huge but hugely below infinity.

Has anyone tried to make complex arguments in hypertext form using a tool like Twine? It seems like a way to avoid the usual mess of footnotes and disclaimers.