From a paper by Milan M. Ćirković, Anders Sandberg, and Nick Bostrom:

We describe a signiﬁcant practical consequence of taking anthropic biases into account in deriving predictions for rare stochastic catastrophic events. The risks associated with catastrophes such as asteroidal/cometary impacts, supervolcanic episodes, and explosions of supernovae/gamma-ray bursts are based on their observed frequencies. As a result, the frequencies of catastrophes that destroy or are otherwise incompatible with the existence of observers are systematically underestimated. We describe the consequences of this anthropic bias for estimation of catastrophic risks, and suggest some directions for future work.

There cannot have been a large disaster on Earth in the last millennia, or we wouldn't be around to see it. There can't have been a very large disaster on Earth in the last ten thousand years, or we wouldn't be around to see it. There can't have been a huge disaster on Earth in the last million years, or we wouldn't be around to see it. There can't have been a planet-destroying disaster on Earth... ever.

Thus the fact that we exist precludes us seeing certain types of disasters in the historical record; as we get closer and closer to the present day, the magnitude of the disasters we can see goes down. These missing disasters form the "anthropic shadow", somewhat visible in the top right of this diagram:

Hence even though it looks like the risk is going down (the magnitude is diminishing as we approach the present), we can't rely on this being true: it could be a purely anthropic effect.

I'm not an astronomer, but my understanding is that when it comes to impact events, the probability of a planet-killer probably

isgoing down over time: after an impact event, the impactor isn't around to pose a threat anymore, and there are only so many large objects with Earth-crossing orbits. At this point we're far enough out on a long enough tail that the probability density isn't changing much over time, but that's not true over the timescales being graphed; if we're dating lunar rocks accurately, extinction-scale impact events last peaked around 3.9 Gya, and were rare after 3 Gya. I'd be surprised if that trend hasn't to some extent continued.More generally, if you want a good estimate of the near-term probability of impact events, you probably want to survey one or several of the other bodies in the solar system. They have an impact record relatively untainted by anthropic bias, and also have the advantage of being a lot easier to read, as most of them lack the plate tectonics that wipe out a lot of older geology on Earth. That said, though, there

areextinction events that wouldn't be affected by the later evolution of the solar system: nearby supernovae, for example, or gamma-ray bursts.And this has been done. We have good records of impact levels on the Moon, Mars and Jupiter (although Jupiter is a little weird). It doesn't look like there's heavy anthropic bias there.

Ćirković, Sandberg & Bostrom refer to these briefly in the paper, but seem to think they're not adequate or as relevant:

Citation?

I don't have a citation for this, more a general familiarity with the literature on the subject, and that no one has ever said "hey it looks like we should have seen a lot more impacts on Earth than we've apparently gotten" or anything similar.

Wouldn't this be a (weak, since humans have lots of reasons) piece of evidence that people see the same pattern of collision sizes on earth as on e.g. the moon?

Yes, and that's the point: that suggests that there's little anthropic bias at work here. A heavy anthropic bias would be if we didn't see the same collision patterns.

This paper seems to have some useful data. I'd be happier with a table of crater sizes and ages that I could plug into Octave and fit a regression to, but so far I haven't been able to come up with any decent-sized datasets.

ETA: The Lunar Impact Crater Database could probably do it, if you feel like doing some messy conversion.

Am i theonly onewho does not see an anthropic shadow there? All i see is a few widely scattered huge events and finer resolution on smaller more recent events...

I assume they're referring to the top right quadrant of the graph being totally empty while the top left quadrant has two events. But those two events are a pretty slender reed to rest their analysis on.

What looks

moreinteresting to me is an apparent downward trend in the biggest crater size from Chixhulub onwards. I tried downloading the Earth Impact Database data referenced in the paper so I could zoom in on these more recent impacts, but the dataset's only available as machine-unreadable HTML tables with various ad hoc notations. (This leads me to wonder how Ćirković, Sandberg & Bostrom turned these clunky tables of numbers into an unambiguous scatterplot, and makes me even more nervous about those two data points on which their analysis hinges.)Ask and ye shall receive (JSON)

Whee! Thanks. I won't ask how the sausage was made; I'll just plunge right into the graphs.

[N.B.: this is the third set of plots I've made, having overhauled them twice in response to errors pointed out by the child comments. The second round of plots, which erroneously used log-log scales, are at this link and this link.]

Here's my version of the Ćirković, Sandberg & Bostrom plot.

There appears to be even

lessevidence of anthropic shadow than in their original graph; there's now an impact in the upper right quadrant, and if anything more mid-sized craters are recorded more recently. But the latter could be a reporting bias because there are better records for newer impacts. So let's zoom in on that denser period:Still no sign of anthropic shadowing. What about zooming in on the post-Chicxulub period I originally wondered about?

I think there's actually mild evidence of anthropic shadowing at this scale, although visually that's mostly suggested by the three biggest craters. Even ignoring those, though, there does seem to be a sort of downward wedge in crater sizes.

And lastly, a link to a bonus plot of the most recent period, the last 3 million years, during which

Homowas evolving. I think there's no visible evidence of anthropic shadowing on that plot, which doesn't surprise me much because there're so few opportunities for anthropic shadowing to appear on such a short time scale.I think these corrected plots bring me back to where I was when I first read ĆS&B's paper: open to the idea of anthropic shadowing, but seeing only a faint sign of it in the impact crater data.

(The R code [for the old plots].)

The log scale creates that downward-sloping pattern as an artifact - it appears even if the craters are purely random (uniformly distributed across time).

Simplified example: suppose that we treat crater diameter as binary, with a 10km or more crater counting as a "big crater" and anything smaller getting ignored. If we get one "big crater" every 2 Myr, on average, then we'd expect the right half of the x-axis to be blank; the rightmost datapoint would be around the 1 Myr mark. Between 1-10 Myr we'd expect to see a few dots (4-5 big impacts). To the left of the 10 Myr mark, the dots would get denser and denser; there would be hundreds of them between 100 & 1000 Myr.

If we instead chose a smaller cutoff for what counts as a "bit crater" - say, a once every 0.2 Myr sized crater (which is perhaps a 2km diameter) - then the pattern would look the same, but shifted over to the right (by one tick mark, in that case).

In the two-dimensional log-log graph, that pattern (of increasing density to the left of the graph, petering out at different x-values depending on what size crater you're looking for) translates into the downward slope that we see here.

Good point. Come to think of it, that's probably why ĆS&B used a linear scale for the time axis in the first place.

I think the 10^7 Myr one is an error, seeing as the earth is less than 10^4 Myr old.

You're absolutely right, can't believe I missed that. That datum's the Dhala crater in India, which has its age listed as "> 1700 < 2100" Ma. Dropping the angle brackets and the spaces gave "17002100". The three other data points over 2400 Ma (the "Jebel Waqf as Suwwan", "Tunnunik (Prince Albert)", and "Amelia Creek" craters) got misdated for similar reasons. Better fix the data file and replot. One sec....

Edit:fixed those four points, but I now notice there's a Santa Fe crater that's mis-sized at 613 km instead of "6-13".Edit 2:OK, think that's rectified as well. If anyone's wondering how I handled these ranges, I did the lazy thing and took the arithmetic mean of the upper & lower bounds.The whole point of HTML is to be machine-readable. I find that if I copy a table from Apple's web browser and paste it into Apple's spreadsheet, it works fine. Maybe you need better machines.

Upvoted for the impressive feat of cramming Apple fanboyism and subtly flawed linguistic pedantry into three dozen words of double-barrelled flamebait. Downvoted for posting double-barrelled flamebait comprising Apple fanboyism and subtly flawed linguistic pedantry.

That HTML's meant to be machine-readable doesn't mean it is. (Both webpages and browsers can fail to meet HTML standards.) But that is itself a counter-nitpick. The bigger problem with your nitpick is that you're reading "machine-readable" in a blinkered way. As Wikipedia says, "machine-readable" data can refer to "human-readable data that is marked up so that it can also be read by machines (examples; microformats, RDFa) or data file formats intended principally for machines (RDF, XML, JSON)". You seem to have only the first meaning in mind while I was thinking of the second. I wanted the data in a format that a computer could immediately interpret and turn into a scatterplot, such as a text file of two tab-delimited columns of numbers. Now, you do make the point that it's fairly straightforward to get the data as two columns of numbers...

...but telling me this is unhelpful. For one thing, I already have two spreadsheet programs on my computer that can do the same thing, Gnumeric and OpenOffice.org Calc. For another, why should I have to change my usual workflow (paste data into vim, clean it, save as plain text, load into R, make graphs) when the data could've been made available in a simple format in the first place? Lastly, once you've got the numbers into your spreadsheet, what happens when you try plotting them? Do you still "find that [...] it works fine"? I suspect not, because the age numbers include values like "< 0.001", "0.004 ± 0.001", "0.0054± 0.0015", "~ 0.0066", "> 0.05", ">5, <36", and "3-95". Being able to circumvent the clunkiness of presenting data in HTML tables doesn't eliminate the problem of the ad hoc human-readable-but-machine-unreadable notations.

By that standard, an excel spreadsheet is "machine-unreadable."

Debatable.

This should be one of the LW Rationality Quotes for next month.

Against the rules.

Interpret it as a wish that it could be, then?

ISTR we once had a rationality quotes thread with the reverse rule, but I can't find it now!

This will teach me to skim next time. Thanks.

I am struggling to follow this anthropic shadow argument. Perhaps someone can help me see what I am getting wrong.

Suppose that every million years on the dot, some catastrophic event happens with probability P (or fails to happen with probability 1-P). Suppose that if the event happens at one of these times, it destroys all life, permanently, with probability 0.1. Suppose that P is unknown, and we initially adopt a prior for it which is uniform between 0 and 1.

Now suppose that by examining the historical record we can discover exactly how many times the event has occurred in Earth's history. Naively, we can then update our prior based on this evidence, and we get a posterior distribution sharply peaked at (# of times event has occurred) / (# of times event could have occurred). I will call this the 'naive' approach.

My understanding of the paper is that they are claiming this 'naive' approach is wrong, and it is wrong because of observer selection effects. In particular they claim it gives an underestimate of P. Their argument for this appears to be the following: if you pick a fixed value of P, and simulate history a large number of times, then in the cases where an observer like us evolves, the observer's calculation of (# of times event has occurred) / (# of times event could have occurred) will on average be significantly below the true value of P. This is because observers are more likely to evolve after periods of unusually low catastrophic activity.

What I am currently not happy with is the following: shouldn't you run the simulation a large number of times, not with fixed value of P, but with P chosen from the prior? And if you do that, I find their claim less obvious. Suppose for simplicity that instead of having a uniform prior, P is equally likely to take the value 0.1 or 0.9. Simulate history some large number of times. Half will be 0.1 worlds and half will be 0.9 worlds. Under the naive approach, more 0.9 world observers will think they are in the 0.1 world than in the paper's approach, so they are more wrong, but there are also very few 0.9 world observers anyway (there is approximately a 10% chance of extinction per million years in this world). The vast majority of observers are 0.1 world observers, confident that they are 0.1 world observers (overconfident according to the paper), and they are right. If you just look at fixed values of P you seem to be ignoring the fact that observers are more likely to arise in worlds where P is smaller. When you take this fact into account, maybe it can justify the 'naive' underestimate?

This is a bit vague, but I'm just trying to explain my feeling that simulating the world many times at fixed P is not obviously the right thing to do (I may also be misunderstanding the argument of the paper and this isn't really what they are doing).

To state my issue another way, although their argument seems plausible from one point of view, I am struggling to understand WHY the 'naive' argument is wrong. All you are doing is applying Bayes theorem, and conditioning on the evidence, which is the historical record of when the event did or did not occur. What could be wrong with that? I can only see it being wrong if there is some additional evidence you should be conditioning on as well which you are missing out, but I can't see what that additional evidence could be in this context. It cannot be your existence, because the probability of your existence loses its dependence on P once the number of past occurrences of the event is given.

Yes, it seems that self-indication assumption is exactly compensating the anthropic shadow: the stronger is the shadow, the less likely I will be in such a world.

However, it works only if worlds with low

pand no shadow actually exist somewhere in the multiverse (and in sufficiently large numbers). If there is a universal anthropic shadow, it will still work.