Note: Please see this post of mine for more on the project, my sources, and potential sources for bias.

One of the categories of critique that have been leveled against climate science is the critique of insularity. Broadly, it is claimed that the type of work that climate scientists are trying to do draws upon insight and expertise in many other domains, but climate scientists have historically failed to consult experts in those domains or even to follow well-documented best practices.

Some takeaways/conclusions

Note: I wrote a preliminary version of this before drafting the post, but after having done most of the relevant investigation. I reviewed and edited it prior to publication. Note also that I don't justify these takeaways explicitly in my later discussion, because a lot of these come from general intuitions of mine and it's hard to articulate how the information I received explicitly affected my reaching the takeaways. I might discuss the rationales behind these takeaways more in a later post.

  • Many of the criticisms are broadly on the mark: climate scientists should have consulted best practices in other domains, and in general should have either followed them or clearly explained the reasons for divergence.
  • However, this criticism is not unique to climate science: academia in general has suffered from problems of disciplines being relatively insular (UPDATE: Here's Robin Hanson saying something similar). And many similar things may be true, albeit in different ways, outside academia.
  • One interesting possibility is that bad practices here operate via founder effects: for an area that starts off as relatively obscure and unimportant, setting up good practices may not be considered important. But as the area grows in importance, it is quite rare for the area to be cleaned up. People and institutions get used to the old ways of doing things. They have too much at stake to make reforms. This does suggest that it's important to get things right early on.
  • (This is speculative, and not discussed in the post): The extent of insularity of a discipline seems to be an area where a few researchers can have significant effect on the discipline. If a few reasonably influential climate scientists had pushed for more integration with and understanding of ideas from other disciplines, the history of climate science research would have been different.

Relevant domains they may have failed to use or learn from

  1. Forecasting research: Although climate scientists were engaging in an exercise that had a lot to do with forecasting, they neither cited research nor consulted experts in the domain of forecasting.
  2. Statistics: Climate scientists used plenty of statistics in their analysis. They did follow the basic principles of statistics, but in many cases used them incorrectly or combined them with novel approaches that were nonstandard and did not have clear statistical literature justifying the use of such approaches.
  3. Programming and software engineering: Climate scientists used a lot of code both for their climate models and for their analyses of historical climate. But their code failed basic principles of decent programming, let alone good software engineering principles such as documentation, unit testing, consistent variable names, and version control.
  4. Publication of data, metadata, and code: This is a phenomenon becoming increasingly common in some other sectors of academia and industry. Climate scientists they failed to learn from econometrics and biomedical research, fields that had been struggling with some qualitatively similar problems and that had been moving to publishing data, metadata, and code.

Let's look at each of these critiques in turn.

Critique #1: Failure to consider forecasting research

We'll devote more attention to this critique, because it has been made, and addressed, cogently in considerable detail.

J. Scott Armstrong (faculty page, Wikipedia) is one of the big names in forecasting. In 2007, Armstrong and Kesten C. Green co-authored a global warming audit (PDF of paper, webpage with supporting materials) for the Forecasting Principles website. that was critical of the forecasting exercises by climate scientists used in the IPCC reports.

Armstrong and Green began their critique by noting the following:

  • The climate science literature did not reference any of the forecasting literature, and there was no indication that they had consulted forecasting experts, even though what they were doing was to quite an extent a forecasting exercise.
  • There was only one paper, by Stewart and Glantz, dating back to 1985, that could be described as a forecasting audit, and that paper was critical of the methodology of climate forecasting. And that paper appears to have been cited very little in the coming years.
  • Armstrong and Green tried to contact leading climate scientists. Of the few who responded, none listed specific forecasting principles they followed, or reasons for not following general forecasting principles. They pointed to the IPCC reports as the best source for forecasts. Armstrong and Green estimated that the IPCC report violated 72 of 89 forecasting principles they were able to rate (their list of forecasting principles includes 140 principles, but they judged only 127 as applicable to climate forecasting, and were able to rate only 89 of them). No climate scientists responded to their invitation to provide their own ratings for the forecasting principles.

How significant are these general criticisms? It depends on the answers to the following questions:

  • In general, how much credence do you assign to the research on forecasting principles, and how strong a prior do you have in favor of these principles being applicable to a specific domain? I think the answer is that forecasting principles as identified on the Forecasting Principles website are a reasonable starting point, and therefore, any major forecasting exercise (or exercise that implicitly generates forecasts) should at any rate justify major points of departure from these principles.
  • How representative are the views of Armstrong and Green in the forecasting community? I have no idea about the representativeness of their specific views, but Armstrong in particular is high-status in the forecasting community (that I described a while back) and the Forecasting Principles website is one of the go-to sources, so material on the website is probably not too far from views in the forecasting community. (Note: I asked the question on Quora a while back, but haven't received any answers).

So it seems like there was arguably a failure of proper procedure in the climate science community in terms of consulting and applying practices from relevant domains. Still, how germane was it to the quality of their conclusions? Maybe it didn't matter after all?

In Chapter 12 of The Signal and the Noise, statistician and forecaster Nate Silver offers the following summary of Armstrong and Green's views:

  • First, Armstrong and Green contend that agreement among forecasters is not related to accuracy—and may reflect bias as much as anything else. “You don’t vote,” Armstrong told me. “That’s not the way science progresses.”
  • Next, they say the complexity of the global warming problem makes forecasting a fool’s errand. “There’s been no case in history where we’ve had a complex thing with lots of variables and lots of uncertainty, where people have been able to make econometric models or any complex models work,” Armstrong told me. “The more complex you make the model the worse the forecast gets.”
  • Finally, Armstrong and Green write that the forecasts do not adequately account for the uncertainty intrinsic to the global warming problem. In other words, they are potentially overconfident.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (p. 382). Penguin Group US. Kindle Edition.

Silver addresses each of these in his book (read it to know what he says). Here are my own thoughts on the three points as put forth by Silver:

  • I think consensus among experts (to the extent that it does exist) should be taken as a positive signal, even if the experts aren't good at forecasting. But certainly, the lack of interest or success in forecasting should dampen the magnitude of the positive signal. We should consider it likely that climate scientists have identified important potential phenomena, but should be skeptical of any actual forecasts derived from their work.
  • I disagree somewhat with this point. I think forecasting could still be possible, but as of now, there is little of a successful track record of forecasting (as Green notes in a later draft paper). So forecasting efforts, including simple ones (such as persistence, linear regression, random walk with drift) and ones based on climate models (both the ones in common use right now and others that give more weight to the PDO/AMO), should continue but the jury is still out on the extent to which they work.
  • I agree here that many forecasters are potentially overconfident.

Some counterpoints to the Armstrong and Green critique:

  • One can argue that what climate scientists are doing isn't forecasting at all, but scenario analysis. After all, the IPCC generates scenarios, but not forecasts. But as I discussed in an earlier post, scenario planning and forecasting are closely related, and even if scenarios aren't direct explicit unconditional forecasts, they often involve implicit conditional forecasts. To its credit, the IPCC does seem to have used some best practices from the scenario planning literature in generating its emissions scenarios. But that is not part of the climate modeling exercise of the IPCC.
  • Many other domains that involve planning for the future don't reference the forecasting literature. Examples include scenario planning (discussed here) and the related field of futures studies (discussed here). Insularity of disciplines from each other is a common feature (or bug) in much of academia. Can we really expect or demand that climate scientists hold themselves to a higher standard?

UPDATE: I forgot to mention in my original draft of the post that Armstrong challenged Al Gore to a bet pitting Armstrong's No Change model with the IPCC model. Gore did not accept the bet, but Armstrong created the website (here) anyway to record the relative performance of the two models.

UPDATE 2: Read drnickbone's comment and my replies for more information on the debate. drnickbone in particular points to responses from Real Climate and Skeptical Science, that I discuss in my response to his comment.

Critique #2: Inappropriate or misguided use of statistics, and failure to consult statisticians

To some extent, this overlaps with Critique #1, because best practices in forecasting include good use of statistical methods. However, the critique is a little broader. There are many parts of climate science not directly involved with forecasting, but where statistical methods still matter. Historical climate reconstruction is one such example. The purpose of these is to get a better understanding of the sorts of climate that could occur and have occurred, and how different aspects of the climate correlated. Unfortunately, historical climate data is not very reliable. How do we deal with different proxies for the climate variables we are interested in so that we can reconstruct them? A careful use of statistics is important here.

Let's consider an example that's quite far removed from climate forecasting, but has (perhaps undeservedly) played an important role in the public debate on global warming: Michael Mann's famed hockey stick (Wikipedia), discussed in detail in Mann, Bradley and Hughes (henceforth, MBH98) (available online here). The major critiques of the paper arose in a series of papers by McIntyre and McKitrick, the most important of them being their 2005 paper in Geophysical Research Letters (henceforth, MM05) (available online here).

I read about the controversy in the book The Hockey Stick Illusion by Andrew Montford (Amazon, Wikipedia), but the author also has a shorter article titled Caspar and the Jesus paper that covers the story as it unfolds from his perspective. While there's a lot more to the hockey stick controversy than statistics alone, some of the main issues are statistical.

Unfortunately, I wasn't able to resolve the statistical issues myself well enough to have an informed view. But my very crude intuition, as well as the statements made by statisticians as recorded below, supports Montford's broad outline of the story. I'll try to describe the broad critiques leveled from the statistical perspective:

  • Choice of centering and standardization: The data was centered around the 20th century, a method known as short-centering, and bound to create a bias in favor of picking hockey stick-like shapes when doing principal components analysis. Each series was also standardized (divided by the standard deviation for the 20th century), which McIntyre argued was inappropriate.
  • Unusual choice of statistic used for significance: MBH98 used a statistic called the RE statistic (reduction of error statistic). This is a fairly unusual statistic to use. In fact, it doesn't have a Wikipedia page, and practically the only stuff on the web (on Google and Google Scholar) about it was in relation to tree-ring research (the proxies used in MBH98 were tree rings). This should seem suspicious: why is tree-ring research using a statistic that's basically unused outside the field? There are good reasons to avoid using statistical constructs on which there is little statistical literature, because people don't have a feel for how they work. MBH98 could have used the R^2 statistic instead, and in fact, they mentioned it in their paper but then ended up not using it.
  • Incorrect calculation of significance threshold: MM05 (plus subsequent comments by McIntyre) claims that not only is the RE statistic nonstandard, there were problems with the way MBH98 used it. First off, there is no theoretical distribution of the RE statistic, so calculating the cutoff needed to attain a particular significance level is a tricky exercise (this is one of many reasons why using a RE statistic may be ill-advised, according to McIntyre). MBH98 calculated the cutoff value for 99% significance incorrectly to be 0. The correct value according to McIntyre was about 0.54, whereas the actual RE statistic value for the data set in MBH98 was 0.48, i.e., not close enough. A later paper by Ammann and Wahl, cited by many as a vindication of MBH98, computed a similar cutoff of 0.52, so that the actual RE statistic value failed the significance test. So how did it manage to vindicate MBH98 when the value of the RE statistic failed the cutoff? They appear to have employed a novel statistical procedure, coming up with something called a calibration/verification RE ratio. McIntyre was quite critical of this, for reasons he described in detail here.

There has been a lengthy debate on the subject, plus two external inquiries and reports on the debate: the NAS Panel Report headed by Gerry North, and the Wegman Report headed by Edward Wegman. Both of them agreed with the statistical criticisms made by McIntyre, but the NAS report did not make any broader comments on what this says about the discipline or the general hockey stick hypothesis, while the Wegman report made more explicit criticism.

The Wegman Report made the insularity critique in some detail:

In general, we found MBH98 and MBH99 to be somewhat obscure and incomplete and the criticisms of MM03/05a/05b to be valid and compelling. We also comment that they were attempting to draw attention to the discrepancies in MBH98 and MBH99, and not to do paleoclimatic temperature reconstruction. Normally, one would try to select a calibration dataset that is representative of the entire dataset. The 1902-1995 data is not fully appropriate for calibration and leads to a misuse in principal component analysis. However, the reasons for setting 1902-1995 as the calibration point presented in the
narrative of MBH98 sounds reasonable, and the error may be easily overlooked by someone not trained in statistical methodology. We note that there is no evidence that Dr. Mann or any of the other authors in paleoclimatology studies have had significant interactions with mainstream statisticians.

In our further exploration of the social network of authorships in temperature reconstruction, we found that at least 43 authors have direct ties to Dr. Mann by virtue of coauthored papers with him. Our findings from this analysis suggest that authors in the area of paleoclimate studies are closely connected and thus ‘independent studies’ may not be as independent as they might appear on the surface. This committee does not believe that web logs are an appropriate forum for the scientific debate on this issue.

It is important to note the isolation of the paleoclimate community; even though they rely heavily on statistical methods they do not seem to be interacting with the statistical community. Additionally, we judge that the sharing of research materials, data and results was haphazardly and grudgingly done. In this case we judge that there was too much reliance on peer review, which was not necessarily independent. Moreover, the work has been sufficiently politicized that this community can hardly reassess their public positions without losing credibility. Overall, our committee believes that Mann’s assessments that the decade of the 1990s was the hottest decade of the millennium and that 1998 was the hottest year of the millennium cannot be supported by his analysis.

McIntyre has a lengthy blog post summarizing what he sees as the main parts of the NAS Panel Report, the Wegman Report, and other statements made by statisticians critical of MBH98.

Critique #3: Inadequate use of software engineering, project management, and coding documentation and testing principles

In the aftermath of Climategate, most public attention was drawn to the content of the emails. But apart from the emails, data and code was also leaked, and this gave the world an inside view of the code that's used to simulate the climate. A number of criticisms of the coding practice emerged.

Chicago Boyz had a lengthy post titled Scientists are not Software Engineers that noted the sloppiness in the code, and some of the implications, but was also quick to point out that poor-quality code is not unique to climate science and is a general problem with large-scale projects that arise from small-scale academic research growing beyond what the coders originally intended, but with no systematic efforts being made to refactor the code (if you have thoughts on the general prevalence of good software engineering practices in code for academic research, feel free to share them by answering my Quora question here, and if you have insights on climate science code in particular, answer my Quora question here). Below are some excerpts from the post:

No, the real shocking revelation lies in the computer code and data that were dumped along with the emails. Arguably, these are the most important computer programs in the world. These programs generate the data that is used to create the climate models which purport to show an inevitable catastrophic warming caused by human activity. It is on the basis of these programs that we are supposed to massively reengineer the entire planetary economy and technology base.

The dumped files revealed that those critical programs are complete and utter train wrecks.


The design, production and maintenance of large pieces of software require project management skills greater than those required for large material construction projects. Computer programs are the most complicated pieces of technology ever created. By several orders of magnitude they have more “parts” and more interactions between those parts than any other technology.

Software engineers and software project managers have created procedures for managing that complexity. It begins with seemingly trivial things like style guides that regulate what names programmers can give to attributes of software and the associated datafiles. Then you have version control in which every change to the software is recorded in a database. Programmers have to document absolutely everything they do. Before they write code, there is extensive planning by many people. After the code is written comes the dreaded code review in which other programmers and managers go over the code line by line and look for faults. After the code reaches its semi-complete form, it is handed over to Quality Assurance which is staffed by drooling, befanged, malicious sociopaths who live for nothing more than to take a programmer’s greatest, most elegant code and rip it apart and possibly sexually violate it. (Yes, I’m still bitter.)

Institutions pay for all this oversight and double-checking and programmers tolerate it because it is impossible to create a large, reliable and accurate piece of software without such procedures firmly in place. Software is just too complex to wing it.

Clearly, nothing like these established procedures was used at CRU. Indeed, the code seems to have been written overwhelmingly by just two people (one at a time) over the past 30 years. Neither of these individuals was a formally trained programmer and there appears to have been no project planning or even formal documentation. Indeed, the comments of the second programmer, the hapless “Harry”, as he struggled to understand the work of his predecessor are now being read as a kind of programmer’s Icelandic saga describing a death march through an inexplicable maze of ineptitude and boobytraps.


A lot of the CRU code is clearly composed of hacks. Hacks are informal, off-the-cuff solutions that programmers think up on the spur of the moment to fix some little problem. Sometimes they are so elegant as to be awe inspiring and they enter programming lore. More often, however, they are crude, sloppy and dangerously unreliable. Programmers usually use hacks as a temporary quick solution to a bottleneck problem. The intention is always to come back later and replace the hack with a more well-thought-out and reliable solution, but with no formal project management and time constraints it’s easy to forget to do so. After a time, more code evolves that depends on the existence of the hack, so replacing it becomes a much bigger task than just replacing the initial hack would have been.

(One hack in the CRU software will no doubt become famous. The programmer needed to calculate the distance and overlapping effect between weather monitoring stations. The non-hack way to do so would be to break out the trigonometry and write a planned piece of code to calculate the spatial relationships. Instead, the CRU programmer noticed that that the visualization software that displayed the program’s results already plotted the station’s locations so he sampled individual pixels on the screen and used the color of the pixels between the stations to determine their location and overlap! This is a fragile hack because if the visualization changes the colors it uses, the components that depend on the hack will fail silently.)

For some choice comments excerpted from a code file, see here.

Critique #4: Practices of publication of data, metadata, and code (that had gained traction in other disciplines)

When McIntyre wanted to replicate MBH98, he emailed Mann asking for his data and code. Mann, though initially cooperative, soon started trying to fed McIntyre off. Part of this was because he thought McIntyre was out to find something wrong with his work (a well-grounded suspicion). But part of it was also that his data and code were a mess. He didn't maintain them in a way that he'd be comfortable sharing them around to anybody other than an already sympathetic academic. And, more importantly, as Mann's colleague Stephen Schneider noted, nobody asked for the code and underlying data during peer review. And most journals at the time did not require authors to submit or archive their code and data at the time of submission or acceptance of their paper. This also closely relates to Critique #3: a requirement or expectation that one's data and code would be published along with one's paper might make people more careful to follow good coding practices and avoid using various "tricks" and "hacks" in their code.

Here's how Andrew Montford puts it in The Hockey Stick Illusion:

The Hockey Stick affair is not the first scandal in which important scientific papers underpinning government policy positions have been found to be non-replicable – McCullough and McKitrick review a litany of sorry cases from several different fields – but it does underline the need for a more solid basis on which political decision-making should be based. That basis is replication. Centuries of scientific endeavour have shown that truth emerges only from repeated experimentation and falsification of theories, a process that only begins after publication and can continue for months or years or decades thereafter. Only through actually reproducing the findings of a scientific paper can other researchers be certain that those findings are correct. In the early history of European science, publication of scientific findings in a journal was usually adequate to allow other researchers to replicate them. However, as science has advanced, the techniques used have become steadily more complicated and consequently more difficult to explain. The advent of computers has allowed scientists to add further layers of complexity to their work and to handle much larger datasets, to the extent that a journal article can now, in most cases, no longer be considered a definitive record of a scientific result. There is simply insufficient space in the pages of a print journal to explain what exactly has been done. This has produced a rather profound change in the purpose of a scientific paper. As geophysicist Jon Claerbout puts it, in a world where powerful computers and vast datasets dominate scientific research, the paper ‘is not the scholarship itself, it is merely advertising of the scholarship’.b The actual scholarship is the data and code used to generate the figures presented in the paper and which underpin its claims to uniqueness. In passing we should note the implications of Claerbout’s observations for the assessment for our conclusions in the last section: by using only peer review to assess the climate science literature, the policymaking community is implicitly expecting that a read-through of a partial account of the research performed will be sufficient to identify any errors or other problems with the paper. This is simply not credible. With a full explanation of methodology now often not possible from the text of a paper, replication can usually only be performed if the data and code are available. This is a major change from a hundred years ago, but in the twenty-first century it should be a trivial problem to address. In some specialisms it is just that. We have seen, however, how almost every attempt to obtain data from climatologists is met by a wall of evasion and obfuscation, with journals and funding bodies either unable or unwilling to assist. This is, of course, unethical and unacceptable, particularly for publicly funded scientists. The public has paid for nearly all of this data to be collated and has a right to see it distributed and reused. As the treatment of the Loehle paper shows,c for scientists to open themselves up to criticism by allowing open review and full data access is a profoundly uncomfortable process, but the public is not paying scientists to have comfortable lives; they are paying for rapid advances in science. If data is available, doubts over exactly where the researcher has started from fall away. If computer code is made public too, then the task of replication becomes simpler still and all doubts about the methodology are removed. The debate moves on from foolish and long-winded arguments about what was done (we still have no idea exactly how Mann calculated his confidence intervals) onto the real scientific meat of whether what was done was correct. As we look back over McIntyre’s work on the Hockey Stick, we see that much of his time was wasted on trying to uncover from the obscure wording of Mann’s papers exactly what procedures had been used. Again, we can only state that this is entirely unacceptable for publicly funded science and is unforgiveable in an area of such enormous policy importance. As well as helping scientists to find errors more quickly, replication has other benefits that are not insignificant. David Goodstein of the California Insitute of Technology has commented that the possibility that someone will try to replicate a piece of work is a powerful disincentive to cheating – in other words, it can help to prevent scientific fraud.251 Goodstein also notes that, in reality, very few scientific papers are ever subject to an attempt to replicate them. It is clear from Stephen Schneider’s surprise when asked to obtain the data behind one of Mann’s papers that this criticism extends into the field of climatology.d In a world where pressure from funding agencies and the demands of university careers mean that academics have to publish or perish, precious few resources are free to replicate the work of others. In years gone by, some of the time of PhD students might have been devoted to replicating the work of rival labs, but few students would accept such a menial task in the modern world: they have their own publication records to worry about. It is unforgiveable, therefore, that in paleoclimate circles, the few attempts that have been made at replication have been blocked by all of the parties in a position to do something about it. Medical science is far ahead of the physical sciences in the area of replication. Doug Altman, of Cancer Research UK’s Medical Statistics group, has commented that archiving of data should be mandatory and that a failure to retain data should be treated as research misconduct.252 The introduction of this kind of regime to climatology could have nothing but a salutary effect on its rather tarnished reputation. Other subject areas, however, have found simpler and less confrontational ways to deal with the problem. In areas such as econometrics, which have long suffered from politicisation and fraud, several journals have adopted clear and rigorous policies on archiving of data. At publications such as the American Economic Review, Econometrica and the Journal of Money, Credit and Banking, a manuscript that is submitted for publication will simply not be accepted unless data and fully functional code are available. In other words, if the data and code are not public then the journals will not even consider the article for publication, except in very rare circumstances. This is simple, fair and transparent and works without any dissent. It also avoids any rancorous disagreements between journal and author after the event. Physical science journals are, by and large, far behind the econometricians on this score. While most have adopted one pious policy or another, giving the appearance of transparency on data and code, as we have seen in the unfolding of this story, there has been a near-complete failure to enforce these rules. This failure simply stores up potential problems for the editors: if an author refuses to release his data, the journal is left with an enforcement problem from which it is very difficult to extricate themselves. Their sole potential sanction is to withdraw the paper, but this then merely opens them up to the possibility of expensive lawsuits. It is hardly surprising that in practice such drastic steps are never taken. The failure of climatology journals to enact strict policies or enforce weaker ones represents a serious failure in the system of assurance that taxpayer-funded science is rigorous and reliable. Funding bodies claim that they rely on journals to ensure data availability. Journals want a quiet life and will not face down the academics who are their lifeblood. Will Nature now go back to Mann and threaten to withdraw his paper if he doesn’t produce the code for his confidence interval calculations? It is unlikely in the extreme. Until politicians and journals enforce the sharing of data, the public can gain little assurance that there is any real need for the financial sacrifices they are being asked to accept. Taking steps to assist the process of replication will do much to improve the conduct of climatology and to ensure that its findings are solidly based, but in the case of papers of pivotal importance politicians must also go further. Where a paper like the Hockey Stick appears to be central to a set of policy demands or to the shaping of public opinion, it is not credible for policymakers to stand back and wait for the scientific community to test the veracity of the findings over the years following publication. Replication and falsification are of little use if they happen after policy decisions have been made. The next lesson of the Hockey Stick affair is that if governments are truly to have assurance that climate science is a sound basis for decision-making, they will have to set up a formal process for replicating key papers, one in which the oversight role is peformed by scientists who are genuinely independent and who have no financial interest in the outcome.

Montford, Andrew (2011-06-06). The Hockey Stick Illusion (pp. 379-383). Stacey Arts. Kindle Edition.

New to LessWrong?

New Comment
40 comments, sorted by Click to highlight new comments since: Today at 9:57 PM

Armstrong and Green estimated that the IPCC report violated 72 of 89 forecasting principles they were able to rate (their list of forecasting principles includes 140 principles, but they judged only 127 as applicable to climate forecasting, and were able to rate only 89 of them)

When I read this, I thought, "who ever fulfills 140 principles for anything?" but it turns out the list is largely sensible.

So that's not really why I'm commenting. I'm commenting because sensible list or not, their application of it is hilarious. Also, for further examples of how the authors have no axe to grind, see their veiled thrusts at the climatologists who snubbed them :P

Selected principles of forecasting "clearly" violated by the IPCC (from Armstrong and Green 2007):

Prior to forecasting, agree on actions to take assuming different possible forecasts.
Make sure forecasts are independent of politics.
Consider whether the events or series can be forecasted.
Ensure that the information is valid. (Manfred - Whoops! Forgot all about that one!)
Examine the value of alternative forecasting methods.
Shrink the forecasts of change if there is high uncertainty for predictions of the explanatory variables. (Manfred - Not sure if this is how uncertainty works)
Use trimmed means, medians, or modes.
Compare reasonable methods.
Test the client’s understanding of the methods. (Manfred - Good luck!)
Examine all important criteria.
Assess face validity.
Tests of statistical significance should not be used.
Present forecasts and supporting data in a simple and understandable form. (Manfred - Oh snap!)

I wonder how they evaluated whether the IPCC "assessed face validity." Were they just expecting a paragraph subtitled "Assessing Face Validity," like a bad 10th grade lab report? Or did they sleuth it out, only concluding after extensive background checks, "these guys couldn't assess face validity if it was biting them in the face very validly. F minus minus."

Prior to forecasting, agree on actions to take assuming different possible forecasts.

Wow. If you want to guarantee that nothing happens with climate forecasting, just insist on this principle.

On Critique #1:

Since you are using Real Climate and Skeptical Science as sources, did you read what they had to say about the Armstrong and Green paper and about Nate Silver's chapter?

Gavin Schmidt's post was short, funny but rude; however ChrisC's comment looks much more damning if true. Is it true?

Here is Skeptical Science on Nate Silver. It seems the main cause of error in Hansen's early 1988 forecast was an assumed climate sensitivity greater than that of the more recent models and calculations (4.2 degrees rather than 3 degrees). Whereas IPCC's 1990 forecast had problems predicting the inputs to global warming (amount of emissions, or radiative forcing for given emissions) rather than the outputs (resulting warming). Redoing accounting for these factors removes nearly all the discrepancy.

In light of the portions I quoted from Armstrong and Green's paper, I'll look at Gavin Schmidt's post:

Principle 1: When moving into a new field, don’t assume you know everything about it because you read a review and none of the primary literature.

Score: -2 G+A appear to have only read one chapter of the IPCC report (Chap 8), and an un-peer reviewed hatchet job on the Stern report. Not a very good start…

The paper does cite many other sources than just the IPCC and the "hatchet job" on the Stern Report, including sources that evaluate climate models and their quality in general. ChrisC notes that the author's fail to cite the ~788 references for the IPCC Chapter 8. The authors claim to have a bibliography on their website that includes the full list of references given to them by all academics who suggested references. Unfortunately, as I noted in my earlier comment, the link to the bibliography from is broken. This doesn't reflect well on the authors (the site on the whole is a mess, with many broken links). Assuming, however, that the authors had put up the bibliography and that it was available as promised in the paper, this critique seems off the mark (though I'd have to see the bibliography to know for sure).

Principle 2: Talk to people who are doing what you are concerned about.

Score: -2 Of the roughly 20 climate modelling groups in the world, and hundreds of associated researchers, G+A appear to have talked to none of them. Strike 2.

This seems patently false given the contents of the paper as I quoted it, and the list of experts that they sought. In fact, it seems like such a major error that I have no idea how Schmidt could have made it if he'd read the paper. (Perhaps he had a more nuanced critique to offer, e.g., that the authors' survey didn't ask enough questions, or they should have tried harder, or contacted more people. But the critique as offered here smacks of incompetence or malice). [Unless Schmidt was reading an older version of the paper that didn't mention the survey at all. But I doubt that even if he was looking at an old version of the paper, it omitted all references to the survey.]

Principle 3: Be humble. If something initially doesn’t make sense, it is more likely that you’ve mis-understood than the entire field is wrong.

Score: -2 For instance, G+A appear to think that climate models are not tested on ‘out of sample’ data (they gave that a ‘-2′). On the contrary, the models are used for many situations that they were not tuned for, paleo-climate changes (mid Holocene, last glacial maximum, 8.2 kyr event) being a good example. Similarly, model projections for the future have been matched with actual data – for instance, forecasting the effects of Pinatubo ahead of time, or Hansen’s early projections. The amount of ‘out of sample’ testing is actually huge, but the confusion stems from G+A not being aware of what the ‘sample’ data actually consists of (mainly present day climatology). Another example is that G+A appear to think that GCMs use the history of temperature changes to make their projections since they suggest leaving some of it out as a validation. But this is just not so, as we discussed more thoroughly in a recent thread.

First off, retrospective "predictions" of things that people already tacitly know, even though those things aren't explicitly used in tuning the models, are not that reliable.

Secondly, it's possible (and likely) that Armstrong and Green missed some out-of-model tests and validations that had been performed in the climate science arena. While part of this can be laid at their feet, part of it also reflects poor documentation by climate scientists of exactly how they were going about their testing. I did read that IPCC AR4 chapter that Armstrong and Green did, and I found it quite unclear on the forecasting side of things (compared to other papers I've read that judge forecast skill, in weather and short-term climate forecasting, macroeconomic forecasting, and business forecasting). This is similar to the sloppy code problem.

Thirdly, the climate scentists whom Armstrong and Green attempted to engage could have been more engaging (not Gavin Schmidt's fault; he wasn't included in the list, and the response rate appears to have been low from mainstream scientists as well as skeptics, so it's not just a problem of the climate science mainstream).

Overall, I'd like to know more details of the survey responses and Armstrong and Green's methodology, and it would be good if they combined their proclaimed commitment to openness with actually having working links on their websites. But Schmidt's critique doesn't reflect too well on him, even if Armstrong and Green were wrong.

Now, to ChrisC's comment:

Call me crazy, but in my field of meteorology, we would never head to popular literature, much less the figgin internet, in order to evaluate the state of the art in science. You head to the scientific literature first and foremost. Since meteorology and climatology are not that different, I would struggle to see why it would be any different.

The authors also seem to put a large weight on “forecasting principles” developed in different fields. While there may be some valuable advice, and cross-field cooperation is to be encouraged, one should not assume that techniques developed in say, econometrics, port directly into climate science.

The authors also make much of a wild goose chase on google for sites matching their specific phrases, such as “global warming” AND “forecast principles”. I’m not sure what a lack of web sites would prove. They also seem to have skiped most of the literature cited in AR4 ch. 8 on model validation and climatology predictions.

Part of the authors' criticism was that the climate science mainstream hadn't paid enough attention to forecasting, or to formal evaluations of forecasting. So it's natural that they didn't find enough mainstream stuff to cite that was directly relevant to the questions at hand for them.

As for the Google search and Google Scholar search, these are standard tools for initiating an inquiry. I know, I've done it, and so has everybody else. It would be damning if the authors had relied only on such searches. But they surveyed climate scientists and worked their way through the IPCC Working Group Report. This may have been far short of full due diligence, but it isn't anywhere near as sloppy as Gavin Schmidt and ChrisC make it sound.

Thanks for a comprehensive summary - that was helpful.

It seems that A&G contacted the working scientists to identify papers which (in the scientists' view) contained the most credible climate forecasts. Not many responded, but 30 referred to the recent (at the time) IPCC WP1 report, which in turn referenced and attempted to summarize over 700 primary papers. There also appear to have been a bunch of other papers cited by the surveyed scientists, but the site has lost them. So we're somewhat at a loss to decide which primary sources climate scientists find most credible/authoritative. (Which is a pity, because those would be worth rating, surely?)

However, A&G did their rating/scoring on the IPCC WP1, Chapter 8. But they didn't contact the climate scientists to help with this rating (or they did, but none of them answered?) They didn't attempt to dig into the 700 or so underlying primary papers, identify which of them contained climate forecasts, and/or had been identified by the scientists as containing the most credible forecasts and then rate those. Or even pick a random sample, and rate those? All that does sound just a tad superficial.

What I find really bizarre is their site's conclusion that because IPCC got a low score by their preferred rating principles, then a "no change" forecast is superior, and more credible! That's really strange, since "no change" has historically done much worse as a predictor than any of the IPCC models.

See the last sentence in my longer quote:

We sent out general calls for experts to use the Forecasting Audit Software to conduct their own audits and we also asked a few individuals to do so. At the time of writing, none have done so.

It's not clear how much effort they put into this step, and whether e.g. they offered the Forecasting Audit Software for free to people they asked (if they were trying to sell the software, which they themselves created, that might have seemed bad).

My guess is that most of the climate scientists they contacted just labeled them mentally along with the numerous "cranks" they usually have to deal with, and didn't bother engaging.

I also am skeptical of some aspects of Armstrong and Green's exercise. But a first outside-view analysis that doesn't receive much useful engagement from insiders can only go so far. What would have been interesting was if, after Armstrong and Green published their analysis and it was somewhat clear that their critique would receive attention, climate scientists had offered a clearer and more direct response to the specific criticisms, and perhaps even read up more about the forecasting principles and the evidence cited for them. I don't think all climate scientists should have done so, I just think at least a few should have been interested enough to do it. Even something similar to Nate Silver's response would have been nice. And maybe that did happen -- if so, I'd like to see links. Schmidt's response, on the other hand, seems downright careless and bad.

My focus here is the critique of insularity, not so much the effect it had on the factual conclusions. Basically, did climate scientists carefully consider forecasting principles (or statistical methods, or software engineering principles) then reject them? Had they never heard of the relevant principles? Did they hear about the principles, but dismiss them as unworthy of investigation? Armstrong and Green's audit may have been sloppy (though perhaps a first pass shouldn't be expected to be better than sloppy) but even if the audit itself wasn't much use, did it raise questions or general directions of inquiry worthy of investigation (or a simple response pointing to past investigation)? Schmidt's reaction seems evidence in favor of the dismissal hypothesis. And in the particular instance, maybe he was right, but it does seem to fit the general idea of insularity.

(Your quote is mangled, you probably have four spaces at the beginning which makes the rendering engine interpret it as a needing to be formatted like code, i.e. No linebreaks)

Thanks, fixed!

Actually, it's somewhat unclear whether the IPCC scenarios did better than a "no change" model -- it is certainly true over the short time period, but perhaps not over a longer time period where temperatures had moved in other directions.

Co-author Green wrote a paper later claiming that the IPCC models did not do better than the no change model when tested over a broader time period:

But it's just a draft paper and I don't know if the author ever plans to clean it up or have it published.

I would really like to see more calibrations and scorings of the models from a pure outside view approach over longer time periods.

Armstrong was (perhaps wrongly) confident enough of his views that he decided to make a public bet claiming that the No Change scenario would beat out the other scenario. The bet is described at:

Overall, I have high confidence in the view that models of climate informed by some knowledge of climate should beat the No Change model, though a lot depends on the details of how the competition is framed (Armstrong's climate bet may have been rigged in favor of No Change). That said, it's not clear how well climate models can do relative to simple time series forecasting approaches or simple (linear trend from radiative forcing + cyclic trend from ocean currents) type approaches. The number of independent out-of-sample validations does not seem to be enough and the predictive power of complex models relative to simple curve-fitting models seems to be low (probably negative). So, I think that arguments that say "our most complex, sophisticated models show X" should be treated with suspicion and should not necessarily be given more credence than arguments that rely on simple models and historical observations.

Actually, it's somewhat unclear whether the IPCC scenarios did better than a "no change" model -- it is certainly true over the short time period, but perhaps not over a longer time period where temperatures had moved in other directions.

There are certainly periods when temperatures moved in a negative direction (1940s-1970s), but then the radiative forcings over those periods (combination of natural and anthropogenic) were also negative. So climate models would also predict declining temperatures, which indeed is what they do "retrodict". A no-change model would be wrong for those periods as well.

Your most substantive point is that the complex models don't seem to be much more accurate than a simple forcing model (e.g. calculate net forcings from solar and various pollutant types, multiply by best estimate of climate sensitivity, and add a bit of lag since the system takes time to reach equilibrium; set sensitivity and lags empirically). I think that's true on the "broadest brush" level, but not for regional and temporal details e.g. warming at different latitudes, different seasons, land versus sea, northern versus southern hemisphere, day versus night, changes in maximum versus minimum temperatures, changes in temperature at different levels of the atmosphere etc. It's hard to get those details right without a good physical model of the climate system and associated general circulation model (which is where the complexity arises). My understanding is that the GCMs do largely get these things right, and make predictions in line with observations; much better than simple trend-fitting.

P.S. If I draw one supportive conclusion from this discussion, it is that long-range climate forecasts are very likely to be wrong, simply because the inputs (radiative forcings) are impossible to forecast with any degree of accuracy.

Even if we'd had perfect GCMs in 1900, forecasts for the 20th century would likely have been very wrong: no one could have predicted the relative balance of CO2, other greenhouse gases and sulfates/aerosols (e.g. no-one could have guessed the pattern of sudden sulfates growth after the 1940s, followed by levelling off after the 1970s). And natural factors like solar cycles, volcanoes and El Niño/La Nina wouldn't have been predictable either.

Similarly, changes in the 21st century could be very unexpected. Perhaps some new industrial process creates brand new pollutants with negative radiative forcing in the 2030s; but then the Amazon dies off in the 2040s, followed by a massive methane belch from the Arctic in the 2050s; then emergency geo-engineering goes into fashion in the 2070s (and out again in the 2080s); then in the 2090s there is a resurgence in coal, because the latest generation of solar panels has been discovered to be causing a weird new plague. Temperatures could be up and down like a yo-yo all century.

Here's a full list of the scientists that Armstrong and Green contacted -- the ones who sent a "useful response" are noted parenthetically. Note that of the 51 who responded, 42 were deemed as having given a useful response.

IPCC Working Group 1

Myles Allen, Richard Alley, Ian Allison, Peter Ambenje, Vincenzo Artale, Paulo Artaxo, Alphonsus Baede, Roger Barry, Terje Berntsen, Richard A. Betts, Nathaniel L. Bindoff, Roxana Bojariu, Sandrine Bony, Kansri Boonpragob, Pascale Braconnot, Guy Brasseur, Keith Briffa, Aristita Busuioc, Jorge Carrasco, Anny Cazenave, Anthony Chen (useful response), Amnat Chidthaisong, Jens Hesselbjerg Christensen, Philippe Ciais (useful response), William Collins, Robert Colman (useful response), Peter Cox, Ulrich Cubasch, Pedro Leite Da Silva Dias, Kenneth L. Denman, Robert Dickinson, Yihui Ding, Jean-Claude Duplessy, David Easterling, David W. Fahey, Thierry Fichefet (useful response), Gregory Flato, Piers M. de F. Forster (useful response), Pierre Friedlingstein, Congbin Fu, Yoshiyuki Fuji, John Fyfe, Xuejie Gao, Amadou Thierno Gaye (useful response), Nathan Gillett (useful response), Filippo Giorgi, Jonathan Gregory (useful response), David Griggs, Sergey Gulev, Kimio Hanawa, Didier Hauglustaine, James Haywood, Gabriele Hegerl (useful response), Martin Heimann (useful response), Christoph Heinze, Isaac Held (useful response), Bruce Hewitson, Elisabeth Holland, Brian Hoskins, Daniel Jacob, Bubu Pateh Jallow, Eystein Jansen (useful response), Philip Jones, Richard Jones, Fortunat Joos, Jean Jouzel, Tom Karl, David Karoly (useful response), Georg Kaser, Vladimir Kattsov, Akio Kitoh, Albert Klein Tank, Reto Knutti, Toshio Koike, Rupa Kumar Kolli, Won-Tae Kwon, Laurent Labeyrie, René Laprise, Corrine Le Quéré, Hervé Le Treut, Judith Lean, Peter Lemke, Sydney Levitus, Ulrike Lohmann, David C. Lowe, Yong Luo, Victor Magaña Rueda, Elisa Manzini, Jose Antonio Marengo, Maria Martelo, Valérie Masson-Delmotte, Taroh Matsuno, Cecilie Mauritzen, Bryant Mcavaney, Linda Mearns, Gerald Meehl, Claudio Guillermo Menendez, John Mitchell, Abdalah Mokssit, Mario Molina, Philip Mote (useful response), James Murphy, Gunnar Myhre, Teruyuki Nakajima, John Nganga, Neville Nicholls, Akira Noda, Yukihiro Nojiri, Laban Ogallo, Daniel Olago, Bette Otto-Bliesner, Jonathan Overpeck (useful response), Govind Ballabh Pant, David Parker, Wm. Richard Peltier, Joyce Penner (useful response), Thomas Peterson (useful response), Andrew Pitman, Serge Planton, Michael Prather (useful response), Ronald Prinn, Graciela Raga, Fatemeh Rahimzadeh, Stefan Rahmstorf, Jouni Räisänen, Srikanthan (S.) Ramachandran, Veerabhadran Ramanathan, Venkatachalam Ramaswamy, Rengaswamy Ramesh, David Randall (useful response), Sarah Raper, Dominique Raynaud, Jiawen Ren, James A. Renwick, David Rind, Annette Rinke, Matilde M. Rusticucci, Abdoulaye Sarr, Michael Schulz (useful response), Jagadish Shukla, C. K. Shum, Robert H. Socolow (useful response), Brian Soden, Olga Solomina (useful response), Richard Somerville (useful response), Jayaraman Srinivasan, Thomas Stocker, Peter A. Stott (useful response), Ron Stouffer, Akimasa Sumi, Lynne D. Talley, Karl E. Taylor (useful response), Kevin Trenberth (useful response), Alakkat S. Unnikrishnan, Rob Van Dorland, Ricardo Villalba, Ian G. Watterson (useful response), Andrew Weaver (useful response), Penny Whetton, Jurgen Willebrand, Steven C. Wofsy, Richard A. Wood, David Wratt, Panmao Zhai, Tingjun Zhang, De’er Zhang, Xiaoye Zhang, Zong-Ci Zhao, Francis Zwiers (useful response)

Union of Concerned Scientists

Brenda Ekwurzel, Peter Frumhoff, Amy Lynd Luers

Channel 4 “The Great Global Warming Swindle” documentary (2007)

Bert Bolin, Piers Corbyn (useful response), Eigil Friis-Christensen, James Shitwaki, Frederick Singer, Carl Wunsch (useful response)

Wikipedia’s list of global warming “skeptics”

Khabibullo Ismailovich Abdusamatov (useful response), Syun-Ichi Akasofu (useful response), Sallie Baliunas, Tim Ball, Robert Balling (useful response), Fred Barnes, Joe Barton, Joe Bastardi, David Bellamy, Tom Bethell, Robert Bidinotto, Roy Blunt, Sonja Boehmer, Andrew Bolt, John Brignell (useful response), Nigel Calder, Ian Castles (useful response), George Chilingarian, John Christy (useful response), Ian Clark, Philip Cooney, Robert Davis, David Deming (useful response), David Douglass, Lester Hogan, Craig Idso, Keith Idso, Sherwood Idso, Zbigniew Jaworowski, Wibjorn Karlen, William Kininmonth, Nigel Lawson, Douglas Leahey, David Legates, Richard Lindzen (useful response), Ross Mckitrick (useful response), Patrick Michaels, Lubos Motl (useful response), Kary Mullis, Tad Murty, Tim Patterson, Benny Peiser (useful response), Ian Plimer, Arthur Robinson, Frederick Seitz, Nir Shaviv, Fred Smith, Willie Soon, Thomas Sowell, Roy Spencer, Philip Stott, Hendrik Tennekes, Jan Veizer, Peter Walsh, Edward Wegman Other sources Daniel Abbasi, Augie Auer, Bert Bolin, Jonathan Boston, Daniel Botkin (useful response), Reid Bryson, Robert Carter (useful response), Ralph Chapman, Al Gore, Kirtland C. Griffin (useful response), David Henderson, Christopher Landsea (useful response), Bjorn Lomborg, Tim Osborn, Roger Pielke (useful response), Henrik Saxe, Thomas Schelling (useful response), Matthew Sobel, Nicholas Stern (useful response), Brian Valentine (useful response), Carl Wunsch (useful response), Antonio Zichichi.

This comment was getting a bit long, so I decided to just post relevant stuff from Armstrong and Green first and then offer my own thoughts in a follow-up comment.

We surveyed scientists involved in long-term climate forecasting and policy makers. Our primary concern was to identify the most important forecasts and how those forecasts were made. In particular, we wished to know if the most widely accepted forecasts of global average temperature were based on the opinions of experts or were derived using scientific forecasting methods. Given the findings of our review of reviews of climate forecasting and the conclusion from our Google search that many scientists are unaware of evidence-based findings related to forecasting methods, we expected that the forecasts would be based on the opinions of scientists. We sent a questionnaire to experts who had expressed diverse opinions on global warming. We generated lists of experts by identifying key people and asking them to identify others. (The lists are provided in Appendix A.) Most (70%) of the 240 experts on our lists were IPCC reviewers and authors. Our questionnaire asked the experts to provide references for what they regarded as the most credible source of long-term forecasts of mean global temperatures. We strove for simplicity to minimize resistance to our request. Even busy people should have time to send a few references, especially if they believe that it is important to evaluate the quality of the forecasts that may influence major decisions. We asked: “We want to know which forecasts people regard as the most credible and how those forecasts were derived… In your opinion, which scientific article is the source of the most credible forecasts of global average temperatures over the rest of this century?” We received useful responses from 51 of the 240 experts, 42 of whom provided references to what they regarded as credible sources of long-term forecasts of mean global temperatures. Interestingly, eight respondents provided references in support of their claims that no credible forecasts exist. Of the 42 expert respondents who were associated with global warming views, 30 referred us to the IPCC’s report. A list of the papers that were suggested by respondents is provided at in the “Global Warming” section.

Unfortunately, the Forecasting Principles website seems to be a mess. Their Global Warming Audit page:

does link to a bibliography, but the link is broken (as is their global warming audit link, though the file is still on their website).

(This is another example where experts in one field ignore best practices -- of maintaining working links to their writing -- so the insularity critique applies to forecasting experts).


Based on the replies to our survey, it was clear that the IPCC’s Working Group 1 Report contained the forecasts that are viewed as most credible by the bulk of the climate forecasting community. These forecasts are contained in Chapter 10 of the Report and the models that are used to forecast climate are assessed in Chapter 8, “Climate Models and Their Evaluation” (Randall et al. 2007). Chapter 8 provided the most useful information on the forecasting process used by the IPCC to derive forecasts of mean global temperatures, so we audited that chapter.

We also posted calls on email lists and on the site asking for help from those who might have any knowledge about scientific climate forecasts. This yielded few responses, only one of which provided relevant references.

Trenberth (2007) and others have claimed that the IPCC does not provide forecasts but rather presents “scenarios” or “projections.” As best as we can tell, these terms are used by the IPCC authors to indicate that they provide “conditional forecasts.” Presumably the IPCC authors hope that readers, especially policy makers, will find at least one of their conditional forecast series plausible and will act as if it will come true if no action is taken. As it happens, the word “forecast” and its derivatives occurred 37 times, and “predict” and its derivatives occurred 90 times in the body of Chapter 8. Recall also that most of our respondents (29 of whom were IPCC authors or reviewers) nominated the IPCC report as the most credible source of forecasts (not “scenarios” or “projections”) of global average temperature. We conclude that the IPCC does provide forecasts. In order to audit the forecasting processes described in Chapter 8 of the IPCC’s report, we each read it prior to any discussion. The chapter was, in our judgment, poorly written. The writing showed little concern for the target readership. It provided extensive detail on items that are of little interest in judging the merits of the forecasting process, provided references without describing what readers might find, and imposed an incredible burden on readers by providing 788 references. In addition, the Chapter reads in places like a sales brochure. In the three-page executive summary, the terms, “new” and “improved” and related derivatives appeared 17 times. Most significantly, the chapter omitted key details on the assumptions and the forecasting process that were used. If the authors used a formal structured procedure to assess the forecasting processes, this was not evident. [...] Reliability is an issue with rating tasks. For that reason, it is desirable to use two or more raters. We sent out general calls for experts to use the Forecasting Audit Software to conduct their own audits and we also asked a few individuals to do so. At the time of writing, none have done so.

Next, they say the complexity of the global warming problem makes forecasting a fool’s errand. “There’s been no case in history where we’ve had a complex thing with lots of variables and lots of uncertainty, where people have been able to make econometric models or any complex models work,” Armstrong told me. “The more complex you make the model the worse the forecast gets.”

Counterexample: integrated circuits. Trying to simulate an Intel microprocessor is damn hard, but they work anyway. In general, engineers sometimes have to deal with the kinds of problems that this implies are impossible, and they frequently get the job done anyway.

Intel's main advantage is that they designed the thing they are trying to simulate. No one designed the economy.

I think the main advantage is being able to perform controlled experiments, instead of simply observational measurements.

This only works because of rapid feedback. Long-range scientific forecasting is much too slow to work this way.

Counterexample: Simulation of interactions within cells. Despite the huge complexity of living cells there were some good simulations created based on known pathways in the cell. They created a finite state machine to model the cell.

the stable states discovered were later found to correspond extremely well with various tissues of the organism and predicted states that would cause apoptosis quite well.

I'll dig out my old notes and give a cite later.

complex thing with lots of variables and lots of uncertainty

The whole point of digital circuitry is that this form of uncertainty is (near)eliminated and does not compound. Arbitrary complexity is manageable given this constraint.

Now when some of these data and programs are leaked, would it be possible to make an open-source project to rewrite the code in Python, with unit tests etc.?

Could someone build their scientific carreer in climate science on fixing the software? I guess it would depend on whether the other people would cooperate: share their data and code, and use the fixed code.

Not only would it be possible, but there already exists one ( which has recreated GISTEMP in Python, found some bugs, but find that their code produces near-identical results to original GISTEMP. They say:

It is our opinion that the GISTEMP code performs substantially as documented in Hansen, J.E., and S. Lebedeff, 1987: Global trends of measured surface air temperature. J. Geophys. Res., 92, 13345-13372., the GISTEMP documentation, and other papers describing updates to the procedure.

This ccc-gistemp project seems, however, to be stalled, in that it hasn't released anything since 2010 (though there were a few blog posts in 2011). This doesn't seem to be because they achieved everything they hoped to; the last information on their website indicates that there's plenty more to do.

(This was made possible not because anyone leaked or stole anything, but because NASA released the GISTEMP code.)

Good point. I don't know. The situation might have improved already, in terms of people releasing their code (though probably not in terms of the code being of great quality). At least, from what I've seen, releasing data and code is now somewhat more common. For instance, when critiquing a recent paper by Mann, Steve McIntyre wrote:

Michael Mann has had a paper on the Atlantic Multidecadal Oscillation (AMO) accepted by Geophysical Research Letters: “On forced temperature changes, internal variability, and the AMO”. The abstract and access to Supplementary Information is here . Mann has made a preprint of the paper available, here . More importantly, and very commendably, he has made full data and Matlab code available.

I don't know if code quality has improved since the rather embarrassing 2009 leak.

The code for the Mann 1998 paper mentioned in the post was written in Fortran, and McIntyre found it on an old FTP server of Mann's. He rewrote the code in R. At least people are writing code in languages like R and Matlab and maybe Python now.

Some researchers still use unconventional languages for their mathematical programming, making their results more difficult for colleagues to check even if they do release their code. See for instance:

If I understand it correctly those kind of computer models need large amount of computer power to run. I would also guess that writing the software decently is more than a one man job.

I don't know much details, but I imagine you could test the program with a subset of the data. And maybe finding a faster algorithm could help, too. If the programs are as horrible as they were described, there is a chance they are not optimal. Maybe it's possible to start with some small parts, and gradually add more functionality.

I think it's a quite challenging task. It needs knowledge about climate science. It needs knowledge about software architecture and organizing a big and complex project. It needs knowledge about statistics and efficient algorithms for them.

It might be interesting to think about funding sources for such a project.

Given the political nature is there someone interested in funding a project that might not provide the answer skeptics want but that also criticises the establishment of climate scientists in some sense?

Could you motivate climate skeptics to donate money via kickstarter or indiegogo for such an open source project? What about

Are there companies that would pay good money for better predictions about the weather in 5 years?

As a computational physicist working on a topic with little to do with climate change (namely, ultra-high-energy cosmic rays), let me (er...) assure you that few of those problems are unique to climatology. Cowboy coding FTW! (Sigh.)


testing if i am banned

[This comment is no longer endorsed by its author]Reply

J. Scott Armstrong (faculty page, Wikipedia) is one of the big names in forecasting. In 2007, Armstrong and Kesten C. Green co-authored a global warming audit (PDF of paper, webpage with supporting materials) for the Forecasting Principles website. that was critical of the forecasting exercises by climate scientists used in the IPCC reports.

Given that they made a new IPCC report since 2007 it would be interesting how much the IPCC improved in that regard.

Insularity of disciplines from each other is a common feature (or bug) in much of academia. Can we really expect or demand that climate scientists hold themselves to a higher standard?

We can demand that they follow a process that's likely to produce truth. Whether or not other academic disciplines do crap doesn't really matter. It's not about having fair standards, it's about having standards that produce truth.

We can demand that they follow a process that's likely to produce truth.

You can demand anything you want, but that doesn't mean you'll get it.

But we can and should adjust our trust in their estimates according to our estimates of the truth finding power of their process.

You can demand anything you want, but that doesn't mean you'll get it.

At the moment climate scientists aren't getting the kind of legislative action that they demand either.

We can also freely ridicule climate scientists for saying silly things such as that our knowledge of climate change is comparable to our certainty about evolution or the age of the earth.

There are a lot of news outlets very interested to covering demands that climate scientists should get their act together in case someone actually goes for it.

I hope some high profile people start challenging big talkers with public bets. Put up or shut up, publicly.

See overcomingbias for some blogposts on betting, and some good reasons as to why people don't bet on such things (especially in the comments, and in the second post, since the first post was in favor of bets)

and some good reasons as to why people don't bet on such things (especially in the comments, and in the second post

I read the second post as listing the various social advantages gained by not betting. Those are good reasons for social advantage.

But when you're interested in the truth, they're not good reasons. Converting social advantage talky talk into actions with consequences tied to the truth helps in two way.

The first way is the aforementioned "Put up, or shut up." Putting some skin in the game focuses the mind, and reveals true preferences.

The second way is more important. Making a bet entails converting all the talky talk into a testable proposition dependent on specific measurements.

Or at least it does if at least one of the bettors is actually concerned with the truth, and competent to formulate the proposition. Two social weasels could easily agree to a bet with ambiguous terms.

I'm not willing to pay excessive costs to prove myself right. Bear in mind that social costs are still costs.

I'm also not willing to impose excessive costs on others as part of proving myself right. Consider that if betting becomes widespread, it may have the effect that poor people are locked out of intellectual circles.

I'm also not willing to take excessive risks to prove myself right. If I have a 95% chance of being right, but the loss for the 5% chance of being mistaken is large enough that I'm risk averse about it, I'm not going to make the bet.

Some of those reasons are perverse incentives. For instance, if I am going to bet on X, that gives me a financial interest in not convincing people of X, and in weakening my arguments for X. (That's not in the posts, but it is in the comments.)

Furthermore, if I believe X,I don't need to bet on X to convince myself of it--after all, I believe it already! I'd be making the bet to convince other people. Needless to say, many of the problems with betting directly relate to convincing other people.

Have you looked at (mentioned in an UPDATE at the end of Critique #1 in my post)?

Is this what it comes down to, that Gore refused to bet, so they presumed to make a pretend bet for him?

Boo. Lame. Worse than lame. Deceptive. (On their part.)

Tell me it aint so.

“Now, assume that Armstrong and Gore made a gentleman‟s bet (no money) and that the ten years of the bet started on January 1, 2008. Armstrong‟s forecast was that there would be no change in global mean temperature over the next ten years. Gore did not specify a method or a forecast. Nor did searches of his book or the Internet reveal any quantitative forecasts or any methodology that he relied on. He did, however, imply that the global mean temperature would increase at a rapid rate – presumably at least as great as the IPCC‟s 1992 projection of 0.03°C-per-year. Thus, the IPCC‟s 1992 projection is used as Gore‟s forecast.

The full correspondence is here:

Maybe it's lame (?) but I don't think they're being deceptive -- they're quite explicit that Gore refused to bet.

The fact that he refused to bet could be interpreted either as evidence that the bet was badly designed and didn't reflect the fundamental point of disagreement between Gore and Armstrong, or as evidence that Gore was unwilling to put his money where his mouth is.

I'm not sure what interpretation to take.

btw, here's a bet that was actually properly entered into by both parties (neither of them a climate scientist):

No, but good stuff. Thanks.