Posted by: Barry Bickmore | September 7, 2011

Roy Spencer Responds With More Excuses

In my last post, I gave some of the details of Andrew Dessler’s latest paper, which criticizes a recent paper by Roy Spencer and Danny Braswell.  One of the criticisms I highlighted was the charge that S&B said they had analyzed output from 14 climate models, but only compared 6 of the models to the data–the 3 with the least, and the 3 with the greatest, climate sensitivity.  They argued that the 3 least sensitive models did a slightly better job (on average) than the 3 most sensitive ones, but none of them were very good at reproducing the data, so maybe that indicates the real climate is less sensitive than ANY of the models.  They also used the temperature series (there are several) that gave the most marked difference from the data.  I provided a number of links to show that Spencer has a history of botching his statistics, and noted that in the past he has simply brushed off criticisms of his statistical abuse, relying on the statistical naïveté of his core audience.

Well, true to form, Spencer has now responded to Dessler’s criticism with more statistical sleight of hand.  But as I predicted, the facade seems to be cracking a bit, because this time the errors are a bit too obvious, and much too egregious.  I won’t go into all of Spencer’s points (since some of them deal with some of Dessler’s points that I didn’t cover last time,) and focus on the issues of statistical abuse and data hiding.

One of Dessler’s points was that SOME of the models (i.e., the ones that simulate El Niño cycles the best) do pretty well at mimicking the pattern in the data S&B pointed out.  And since models that don’t allow clouds to force the system can mimic the data well, the fact that another model (S&B’s) that does allow clouds to force the system also can mimic the data well is sort of meaningless.  It tells us that the kind of argument S&B were trying to make doesn’t hold water, rather than that Dessler had proven anything about clouds in this way.  But let’s see how Roy responded.

But look at what Dessler has done: he has used models which DO NOT ALLOW cloud changes to affect temperature, in order to support his case that cloud changes do not affect temperature! While I will have to think about this some more, it smacks of circular reasoning.

Unbelievable.  He has completely turned Dessler’s argument on it’s head.

Another of Dessler’s criticisms was that if Spencer and Braswell want to make an argument that one data set (e.g., the observational data) is “different” than another, they have to do a statistical analysis.  That’s just the nature of the problem.  But S&B failed to calculate error bars for either the model results (which Trenberth and Fasullo did) or the slopes they calculated for the data (which Dessler did).  This is all just standard statistics that every scientist should know.  But what does Spencer think of error bars?

Figure 2 in his paper, we believe, helps make our point for us: there is a substantial difference between the satellite measurements and the climate models. He tries to minimize the discrepancy by putting 2-sigma error bounds on the plots and claiming the satellite data are not necessarily inconsistent with the models.

But this is NOT the same as saying the satellite data SUPPORT the models. After all, the IPCC’s best estimate projections of future warming from a doubling of CO2 (3 deg. C) is almost exactly the average of all of the models sensitivities! So, when the satellite observations do depart substantially from the average behavior of the models, this raises an obvious red flag.

With that, Spencer dispenses with a standard statistical technique to deal with uncertainty!!!  But let’s review Spencer’s further argument that it’s a big deal if the AVERAGE behavior of all the models don’t match the data in this case.

What Dessler (and Trenberth and Fasullo) showed was that the deviation of the data from the observed values probably had little, if anything to do with climate sensitivity.  The ones that do well are the ones that have ALREADY been shown to mimic the El Niño cycle well.  That makes sense, because the data correlations S&B calculated were over the span of MONTHS–the kind of timescale El Niño operates on.  Given that the models that do well are NOT among the “3 least sensitive” and “3 most sensitive” models S&B showed, then how can this analysis possibly be getting at climate sensitivity in any meaningful way?

Now, if deviations from the observations S&B highlight are just due to how well the models reproduce El Niño, then that means some models are probably better than others at mimicking short-term behavior in the climate system… something else that was already well known.  What it doesn’t tell us is how good any of the models are at projecting long-term behavior, which is WHAT EVERYONE CARES ABOUT.

[UPDATE:  Down in the comments, HAS points out this paper by Belmadani et al., which he says indicates that the models that did the best at mimicking the observational data pattern in the lag regression statistics aren’t necessarily “the best” at mimicking ENSO.  This is based on a different kind of comparison (less direct), and I don’t know what other model-data comparisons others have made regarding this issue, so it’s hard for me to say at this point whether I think HAS is right.  The fact is, however, that the lag regression analysis certainly gets at SOMETHING about short-term variability, so whether it’s El Niño alone, or a combination of factors that Spencer’s analysis is getting at, it still doesn’t seem to have anything much to do with climate sensitivity.]

Roy Spencer really, really wants to use the data to say something profound about climate sensitivity (as long as it is low).  If that’s his aim,  here’s a tip.  Maybe he could come up with some statistical test for how well the models reproduce the data–ALL the data sets, not just the one he chose–e.g., he could simply use the sum of squared error.  Then he could plot that vs. the climate sensitivity of the models.  Maybe a statistician can pipe up here and suggest a more sophisticated way of doing it.  I have no idea what Spencer would come up with (although it’s clear from Dessler’s plots that it wouldn’t be a strong correlation), but it would at least nominally address the question he’s attempting to answer.

As it is, he’s waving a red herring, while simultaneously accusing Dessler of doing so.

Finally, what does he say about the “missing” data–the model output that undercut his case, but somehow didn’t make it into his Results or figures?

How is picking the 3 most sensitive models AND the 3 least sensitive models going to “provide maximum support for (our) hypothesis”? If I had picked ONLY the 3 most sensitive, or ONLY the 3 least sensitive, that might be cherry picking…depending upon what was being demonstrated.

And where is the evidence those 6 models produce the best support for our hypothesis? I would have had to run hundreds of combinations of the 14 models to accomplish that. Is that what Dr. Dessler is accusing us of?

Instead, the point of using the 3 most sensitive and 3 least sensitive models was to emphasize that not only are the most sensitive climate models inconsistent with the observations, so are the least sensitive models.

Remember, the IPCC’s best estimate of 3 deg. C warming is almost exactly the warming produced by averaging the full range of its models’ sensitivities together. The satellite data depart substantially from that. I think inspection of Dessler’s Fig. 2 supports my point.

Again, he’s still on the sensitivity issue.  This is why I didn’t go all the way and accuse Roy of deliberately hiding data.  I’ve demonstrated over and over that he has a tendency to find low climate sensitivity wherever he looks, no matter whether his analysis really addresses the issue, or not.  In this case, he was probably so intent on his goal that it didn’t even cross his mind that he was leaving out data that undercut his case.


  1. Hello Barry,

    ‘And since models that don’t allow clouds to force the system can mimic the data well, the fact that another model (S&B’s) that does allow clouds to force the system also can mimic the data well is sort of meaningless.’

    I don’t really understand what you’re getting at here, can you clarify? How does the success of models [that don’t allow clouds to force the system] in reproducing the data render the fact that S&B’s model [that does allow clouds to force the system] also reproduces the data well meaningless?

    Probably I don’t understand this because I really don’t follow Dr. Spencer’s original point on this either. But if I don’t get it maybe other readers are wondering the same.


    • Hi Mark,

      S&B used their modeling of the data to show that they could explain it if clouds are allowed to force the system. They figured that was good evidence that clouds DO force the system. Dessler showed that some of the regular climate models, which do not allow clouds to force the system, also mimic the data pretty well. Therefore, since models that do, and models that don’t, allow clouds to force the system can mimic the data pattern, we can’t really use any of this as evidence that clouds do or don’t force the system. Is that clearer, I hope?

      • Crystal, thank you!

  2. Why did Spencer go from using monthly data in his paper to 3-monthly data? He is not comparing apples with apples– Dessler very clearly states that he calculated sigma of CdT/dt using monthly MERRA data. Surely Spencer using 3-monthly data would lead to him underestimating the sigma of CdT/dt? Yet that possibility appears to not have dawned on Spencer.

    As for Spencer’s obsession with low climate sensitivity and seeing it lurking wherever he looks…yup, hit the nail on the head there Dr. Bickmore.

    But I think that you are giving Spencer too much credit when you say “In this case, he was probably so intent on his goal that it didn’t even cross his mind that he was leaving out data that undercut his case.”

    Actually, Spencer and Braswell (where is Braswell in all of this, is he a recluse?) state in their Conclusions that “Finally, since much of the temperature variability during 2000-2010 was due to ENSO [8], we conclude that ENSO-related temperature variations are partly radiatively forced”. So they knew/know that the models’ ability to simulate ENSO is what is really being tested here, not climate sensitivity. And given that, why would they ignore those models which simulated ENSO best? His excuse does not hold up to scrutiny.

    • Hi Maple,

      I’m still suspicious, too, but my experience with Monckton has convinced me that I shouldn’t be too quick to cry “Fraud!” I now think that he really does believe he’s a member of Parliament, and that he has invented a miracle cure-all. He really does believe that his fake CO2 and temperature projections are right. My current opinion is that he’s a bit (!!!) on the nutty side. The same thing goes for Spencer. Is he smart? Sure. But he seems incapable of dealing with criticism.

      • from long discussions with people unconvinced of AGW (sometimes extremely so), i became convinced that at least the people i talked to really believed in what they said, no matter how very easy it was to show their flaws in logic. i’m not even sure tim ball (for example) is realizing he’s plain lying when he fakes his professional record. i think most people accusing unconvinced’s of conscious fraud underestimate the power of denial. NEVER underestimate the power of denial.


      • Peter, that is a great point. If I used webcite, it’d mark it!

  3. If we want to find the “multi-model ensemble” with the best correlation to the observational time series, there are 16,383 possible combinations using from 1 to 14 models in the ensemble. Only 63 of those combinations would be considered using the six models hand-selected by Spencer and Braswell. Not even 1% of the sample space.

    Dr. Spencer doesn’t realize that “where is the evidence those 6 models produce the best support for our hypothesis” — whether as an ensemble or individually — is exactly the question most of us are asking. It was most definitely not in the paper.

  4. Barry, you state here (emph. mine):
    “They argued that the 3 least sensitive models did a slightly better job (on average) than the 3 most sensitive ones, but none of them were very good at reproducing the data, so maybe that indicates ***the real climate is more sensitive*** than ANY of the models.”

    More, or less? I would think Spencer wouldn’t want to argue for more, certainly not.

    Good article BTW.

  5. FWIW, and I’m being facetious, the three models which did do a good job at simulating ENSO and which were closest to the observations all have an equilibrium climate sensitivity greater than +3 C for doubling CO2 😉

  6. […] Roy Spencer Responds With More Shoddy Statistics and Excuses – Barry Bickmore Eco World Content From Across The Internet. Featured on EcoPressed College Campuses Ban Bottled Water Share this:TwitterFacebookDiggStumbleUponEmailRedditLike this:Like3 bloggers like this post. […]

  7. “Remember, the IPCC’s best estimate of 3 deg. C warming is almost exactly the warming produced by averaging the full range of its models’ sensitivities together. ”

    Spencer emphasizes this point – but the IPCC’s best estimate of 3 deg. C warming I would argue is more supported by paleoclimate evidence and other factors than it is by the model mean – yet it seems like Spencer is trying to imply that the IPCC best estimate was chosen _because_ it is the model mean.


  8. Barry, your title “Roy Spencer responds with more shoddy statistics and and excuses” really does a disservice to you and disparages the great institution that you work for and of which I am an alumnus. As your comments become more shrill and crass in order to gain attention, you will find that you will be further marginalized in the global warming debate. If you take out Dana1981, this already reduces the number of responses on your blog considerably. Compare to Judith Curry with hundreds of responses for many of her posts.

    It really speaks poorly of your ability to have a real conversation regarding the facts such as the double standard and hypocrisy the global warming orthodoxy sets (i.e. it’s ok for the IPCC not to publish error bars (that you can drive a mack truck through) on the hockey stick or Doran/zimmerman not publishing margins of error of the study you so lovingly publicize about “97% of climate scientists”, or the ability to rearrange some of Dessler’s data points from 2010 to show a negative feedback instead of positive if you add in lag time ( or the wonderful cherry-picking dana1981 did on trying desperately to show that the IPCC predictions follow real-world data (here’s a refresher on what the IPCC claimed in 2007: Please note that the IPCC said that if we kept “emissions at 2000 levels to expect a further 1degree warming”. Well, right now we’re at about 0.12C for the decade 2000-2011 which is slightly above what IPCC said would be if we held everything constant at 2000 levels and not near the 0.2C claimed , which is why Trenberth is desperately trying to find the “missing heat” and why this “crisis” is averted.

    In the end, you have proven yourself (as well as dana1981) to be utterly unreliable in exposing the truth where it lies. You have taken a stand that clearly undercuts objectivity. Your theory is slowly, but surely, falling into the crevasses of history, never to return. Please don’t bother to respond, sir, as I find your blog posts utterly unconvincing and shrill and will not waste my time any further with it.

    • Scott,

      I’m sorry you don’t like my style, but the fact is that what Roy has done is dishonest. Period. Maybe Doran and Zimmerman should have calculated error bars–I’ve said as much to you before. But if they had done so, they would have a margin of error of a few percentage points. So what? That wouldn’t affect their main conclusions, at all. The Anderegg study, which got about the same answer, didn’t need error bars because they were dealing with the full population they were studying. Roy’s data hiding DID affect his conclusions drastically.

      So the bottom line is that you are trying to excuse hiding data that undercuts stated conclusions by comparing it to a study where they didn’t calculate error bars, and it manifestly wouldn’t have made much difference. Do you really believe yourself, here? Honestly?

      And haven’t I pointed out before that the IPCC models DO predict that there will be periods of several years with less warming–it’s just hard to predict exactly when that would happen? Sorry, but if the models are really very far off, it will take a while longer to be sure about it. Right now we’re well within the error bars of the model predictions.

      Also, if you want to understand what Kevin Trenberth was talking about, why don’t you look up the paper he published, and to which he was referring in the super-secret conspiracy e-mail.

      Finally, if you don’t like my blog, you’re welcome to go somewhere else.

    • Ok, Scott, I’ve finally done it! You’ve finally made me so tired of hearing from you about how Peter Doran and Maggie Kendall-Zimmerman didn’t calculate a margin of error for their poll results, that I’ve actually gone and done it myself!

      Here’s the Wikipedia page on how to calculate a margin of error on poll results. I’m using the assumption of random sampling.

      To calculate one standard deviation (sigma) you use the following formula, where n is the sample size and p is the percentage answering one way or the other.

      sigma = (p(1-p)/n)^2

      I plugged in their example numbers to make sure I was doing it the right way, then plugged in p = 0.974 and n = 77. This gave sigma = 0.018.

      To get the 95% confidence interval, you just multiply sigma by 1.96. To get a 99% confidence interval, you multiply by 2.58. Here’s what I got.

      The percentage of actively publishing climate scientists who agree that humans are significantly contributing to climate change is (drumroll)…

      97.4% (+/- 3.6%, 95% confidence)
      97.4% (+/- 4.7%, 99% confidence)

      Obviously, we can’t have more than 100%, so just cut the top end of the confidence intervals off at something like 99.9%. The 95% confidence interval is the standard one to report.

      After all this time, with you going on, and On, and ON about how dishonest and/or incompetent they were, you were really arguing about how the real percentage could be as low as 93.8%. Personally, I’m not very shocked and dismayed by those numbers… I don’t know about you.

      So with that data in hand, feel free to go on some more about how I’m SO hypocritical for talking about D&Z’s results as if they were legitimate, while simultaneously criticizing Roy Spencer for 1) using made-up statistical methods that can give him any answer he wants, if he allows his parameters to stray into wildly unphysical territory, and 2) leaving out data that he says he’s analyzed, when those data undercut his conclusions in a big way.

      Go on…

  9. Thanks for helping clarify and highlight the problems with the S&B paper, Barry.

    (BTW – re Scott’s comment – I know that your blog won’t ever sink to the level of Curry’s, which is designed to pander to deniers who clutter it with more nonsense than any sane person can stomach. I much prefer your clear straight talk to curry puff and waffle.)

  10. […] papers that Dessler finds to be flawed?  Not really (but attempted and already criticized).  It's a bit early for any substantive criticism, so for now, assumptions, speculation, and […]

  11. Barry,

    Could you provide some context to Dessler’s using a 700 m mixed layer. Spencer is intimating Dessler originally used a 100 m layer and altered it to 700 m to fit his model to Spencer’s criticisms.

    I am also uncertain what is the error which Spencer claims has resulted in Dessler being out by a factor of 10 in his calculations – has Dessler changed the depth of the mixed layer to reduce this error?

    Changing a layer from 100 m to 700 m feels a bit like the criticisms you have made of Spencer for fitting his model using unrealistic physical parameters. Surely this layer is a measurable physical reality and the models should not be fitting this parameter to improve the data fit but rather fixing it to the right depth to show how realistic the model output is?

    Or is my very inexpert understanding out!?



    • Hi Chinahand,

      Arthur Smith has some initial comments on this over on Roy Spencer’s blog. Here’s my understanding. In Spencer and Braswell’s paper, they assumed about a 25 m mixed layer, which is way too shallow. A more reasonable choice for a model like theirs would be about 100 m, which is what Dessler used. Spencer used Levitus’s ocean heat content data to criticize Dessler’s numbers (saying they were off by a factor of 10), but Arthur has pointed out that it looks like Spencer is off by a factor of 10 because he calculated the heat flux for the full 700 m (which is how deep the data goes). So in essence, Spencer was once again assuming a 700 m mixed layer. Is that clearer?

      UPDATE: I’m not sure this is right. See below.

    • Wait! I think maybe both Roy and Arthur didn’t have it exactly right. More later. I’ll have to wait a bit to see how this works out, but it should be interesting.

  12. Barry,
    Chinahand and Artur make some excellent points. I have some thoughts on this, but right now have to take care of some stuff….more later today.

  13. I think you need to take some care claiming that the models used by D11 or cited by T&F at RealClimate are in fact the best at modelling ENSO – see “ENSO Feedbacks and Associated Time Scales of Variability in a Multimodel Ensemble” Belmadani et al J of Climate 2010

    • Thanks, HAS. There’s no denying these models are better at mimicking the short-term feedbacks (as seen in the lag regression stats), however. And it certainly doesn’t have anything to do with equilibrium climate sensitivity.

      • I gather that what you are now asserting is that the models that best fit with the lag regression stats are those that best fit with the lag regression stats, and that is hard to argue with.

        What you said (and D11 and T&F said in a different form) was “The ones that do well are the ones that have ALREADY been shown to mimic the El Niño cycle well.”

        As it turns out these models turn out not to model ENSO as well as others that fit less well to the lag regression stats. This suggests D11’s claim that “.. since most of the climate variations over this period were due to ENSO, this suggests that the ability to reproduce ENSO is what’s being tested here, not anything directly related to equilibrium climate sensitivity” is unproven (as is your bold reassertion of this in your reply above).

        • Hit the nail on the head. Profound, isn’t it? 😉 In any case, one good thing that might come out of this is that the lag regression statistics surely say SOMETHING about short-term variations. So even if it isn’t ENSO they’re getting at, maybe if you can find a model that does both ENSO and this well, it would be a step forward trying to get decent decade-scale projections.

      • Pleased we’ve cleared up that you were incorrect to say GCMs that model ENSO well also model the lagged regressions well.

        I asume you’ll go through this post and the last and correct the comments about S&D made under this misapprehension?

      • I haven’t had time to read the paper you referenced. I’ll take a look in the next few days, and maybe ask around a bit. If I think you have a good point, I’ll put an update note above. Fair enough?

      • Appreciate the update.

        Also note in passing that D11 wasn’t completely correct to say “.. the models that do a good job simulating the observations (GFDL CM 2.1, MPI ECHAM5, and MRI CGCM 2.3.2A) are among those that have been identified as realistically reproducing ENSO [Lin, 2007].”

        Lin2007 does not include MRI CGCM 2.3.2A as one of the GCMs models that realistically reproduce ENSO. It is in the group that “.shows an oscillation with a constant period shorter than the observed ENSO period, sometimes also with a constant amplitude” and Lin accordingly puts it aside.

  14. So why is Dessler revising his paper again?

    • Because Roy pointed out a place where he thought Dessler had mischaracterized his claims (not enough nuance), and Dessler thought he had a good point. I don’t know of anything else he’s revising.

  15. Sorry, got busy yesterday, and am busy today as well.

    Barry, I think the paper is question has been sought out to confuse. Even **Spencer agrees that the variations in global temps for the for the short study period were modulated by ENSO**. But he and Dessler looked at near global data, so the models need to simulate ENSO well while also needing to do a good job at simulating all the other processes at work too.

    Roy is trying, again, to distract people from his significant methodological and theoretical problems and blatant cherry picking by nit picking at Dessler’s paper. It is quite comical.

    Dessler has been behaving very professionally during this episode…Spencer? Nope, not even close– Spencer seems more interested in feeding fodder to his target audience and scoring petty points against Dessler, someone who clearly intimidates him. That is really petty and juvenile. Sadly, Spencer’s target audience are lapping it up because, for the most part, they do not know any better.

    • I agree with you, Maple. I did an update above based on HAS’s comments, in which I tried to get across that I thought this was a nitpick. Do you agree with what I said there?

      • Hi Barry, yes that seems to be the jist of it. The paper in question was very specific to studying variations in SSTs over the equatorial Pacific (in particular ENSO), whereas SB11 and Dessler were near global studies, and the cherry-picked paper it was also for a greatly longer study period than that considered by SB11 and Dessler11. So really hard to compare the two.

        You and Arthur might find the comments made by Socratic at Spencer’s blog interesting. I cannot fault Socratic’s numbers, but this is area is not my forte.

      • Just come back to this by accident and noticed this further interchange.

        Unfortunately I think MapleLeaf is confused here – there is a simple issue being addressed here – which GCMs model ENSO the best. If you look at Lin (2007) cited by D11 in support of his claims relevant to this issue you will find that that analysis is limited to 5N-5S (which is hardly surprising given what is being done here).

        The claim being made by Dessler11 and Trenberth at al in their responses to S&B is that the models that best model ENSO also best replicate the effect S&B shows. D11 uses GFDL CM 2.1, MPI ECHAM5, and MRI CGCM 2.3.2A based on Lin (2007) and T11 assert ECHAM5. That these models show the S&B effect seems correct (although I haven’t checked), but the claim that these models best replicate ENSO is controversial (in fact Lin says MRI CGCM 2.3.2A simply doesn’t – frequency too low) .

        This issue isn’t central to the wider debate; it simply demonstrates the hypothesis “What is being seen is an ENSO related artifact (because there was nothing much else happening this decade)” is not supported by the comparison with GCMs. The models that model ENSO well (at least in part in Lin (2007)’s view and definitely in Belmadani et al’s view) do not reproduce S&B’s results.

  16. Hi Barry – thanks for the compliments – it’s hard to figure out what Spencer’s up to because the claims in his post aren’t backed up by details of how he got them. So my questions and guesses have really been trying to get to the bottom of that. Too bad he seems to have stopped responding.

    My focus was on this factor-of-ten discrepancy between Dessler’s 20:1 ratio of radiative flux to ocean heat flux, vs Spencer’s claim of 2:1 or less (at least revised from the 0.5:1 Dessler noted he had been claiming). I’m almost certain Spencer has a factor-of-2 error in his denominator (variation in cloud radiative flux), which reduces the discrepancy to a factor of 4 or 5. The real question is the numerator – what is the scale of heat flux into the ocean mixed layer (once you take out the annual signal). Spencer’s done the calculation one way, and if he didn’t make a stupid error like he did on the denominator, I think his method is reasonable, as far as I can figure out. But we still haven’t gotten to the bottom of it as far as I know.

  17. Barry

    Rest assured, your work is not in vain 😉

    I shall be following this (and any future postings on SB11 with great interest.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: