Proposition 04: False

Tuesday, January 5th, 2010

Well, the final GSOD readings for 2009 have now been released. Time to finish this off…

rmw42@pandora:~/NOAA$ ./average 2009
rmw42@pandora:~/NOAA$ cat Average-* | sort > Averages.txt
rmw42@pandora:~/NOAA$ cat Averages.txt | ./warm5-2009
1484 stations had 2009 as one of their 5 warmest years
4187 stations did not have 2009 as of their 5 warmest years
11746 stations rejected for having insufficient data

Only 26% of the weather stations have 2009 anywhere in their top 5 – I guess that’s a reason for the Met Office not to count their climate chickens too early, as December was brutally cold and pushed the year out of the record books.

GHCN data

Monday, December 14th, 2009

I’ve been looking at the GHCN data as a possible replacement for/addition to the GSOD series. It’s mostly a set of monthly averages, though there is a 1.7gb lump of “daily” data (which, glancing at extracts, appears to have a huge number of missing entries).

The GHCN does appear to have better coverage of Africa and 1934, so it will be interesting to see how that affects the results.

Proposition 06: 1934 vs 1998

Sunday, December 13th, 2009

Someone called iCowboy posted a comment on Iain Dale’s Diary saying:

1934 vs. 1998 – the difference is 0.01 Celsius between those two years, but the statistics definitely show that 1998 sits well within a warm period, 1934 was generally cooler.

in response to claims that 1934 was actually warmer than 1998.


1934 is (at most) 0.01 degree Celsius warmer than 1998


Scan through the station-year records, find any which cover both 1934 and 1998 with reasonably-complete (>90%) data and check the temperature difference between them.

The Met Office speaks

Wednesday, December 9th, 2009

Here is a possible source for some of Jo Steele’s comments in the Metro.

This gives us another, clearer formulation for Proposition 03:

The decade 2000–2009 has been warmer, on average, than any other decade in the previous 150 years;

as well as some other statements which might bear investigation.

Proposition 04: Possibly True

Wednesday, December 9th, 2009

Not going to confirm this one yet, as there are are ~4 weeks’ more readings still to come – in the coldest part of the year in the Northern Hemisphere – which I expect will drag the annual mean temperature down somewhat.

However, using the data to the first week of December, we get:

rmw42@pandora:~/NOAA$ cat Averages.txt Average-2009.txt | sort | ./warm5-2009
3665 stations had 2009 as one of their 5 warmest years
1906 stations did not have 2009 as of their 5 warmest years
11771 stations rejected for having insufficient data

which shows the proposition to be true in 66% (i.e. near enough two thirds) of stations for now, though if the December average is something like 10F colder than the summer this could change substantially.

It also shows a potential flaw in my “>=90% of days” check – a year can miss out almost an entire month and still be “OK”. That month could just as easily be July as January, and so could move temperatures up or down. I believe missing days will be somewhat randomly distributed, but this should be checked.

Proposition 05: False or Undefined

Wednesday, December 9th, 2009

Well, that one was remarkably easy. Take the sorted annual mean recorded temperature and see if the top result is 1998:

rmw42@pandora:~/NOAA$ cat Averages.txt | ./topyear
1163 stations had their warmest year in 1998
4985 stations did not have their warmest year in 1998
9970 stations rejected for having insufficient data

That would be a “no”, then. 19% of stations report 1998 as their warmest year, which means 81% do not. Clearly the proposition is not true.

19% is quite a lot, though, and 1998 was clearly a very warm year. Is it likely that another year scores a much higher proportion? If not, the proposition is ill-defined: there is no such thing as a warmest year for the entire planet, as different parts are warm at different times. That’s different from the proposition being false – instead the proposition is meaningless.

Possible follow-up: what does the histogram of “warmest years” look like? Would need to be careful to take account of incomplete coverage… Similarly, the Mean Reciprocal Rank of a year would be interesting to see.

Propositions 03, 04, 05: Hot decade

Wednesday, December 9th, 2009

Today’s Metro article “China lambasts US and EU emission pledges” by Jo Steele contains a barrage of claims:

Meanwhile, the Noughties have been the warmest decade on record and this year has been one of the five hottest, weather experts warned.

While 1998 remains the hottest single year since records began in 1850, the last ten years have been the warmest, the Met Office revealed.

Wow. It ends with what looks like a repeat of (false) Proposition 02, but the others can stand a bit of scrutiny.

Proposition 03:

The Noughties have been the warmest decade on record

Proposition 04:

This year [2009] has been one of the five hottest [years on record]

Proposition 05:

1998 remains the hottest single year since records began


Well, propositions 04 and 05 seem fairly straightforward – top 1 and top 5s to go with the top 10 analysis already done. A bit of Perl hacking should get results for those today, though since we’re still in 2009 the final result will have to wait until January. Proposition 03 will take a bit more work, but not much – do we consider rolling decades or bucket the temperature record by (year/10)? Rolling decades is probably the more general version, but both are interesting tests to run.

So, time to get cracking!

Proposition 02: False

Sunday, December 6th, 2009

Some more horrific Perl, but it did the job…

rmw42@pandora:~/NOAA$ cat Average.txt | ./warmest 10
32 stations had 10 of their 10 warmest years post 1997
1360 stations did not have 10 of their 10 warmest years post 1997
11347 stations rejected for having insufficient data

So, I make that 98% of weather stations finding that the warmest ten years in their history were not post-1997. That’s quite shocking, really. I think it’s safe to say that the statement is completely and utterly false – if 98% of weather stations active for the last 24 years don’t show the last 12 as containing the ten warmest, by what measure can we claim they were the warmest years?

I want to test this to see if there’s any pattern to the stations used, whether requiring good data integrity skews the results, and whether rejecting so many stations (~90% of the total) was necessary – but I think the reasons I gave on Friday are sound. It doesn’t matter a damn to me if a station had good data during WW2 – if it hasn’t been active for 60 years, it can’t tell me how warm 2007 was! And surely a station giving only one temperature reading per year – yes, there are some like that – is hopeless?

As Adam and Jamie might say: Myth Busted!

Updated 2009/12/7 09:23:

rmw42@pandora:~/NOAA$ cat Average.txt | ./warmest 10
44 stations had 10 of their 10 warmest years post 1997
2460 stations did not have 10 of their 10 warmest years post 1997
13614 stations rejected for having insufficient data

Data ahoy!

Sunday, December 6th, 2009

I’ve finally uploaded the early years’ weather data (about 350mb) to my shell account, which took about three hours this morning. I’ve processed it and so now I’ve a complete set of averages, about 280000 station-years.

I can use this with the country station list/distances to determine the set of stations in each year – something like

join AfricaStations.txt Average-1969.txt -t',' -o1.3 | sort

will get me the distances (3rd field in the first file, so 1.3) for all the stations active in 1969 in order of distance. Then I just need to get the order statistics and graph the result to get the answers for Proposition 01.

For Proposition 02, I can use the same station-year average temperature list, sorted by station, to extract and check the warmest years. I think that will need some Perl…

Temperature records

Saturday, December 5th, 2009

The NOAA data is awkwardly arranged. It’s in (literally) thousands of GZip files, one per year per station, stored in yearly TAR files. About 3gb compressed, so probably double that. My PC chokes on them as the virus checker sees an archive file and decides to look inside, so I’ve had to download them using a shell account on a Linux box. I’ve now got the years 1929-1959 and 1970-1973 on my PC, and 1960-1969 and 1974-2009 on the shell account.

I’ve written a C program (“annual”) to parse the temperature records and calculate the mean (and standard deviation), and a shell script to de-TAR a year’s data to a temp directory and to do

cat $f | gzip -d | tail -n+2 | ./annual $1 >> Average-$1.txt

for each station record ($f). The ‘tail’ call strips off the first line (column headers), and $1 passes through the year given to the shell script. All this results in a comma-separated text file containing station ID, year, mean temperature, number of samples, and the standard deviation.

Now just to create another shell script to loop through this lot, and deal with the files on my home PC as well as the ones on the linux box… If I ‘nice’ everything hopefully nobody will notice it running all night!