Archive for December, 2009

Proposition 02: False

Sunday, December 6th, 2009

Some more horrific Perl, but it did the job…

rmw42@pandora:~/NOAA$ cat Average.txt | ./warmest 10
32 stations had 10 of their 10 warmest years post 1997
1360 stations did not have 10 of their 10 warmest years post 1997
11347 stations rejected for having insufficient data

So, I make that 98% of weather stations finding that the warmest ten years in their history were not post-1997. That’s quite shocking, really. I think it’s safe to say that the statement is completely and utterly false – if 98% of weather stations active for the last 24 years don’t show the last 12 as containing the ten warmest, by what measure can we claim they were the warmest years?

I want to test this to see if there’s any pattern to the stations used, whether requiring good data integrity skews the results, and whether rejecting so many stations (~90% of the total) was necessary – but I think the reasons I gave on Friday are sound. It doesn’t matter a damn to me if a station had good data during WW2 – if it hasn’t been active for 60 years, it can’t tell me how warm 2007 was! And surely a station giving only one temperature reading per year – yes, there are some like that – is hopeless?

As Adam and Jamie might say: Myth Busted!

Updated 2009/12/7 09:23:

rmw42@pandora:~/NOAA$ cat Average.txt | ./warmest 10
44 stations had 10 of their 10 warmest years post 1997
2460 stations did not have 10 of their 10 warmest years post 1997
13614 stations rejected for having insufficient data

Station locations against time

Sunday, December 6th, 2009

I don’t know why I chose Perl, it’s a hateful language. Turns out it wasn’t quite so simple – ‘sort’ works lexicographically (i.e. as text) so “117″ is less than “20″ as 1<2. Converting lines to integers and sorting on those, using delightful syntax like

@values = sort {$a <=> $b} @values;

and then split cases for odd/even list lengths for the median and again for upper/lower quartiles. Still, in the end it gives me the results I’m looking for – a list of comma-separated values for year, number of stations, mean distance, median distance, lower quartile distance and upper quartile distance. Or “year,0,,,,” if there are no stations. That should be just what I need to paste into Excel and produce graphs…

Data ahoy!

Sunday, December 6th, 2009

I’ve finally uploaded the early years’ weather data (about 350mb) to my shell account, which took about three hours this morning. I’ve processed it and so now I’ve a complete set of averages, about 280000 station-years.

I can use this with the country station list/distances to determine the set of stations in each year – something like

join AfricaStations.txt Average-1969.txt -t',' -o1.3 | sort

will get me the distances (3rd field in the first file, so 1.3) for all the stations active in 1969 in order of distance. Then I just need to get the order statistics and graph the result to get the answers for Proposition 01.

For Proposition 02, I can use the same station-year average temperature list, sorted by station, to extract and check the warmest years. I think that will need some Perl…

Station locations

Sunday, December 6th, 2009

Excellent news – the six-digit part of the station codes is geographic and the list is sorted. Africa is in the range 600000-689999, so I can just cut out the lines from the middle of the file – same for Europe, etc.

Ocean distance from station

Saturday, December 5th, 2009

Only needs a small change to the map-drawing Java to take the NOAA list of stations and calculate their distances to the nearest ocean. There are 30800 of those so it’ll take a while, but I should then have a map of station ID to distance-to-ocean. Yay!

Distance to ocean map

Saturday, December 5th, 2009

Well, the DTED collection took rather longer than it should have – having to dig around for all the flyspeck islands in the pacific. I’m still not sure I’ve got them all, but it’ll do for now.

I’ve updated the map-drawing code to calculate Great Circle distances between points, and therefore calculate the nearest ocean to each point of land:

(The image contains the actual values obtained, (red*16) + (green & 15) = distance in km. The red channel alone is probably sufficient given the accuracies.)

This is only rough, as it is working at the level of 1-degree x 1-degree cells, so distances are only accurate to around +/-50km. This can be improved by using the data in the DTED cells, rather than just their existence, and should get to well under 1km errors in the temperate latitudes, but for a quick test it’ll suffice.

Temperature records

Saturday, December 5th, 2009

The NOAA data is awkwardly arranged. It’s in (literally) thousands of GZip files, one per year per station, stored in yearly TAR files. About 3gb compressed, so probably double that. My PC chokes on them as the virus checker sees an archive file and decides to look inside, so I’ve had to download them using a shell account on a Linux box. I’ve now got the years 1929-1959 and 1970-1973 on my PC, and 1960-1969 and 1974-2009 on the shell account.

I’ve written a C program (“annual”) to parse the temperature records and calculate the mean (and standard deviation), and a shell script to de-TAR a year’s data to a temp directory and to do

cat $f | gzip -d | tail -n+2 | ./annual $1 >> Average-$1.txt

for each station record ($f). The ‘tail’ call strips off the first line (column headers), and $1 passes through the year given to the shell script. All this results in a comma-separated text file containing station ID, year, mean temperature, number of samples, and the standard deviation.

Now just to create another shell script to loop through this lot, and deal with the files on my home PC as well as the ones on the linux box… If I ‘nice’ everything hopefully nobody will notice it running all night!

Data collection

Saturday, December 5th, 2009

Yesterday and this morning I’ve collected a huge number of DTED files from here.

The terrain information is made available for free from the USGS, but is generally provided on physical media as it’s so large – multiple DVDs for level 1 DTED. Of course, there’s no such thing as a free lunch, so the files are grouped into 10-degree blocks and then arranged by country, so somewhat awkward to find – or a fun geography quiz, if you prefer!

I’ve also cobbled together a bit of Java to produce a map showing which terrain cells I’ve got. Sadly, this shows I’ve got some more work to do:

Proof of concept

Friday, December 4th, 2009

I’ve downloaded the data for station 723150-03812, Asheville Municipal Airport in North Carolina. This is the one the NCDC/NOAA use for their sample data and is, I guess, their local airport.

Throwing the full 1948-2009 data – about 26000 records - into an Excel spreadsheet, using the SUMIF and COUNTIF functions to pull out the relevant days, and sorting the results shows that the ten warmest years in Asheville are (from lowest to highest): 1974, 1980, 2001, 1999, 2002, 1991, 2007, 2009, 1998, 1990

2009 is an aberration as there are a bunch of cold days to come before the end of the year, but it’s clear that there have been equally warm years in recent decades.

This surprised me, I expected at least eight or nine out of the ten to be the warmest if the effect were clear-cut – particularly with all the fuss people have made about airport locations for weather stations, the El Nino in ’98, and the satellite data showing warming throughout the 1990s.

It remains to be seen how typical (or not) Asheville is…

Proposition 02: The warmest ten years

Friday, December 4th, 2009

I was reading the Evening Standard on my way home tonight, and saw the Rt Hon Ed Miliband MP’s op-ed – “Climate change sceptics are today’s flat-earth brigade”. In it he mentions a line I’ve heard numerous times before.


The 10 warmest years on record have all occurred since 1997


It seems to me that this statement is either true or false, and can be checked using temperature records from weather stations – the sort of data I’m collecting already. For any particular station, the statement is either true or false. For the planet as a whole, a clear majority of stations should ‘vote’ for the statement being true.

  • Collect daily station records from the NOAA here
  • Exclude stations which don’t include the years 1985 – 2008 (12 years either side of 1997 – it’s no use using the station if the test is necessarily true because it’s only post-1997 or necessarily false because there aren’t ten years after 1997 to be the warmest)
  • Exclude stations without records for at least 90% of days
  • Calculate the annual mean temperature for each station in each year
  • Sort station data by annual mean temperature to find the ten warmest years, see if they’re all after 1997 or not

I expect there’ll be some stations for which it’s true and some for which it’s false – there will always be random weather effects – but if recent years are the warmest this should be clear from the measurements. It would be strange to imagine a warm year which left most places colder!