Posts Tagged ‘proposition 01’

GHCN data

Monday, December 14th, 2009

I’ve been looking at the GHCN data as a possible replacement for/addition to the GSOD series. It’s mostly a set of monthly averages, though there is a 1.7gb lump of “daily” data (which, glancing at extracts, appears to have a huge number of missing entries).

The GHCN does appear to have better coverage of Africa and 1934, so it will be interesting to see how that affects the results.

First draft of distance-to-ocean

Monday, December 7th, 2009

Have reprocessed the data from yesterday, and graphed it.

Africa has some problems with station counts – there are none producing daily data in the 1950s or the early 1970s. I believe this might be different in the monthly GHCN dataset, but I’ve been using the GSOD figures instead – another strand to check. The station counts are shown below:

As a result, I’ve had to trim the data somewhat. 1956 and 1968 show major differences from the adjacent years, so are too volatile to include, and most of the other removed years have at most a dozen stations. There are some differences in station placement from the 1957-1967 series to the post-1973 one, but the latter seems quite stable:

All in all, I’d say this doesn’t show any particular trend in station placement in Africa w.r.t the ocean.

In Europe, we have:

which is a rather different story. The station counts are good for the entire period, and – aside from some difficulties immediately after WW2 – stations seem to have moved significantly closer to the water. Something like 100km/20% closer to the ocean between the early 1960s and the mid/late 1970s, and stable ever since.

The spreadsheets are here: Africa and Europe.

Mea culpa

Monday, December 7th, 2009

This morning, taking a nice long bath, I realised there could be a problem with my parsing of the temperature records. This turns out to be the case.

Following the format specification, I’d jumped to column 25, skipped any spaces or tabs, read the digits, read a dot (silently aborting if not present), and then optionally read another digit. That produced records for 285057 station-years.

What I didn’t do, and what my checks had failed to spot, was handle negative Fahrenheit temperatures. I guess I’m not used to winters being colder than salted ice. A minus sign is neither whitespace nor a digit, so it fell through those tests and silently aborted at the next as it’s not a dot! Net result: no records from any station where the temperature dips below 0.0F. Oops. I’ve fixed that (check if there’s a minus sign, and negate the value read if so) and also put in a message on stderr if that abort is triggered.

I’ve run through all the years, with no messages on stderr, and there are now 394264 station-years of data produced. Code will follow when I figure a good way to post it.

I’ve updated the Proposition 02 results, though it’s actually increased the margin from 97.7% to 98.2% of stations not showing the warmest ten years as post-1997.

Station locations against time

Sunday, December 6th, 2009

I don’t know why I chose Perl, it’s a hateful language. Turns out it wasn’t quite so simple – ‘sort’ works lexicographically (i.e. as text) so “117″ is less than “20″ as 1<2. Converting lines to integers and sorting on those, using delightful syntax like

@values = sort {$a <=> $b} @values;

and then split cases for odd/even list lengths for the median and again for upper/lower quartiles. Still, in the end it gives me the results I’m looking for – a list of comma-separated values for year, number of stations, mean distance, median distance, lower quartile distance and upper quartile distance. Or “year,0,,,,” if there are no stations. That should be just what I need to paste into Excel and produce graphs…

Data ahoy!

Sunday, December 6th, 2009

I’ve finally uploaded the early years’ weather data (about 350mb) to my shell account, which took about three hours this morning. I’ve processed it and so now I’ve a complete set of averages, about 280000 station-years.

I can use this with the country station list/distances to determine the set of stations in each year – something like

join AfricaStations.txt Average-1969.txt -t',' -o1.3 | sort

will get me the distances (3rd field in the first file, so 1.3) for all the stations active in 1969 in order of distance. Then I just need to get the order statistics and graph the result to get the answers for Proposition 01.

For Proposition 02, I can use the same station-year average temperature list, sorted by station, to extract and check the warmest years. I think that will need some Perl…

Station locations

Sunday, December 6th, 2009

Excellent news – the six-digit part of the station codes is geographic and the list is sorted. Africa is in the range 600000-689999, so I can just cut out the lines from the middle of the file – same for Europe, etc.

Ocean distance from station

Saturday, December 5th, 2009

Only needs a small change to the map-drawing Java to take the NOAA list of stations and calculate their distances to the nearest ocean. There are 30800 of those so it’ll take a while, but I should then have a map of station ID to distance-to-ocean. Yay!

Distance to ocean map

Saturday, December 5th, 2009

Well, the DTED collection took rather longer than it should have – having to dig around for all the flyspeck islands in the pacific. I’m still not sure I’ve got them all, but it’ll do for now.

I’ve updated the map-drawing code to calculate Great Circle distances between points, and therefore calculate the nearest ocean to each point of land:

(The image contains the actual values obtained, (red*16) + (green & 15) = distance in km. The red channel alone is probably sufficient given the accuracies.)

This is only rough, as it is working at the level of 1-degree x 1-degree cells, so distances are only accurate to around +/-50km. This can be improved by using the data in the DTED cells, rather than just their existence, and should get to well under 1km errors in the temperate latitudes, but for a quick test it’ll suffice.

Data collection

Saturday, December 5th, 2009

Yesterday and this morning I’ve collected a huge number of DTED files from here.

The terrain information is made available for free from the USGS, but is generally provided on physical media as it’s so large – multiple DVDs for level 1 DTED. Of course, there’s no such thing as a free lunch, so the files are grouped into 10-degree blocks and then arranged by country, so somewhat awkward to find – or a fun geography quiz, if you prefer!

I’ve also cobbled together a bit of Java to produce a map showing which terrain cells I’ve got. Sadly, this shows I’ve got some more work to do:

Proposition 01: Thermometer drift

Thursday, December 3rd, 2009

E.M. “Chiefio” Smith raised an interesting point in an article here and, as it seems to be fairly straightforward, I’d like to follow it up.


It would be very interesting to do a map of “distance to ocean over time” for Africa (and for Europe, too…) since I suspect that is the key driver for thermometer changes in both those continents. Europe moving toward the water to gain warmth from the Gulf Stream and inland seas, while Africa moves away from the oceans to avoid moderating winds.


  • Use DTED level 0 terrain data, some of which I already have, to determine which parts of the world are land and which sea.
  • Obtain a list of weather stations on a continent, their lat/long coordinates, and years active.
  • Calculate the distance from each station to the ocean
  • Graph the mean, median and upper/lower quartiles for the stations over time

Sounds like fun!