Temperature records

The NOAA data is awkwardly arranged. It’s in (literally) thousands of GZip files, one per year per station, stored in yearly TAR files. About 3gb compressed, so probably double that. My PC chokes on them as the virus checker sees an archive file and decides to look inside, so I’ve had to download them using a shell account on a Linux box. I’ve now got the years 1929-1959 and¬†1970-1973 on my PC, and 1960-1969 and 1974-2009 on the shell account.

I’ve¬†written a C program (“annual”) to parse the temperature records and calculate the mean (and standard deviation), and a shell script to de-TAR a year’s data to a temp directory and to do

cat $f | gzip -d | tail -n+2 | ./annual $1 >> Average-$1.txt

for each station record ($f). The ‘tail’ call strips off the first line (column headers), and $1 passes through the year given to the shell script. All this results in a comma-separated text file containing station ID, year, mean temperature, number of samples, and the standard deviation.

Now just to create another shell script to loop through this lot, and deal with the files on my home PC as well as the ones on the linux box… If I ‘nice’ everything hopefully nobody will notice it running all night!

Tags: , ,

Leave a Reply