Get daily average weather station data (Global)
2 minute read
A set of Python tools to make it easier to extract weather station data (e.g., temperature, precipitation) from the Global Historical Climatology Network - Daily (GHCND)
More information on the data can be found here
The code can be downloaded from the get_station_data github repository
from get_station_data import ghcnd
from get_station_data.util import nearest_stn
%matplotlib inline
stn_md = ghcnd.get_stn_metadata()
Choose a location (lon/lat) and number of nearest neighbours
london_lon_lat = -0.1278, 51.5074
my_stns = nearest_stn(stn_md,
london_lon_lat[0], london_lon_lat[1],
n_neighbours=5 )
my_stns
|
station |
lat |
lon |
elev |
name |
52113 |
UKE00105915 |
51.5608 |
0.1789 |
137.0 |
HAMPSTEAD |
52165 |
UKM00003772 |
51.4780 |
-0.4610 |
25.3 |
HEATHROW |
52098 |
UKE00105900 |
51.8067 |
0.3581 |
128.0 |
ROTHAMSTED |
52191 |
UKW00035054 |
51.2833 |
0.4000 |
91.1 |
WEST MALLING |
52131 |
UKE00107650 |
51.4789 |
0.4489 |
25.0 |
HEATHROW |
Download and extract data into a pandas DataFrame
df = ghcnd.get_data(my_stns)
df.head()
|
station |
year |
month |
day |
element |
value |
mflag |
qflag |
sflag |
date |
lon |
lat |
elev |
name |
0 |
UKE00105915 |
1959 |
12 |
1 |
TMAX |
NaN |
|
|
|
1959-12-01 |
0.1789 |
51.5608 |
137.0 |
HAMPSTEAD |
1 |
UKE00105915 |
1959 |
12 |
2 |
TMAX |
NaN |
|
|
|
1959-12-02 |
0.1789 |
51.5608 |
137.0 |
HAMPSTEAD |
2 |
UKE00105915 |
1959 |
12 |
3 |
TMAX |
NaN |
|
|
|
1959-12-03 |
0.1789 |
51.5608 |
137.0 |
HAMPSTEAD |
3 |
UKE00105915 |
1959 |
12 |
4 |
TMAX |
NaN |
|
|
|
1959-12-04 |
0.1789 |
51.5608 |
137.0 |
HAMPSTEAD |
4 |
UKE00105915 |
1959 |
12 |
5 |
TMAX |
NaN |
|
|
|
1959-12-05 |
0.1789 |
51.5608 |
137.0 |
HAMPSTEAD |
Filter data for, e.g., a single variable
var = 'PRCP' # precipitation
df = df[ df['element'] == var ]
### Tidy up columns
df = df.rename(index=str, columns={"value": var})
df = df.drop(['element'], axis=1)
df.head()
|
station |
year |
month |
day |
PRCP |
mflag |
qflag |
sflag |
date |
lon |
lat |
elev |
name |
93 |
UKE00105915 |
1960 |
1 |
1 |
2.5 |
|
|
E |
1960-01-01 |
0.1789 |
51.5608 |
137.0 |
HAMPSTEAD |
94 |
UKE00105915 |
1960 |
1 |
2 |
1.5 |
|
|
E |
1960-01-02 |
0.1789 |
51.5608 |
137.0 |
HAMPSTEAD |
95 |
UKE00105915 |
1960 |
1 |
3 |
1.0 |
|
|
E |
1960-01-03 |
0.1789 |
51.5608 |
137.0 |
HAMPSTEAD |
96 |
UKE00105915 |
1960 |
1 |
4 |
0.8 |
|
|
E |
1960-01-04 |
0.1789 |
51.5608 |
137.0 |
HAMPSTEAD |
97 |
UKE00105915 |
1960 |
1 |
5 |
0.0 |
|
|
E |
1960-01-05 |
0.1789 |
51.5608 |
137.0 |
HAMPSTEAD |
df.drop(columns=['mflag','qflag','sflag']).tail(n=10)
|
station |
year |
month |
day |
PRCP |
date |
lon |
lat |
elev |
name |
83938 |
UKE00107650 |
2016 |
12 |
22 |
0.0 |
2016-12-22 |
0.4489 |
51.4789 |
25.0 |
HEATHROW |
83939 |
UKE00107650 |
2016 |
12 |
23 |
1.4 |
2016-12-23 |
0.4489 |
51.4789 |
25.0 |
HEATHROW |
83940 |
UKE00107650 |
2016 |
12 |
24 |
0.0 |
2016-12-24 |
0.4489 |
51.4789 |
25.0 |
HEATHROW |
83941 |
UKE00107650 |
2016 |
12 |
25 |
1.0 |
2016-12-25 |
0.4489 |
51.4789 |
25.0 |
HEATHROW |
83942 |
UKE00107650 |
2016 |
12 |
26 |
0.0 |
2016-12-26 |
0.4489 |
51.4789 |
25.0 |
HEATHROW |
83943 |
UKE00107650 |
2016 |
12 |
27 |
0.0 |
2016-12-27 |
0.4489 |
51.4789 |
25.0 |
HEATHROW |
83944 |
UKE00107650 |
2016 |
12 |
28 |
0.2 |
2016-12-28 |
0.4489 |
51.4789 |
25.0 |
HEATHROW |
83945 |
UKE00107650 |
2016 |
12 |
29 |
0.4 |
2016-12-29 |
0.4489 |
51.4789 |
25.0 |
HEATHROW |
83946 |
UKE00107650 |
2016 |
12 |
30 |
0.0 |
2016-12-30 |
0.4489 |
51.4789 |
25.0 |
HEATHROW |
83947 |
UKE00107650 |
2016 |
12 |
31 |
0.4 |
2016-12-31 |
0.4489 |
51.4789 |
25.0 |
HEATHROW |
Save to file
df.to_csv('London_5stns_GHCN-D.csv', index=False)
Plot histogram of all data
df['PRCP'].plot.hist(bins=40)
<matplotlib.axes._subplots.AxesSubplot at 0x11ae36898>

Plot time series for one station
heathrow = df[ df['name'] == 'HEATHROW' ]
heathrow['PRCP'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x81f0d7240>
