Daymet¶
Retrieve data from from Daily Surface Weather and Climatological Summaries (Daymet) using daymetr and harmonize for use in Springtime.
Point versus area¶
DaymetR supports different ways of downloading data, either daily data for one grid point at a time, retrieved as csv files, or a raster of daily, monthly, or annual data, downloaded as a netcdf files. Which option is more efficient, both in terms of download speed and data size, depends on the number of points and their spatial distribution.
Springtime will choose which option to use based on your settings for points
and area
:
points | area | daymetr method | raw_load | load |
---|---|---|---|---|
yes | no | points (csv) | geopandas | geopandas |
yes | yes | raster (nc) | xarray | geopandas (not implemented yet) |
no | yes | raster (nc) | xarray | geopandas |
no | no | invalid | - | - |
The raw_load
method stays very close to the raw data format on disk. Conversely, the load
method standardizes the data format. This means that different pre-processing steps are needed depending on the combination of shape and area.
Here, we'll walk through these different steps, starting with points data.
Loading point data¶
We start with data for a few points and looking at the 'raw' data.
from springtime.datasets import Daymet
daymet_points = Daymet(
variables=["tmin", "tmax"],
points=[
[-84.2625, 36.0133],
[-86, 39.6],
[-85, 40],
],
years=[2000, 2002],
)
gdf = daymet_points.raw_load()
gdf.head()
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-84.2625_36.0133_2000_2002.csv INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-86.0_39.6_2000_2002.csv INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-85.0_40.0_2000_2002.csv
year | yday | dayl (s) | prcp (mm/day) | srad (W/m^2) | swe (kg/m^2) | tmax (deg c) | tmin (deg c) | vp (Pa) | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2000 | 1 | 34571.11 | 0.00 | 316.51 | 1.05 | 17.50 | 2.30 | 720.50 | POINT (-84.26250 36.01330) |
1 | 2000 | 2 | 34606.19 | 0.00 | 258.33 | 0.00 | 19.34 | 6.69 | 980.37 | POINT (-84.26250 36.01330) |
2 | 2000 | 3 | 34644.13 | 16.00 | 170.64 | 0.00 | 22.19 | 11.40 | 1347.53 | POINT (-84.26250 36.01330) |
3 | 2000 | 4 | 34684.93 | 30.74 | 169.79 | 0.00 | 16.64 | 6.60 | 974.48 | POINT (-84.26250 36.01330) |
4 | 2000 | 5 | 34728.55 | 0.00 | 168.51 | 0.00 | 4.19 | -2.22 | 518.74 | POINT (-84.26250 36.01330) |
Although the raw_load
method attempts to stay true to the raw data, some
processing is already happening under the hood.
First of all, the csv files contain metadata and citation information at the top of the file. This is not read into the pandas dataframe. Secondly, we are adding a geometry column, and setting it as index. This makes it possible to concatenate data from multiple points in one go.
In the load
method, springtime does a bit more work:
- Extract the units from the column names
- Retain only the variables requested in the dataset definition
gdf = daymet_points.raw_load()
# Remove unit from column names
gdf.attrs["units"] = gdf.columns.values
gdf.columns = [col.split(" (")[0] for col in gdf.columns]
# Filter columns of interest
columns = ["year", "yday", "geometry"] + list(daymet_points.variables)
gdf = gdf[columns]
gdf.head()
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-84.2625_36.0133_2000_2002.csv INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-86.0_39.6_2000_2002.csv INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-85.0_40.0_2000_2002.csv
year | yday | geometry | tmin | tmax | |
---|---|---|---|---|---|
0 | 2000 | 1 | POINT (-84.26250 36.01330) | 2.30 | 17.50 |
1 | 2000 | 2 | POINT (-84.26250 36.01330) | 6.69 | 19.34 |
2 | 2000 | 3 | POINT (-84.26250 36.01330) | 11.40 | 22.19 |
3 | 2000 | 4 | POINT (-84.26250 36.01330) | 6.60 | 16.64 |
4 | 2000 | 5 | POINT (-84.26250 36.01330) | -2.22 | 4.19 |
Resampling¶
The data are already split in year and yday. That's nice, although it makes resampling slightly more laborious. Springtime uses a dedicated resampling function that reconstructs the datetime column under the hood.
daymet_points._resample_yday(gdf, frequency="month", operator="mean")
geometry | year | month | tmin | tmax | |
---|---|---|---|---|---|
0 | POINT (-84.26250 36.01330) | 2000 | 1 | -2.488065 | 8.078065 |
1 | POINT (-84.26250 36.01330) | 2000 | 2 | 0.278621 | 14.349655 |
2 | POINT (-84.26250 36.01330) | 2000 | 3 | 3.736774 | 19.008065 |
3 | POINT (-84.26250 36.01330) | 2000 | 4 | 6.470667 | 19.506333 |
4 | POINT (-84.26250 36.01330) | 2000 | 5 | 13.895484 | 26.851613 |
... | ... | ... | ... | ... | ... |
103 | POINT (-85.00000 40.00000) | 2002 | 8 | 17.059677 | 29.115484 |
104 | POINT (-85.00000 40.00000) | 2002 | 9 | 13.145667 | 26.978667 |
105 | POINT (-85.00000 40.00000) | 2002 | 10 | 5.100645 | 15.505806 |
106 | POINT (-85.00000 40.00000) | 2002 | 11 | -0.701000 | 7.670333 |
107 | POINT (-85.00000 40.00000) | 2002 | 12 | -5.357097 | 2.525161 |
108 rows × 5 columns
Loading gridded data¶
Now that we've seen how to work with point data, let's continue with gridded
data. We want to arrive at a similar data structure as soon as possible, but
upon raw_load
we already see some notable differences.
from springtime.datasets import Daymet
daymet_area = Daymet(
variables=["tmin", "tmax"],
area={"name": "indianapolis", "bbox": [-86.5, 39.5, -86, 40.1]},
years=[2000, 2002],
frequency="monthly",
)
ds = daymet_area.raw_load()
ds
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2000_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2001_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2002_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2000_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2001_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2002_ncss.nc
<xarray.Dataset> Dimensions: (time: 36, y: 70, x: 52) Coordinates: * y (y) float32 -159.0 -160.0 -161.0 ... -227.0 -228.0 * x (x) float32 1.095e+03 1.096e+03 ... 1.146e+03 * time (time) datetime64[ns] 2000-01-16T12:00:00 ... 20... Data variables: lat (time, y, x) float32 dask.array<chunksize=(12, 70, 52), meta=np.ndarray> lambert_conformal_conic (time) int16 -32767 -32767 -32767 ... -32767 -32767 lon (time, y, x) float32 dask.array<chunksize=(12, 70, 52), meta=np.ndarray> tmax (time, y, x) float32 dask.array<chunksize=(12, 70, 52), meta=np.ndarray> tmin (time, y, x) float32 dask.array<chunksize=(12, 70, 52), meta=np.ndarray> Attributes: (12/13) start_year: 2000 source: Daymet Software Version 4.0 Version_software: Daymet Software Version 4.0 Version_data: Daymet Data Version 4.0 Conventions: CF-1.6 citation: Please see http://daymet.ornl.gov/ for current Dayme... ... ... NCO: netCDF Operators version 4.9.3 (Homepage = http://nc... History: Translated to CF-1.0 Conventions by Netcdf-Java CDM ... geospatial_lat_min: 39.43457395273136 geospatial_lat_max: 40.166069168093486 geospatial_lon_min: -86.62665724948563 geospatial_lon_max: -85.85920225420271
For consistency with the points data we need to:
Derive geometry from the lat/lon variables and convert to (geo)dataframe
Find the grid points corresponding to the requested points (if
points
is given).Notice that latitude and longitude are present, but not as coordinates. Extracting subsets from multi-dimensional coordinates is relatively convenient with geopandas'
sjoin_nearest
, so we first convert to geodataframe and then do the subselection.Split the time coordinate into year and yday or month (depending on frequency)
For each of these steps, the Daymet
class has builtin methods, such that the `load`` method roughly does the following.
points = [
[-84.2625, 36.0133],
[-86, 39.6],
[-85, 40],
]
# Note: methods with leading underscores are so-called private methods, which
# means they may change without notice. Normally you shouldn't use these methods
# directly. Here we use them for illustration purpose only.
ds = daymet_area.raw_load()
gdf = daymet_area._to_dataframe(ds)
gdf = daymet_area._extract_points(gdf, points)
gdf = daymet_area._split_time(gdf)
gdf.head()
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2000_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2001_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2002_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2000_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2001_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2002_ncss.nc
geometry | tmax | tmin | year | month | |
---|---|---|---|---|---|
0 | POINT (-84.26250 36.01330) | 20.942259 | 7.581935 | 2000 | 10 |
0 | POINT (-84.26250 36.01330) | -1.892333 | -10.891000 | 2000 | 12 |
0 | POINT (-84.26250 36.01330) | 9.474667 | 0.508000 | 2000 | 11 |
0 | POINT (-84.26250 36.01330) | 28.108709 | 17.149355 | 2000 | 7 |
0 | POINT (-84.26250 36.01330) | 29.532581 | 18.114515 | 2001 | 8 |
At this point we've arrived at a similar data format for point-based and gridded data. But there are a few more steps before we have a completely harmonized dataset.
Stacking columns¶
Daymet has several records for each year/location, but our typical springtime use cases requires only one prediction per year/location. Thus, before returning anything, the load method stacks the "yday" or "month" column.
gdf = daymet_area.load()
gdf.head()
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2000_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2001_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2002_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2000_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2001_ncss.nc INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2002_ncss.nc
year | geometry | tmax|1 | tmax|2 | tmax|3 | tmax|4 | tmax|5 | tmax|6 | tmax|7 | tmax|8 | ... | tmin|3 | tmin|4 | tmin|5 | tmin|6 | tmin|7 | tmin|8 | tmin|9 | tmin|10 | tmin|11 | tmin|12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2000 | POINT (-86.54761 39.50947) | 2.238387 | 8.702414 | 14.037742 | 17.127333 | 24.050968 | 27.145334 | 27.806129 | 28.029678 | ... | 0.310645 | 3.423000 | 11.999355 | 15.703667 | 16.417097 | 16.311613 | 11.375667 | 6.980968 | -0.096667 | -12.031333 |
1 | 2000 | POINT (-86.54565 39.51877) | 2.297742 | 8.734138 | 14.094838 | 17.162001 | 24.059355 | 27.165333 | 27.891291 | 28.060322 | ... | 0.352258 | 3.492000 | 12.059677 | 15.792000 | 16.509033 | 16.387419 | 11.458667 | 7.044838 | -0.051333 | -11.953667 |
2 | 2000 | POINT (-86.53560 39.50795) | 2.320323 | 8.759311 | 14.108065 | 17.169666 | 24.069677 | 27.184000 | 27.901291 | 28.064194 | ... | 0.376452 | 3.506000 | 12.078065 | 15.803333 | 16.523226 | 16.396130 | 11.473001 | 7.053871 | -0.032667 | -11.917000 |
3 | 2000 | POINT (-86.53364 39.51726) | 2.348710 | 8.768276 | 14.137742 | 17.183666 | 24.069355 | 27.188999 | 27.951612 | 28.078386 | ... | 0.399032 | 3.548667 | 12.110968 | 15.856000 | 16.575483 | 16.439354 | 11.522000 | 7.091290 | -0.007000 | -11.877000 |
4 | 2000 | POINT (-86.53168 39.52656) | 2.320645 | 8.739310 | 14.116451 | 17.164667 | 24.054193 | 27.168001 | 27.933870 | 28.061935 | ... | 0.387419 | 3.541333 | 12.100323 | 15.842334 | 16.558065 | 16.428709 | 11.507334 | 7.080323 | -0.019333 | -11.902667 |
5 rows × 26 columns
PointsFromOther¶
In extracting points above, we've silently assumed that we are interested in an exhaustive list of all combinations of years and points. However, when taking points from other datasets (e.g. NPN), this may not be the case. In joining dataframe, therefore, they are joined on the combinations of year/geometry instead.
# from springtime.datasets import RPPO, Daymet
# from springtime.utils import PointsFromOther, join_dataframes
# import logging
# logging.basicConfig(level=logging.DEBUG)
# # TODO Find example where PPO data is present in the bbox and check that the result is OK.
# indianapolis = {"name": "indianapolis", "bbox": [-86.5, 39.5, -86, 40.1]} # no results
# # https://github.com/ropensci/rppo/pull/22
# ppo = RPPO(years=[2000, 2002], area=indianapolis)
# daymet = Daymet(
# variables=["tmin", "tmax"],
# years=[2000, 2002],
# points=PointsFromOther(source="ppo"),
# area=indianapolis,
# frequency="monthly",
# )
# df_ppo = ppo.load()
# daymet.points.get_points(df_ppo)
# df_daymet = daymet.load()
# print(len(df_ppo))
# print(len(df_daymet))
# join_dataframes([df_ppo, df_daymet])
To recipe¶
Finally, we can also export our dataset to a recipe for easy sharing and reproducibility.
print(daymet_area.to_recipe())
dataset: daymet years: - 2000 - 2002 area: name: indianapolis bbox: - -86.5 - 39.5 - -86.0 - 40.1 variables: - tmin - tmax mosaic: na frequency: monthly