Daymet¶

Retrieve data from from Daily Surface Weather and Climatological Summaries (Daymet) using daymetr and harmonize for use in Springtime.

Point versus area¶

DaymetR supports different ways of downloading data, either daily data for one grid point at a time, retrieved as csv files, or a raster of daily, monthly, or annual data, downloaded as a netcdf files. Which option is more efficient, both in terms of download speed and data size, depends on the number of points and their spatial distribution.

Springtime will choose which option to use based on your settings for points and area:

points	area	daymetr method	raw_load	load
yes	no	points (csv)	geopandas	geopandas
yes	yes	raster (nc)	xarray	geopandas (not implemented yet)
no	yes	raster (nc)	xarray	geopandas
no	no	invalid	-	-

The raw_load method stays very close to the raw data format on disk. Conversely, the load method standardizes the data format. This means that different pre-processing steps are needed depending on the combination of shape and area.

Here, we'll walk through these different steps, starting with points data.

Loading point data¶

We start with data for a few points and looking at the 'raw' data.

In [2]:

Copied!





from springtime.datasets import Daymet

daymet_points = Daymet(
    variables=["tmin", "tmax"],
    points=[
        [-84.2625, 36.0133],
        [-86, 39.6],
        [-85, 40],
    ],
    years=[2000, 2002],
)

gdf = daymet_points.raw_load()
gdf.head()
from springtime.datasets import Daymet

daymet_points = Daymet(
    variables=["tmin", "tmax"],
    points=[
        [-84.2625, 36.0133],
        [-86, 39.6],
        [-85, 40],
    ],
    years=[2000, 2002],
)

gdf = daymet_points.raw_load()
gdf.head()

INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-84.2625_36.0133_2000_2002.csv
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-86.0_39.6_2000_2002.csv
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-85.0_40.0_2000_2002.csv

Out[2]:

	year	yday	dayl (s)	prcp (mm/day)	srad (W/m^2)	swe (kg/m^2)	tmax (deg c)	tmin (deg c)	vp (Pa)	geometry
0	2000	1	34571.11	0.00	316.51	1.05	17.50	2.30	720.50	POINT (-84.26250 36.01330)
1	2000	2	34606.19	0.00	258.33	0.00	19.34	6.69	980.37	POINT (-84.26250 36.01330)
2	2000	3	34644.13	16.00	170.64	0.00	22.19	11.40	1347.53	POINT (-84.26250 36.01330)
3	2000	4	34684.93	30.74	169.79	0.00	16.64	6.60	974.48	POINT (-84.26250 36.01330)
4	2000	5	34728.55	0.00	168.51	0.00	4.19	-2.22	518.74	POINT (-84.26250 36.01330)

Although the raw_load method attempts to stay true to the raw data, some processing is already happening under the hood.

First of all, the csv files contain metadata and citation information at the top of the file. This is not read into the pandas dataframe. Secondly, we are adding a geometry column, and setting it as index. This makes it possible to concatenate data from multiple points in one go.

In the load method, springtime does a bit more work:

Extract the units from the column names
Retain only the variables requested in the dataset definition

In [3]:

Copied!





gdf = daymet_points.raw_load()

# Remove unit from column names
gdf.attrs["units"] = gdf.columns.values
gdf.columns = [col.split(" (")[0] for col in gdf.columns]

# Filter columns of interest
columns = ["year", "yday", "geometry"] + list(daymet_points.variables)
gdf = gdf[columns]

gdf.head()
gdf = daymet_points.raw_load()

# Remove unit from column names
gdf.attrs["units"] = gdf.columns.values
gdf.columns = [col.split(" (")[0] for col in gdf.columns]

# Filter columns of interest
columns = ["year", "yday", "geometry"] + list(daymet_points.variables)
gdf = gdf[columns]

gdf.head()

INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-84.2625_36.0133_2000_2002.csv
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-86.0_39.6_2000_2002.csv
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/daymet_-85.0_40.0_2000_2002.csv

Out[3]:

	year	yday	geometry	tmin	tmax
0	2000	1	POINT (-84.26250 36.01330)	2.30	17.50
1	2000	2	POINT (-84.26250 36.01330)	6.69	19.34
2	2000	3	POINT (-84.26250 36.01330)	11.40	22.19
3	2000	4	POINT (-84.26250 36.01330)	6.60	16.64
4	2000	5	POINT (-84.26250 36.01330)	-2.22	4.19

Resampling¶

The data are already split in year and yday. That's nice, although it makes resampling slightly more laborious. Springtime uses a dedicated resampling function that reconstructs the datetime column under the hood.

In [4]:

Copied!

daymet_points._resample_yday(gdf, frequency="month", operator="mean")
daymet_points._resample_yday(gdf, frequency="month", operator="mean")

Out[4]:

	geometry	year	month	tmin	tmax
0	POINT (-84.26250 36.01330)	2000	1	-2.488065	8.078065
1	POINT (-84.26250 36.01330)	2000	2	0.278621	14.349655
2	POINT (-84.26250 36.01330)	2000	3	3.736774	19.008065
3	POINT (-84.26250 36.01330)	2000	4	6.470667	19.506333
4	POINT (-84.26250 36.01330)	2000	5	13.895484	26.851613
...	...	...	...	...	...
103	POINT (-85.00000 40.00000)	2002	8	17.059677	29.115484
104	POINT (-85.00000 40.00000)	2002	9	13.145667	26.978667
105	POINT (-85.00000 40.00000)	2002	10	5.100645	15.505806
106	POINT (-85.00000 40.00000)	2002	11	-0.701000	7.670333
107	POINT (-85.00000 40.00000)	2002	12	-5.357097	2.525161

108 rows × 5 columns

Loading gridded data¶

Now that we've seen how to work with point data, let's continue with gridded data. We want to arrive at a similar data structure as soon as possible, but upon raw_load we already see some notable differences.

In [5]:

Copied!





from springtime.datasets import Daymet

daymet_area = Daymet(
    variables=["tmin", "tmax"],
    area={"name": "indianapolis", "bbox": [-86.5, 39.5, -86, 40.1]},
    years=[2000, 2002],
    frequency="monthly",
)

ds = daymet_area.raw_load()
ds
from springtime.datasets import Daymet

daymet_area = Daymet(
    variables=["tmin", "tmax"],
    area={"name": "indianapolis", "bbox": [-86.5, 39.5, -86, 40.1]},
    years=[2000, 2002],
    frequency="monthly",
)

ds = daymet_area.raw_load()
ds

INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2000_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2001_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2002_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2000_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2001_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2002_ncss.nc

Out[5]:

<xarray.Dataset>
Dimensions:                  (time: 36, y: 70, x: 52)
Coordinates:
  * y                        (y) float32 -159.0 -160.0 -161.0 ... -227.0 -228.0
  * x                        (x) float32 1.095e+03 1.096e+03 ... 1.146e+03
  * time                     (time) datetime64[ns] 2000-01-16T12:00:00 ... 20...
Data variables:
    lat                      (time, y, x) float32 dask.array<chunksize=(12, 70, 52), meta=np.ndarray>
    lambert_conformal_conic  (time) int16 -32767 -32767 -32767 ... -32767 -32767
    lon                      (time, y, x) float32 dask.array<chunksize=(12, 70, 52), meta=np.ndarray>
    tmax                     (time, y, x) float32 dask.array<chunksize=(12, 70, 52), meta=np.ndarray>
    tmin                     (time, y, x) float32 dask.array<chunksize=(12, 70, 52), meta=np.ndarray>
Attributes: (12/13)
    start_year:          2000
    source:              Daymet Software Version 4.0
    Version_software:    Daymet Software Version 4.0
    Version_data:        Daymet Data Version 4.0
    Conventions:         CF-1.6
    citation:            Please see http://daymet.ornl.gov/ for current Dayme...
    ...                  ...
    NCO:                 netCDF Operators version 4.9.3 (Homepage = http://nc...
    History:             Translated to CF-1.0 Conventions by Netcdf-Java CDM ...
    geospatial_lat_min:  39.43457395273136
    geospatial_lat_max:  40.166069168093486
    geospatial_lon_min:  -86.62665724948563
    geospatial_lon_max:  -85.85920225420271

For consistency with the points data we need to:

Derive geometry from the lat/lon variables and convert to (geo)dataframe
Find the grid points corresponding to the requested points (if points is given).

Notice that latitude and longitude are present, but not as coordinates. Extracting subsets from multi-dimensional coordinates is relatively convenient with geopandas' sjoin_nearest, so we first convert to geodataframe and then do the subselection.
Split the time coordinate into year and yday or month (depending on frequency)

For each of these steps, the Daymet class has builtin methods, such that the `load`` method roughly does the following.

In [6]:

Copied!





points = [
    [-84.2625, 36.0133],
    [-86, 39.6],
    [-85, 40],
]

# Note: methods with leading underscores are so-called private methods, which
# means they may change without notice. Normally you shouldn't use these methods
# directly. Here we use them for illustration purpose only.
ds = daymet_area.raw_load()
gdf = daymet_area._to_dataframe(ds)
gdf = daymet_area._extract_points(gdf, points)
gdf = daymet_area._split_time(gdf)

gdf.head()
points = [
    [-84.2625, 36.0133],
    [-86, 39.6],
    [-85, 40],
]

# Note: methods with leading underscores are so-called private methods, which
# means they may change without notice. Normally you shouldn't use these methods
# directly. Here we use them for illustration purpose only.
ds = daymet_area.raw_load()
gdf = daymet_area._to_dataframe(ds)
gdf = daymet_area._extract_points(gdf, points)
gdf = daymet_area._split_time(gdf)

gdf.head()

INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2000_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2001_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2002_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2000_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2001_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2002_ncss.nc

Out[6]:

geometry	tmax	tmin	year	month
POINT (-84.26250 36.01330)	20.942259	7.581935	2000	10
POINT (-84.26250 36.01330)	-1.892333	-10.891000	2000	12
POINT (-84.26250 36.01330)	9.474667	0.508000	2000	11
POINT (-84.26250 36.01330)	28.108709	17.149355	2000	7
POINT (-84.26250 36.01330)	29.532581	18.114515	2001	8

At this point we've arrived at a similar data format for point-based and gridded data. But there are a few more steps before we have a completely harmonized dataset.

Stacking columns¶

Daymet has several records for each year/location, but our typical springtime use cases requires only one prediction per year/location. Thus, before returning anything, the load method stacks the "yday" or "month" column.

In [7]:

Copied!

gdf = daymet_area.load()
gdf.head()
gdf = daymet_area.load()
gdf.head()

INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2000_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2001_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmin_monavg_2002_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2000_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2001_ncss.nc
INFO:springtime.datasets.daymet:Found /home/peter/.cache/springtime/daymet/bbox_indianapolis_monthly/tmax_monavg_2002_ncss.nc

Out[7]:

	year	geometry	tmax\|1	tmax\|2	tmax\|3	tmax\|4	tmax\|5	tmax\|6	tmax\|7	tmax\|8	...	tmin\|3	tmin\|4	tmin\|5	tmin\|6	tmin\|7	tmin\|8	tmin\|9	tmin\|10	tmin\|11	tmin\|12
0	2000	POINT (-86.54761 39.50947)	2.238387	8.702414	14.037742	17.127333	24.050968	27.145334	27.806129	28.029678	...	0.310645	3.423000	11.999355	15.703667	16.417097	16.311613	11.375667	6.980968	-0.096667	-12.031333
1	2000	POINT (-86.54565 39.51877)	2.297742	8.734138	14.094838	17.162001	24.059355	27.165333	27.891291	28.060322	...	0.352258	3.492000	12.059677	15.792000	16.509033	16.387419	11.458667	7.044838	-0.051333	-11.953667
2	2000	POINT (-86.53560 39.50795)	2.320323	8.759311	14.108065	17.169666	24.069677	27.184000	27.901291	28.064194	...	0.376452	3.506000	12.078065	15.803333	16.523226	16.396130	11.473001	7.053871	-0.032667	-11.917000
3	2000	POINT (-86.53364 39.51726)	2.348710	8.768276	14.137742	17.183666	24.069355	27.188999	27.951612	28.078386	...	0.399032	3.548667	12.110968	15.856000	16.575483	16.439354	11.522000	7.091290	-0.007000	-11.877000
4	2000	POINT (-86.53168 39.52656)	2.320645	8.739310	14.116451	17.164667	24.054193	27.168001	27.933870	28.061935	...	0.387419	3.541333	12.100323	15.842334	16.558065	16.428709	11.507334	7.080323	-0.019333	-11.902667

5 rows × 26 columns

PointsFromOther¶

In extracting points above, we've silently assumed that we are interested in an exhaustive list of all combinations of years and points. However, when taking points from other datasets (e.g. NPN), this may not be the case. In joining dataframe, therefore, they are joined on the combinations of year/geometry instead.

In [8]:

Copied!





# from springtime.datasets import RPPO, Daymet
# from springtime.utils import PointsFromOther, join_dataframes
# import logging

# logging.basicConfig(level=logging.DEBUG)

# # TODO Find example where PPO data is present in the bbox and check that the result is OK.

# indianapolis = {"name": "indianapolis", "bbox": [-86.5, 39.5, -86, 40.1]}  # no results
# # https://github.com/ropensci/rppo/pull/22

# ppo = RPPO(years=[2000, 2002], area=indianapolis)

# daymet = Daymet(
#     variables=["tmin", "tmax"],
#     years=[2000, 2002],
#     points=PointsFromOther(source="ppo"),
#     area=indianapolis,
#     frequency="monthly",
# )

# df_ppo = ppo.load()
# daymet.points.get_points(df_ppo)
# df_daymet = daymet.load()

# print(len(df_ppo))
# print(len(df_daymet))

# join_dataframes([df_ppo, df_daymet])
# from springtime.datasets import RPPO, Daymet
# from springtime.utils import PointsFromOther, join_dataframes
# import logging

# logging.basicConfig(level=logging.DEBUG)

# # TODO Find example where PPO data is present in the bbox and check that the result is OK.

# indianapolis = {"name": "indianapolis", "bbox": [-86.5, 39.5, -86, 40.1]}  # no results
# # https://github.com/ropensci/rppo/pull/22

# ppo = RPPO(years=[2000, 2002], area=indianapolis)

# daymet = Daymet(
#     variables=["tmin", "tmax"],
#     years=[2000, 2002],
#     points=PointsFromOther(source="ppo"),
#     area=indianapolis,
#     frequency="monthly",
# )

# df_ppo = ppo.load()
# daymet.points.get_points(df_ppo)
# df_daymet = daymet.load()

# print(len(df_ppo))
# print(len(df_daymet))

# join_dataframes([df_ppo, df_daymet])

To recipe¶

Finally, we can also export our dataset to a recipe for easy sharing and reproducibility.

In [9]:

Copied!

print(daymet_area.to_recipe())
print(daymet_area.to_recipe())

dataset: daymet
years:
- 2000
- 2002
area:
  name: indianapolis
  bbox:
  - -86.5
  - 39.5
  - -86.0
  - 40.1
variables:
- tmin
- tmax
mosaic: na
frequency: monthly