utils
Points = Sequence[Point] | PointsFromOther
module-attribute
Points can be a list of (lon, lat) tuples or a PointsFromOther object.
Point
PointsFromOther
NamedArea
NamedIdentifiers
YearRange
Bases: NamedTuple
Date range in years.
Example:
>>> YearRange(2000, 2005)
YearRange(start=2000, end=2005)
>>> YearRange(start=2000, end=2005).range
range(2000, 2006)
>>> YearRange(2000, 2000)
YearRange(start=2000, end=2000)
end: PositiveInt
instance-attribute
The end year is inclusive.
range: range
property
Return the range of years.
TimeoutError
Bases: Exception
Timeout Exception.
When a function takes too long to run when using the :func:retry decorator.
ResampleConfig
Bases: BaseModel
frequency: str = 'month'
class-attribute
instance-attribute
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html for allowed values.
operator: str = 'mean'
class-attribute
instance-attribute
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html and https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#built-in-aggregation-methods for allowed values.
retry(timeout=10, max_tries=3, delay=1, backoff=2)
Decorator to retry function with timeout.
The decorator will call the function up to max_tries times if it raises an exception.
Note
Decorating function which calls rpy2 does not work. Please use :func:run_r_script method instead.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeout |
int
|
Maximum mumber of seconds the function may take. |
10
|
max_tries |
int
|
Maximum number of times to execute the function. |
3
|
delay |
int
|
Sleep this many seconds * backoff * try number after failure |
1
|
backoff |
int
|
Multiply delay by this factor after each failure |
2
|
Raises:
Type | Description |
---|---|
TimeoutError
|
When max tries is reached and last try timedout. |
run_r_script(script, timeout=30, max_tries=3)
transponse_df(df, index=('year', 'geometry'), columns=('doy'))
Ensure features are in columns not in rows
ML records are characterized by a unique combination of location and year. Predictor variables like (daily/monthly) temperature may have multiple values for the same location and year.
This function reorganizes the data such that multiple predictor values for the same records occur in separate columns.
For example:
year geometry doy temperature
0 2000 POINT (1.00000 1.00000) 1 14
1 2000 POINT (1.00000 1.00000) 2 15
becomes
year geometry temperature_1 temperature_2
0 2000 POINT (1.00000 1.00000) 14 15
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
The raw data in "long form" |
required | |
index |
Columns to use as unique record identifiers. |
('year', 'geometry')
|
|
columns |
Columns that contain the index for the repeated predictors. |
('doy')
|
Returns:
Type | Description |
---|---|
"Wide form" data frame with year and geometry column and |
|
columns named |
rolling_mean(df, over, groupby=('year', 'geometry'), window_sizes=(3, 7, 15, 30, 90, 365))
Group by groupby
columns and calculate rolling mean
for over
columns with different window sizes.
resample(df, freq='month', operator='mean', column='datetime')
Resample data on year, geometry, and given frequency.
Options for freq (properties of df.time.dt): - 'month' - 'week' - 'day' - 'dayofyear' - ...
points_from_cube(ds, points, xdim='lon', ydim='lat')
From a cube, extract the values at the given points.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ds |
Dataset
|
Xarray datset with latitude and longitude dimensions |
required |
points |
Points
|
List of points as (lon, lat) tuples |
required |
Returns:
Type | Description |
---|---|
GeoDataFrame
|
Dataframe with columns for each point and each variable in the dataset. |
GeoDataFrame
|
The points are in the geometry column. |
get_points_from_raster(points, ds)
Extract points from area.
extract_points(ds, points, method='nearest')
Extract list of points from gridded dataset.
extract_records(ds, records)
Extract list of year/geometry records from gridded dataset.
join_dataframes(dfs, index_cols=['year', 'geometry'])
Join dataframes by index cols.
Assumes incoming data is a geopandas dataframe with a geometry column. Not as index.
split_time(ds, freq='daily')
Split datetime coordinate into year and dayofyear or month.