utils

`Points = Sequence[Point] | PointsFromOther` `module-attribute`

Points can be a list of (lon, lat) tuples or a PointsFromOther object.

`Point`

Bases: NamedTuple

Single point with x and y coordinate.

`PointsFromOther`

Bases: BaseModel

Points from another dataset.

Attributes:

Name	Type	Description
`source`	`str`	Name of dataset to get points from.

`NamedArea`

Bases: BaseModel

Named area with bounding box.

`NamedIdentifiers`

Bases: BaseModel

List of identifiers with a name.

`YearRange`

Bases: NamedTuple

Date range in years.

Example:

>>> YearRange(2000, 2005)
YearRange(start=2000, end=2005)
>>> YearRange(start=2000, end=2005).range
range(2000, 2006)
>>> YearRange(2000, 2000)
YearRange(start=2000, end=2000)

`end: PositiveInt` `instance-attribute`

The end year is inclusive.

`range: range` `property`

Return the range of years.

`TimeoutError`

Bases: Exception

Timeout Exception.

When a function takes too long to run when using the :func:retry decorator.

`ResampleConfig`

Bases: BaseModel

`frequency: str = 'month'` `class-attribute` `instance-attribute`

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html for allowed values.

`operator: str = 'mean'` `class-attribute` `instance-attribute`

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html and https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#built-in-aggregation-methods for allowed values.

`retry(timeout=10, max_tries=3, delay=1, backoff=2)`

Decorator to retry function with timeout.

The decorator will call the function up to max_tries times if it raises an exception.

Note

Decorating function which calls rpy2 does not work. Please use :func:run_r_script method instead.

Parameters:

Name	Type	Description	Default
`timeout`	`int`	Maximum mumber of seconds the function may take.	`10`
`max_tries`	`int`	Maximum number of times to execute the function.	`3`
`delay`	`int`	Sleep this many seconds * backoff * try number after failure	`1`
`backoff`	`int`	Multiply delay by this factor after each failure	`2`

Raises:

Type	Description
`TimeoutError`	When max tries is reached and last try timedout.

`run_r_script(script, timeout=30, max_tries=3)`

Run R script with retries and timeout logic.

Parameters:

Name	Type	Description	Default
`script`	`str`	The R script to run	required
`timeout`	`int`	Maximum mumber of seconds the function may take.	`30`
`max_tries`	`int`	Maximum number of times to execute the function.	`3`

`transponse_df(df, index=('year', 'geometry'), columns=('doy'))`

Ensure features are in columns not in rows

ML records are characterized by a unique combination of location and year. Predictor variables like (daily/monthly) temperature may have multiple values for the same location and year.

This function reorganizes the data such that multiple predictor values for the same records occur in separate columns.

For example:

       year                 geometry  doy   temperature
    0  2000  POINT (1.00000 1.00000)    1             14
    1  2000  POINT (1.00000 1.00000)    2             15

becomes

       year                 geometry  temperature_1  temperature_2
    0  2000  POINT (1.00000 1.00000)             14             15

Parameters:

Name	Description	Default
`df`	The raw data in "long form"	required
`index`	Columns to use as unique record identifiers.	`('year', 'geometry')`
`columns`	Columns that contain the index for the repeated predictors.	`('doy')`

Returns:

Type	Description
	"Wide form" data frame with year and geometry column and
	columns named `<original column name>_<doy>`.

`rolling_mean(df, over, groupby=('year', 'geometry'), window_sizes=(3, 7, 15, 30, 90, 365))`

Group by groupby columns and calculate rolling mean for over columns with different window sizes.

`resample(df, freq='month', operator='mean', column='datetime')`

Resample data on year, geometry, and given frequency.

Options for freq (properties of df.time.dt): - 'month' - 'week' - 'day' - 'dayofyear' - ...

`points_from_cube(ds, points, xdim='lon', ydim='lat')`

From a cube, extract the values at the given points.

Parameters:

Name	Type	Description	Default
`ds`	`Dataset`	Xarray datset with latitude and longitude dimensions	required
`points`	`Points`	List of points as (lon, lat) tuples	required

Returns:

Type	Description
`GeoDataFrame`	Dataframe with columns for each point and each variable in the dataset.
`GeoDataFrame`	The points are in the geometry column.

`get_points_from_raster(points, ds)`

Extract points from area.

`extract_points(ds, points, method='nearest')`

Extract list of points from gridded dataset.

`extract_records(ds, records)`

Extract list of year/geometry records from gridded dataset.

`join_dataframes(dfs, index_cols=['year', 'geometry'])`

Join dataframes by index cols.

Assumes incoming data is a geopandas dataframe with a geometry column. Not as index.

`split_time(ds, freq='daily')`

Split datetime coordinate into year and dayofyear or month.

utils

Points = Sequence[Point] | PointsFromOther module-attribute

Point

PointsFromOther

NamedArea

NamedIdentifiers

YearRange

end: PositiveInt instance-attribute

range: range property

TimeoutError

ResampleConfig

frequency: str = 'month' class-attribute instance-attribute

operator: str = 'mean' class-attribute instance-attribute

retry(timeout=10, max_tries=3, delay=1, backoff=2)

run_r_script(script, timeout=30, max_tries=3)

transponse_df(df, index=('year', 'geometry'), columns=('doy'))

rolling_mean(df, over, groupby=('year', 'geometry'), window_sizes=(3, 7, 15, 30, 90, 365))

resample(df, freq='month', operator='mean', column='datetime')

points_from_cube(ds, points, xdim='lon', ydim='lat')

get_points_from_raster(points, ds)

extract_points(ds, points, method='nearest')

extract_records(ds, records)

join_dataframes(dfs, index_cols=['year', 'geometry'])

split_time(ds, freq='daily')

`Points = Sequence[Point] | PointsFromOther` `module-attribute`

`Point`

`PointsFromOther`

`NamedArea`

`NamedIdentifiers`

`YearRange`

`end: PositiveInt` `instance-attribute`

`range: range` `property`

`TimeoutError`

`ResampleConfig`

`frequency: str = 'month'` `class-attribute` `instance-attribute`

`operator: str = 'mean'` `class-attribute` `instance-attribute`

`retry(timeout=10, max_tries=3, delay=1, backoff=2)`

`run_r_script(script, timeout=30, max_tries=3)`

`transponse_df(df, index=('year', 'geometry'), columns=('doy'))`

`rolling_mean(df, over, groupby=('year', 'geometry'), window_sizes=(3, 7, 15, 30, 90, 365))`

`resample(df, freq='month', operator='mean', column='datetime')`

`points_from_cube(ds, points, xdim='lon', ydim='lat')`

`get_points_from_raster(points, ds)`

`extract_points(ds, points, method='nearest')`

`extract_records(ds, records)`

`join_dataframes(dfs, index_cols=['year', 'geometry'])`

`split_time(ds, freq='daily')`