Skip to content

utils

Points = Sequence[Point] | PointsFromOther module-attribute

Points can be a list of (lon, lat) tuples or a PointsFromOther object.

Point

Bases: NamedTuple

Single point with x and y coordinate.

PointsFromOther

Bases: BaseModel

Points from another dataset.

Attributes:

Name Type Description
source str

Name of dataset to get points from.

NamedArea

Bases: BaseModel

Named area with bounding box.

NamedIdentifiers

Bases: BaseModel

List of identifiers with a name.

YearRange

Bases: NamedTuple

Date range in years.

Example:

>>> YearRange(2000, 2005)
YearRange(start=2000, end=2005)
>>> YearRange(start=2000, end=2005).range
range(2000, 2006)
>>> YearRange(2000, 2000)
YearRange(start=2000, end=2000)

end: PositiveInt instance-attribute

The end year is inclusive.

range: range property

Return the range of years.

TimeoutError

Bases: Exception

Timeout Exception.

When a function takes too long to run when using the :func:retry decorator.

ResampleConfig

Bases: BaseModel

frequency: str = 'month' class-attribute instance-attribute

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html for allowed values.

operator: str = 'mean' class-attribute instance-attribute

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html and https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#built-in-aggregation-methods for allowed values.

retry(timeout=10, max_tries=3, delay=1, backoff=2)

Decorator to retry function with timeout.

The decorator will call the function up to max_tries times if it raises an exception.

Note

Decorating function which calls rpy2 does not work. Please use :func:run_r_script method instead.

Parameters:

Name Type Description Default
timeout int

Maximum mumber of seconds the function may take.

10
max_tries int

Maximum number of times to execute the function.

3
delay int

Sleep this many seconds * backoff * try number after failure

1
backoff int

Multiply delay by this factor after each failure

2

Raises:

Type Description
TimeoutError

When max tries is reached and last try timedout.

run_r_script(script, timeout=30, max_tries=3)

Run R script with retries and timeout logic.

Parameters:

Name Type Description Default
script str

The R script to run

required
timeout int

Maximum mumber of seconds the function may take.

30
max_tries int

Maximum number of times to execute the function.

3

transponse_df(df, index=('year', 'geometry'), columns=('doy'))

Ensure features are in columns not in rows

ML records are characterized by a unique combination of location and year. Predictor variables like (daily/monthly) temperature may have multiple values for the same location and year.

This function reorganizes the data such that multiple predictor values for the same records occur in separate columns.

For example:

       year                 geometry  doy   temperature
    0  2000  POINT (1.00000 1.00000)    1             14
    1  2000  POINT (1.00000 1.00000)    2             15

becomes

       year                 geometry  temperature_1  temperature_2
    0  2000  POINT (1.00000 1.00000)             14             15

Parameters:

Name Type Description Default
df

The raw data in "long form"

required
index

Columns to use as unique record identifiers.

('year', 'geometry')
columns

Columns that contain the index for the repeated predictors.

('doy')

Returns:

Type Description

"Wide form" data frame with year and geometry column and

columns named <original column name>_<doy>.

rolling_mean(df, over, groupby=('year', 'geometry'), window_sizes=(3, 7, 15, 30, 90, 365))

Group by groupby columns and calculate rolling mean for over columns with different window sizes.

resample(df, freq='month', operator='mean', column='datetime')

Resample data on year, geometry, and given frequency.

Options for freq (properties of df.time.dt): - 'month' - 'week' - 'day' - 'dayofyear' - ...

points_from_cube(ds, points, xdim='lon', ydim='lat')

From a cube, extract the values at the given points.

Parameters:

Name Type Description Default
ds Dataset

Xarray datset with latitude and longitude dimensions

required
points Points

List of points as (lon, lat) tuples

required

Returns:

Type Description
GeoDataFrame

Dataframe with columns for each point and each variable in the dataset.

GeoDataFrame

The points are in the geometry column.

get_points_from_raster(points, ds)

Extract points from area.

extract_points(ds, points, method='nearest')

Extract list of points from gridded dataset.

extract_records(ds, records)

Extract list of year/geometry records from gridded dataset.

join_dataframes(dfs, index_cols=['year', 'geometry'])

Join dataframes by index cols.

Assumes incoming data is a geopandas dataframe with a geometry column. Not as index.

split_time(ds, freq='daily')

Split datetime coordinate into year and dayofyear or month.