PPO¶
Uses rppo to get observations from the Plant Phenology Ontology (PPO) data portal
Example use case¶
The paper introducing the PPO portal suggests the following:
... we examined leafing out dates for the genera Acer (maples) and Quercus (oaks) and flowering dates for the genera Acer and Syringa (lilacs). [...] To estimate leafing out dates, we used all observations of plants with the PPO trait 'true leaves present' that did not also have the trait 'senescing true leaves present', and to estimate flowering dates, we used all observations of plants with the PPO trait 'flowers present' that did not also have the trait 'senesced flowers present'. All geographic locations (i.e., latitude and longitude) were rounded to a 0.1-degree grid, and the data were filtered to only keep the earliest relevant observation for each unique combination of grid cell and year.
Here, we will walk through the steps to do this from scratch with springtime, and finally see how we can do the same thing in one go.
Getting the term IDs¶
The springtime interface is a thin wrapper around rppo, so the options you can provide are similar to those you can provide directly to the R package.
First, we need to figure out the termIDs for "flowering" and "leafing out". We can use the `` ppo_get_terms
function for that.
from springtime.datasets.ppo import ppo_get_terms
terms = ppo_get_terms()
INFO:springtime.datasets.ppo:Downloading terms
terms.query("label.str.contains('true leaves present')")
termID | label | definition | |
---|---|---|---|
10 | obo:PPO_0002322 | expanding true leaves present | An 'expanding true leaf presence' (PPO:0002024... |
12 | obo:PPO_0002320 | expanding unfolded true leaves present | An 'expanding unfolded true leaf presence' (PP... |
22 | obo:PPO_0002318 | immature unfolded true leaves present | An 'immature unfolded true leaf presence' (PPO... |
25 | obo:PPO_0002319 | mature true leaves present | An 'mature true leaf presence' (PPO:0002021) t... |
41 | obo:PPO_0002316 | non-senescing unfolded true leaves present | A 'non-senescing unfolded true leaf presence' ... |
70 | obo:PPO_0002317 | senescing true leaves present | A 'senescing true leaf presence' (PPO:0002019)... |
71 | obo:PPO_0002313 | true leaves present | A 'true leaf presence' (PPO:0002015) trait tha... |
73 | obo:PPO_0002315 | unfolded true leaves present | An 'unfolded true leaf presence' (PPO:0002017)... |
75 | obo:PPO_0002314 | unfolding true leaves present | An 'unfolding true leaf presence' (PPO:0002016... |
terms.query("label.str.contains('flowers present')")
termID | label | definition | |
---|---|---|---|
16 | obo:PPO_0002330 | flowers present | A 'flower presence' (PPO:0002032) trait that i... |
39 | obo:PPO_0002331 | non-senesced flowers present | A 'non-senesced flower presence' (PPO:0002033)... |
46 | obo:PPO_0002333 | open flowers present | An 'open flower presence' (PPO:0002035) trait ... |
52 | obo:PPO_0002334 | pollen-releasing flowers present | A 'pollen-releasing flower presence' (PPO:0002... |
68 | obo:PPO_0002335 | senesced flowers present | A 'senesced flower presence' (PPO:0002037) tra... |
80 | obo:PPO_0002332 | unopened flowers present | An 'unopened flower presence' (PPO:0002034) tr... |
Getting the data¶
Now that we know the relevant termIDs, let's start with a simple dataset definition.
from springtime.datasets import RPPO
leafing_maples = RPPO(genus="Acer", termID="obo:PPO_0002313", years=[1990, 2020])
leafing_oaks = RPPO(genus="Quercus", termID="obo:PPO_0002313", years=[1990, 2020])
flowering_maples = RPPO(genus="Acer", termID="obo:PPO_0002032", years=[1990, 2020])
flowering_lilacs = RPPO(genus="Syringa", termID="obo:PPO_0002032", years=[1990, 2020])
Let's continue to explore the flowering lilacs
print(flowering_lilacs)
RPPO( dataset='rppo', years=YearRange(start=1990, end=2020), genus='Syringa', termID='obo:PPO_0002032', area=None, limit=None, timeLimit=60, exclude_terms=[], infer_event=None )
raw_df = flowering_lilacs.raw_load()
INFO:springtime.datasets.ppo:Locating data...
Found /home/peter/.cache/springtime/PPO/Syringa.obo:PPO_0002032.1990-2020.csv
raw_df
dayOfYear | year | genus | specificEpithet | eventRemarks | latitude | longitude | termID | source | eventId | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 142 | 2016 | Syringa | vulgaris | End of flowering (lilac/honeysuckle) | 44.930183 | -93.209820 | obo:BFO_0000002,obo:BFO_0000001,obo:PPO_000200... | USA-NPN | urn:phenologicalObservingProcess/7956537 |
1 | 149 | 2016 | Syringa | vulgaris | End of flowering (lilac/honeysuckle) | 44.930183 | -93.209820 | obo:BFO_0000020,obo:PPO_0002037,obo:PPO_000232... | USA-NPN | urn:phenologicalObservingProcess/8021769 |
2 | 149 | 2016 | Syringa | vulgaris | End of flowering (lilac/honeysuckle) | 44.930183 | -93.209820 | obo:BFO_0000020,obo:PPO_0002324,obo:PPO_000232... | USA-NPN | urn:phenologicalObservingProcess/8021774 |
3 | 152 | 2020 | Syringa | vulgaris | End of flowering (lilac/honeysuckle) | 44.930183 | -93.209820 | obo:PATO_0000001,obo:BFO_0000002,obo:BFO_00000... | USA-NPN | urn:phenologicalObservingProcess/22739137 |
4 | 161 | 2020 | Syringa | vulgaris | End of flowering (lilac/honeysuckle) | 44.930183 | -93.209820 | obo:PATO_0000001,obo:PPO_0002323,obo:PPO_00023... | USA-NPN | urn:phenologicalObservingProcess/22808877 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
29374 | 99 | 2010 | Syringa | vulgaris | Open flowers (lilac) | 42.168755 | -88.371340 | obo:PATO_0000001,obo:PPO_0002041,obo:BFO_00000... | USA-NPN | urn:phenologicalObservingProcess/193882 |
29375 | 99 | 2010 | Syringa | vulgaris | Full flowering (lilac) | 42.168755 | -88.371340 | obo:PATO_0000001,obo:PPO_0002041,obo:BFO_00000... | USA-NPN | urn:phenologicalObservingProcess/193883 |
29376 | 115 | 2010 | Syringa | vulgaris | Open flowers (lilac) | 42.162610 | -88.398506 | obo:PPO_0002001,obo:PPO_0002331,obo:PPO_000200... | USA-NPN | urn:phenologicalObservingProcess/204229 |
29377 | 124 | 2010 | Syringa | vulgaris | Open flowers (lilac) | 42.162610 | -88.398506 | obo:PPO_0002025,obo:PPO_0002026,obo:PPO_000203... | USA-NPN | urn:phenologicalObservingProcess/232974 |
29378 | 105 | 2010 | Syringa | vulgaris | Open flowers (lilac) | 42.162610 | -88.398506 | obo:PPO_0002330,obo:PPO_0002331,obo:PPO_000233... | USA-NPN | urn:phenologicalObservingProcess/194357 |
29379 rows × 10 columns
The raw data takes some getting used to. What's nice is that we already have columns for year, dayOfYear, latitude, and longitude. The other columns are less evident.
Filtering senesced flowers¶
The most relevant column is the termID
. PPO is a state-based dataset, and the termID
column contains every state that is applicable for a given observation. In our query, we looked for records with termID PPO:0002032 = flower presence, and we can verify that indeed, this term is present in all rows.
flowers = raw_df.termID.map(lambda x: "obo:PPO_0002032" in x)
print(f"{flowers.sum()} / {len(raw_df)}")
29379 / 29379
According to the paper, we need to disregard any terms that also have "senesced flowers present", so we need to filter our data. Unfortunately, this is a bit problematic, as the first 1000 results (note we set the limit to 1000) all seem to include both termIDs.
# fresh_flowers = raw_df.query("~termID.str.contains('obo:PPO_0002335')")
# equivalent but faster
fresh_flowers = raw_df[~raw_df.termID.map(lambda x: "obo:PPO_0002335" in x)]
print(f"{len(fresh_flowers)} / {len(raw_df)}")
20695 / 29379
Conversion to event-based data¶
Notice that sometimes the same state may have been observed multiple times in the same year.
fresh_flowers.groupby(["year", "latitude", "longitude"])["dayOfYear"].agg(
["min", "max", "count"]
)
min | max | count | |||
---|---|---|---|---|---|
year | latitude | longitude | |||
1990 | 30.930000 | -100.120000 | 68 | 77 | 2 |
32.650000 | -103.380000 | 88 | 101 | 2 | |
32.670000 | -116.300000 | 97 | 106 | 2 | |
32.850000 | -116.620000 | 92 | 102 | 2 | |
32.930000 | -107.570000 | 96 | 103 | 2 | |
... | ... | ... | ... | ... | ... |
2020 | 48.024920 | -122.469190 | 140 | 140 | 2 |
48.033490 | -122.598724 | 111 | 134 | 5 | |
48.919930 | -122.640564 | 130 | 130 | 2 | |
49.356550 | -124.414300 | 149 | 163 | 3 | |
52.095947 | -106.574160 | 157 | 157 | 2 |
3213 rows × 3 columns
Following the procedure outlined in the reference paper, we can convert the data to an event-based dataset by retaining only the first DOY.
groups = ["year", "latitude", "longitude"]
onset_of_flowers = fresh_flowers.groupby(groups)["dayOfYear"].agg("min")
onset_of_flowers
year latitude longitude 1990 30.930000 -100.120000 68 32.650000 -103.380000 88 32.670000 -116.300000 97 32.850000 -116.620000 92 32.930000 -107.570000 96 ... 2020 48.024920 -122.469190 140 48.033490 -122.598724 111 48.919930 -122.640564 130 49.356550 -124.414300 149 52.095947 -106.574160 157 Name: dayOfYear, Length: 3213, dtype: int64
Final tweaks¶
The last step to make ppo fully springtime-compatibly is to convert the data to a geopandas dataframe.
import geopandas as gpd
df = onset_of_flowers.reset_index()
lon = df.pop("longitude")
lat = df.pop("latitude")
geometry = gpd.points_from_xy(lon, lat)
gdf = gpd.GeoDataFrame(df, geometry=geometry)
gdf
year | dayOfYear | geometry | |
---|---|---|---|
0 | 1990 | 68 | POINT (-100.12000 30.93000) |
1 | 1990 | 88 | POINT (-103.38000 32.65000) |
2 | 1990 | 97 | POINT (-116.30000 32.67000) |
3 | 1990 | 92 | POINT (-116.62000 32.85000) |
4 | 1990 | 96 | POINT (-107.57000 32.93000) |
... | ... | ... | ... |
3208 | 2020 | 140 | POINT (-122.46919 48.02492) |
3209 | 2020 | 111 | POINT (-122.59872 48.03349) |
3210 | 2020 | 130 | POINT (-122.64056 48.91993) |
3211 | 2020 | 149 | POINT (-124.41430 49.35655) |
3212 | 2020 | 157 | POINT (-106.57416 52.09595) |
3213 rows × 3 columns
Bringing it all together¶
To do everything in one go, springtime adds the following keywords to the RPPO
dataset: exclude_terms
and infer_event
.
As such, we can completely automate the steps above.
from springtime.datasets import RPPO
flowering_lilacs = RPPO(
genus="Syringa",
termID="obo:PPO_0002032",
years=[1990, 2020],
exclude_terms=["obo:PPO_0002335"],
infer_event="first_yes_day",
)
df = flowering_lilacs.load()
df
INFO:springtime.datasets.ppo:Locating data...
Found /home/peter/.cache/springtime/PPO/Syringa.obo:PPO_0002032.1990-2020.csv
year | dayOfYear | geometry | |
---|---|---|---|
0 | 1990 | 68 | POINT (-100.12000 30.93000) |
1 | 1990 | 88 | POINT (-103.38000 32.65000) |
2 | 1990 | 97 | POINT (-116.30000 32.67000) |
3 | 1990 | 92 | POINT (-116.62000 32.85000) |
4 | 1990 | 96 | POINT (-107.57000 32.93000) |
... | ... | ... | ... |
3208 | 2020 | 140 | POINT (-122.46919 48.02492) |
3209 | 2020 | 111 | POINT (-122.59872 48.03349) |
3210 | 2020 | 130 | POINT (-122.64056 48.91993) |
3211 | 2020 | 149 | POINT (-124.41430 49.35655) |
3212 | 2020 | 157 | POINT (-106.57416 52.09595) |
3213 rows × 3 columns
Export as recipe¶
Finally, we can export the dataset to a recipe for sharing and reproducibility.
print(flowering_lilacs.to_recipe())
dataset: rppo years: - 1990 - 2020 genus: Syringa termID: obo:PPO_0002032 timeLimit: 60 exclude_terms: - obo:PPO_0002335 infer_event: first_yes_day