National Phenology Network (NPN)¶
Retrieve data from the National Phenology Network (NPN) using rnpn.
To install rnpn
(in an R session):
install.packages("rnpn")
Listing species¶
Before we can download any data, we need to know which species and phenophases
are available. To this end, we can use the npn_species
and npn_phenophases
functions.
from springtime.datasets.rnpn import npn_species
species = npn_species()
species
species_id | common_name | genus | genus_id | genus_common_name | species | kingdom | itis_taxonomic_sn | functional_type | class_id | class_common_name | class_name | order_id | order_common_name | order_name | family_id | family_name | family_common_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 120 | 'ohi'a lehua | Metrosideros | 798 | Lehuas (Metrosideros) | polymorpha | Plantae | 27259.0 | Evergreen broadleaf | 15 | Flowering Plants | Magnoliopsida | 89 | Myrtle and Evening-primrose Families | Myrtales | 301 | Myrtaceae | Myrtle Family |
1 | 1436 | absinthium | Artemisia | 437 | Sagebrushes (Artemisia) | absinthium | Plantae | 35445.0 | Forb | 15 | Flowering Plants | Magnoliopsida | 69 | Aster, Bellflower and Buckbean Families | Asterales | 242 | Asteraceae | Aster Family |
2 | 1227 | Acadian flycatcher | Empidonax | 612 | Empidonax Flycatchers (Empidonax) | virescens | Animalia | 178339.0 | Bird | 5 | Birds | Aves | 31 | Perching Birds | Passeriformes | 154 | Tyrannidae | Tyrant Flycatchers |
3 | 1229 | acorn woodpecker | Melanerpes | 790 | Melanerpine Woodpeckers (Melanerpes) | formicivorus | Animalia | 178189.0 | Bird | 5 | Birds | Aves | 33 | Woodpeckers | Piciformes | 158 | Picidae | Woodpeckers |
4 | 2110 | Adam and Eve | Aplectrum | 1285 | Adam and Eves (Aplectrum) | hyemale | Plantae | 43489.0 | Forb | 15 | Flowering Plants | Magnoliopsida | 68 | Asparagas, Iris, Orchid and Aloe Families | Asparagales | 307 | Orchidaceae | Orchid Family |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1750 | 1671 | yerba mansa | Anemopsis | 413 | Yerba Mansa (Anemopsis) | californica | Plantae | 18223.0 | Forb | 15 | Flowering Plants | Magnoliopsida | 92 | Birthwort and Lizard's-tail Families | Piperales | 331 | Saururaceae | Lizard's-tail Family |
1751 | 228 | Yoshino cherry | Prunus | 933 | Cherries (Prunus) | yedoensis | Plantae | 836663.0 | Deciduous broadleaf | 15 | Flowering Plants | Magnoliopsida | 96 | Oleaster, Buckthorn, Rose and Elm Families | Rosales | 325 | Rosaceae | Rose Family |
1752 | 1043 | youth on age | Tolmiea | 1058 | Youth on Ages (Tolmiea) | menziesii | Plantae | 24533.0 | Forb | 15 | Flowering Plants | Magnoliopsida | 98 | Currant, Witch-hazel and Saxifrage Families | Saxifragales | 332 | Saxifragaceae | Saxifrage Family |
1753 | 1395 | zebra-tailed lizard | Callisaurus | 479 | Zebra-tailed Lizards (Callisaurus) | draconoides | Animalia | 173906.0 | Reptile | 10 | Reptiles | Reptilia | 54 | Snakes and Lizards | Squamata | 209 | Phrynosomatidae | Zebra-tailed and Horned Lizards |
1754 | 2188 | zigzag spiderwort | Tradescantia | 1061 | Spiderworts (Tradescantia) | subaspera | Plantae | 39176.0 | Forb | 15 | Flowering Plants | Magnoliopsida | 74 | Spiderwort and Water-hyacinth Families | Commelinales | 260 | Commelinaceae | Spiderwort Family |
1755 rows × 18 columns
Let's say we're interested in the common lilac, we can find corresponding geni by querying this dataframe.
species.query('common_name.str.contains("lilac")')
species_id | common_name | genus | genus_id | genus_common_name | species | kingdom | itis_taxonomic_sn | functional_type | class_id | class_common_name | class_name | order_id | order_common_name | order_name | family_id | family_name | family_common_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
421 | 36 | common lilac | Syringa | 1035 | Lilacs (Syringa) | vulgaris | Plantae | 32996.0 | Deciduous broadleaf | 15 | Flowering Plants | Magnoliopsida | 83 | Mint, Olive and Plantain Families | Lamiales | 305 | Oleaceae | Olive Family |
883 | 1243 | lilac borer | Podosesia | 912 | Ash/Lilac Borer Moths (Podosesia) | syringae | Animalia | NaN | Insect | 8 | Insects | Insecta | 43 | Butterflies and Moths | Lepidoptera | 363 | Sesiidae | Clearwing Moths |
884 | 1169 | lilac chastetree | Vitex | 1096 | Chastetrees (Vitex) | agnus-castus | Plantae | 32221.0 | Deciduous broadleaf | 15 | Flowering Plants | Magnoliopsida | 83 | Mint, Olive and Plantain Families | Lamiales | 286 | Lamiaceae | Mint Family |
924 | 1214 | Manchurian lilac | Syringa | 1035 | Lilacs (Syringa) | pubescens | Plantae | 832925.0 | Deciduous broadleaf | 15 | Flowering Plants | Magnoliopsida | 83 | Mint, Olive and Plantain Families | Lamiales | 305 | Oleaceae | Olive Family |
1211 | 35 | Red Rothomagensis lilac | Syringa | 1035 | Lilacs (Syringa) | chinensis | Plantae | 832915.0 | Deciduous broadleaf | 15 | Flowering Plants | Magnoliopsida | 83 | Mint, Olive and Plantain Families | Lamiales | 305 | Oleaceae | Olive Family |
Sometimes, we might want more than one species. To this end, springtime can group multiple species_ids under a common name, using a class called `NamedIdentifiers``. Additionally, springtime comes with a couple of helper functions to quickly filter the species, e.g. on functional type.
For example, to get all species of cactus:
from springtime.datasets.rnpn import npn_species_ids_by_functional_type
cactus = npn_species_ids_by_functional_type("Cactus")
print(cactus)
NamedIdentifiers( name='Cactus', items=[946, 945, 2011, 2133, 867, 1941, 294, 855, 1746, 1958, 1773, 2012, 215, 210, 948, 866, 947, 1942] )
Tip: to quickly see all functional types and the number of species in each category, you could do:
species.functional_type.value_counts()
Forb 561 Deciduous broadleaf 338 Bird 165 Evergreen broadleaf 137 Graminoid 123 Insect 94 Drought deciduous broadleaf 70 Amphibian 40 Mammal 39 Semi-evergreen broadleaf 31 Evergreen conifer 26 Reptile 25 Semi-evergreen forb 25 Fish 22 Pine 20 Cactus 18 Evergreen forb 15 Deciduous conifer 5 Algae 1 Name: functional_type, dtype: int64
Phenophase IDs¶
Similarly, we can list all phenophases and subsequently query it to make combined lists of phenophases of interest.
from springtime.datasets.rnpn import npn_phenophases
phenophases = npn_phenophases()
phenophases
phenophase_id | phenophase_name | phenophase_category | color | pheno_class_id | |
---|---|---|---|---|---|
0 | 56 | First leaf | Leaves | NaN | 1 |
1 | 57 | 75% leaf elongation | Leaves | NaN | 2 |
2 | 58 | First flower | Flowers | NaN | 7 |
3 | 59 | Last flower | Flowers | NaN | 9 |
4 | 60 | First fruit ripe | Fruits | NaN | 12 |
... | ... | ... | ... | ... | ... |
194 | 545 | Post-dormant nymphs | Development | Brown3 | 113 |
195 | 546 | Crawlers | Development | Brown3 | 112 |
196 | 547 | Egg laying | Reproduction | Brown2 | 136 |
197 | 548 | Egg laying | Reproduction | Brown2 | 136 |
198 | 549 | Mating | Reproduction | Brown2 | 82 |
199 rows × 5 columns
phenophases.query('phenophase_name.str.contains("flower")')
phenophase_id | phenophase_name | phenophase_category | color | pheno_class_id | |
---|---|---|---|---|---|
2 | 58 | First flower | Flowers | NaN | 7 |
3 | 59 | Last flower | Flowers | NaN | 9 |
8 | 72 | First flower | Flowers | NaN | 7 |
20 | 121 | First flower bud | Flowers | NaN | 6 |
30 | 186 | Full flowering | Flowers | Green2 | 7 |
33 | 201 | Open flowers | Flowers | Green2 | 7 |
35 | 205 | Open flowers | Flowers | Green2 | 7 |
36 | 206 | Full flowering | Flowers | Green2 | 7 |
37 | 207 | End of flowering | Flowers | Green2 | 9 |
38 | 210 | Open flowers | Flowers | Green2 | 7 |
39 | 211 | Full flowering | Flowers | Green2 | 7 |
149 | 494 | Open flowers | Flowers | Green2 | 7 |
155 | 500 | Flowers or flower buds | Flowers | Green2 | 6 |
156 | 501 | Open flowers | Flowers | Green2 | 7 |
from springtime.datasets.rnpn import npn_phenophase_ids_by_name
print(npn_phenophase_ids_by_name("flower"))
NamedIdentifiers( name='flower', items=[58, 59, 72, 121, 186, 201, 205, 206, 207, 210, 211, 256, 312, 493, 494, 500, 501] )
Retrieving data¶
Once we have a grip on the available species and phenophases, we can build a data request for NPN.
from springtime.datasets import RNPN
# Create a data instance
dataset = RNPN(
species_ids={"name": "Syringa", "items": [36]},
phenophase_ids={"name": "leaves", "items": [483]},
years=[2010, 2011],
)
print(dataset)
RNPN( dataset='RNPN', years=YearRange(start=2010, end=2011), species_ids=NamedIdentifiers(name='Syringa', items=[36]), phenophase_ids=NamedIdentifiers(name='leaves', items=[483]), area=None, use_first=True, aggregation_operator='median' )
df = dataset.raw_load()
df.head()
INFO:springtime.datasets.rnpn:Locating data INFO:springtime.datasets.rnpn:Found /home/peter/.cache/springtime/rnpn/rnpn_npn_data_y_2010_Syringa_leaves.csv INFO:springtime.datasets.rnpn:Found /home/peter/.cache/springtime/rnpn/rnpn_npn_data_y_2011_Syringa_leaves.csv
site_id | latitude | longitude | elevation_in_meters | state | species_id | genus | species | common_name | kingdom | ... | first_yes_day | first_yes_doy | first_yes_julian_date | numdays_since_prior_no | last_yes_year | last_yes_month | last_yes_day | last_yes_doy | last_yes_julian_date | numdays_until_next_no | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 17967 | 38.388618 | -91.376022 | 909 | MO | 36 | Syringa | vulgaris | common lilac | Plantae | ... | 1 | 60 | 2455257 | -9999 | 2010 | 3 | 1 | 60 | 2455257 | -9999 |
1 | 17994 | 39.538925 | -79.971695 | 1203 | WV | 36 | Syringa | vulgaris | common lilac | Plantae | ... | 5 | 125 | 2455322 | -9999 | 2010 | 5 | 5 | 125 | 2455322 | -9999 |
2 | 17999 | 39.791470 | -85.609932 | 947 | IN | 36 | Syringa | vulgaris | common lilac | Plantae | ... | 12 | 102 | 2455299 | -9999 | 2010 | 4 | 12 | 102 | 2455299 | -9999 |
3 | 18032 | 40.947803 | -76.628807 | 613 | PA | 36 | Syringa | vulgaris | common lilac | Plantae | ... | 5 | 95 | 2455292 | -9999 | 2010 | 4 | 5 | 95 | 2455292 | -9999 |
4 | 18051 | 41.292011 | -91.693176 | 734 | IA | 36 | Syringa | vulgaris | common lilac | Plantae | ... | 12 | 102 | 2455299 | -9999 | 2010 | 4 | 12 | 102 | 2455299 | -9999 |
5 rows × 25 columns
NPN data is available in different forms. Springtime uses the "individual phenometrics" type. From the documentation:
This data type includes estimates of the dates of phenophase onsets and ends for individual plants and for animal species at a site during a user-defined time period. Each row represents a series of consecutive "yes" phenophase status records, beginning with the date of the first "yes" and ending with the date of the last "yes", submitted for a given phenophase on a given organism. Note that more than one consecutive series for an organism may be present within a single growing season or year.
Please also refer to the documentation on data quality and cleaning and make sure you adequately assess the data quality in your research.
In case we're happy with the data quality, we can extract the geometry and the first (or last) yes day from the data. This is done automatically in the springtime load function.
df = dataset.load()
df
INFO:springtime.datasets.rnpn:Locating data INFO:springtime.datasets.rnpn:Found /home/peter/.cache/springtime/rnpn/rnpn_npn_data_y_2010_Syringa_leaves.csv INFO:springtime.datasets.rnpn:Found /home/peter/.cache/springtime/rnpn/rnpn_npn_data_y_2011_Syringa_leaves.csv
year | geometry | leaves_doy | |
---|---|---|---|
0 | 2010 | POINT (-91.37602 38.38862) | 60.0 |
1 | 2010 | POINT (-79.97169 39.53892) | 125.0 |
2 | 2010 | POINT (-85.60993 39.79147) | 102.0 |
3 | 2010 | POINT (-76.62881 40.94780) | 95.0 |
4 | 2010 | POINT (-91.69318 41.29201) | 102.0 |
5 | 2010 | POINT (-91.48378 41.88856) | 94.0 |
6 | 2010 | POINT (-74.29987 42.10105) | 99.0 |
7 | 2010 | POINT (-77.43737 42.89832) | 98.0 |
8 | 2011 | POINT (-83.05326 35.59232) | 103.0 |
9 | 2011 | POINT (-80.53153 35.59463) | 73.0 |
10 | 2011 | POINT (-84.17097 35.65687) | 69.0 |
11 | 2011 | POINT (-86.96690 36.39927) | 56.0 |
12 | 2011 | POINT (-91.78845 36.65863) | 80.0 |
13 | 2011 | POINT (-90.15485 38.71774) | 78.0 |
14 | 2011 | POINT (-83.82374 38.82653) | 82.0 |
15 | 2011 | POINT (-84.52015 38.88288) | 80.0 |
16 | 2011 | POINT (-85.24599 38.99372) | 101.0 |
17 | 2011 | POINT (-81.54738 39.24728) | 77.0 |
18 | 2011 | POINT (-77.48402 39.72047) | 101.0 |
19 | 2011 | POINT (-79.89363 40.66472) | 94.0 |
20 | 2011 | POINT (-73.11993 40.85247) | 81.0 |
21 | 2011 | POINT (-77.44374 40.98488) | 104.0 |
22 | 2011 | POINT (-80.29738 41.04120) | 103.0 |
23 | 2011 | POINT (-74.59502 41.12333) | 120.0 |
24 | 2011 | POINT (-80.81221 41.13425) | 121.0 |
25 | 2011 | POINT (-85.12750 41.21440) | 105.0 |
26 | 2011 | POINT (-72.63346 41.91122) | 117.0 |
27 | 2011 | POINT (-80.33588 42.01947) | 119.0 |
28 | 2011 | POINT (-79.26913 42.09705) | 113.0 |
29 | 2011 | POINT (-80.01417 42.13030) | 110.0 |
30 | 2011 | POINT (-73.65140 42.33499) | 108.0 |
31 | 2011 | POINT (-83.21512 42.49105) | 113.0 |
32 | 2011 | POINT (-88.16408 42.98386) | 132.0 |
Save as recipe¶
Finally, we can export the dataset definition as a yaml-recipe for later reference.
print(dataset.to_recipe())
dataset: RNPN years: - 2010 - 2011 species_ids: name: Syringa items: - 36 phenophase_ids: name: leaves items: - 483 use_first: true aggregation_operator: median
TODO¶
- Add tests
- Add to docs