National Phenology Network (NPN)¶

Retrieve data from the National Phenology Network (NPN) using rnpn.

To install rnpn (in an R session):

install.packages("rnpn")

Listing species¶

Before we can download any data, we need to know which species and phenophases are available. To this end, we can use the npn_species and npn_phenophases functions.

In [2]:

Copied!

from springtime.datasets.rnpn import npn_species

species = npn_species()
species
from springtime.datasets.rnpn import npn_species

species = npn_species()
species

Out[2]:

	species_id	common_name	genus	genus_id	genus_common_name	species	kingdom	itis_taxonomic_sn	functional_type	class_id	class_common_name	class_name	order_id	order_common_name	order_name	family_id	family_name	family_common_name
0	120	'ohi'a lehua	Metrosideros	798	Lehuas (Metrosideros)	polymorpha	Plantae	27259.0	Evergreen broadleaf	15	Flowering Plants	Magnoliopsida	89	Myrtle and Evening-primrose Families	Myrtales	301	Myrtaceae	Myrtle Family
1	1436	absinthium	Artemisia	437	Sagebrushes (Artemisia)	absinthium	Plantae	35445.0	Forb	15	Flowering Plants	Magnoliopsida	69	Aster, Bellflower and Buckbean Families	Asterales	242	Asteraceae	Aster Family
2	1227	Acadian flycatcher	Empidonax	612	Empidonax Flycatchers (Empidonax)	virescens	Animalia	178339.0	Bird	5	Birds	Aves	31	Perching Birds	Passeriformes	154	Tyrannidae	Tyrant Flycatchers
3	1229	acorn woodpecker	Melanerpes	790	Melanerpine Woodpeckers (Melanerpes)	formicivorus	Animalia	178189.0	Bird	5	Birds	Aves	33	Woodpeckers	Piciformes	158	Picidae	Woodpeckers
4	2110	Adam and Eve	Aplectrum	1285	Adam and Eves (Aplectrum)	hyemale	Plantae	43489.0	Forb	15	Flowering Plants	Magnoliopsida	68	Asparagas, Iris, Orchid and Aloe Families	Asparagales	307	Orchidaceae	Orchid Family
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1750	1671	yerba mansa	Anemopsis	413	Yerba Mansa (Anemopsis)	californica	Plantae	18223.0	Forb	15	Flowering Plants	Magnoliopsida	92	Birthwort and Lizard's-tail Families	Piperales	331	Saururaceae	Lizard's-tail Family
1751	228	Yoshino cherry	Prunus	933	Cherries (Prunus)	yedoensis	Plantae	836663.0	Deciduous broadleaf	15	Flowering Plants	Magnoliopsida	96	Oleaster, Buckthorn, Rose and Elm Families	Rosales	325	Rosaceae	Rose Family
1752	1043	youth on age	Tolmiea	1058	Youth on Ages (Tolmiea)	menziesii	Plantae	24533.0	Forb	15	Flowering Plants	Magnoliopsida	98	Currant, Witch-hazel and Saxifrage Families	Saxifragales	332	Saxifragaceae	Saxifrage Family
1753	1395	zebra-tailed lizard	Callisaurus	479	Zebra-tailed Lizards (Callisaurus)	draconoides	Animalia	173906.0	Reptile	10	Reptiles	Reptilia	54	Snakes and Lizards	Squamata	209	Phrynosomatidae	Zebra-tailed and Horned Lizards
1754	2188	zigzag spiderwort	Tradescantia	1061	Spiderworts (Tradescantia)	subaspera	Plantae	39176.0	Forb	15	Flowering Plants	Magnoliopsida	74	Spiderwort and Water-hyacinth Families	Commelinales	260	Commelinaceae	Spiderwort Family

1755 rows × 18 columns

Let's say we're interested in the common lilac, we can find corresponding geni by querying this dataframe.

In [3]:

Copied!

species.query('common_name.str.contains("lilac")')
species.query('common_name.str.contains("lilac")')

Out[3]:

	species_id	common_name	genus	genus_id	genus_common_name	species	kingdom	itis_taxonomic_sn	functional_type	class_id	class_common_name	class_name	order_id	order_common_name	order_name	family_id	family_name	family_common_name
421	36	common lilac	Syringa	1035	Lilacs (Syringa)	vulgaris	Plantae	32996.0	Deciduous broadleaf	15	Flowering Plants	Magnoliopsida	83	Mint, Olive and Plantain Families	Lamiales	305	Oleaceae	Olive Family
883	1243	lilac borer	Podosesia	912	Ash/Lilac Borer Moths (Podosesia)	syringae	Animalia	NaN	Insect	8	Insects	Insecta	43	Butterflies and Moths	Lepidoptera	363	Sesiidae	Clearwing Moths
884	1169	lilac chastetree	Vitex	1096	Chastetrees (Vitex)	agnus-castus	Plantae	32221.0	Deciduous broadleaf	15	Flowering Plants	Magnoliopsida	83	Mint, Olive and Plantain Families	Lamiales	286	Lamiaceae	Mint Family
924	1214	Manchurian lilac	Syringa	1035	Lilacs (Syringa)	pubescens	Plantae	832925.0	Deciduous broadleaf	15	Flowering Plants	Magnoliopsida	83	Mint, Olive and Plantain Families	Lamiales	305	Oleaceae	Olive Family
1211	35	Red Rothomagensis lilac	Syringa	1035	Lilacs (Syringa)	chinensis	Plantae	832915.0	Deciduous broadleaf	15	Flowering Plants	Magnoliopsida	83	Mint, Olive and Plantain Families	Lamiales	305	Oleaceae	Olive Family

Sometimes, we might want more than one species. To this end, springtime can group multiple species_ids under a common name, using a class called `NamedIdentifiers``. Additionally, springtime comes with a couple of helper functions to quickly filter the species, e.g. on functional type.

For example, to get all species of cactus:

In [4]:

Copied!

from springtime.datasets.rnpn import npn_species_ids_by_functional_type

cactus = npn_species_ids_by_functional_type("Cactus")
print(cactus)
from springtime.datasets.rnpn import npn_species_ids_by_functional_type

cactus = npn_species_ids_by_functional_type("Cactus")
print(cactus)

NamedIdentifiers(
    name='Cactus',
    items=[946, 945, 2011, 2133, 867, 1941, 294, 855, 1746, 1958, 1773, 2012, 215, 210, 948, 866, 947, 1942]
)

Tip: to quickly see all functional types and the number of species in each category, you could do:

In [5]:

Copied!

species.functional_type.value_counts()
species.functional_type.value_counts()

Out[5]:

Forb                           561
Deciduous broadleaf            338
Bird                           165
Evergreen broadleaf            137
Graminoid                      123
Insect                          94
Drought deciduous broadleaf     70
Amphibian                       40
Mammal                          39
Semi-evergreen broadleaf        31
Evergreen conifer               26
Reptile                         25
Semi-evergreen forb             25
Fish                            22
Pine                            20
Cactus                          18
Evergreen forb                  15
Deciduous conifer                5
Algae                            1
Name: functional_type, dtype: int64

Phenophase IDs¶

Similarly, we can list all phenophases and subsequently query it to make combined lists of phenophases of interest.

In [6]:

Copied!

from springtime.datasets.rnpn import npn_phenophases

phenophases = npn_phenophases()
phenophases
from springtime.datasets.rnpn import npn_phenophases

phenophases = npn_phenophases()
phenophases

Out[6]:

	phenophase_id	phenophase_name	phenophase_category	color	pheno_class_id
0	56	First leaf	Leaves	NaN	1
1	57	75% leaf elongation	Leaves	NaN	2
2	58	First flower	Flowers	NaN	7
3	59	Last flower	Flowers	NaN	9
4	60	First fruit ripe	Fruits	NaN	12
...	...	...	...	...	...
194	545	Post-dormant nymphs	Development	Brown3	113
195	546	Crawlers	Development	Brown3	112
196	547	Egg laying	Reproduction	Brown2	136
197	548	Egg laying	Reproduction	Brown2	136
198	549	Mating	Reproduction	Brown2	82

199 rows × 5 columns

In [7]:

Copied!

phenophases.query('phenophase_name.str.contains("flower")')
phenophases.query('phenophase_name.str.contains("flower")')

Out[7]:

	phenophase_id	phenophase_name	phenophase_category	color	pheno_class_id
2	58	First flower	Flowers	NaN	7
3	59	Last flower	Flowers	NaN	9
8	72	First flower	Flowers	NaN	7
20	121	First flower bud	Flowers	NaN	6
30	186	Full flowering	Flowers	Green2	7
33	201	Open flowers	Flowers	Green2	7
35	205	Open flowers	Flowers	Green2	7
36	206	Full flowering	Flowers	Green2	7
37	207	End of flowering	Flowers	Green2	9
38	210	Open flowers	Flowers	Green2	7
39	211	Full flowering	Flowers	Green2	7
149	494	Open flowers	Flowers	Green2	7
155	500	Flowers or flower buds	Flowers	Green2	6
156	501	Open flowers	Flowers	Green2	7

In [8]:

Copied!

from springtime.datasets.rnpn import npn_phenophase_ids_by_name

print(npn_phenophase_ids_by_name("flower"))
from springtime.datasets.rnpn import npn_phenophase_ids_by_name

print(npn_phenophase_ids_by_name("flower"))

NamedIdentifiers(
    name='flower',
    items=[58, 59, 72, 121, 186, 201, 205, 206, 207, 210, 211, 256, 312, 493, 494, 500, 501]
)

Retrieving data¶

Once we have a grip on the available species and phenophases, we can build a data request for NPN.

In [9]:

Copied!





from springtime.datasets import RNPN

# Create a data instance
dataset = RNPN(
    species_ids={"name": "Syringa", "items": [36]},
    phenophase_ids={"name": "leaves", "items": [483]},
    years=[2010, 2011],
)
print(dataset)
from springtime.datasets import RNPN

# Create a data instance
dataset = RNPN(
    species_ids={"name": "Syringa", "items": [36]},
    phenophase_ids={"name": "leaves", "items": [483]},
    years=[2010, 2011],
)
print(dataset)

RNPN(
    dataset='RNPN',
    years=YearRange(start=2010, end=2011),
    species_ids=NamedIdentifiers(name='Syringa', items=[36]),
    phenophase_ids=NamedIdentifiers(name='leaves', items=[483]),
    area=None,
    use_first=True,
    aggregation_operator='median'
)

In [10]:

Copied!

df = dataset.raw_load()
df.head()
df = dataset.raw_load()
df.head()

INFO:springtime.datasets.rnpn:Locating data
INFO:springtime.datasets.rnpn:Found /home/peter/.cache/springtime/rnpn/rnpn_npn_data_y_2010_Syringa_leaves.csv
INFO:springtime.datasets.rnpn:Found /home/peter/.cache/springtime/rnpn/rnpn_npn_data_y_2011_Syringa_leaves.csv

Out[10]:

	site_id	latitude	longitude	elevation_in_meters	state	species_id	genus	species	common_name	kingdom	...	first_yes_day	first_yes_doy	first_yes_julian_date	numdays_since_prior_no	last_yes_year	last_yes_month	last_yes_day	last_yes_doy	last_yes_julian_date	numdays_until_next_no
0	17967	38.388618	-91.376022	909	MO	36	Syringa	vulgaris	common lilac	Plantae	...	1	60	2455257	-9999	2010	3	1	60	2455257	-9999
1	17994	39.538925	-79.971695	1203	WV	36	Syringa	vulgaris	common lilac	Plantae	...	5	125	2455322	-9999	2010	5	5	125	2455322	-9999
2	17999	39.791470	-85.609932	947	IN	36	Syringa	vulgaris	common lilac	Plantae	...	12	102	2455299	-9999	2010	4	12	102	2455299	-9999
3	18032	40.947803	-76.628807	613	PA	36	Syringa	vulgaris	common lilac	Plantae	...	5	95	2455292	-9999	2010	4	5	95	2455292	-9999
4	18051	41.292011	-91.693176	734	IA	36	Syringa	vulgaris	common lilac	Plantae	...	12	102	2455299	-9999	2010	4	12	102	2455299	-9999

5 rows × 25 columns

NPN data is available in different forms. Springtime uses the "individual phenometrics" type. From the documentation:

This data type includes estimates of the dates of phenophase onsets and ends for individual plants and for animal species at a site during a user-defined time period. Each row represents a series of consecutive "yes" phenophase status records, beginning with the date of the first "yes" and ending with the date of the last "yes", submitted for a given phenophase on a given organism. Note that more than one consecutive series for an organism may be present within a single growing season or year.

Please also refer to the documentation on data quality and cleaning and make sure you adequately assess the data quality in your research.

In case we're happy with the data quality, we can extract the geometry and the first (or last) yes day from the data. This is done automatically in the springtime load function.

In [11]:

Copied!

df = dataset.load()
df
df = dataset.load()
df

INFO:springtime.datasets.rnpn:Locating data
INFO:springtime.datasets.rnpn:Found /home/peter/.cache/springtime/rnpn/rnpn_npn_data_y_2010_Syringa_leaves.csv
INFO:springtime.datasets.rnpn:Found /home/peter/.cache/springtime/rnpn/rnpn_npn_data_y_2011_Syringa_leaves.csv

Out[11]:

	year	geometry	leaves_doy
0	2010	POINT (-91.37602 38.38862)	60.0
1	2010	POINT (-79.97169 39.53892)	125.0
2	2010	POINT (-85.60993 39.79147)	102.0
3	2010	POINT (-76.62881 40.94780)	95.0
4	2010	POINT (-91.69318 41.29201)	102.0
5	2010	POINT (-91.48378 41.88856)	94.0
6	2010	POINT (-74.29987 42.10105)	99.0
7	2010	POINT (-77.43737 42.89832)	98.0
8	2011	POINT (-83.05326 35.59232)	103.0
9	2011	POINT (-80.53153 35.59463)	73.0
10	2011	POINT (-84.17097 35.65687)	69.0
11	2011	POINT (-86.96690 36.39927)	56.0
12	2011	POINT (-91.78845 36.65863)	80.0
13	2011	POINT (-90.15485 38.71774)	78.0
14	2011	POINT (-83.82374 38.82653)	82.0
15	2011	POINT (-84.52015 38.88288)	80.0
16	2011	POINT (-85.24599 38.99372)	101.0
17	2011	POINT (-81.54738 39.24728)	77.0
18	2011	POINT (-77.48402 39.72047)	101.0
19	2011	POINT (-79.89363 40.66472)	94.0
20	2011	POINT (-73.11993 40.85247)	81.0
21	2011	POINT (-77.44374 40.98488)	104.0
22	2011	POINT (-80.29738 41.04120)	103.0
23	2011	POINT (-74.59502 41.12333)	120.0
24	2011	POINT (-80.81221 41.13425)	121.0
25	2011	POINT (-85.12750 41.21440)	105.0
26	2011	POINT (-72.63346 41.91122)	117.0
27	2011	POINT (-80.33588 42.01947)	119.0
28	2011	POINT (-79.26913 42.09705)	113.0
29	2011	POINT (-80.01417 42.13030)	110.0
30	2011	POINT (-73.65140 42.33499)	108.0
31	2011	POINT (-83.21512 42.49105)	113.0
32	2011	POINT (-88.16408 42.98386)	132.0

Save as recipe¶

Finally, we can export the dataset definition as a yaml-recipe for later reference.

In [12]:

Copied!

print(dataset.to_recipe())
print(dataset.to_recipe())

dataset: RNPN
years:
- 2010
- 2011
species_ids:
  name: Syringa
  items:
  - 36
phenophase_ids:
  name: leaves
  items:
  - 483
use_first: true
aggregation_operator: median

TODO¶

Add tests
Add to docs