Using BioShiftR

library(BioShiftR)
library(dplyr)

BioShiftR is a helper package to facilitate easy use and manipulation of the BioShifts database, which includes over 31,000 species’ range shift documentations from published scientific literature, as well as methodological, taxonomic, and climate variables within study regions and, when available, within study regions clipped to species’ ranges.

Use this package’s helper functions to easily merge, subset, and manipulate the dataset for data organization and hypothesis testing. This vignette covers the workflow for censusing shifts and adding relevant parameters.

Get BioShifts Shifts

All BioShiftR workflows should begin with the get_shifts() function, which uploads all, or a subset of, the 31,759 range shift observations within the BioShifts database. This function returns a minimal dataset showing only the range shift rates across latitude or elevation (calc_rate), as calculated in BioShifts in either m/year for elevational shifts, or km/year for latitudinal shifts (calc_unit), and necessary identifiers which connect to all other datasets.

get_shifts() %>% glimpse()
#> Rows: 31,759
#> Columns: 13
#> $ id                  <chr> "A001_P1_ELE_O_M1", "A001_P1_ELE_O_M1", "A001_P1_E…
#> $ article_id          <chr> "A001", "A001", "A001", "A001", "A001", "A001", "A…
#> $ poly_id             <chr> "P1", "P1", "P1", "P1", "P1", "P1", "P1", "P1", "P…
#> $ method_id           <chr> "M01", "M01", "M01", "M01", "M01", "M01", "M01", "…
#> $ eco                 <chr> "Ter", "Ter", "Ter", "Ter", "Ter", "Ter", "Ter", "…
#> $ type                <chr> "ELE", "ELE", "ELE", "ELE", "ELE", "ELE", "ELE", "…
#> $ param               <chr> "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", …
#> $ sp_name_publication <chr> "Aegithalos_caudatus", "Certhia_familiaris", "Dend…
#> $ sp_name_checked     <chr> "Aegithalos_caudatus", "Certhia_familiaris", "Dend…
#> $ subsp_or_pop        <chr> "Giffre Valley", "Giffre Valley", "Giffre Valley",…
#> $ calc_rate           <dbl> -2.2128, -0.5106, -7.8723, -3.2340, 4.8511, -1.319…
#> $ calc_unit           <chr> "m/year", "m/year", "m/year", "m/year", "m/year", …
#> $ direction           <chr> "Lower Elevation", "Lower Elevation", "Lower Eleva…

get_shifts() has some built-in defaults to subset shifts by type, broad taxonomic groups, or continents. See the function help page for more options.

For example, to select only latitudinal range shifts of birds in North America, we could use the following arguments.

get_shifts(group = "Birds", 
           type = "LAT", 
           continent = "North America") %>% 
  glimpse()
#> Rows: 2,382
#> Columns: 13
#> $ id                  <chr> "A009_P1_LAT_LE_M1", "A009_P1_LAT_LE_M1", "A009_P1…
#> $ article_id          <chr> "A009", "A009", "A009", "A009", "A009", "A009", "A…
#> $ poly_id             <chr> "P1", "P1", "P1", "P1", "P1", "P1", "P1", "P1", "P…
#> $ method_id           <chr> "M01", "M01", "M01", "M01", "M01", "M01", "M01", "…
#> $ eco                 <chr> "Ter", "Ter", "Ter", "Ter", "Ter", "Ter", "Ter", "…
#> $ type                <chr> "LAT", "LAT", "LAT", "LAT", "LAT", "LAT", "LAT", "…
#> $ param               <chr> "LE", "LE", "LE", "LE", "LE", "LE", "LE", "LE", "L…
#> $ sp_name_publication <chr> "Accipiter_cooperii", "Accipiter_striatus", "Actit…
#> $ sp_name_checked     <chr> "Accipiter_cooperii", "Accipiter_striatus", "Actit…
#> $ subsp_or_pop        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ calc_rate           <dbl> 13.78, 0.18, 1.92, 0.51, 0.35, 6.48, 14.71, 0.61, …
#> $ calc_unit           <chr> "km/year", "km/year", "km/year", "km/year", "km/ye…
#> $ direction           <chr> "Expansion", "Expansion", "Expansion", "Expansion"…

Or we could do the same and quickly assess the number of latitudinal bird shifts in North America that have positive rates (moving towards the poles):

# count how many have positive rates (moving towards the poles)
get_shifts(group = "Birds", 
           type = "LAT", 
           continent = "North America") %>% 
  count(calc_rate > 0)
#> # A tibble: 2 × 2
#>   `calc_rate > 0`     n
#>   <lgl>           <int>
#> 1 FALSE             948
#> 2 TRUE             1434

Understanding IDs

Because the minimal shifts dataframe connects to several other dataframes (see below), each containing data at different levels (e.g., some additional information is at the “source article” level, while some is at the species or shift level), the minimal database contains five separate ID columns (article_id, poly_id, type, param, and method_id), which collectively make up the group_id column. These identifiers are nested such that each consecutive ID can– but doesn’t always– contain multiple of the next value. In other words, some articles contain multiple polygons, some polygons contain range shifts of multiple types, some range shift of the same type are studied at multiple range parameters, and so on. Different combinations of these identifiers merge the minimal shifts dataset to all other BioShifts data, and they are designed such that within a single group ID, a species/subspecies will never be represented more than once.

ID Column	Format	Description
`id`	`AXXX_PX_type_param_MXX`	summary ID value, arranged as `article_poly_type_param_method`.
`article_id`	`A001`, `A002`, … `A244`	Identifier for the source publication in which the range shifts were originally documented.
`poly_id`	`P1`, `P2`, … `P9`	Identifier for the polygon within the source publication in which range shifts were documented. Most publications have one polygon (P1), but can have multiple when a publication detects range shifts in multiple locations (e.g., two separate mountains, three ocean basins, etc.)
`type`	`LAT`, `ELE`	Identifier for the type of range shift – latitudinal or elevational – detected within a polygon. Usually, studies detect only one type of range shift, but in some cases census both.
`param`	`LE`, `O`, `TE`	Identifier for the parameter of the species’ range where the shift was detected: Leading Edge (LE), Trailing edge (TE), or range center (O). Here defined as the poleward or upslope edge, the equatorward or downslope edge, and various definitions of the range center (midpoint, center of gravity, etc.), respectively.
`method_id`	`M01`, `M02`, … `M24`	Identifier for the method, within previous groups, of the range shifts detected. Usually, studies use only one method for all range shifts (M1), but in some cases, studies census range shifts across multiple timeframes (for example, 1950-1975, 1975-2000, 1950-2000), or by two different statistical methods (mean occurrence and median occurrence), resulting in detections with identical values of all preceding columns. Here, these cases will be separated into `_M1`, `_M2`, and so on.

Add BioShifts data

BioShiftR includes various helper functions – to be used individually or together – that supplement the dataset from the get_shifts() function with other relevant data. View the table below to see all funcitons and the respective information that they add to the shifts dataframe. Each function adds directly to the shifts dataframe by combinations of ID keys. Some require additional arguments to specify data requests. See function help pages for details.

Function	Description	New Column Names
`add_articles(data)`	Adds information identifying the source article for each specific shift	`article`, `doi`, `id_bioshifts_v1`, `id_core`
`add_author_reported(data)`	BioShifts recalculates range shifts as rates across latitude or elevation, and displays calculated rates in the `get_shifts()` function. This function adds columns identifying the original shift as reported by study authors.	`author_reported`, `author_reported_unit`, `author_reported_sig`, `author_reported_magnitude`, `author_reported_angle`, `author_source`
`add_methods(data)`	Shifts are measured using various methods which may affect the variability of detected shift rates. This function adds methodological parameters by which each individual shift was detected	`start_firstperiod`, `midpoint_firstperiod`, `end_firstperiod`, `start_secondperiod`, `midpoint_secondperiod`, `end_secondperiod`, `n_periods`, `duration`, `grain_size`, `sampling`, `category`, `obs_type`, `uncertainty_distribution`, `position_definition`, `position_definition_category`
`add_baselines(data, type, stat, exp, res)`	Adds the average(???) climate variable values (mean temperature or precipitation) within the study area (or the species-specific study area, see ___), over the study duration.	Variable combinations of statistic and exposure variable formatted as `baseline_stat_exposure` (e.g., `baseline_mean_temp()`). Also `baseline_res` for the spatial resolution over which variables were calculated. See “Adding Climate Variables” for details.
`add_trends(data, type, stat, exp, res)`	Adds the average change per year of climate variables (temperature or precipitation) in the study area (or species-specific study area) over the study duration. See “Adding climate Variables” vignette for more	Variable. Combinations of `trend_stat_exposure` (e.g., `trend_mean_temp`) for all requested trends, `trend_temp_var`, and `trend_res`.
`add_cv()`	Adds the velocity of climate variables (temperature and precipitation) over space in the latitudinal or elevational directions within study area or species-specific study area over the study duration. See “Adding Climate Variables” vignette for more.	Variable combinations of `VelAlong_stat_exposure` (e.g., `VelAlong_mean_temp`) for all requested climate velocities, `cv_temp_var`, and `cv_res`. See “Adding Climate Variables” vignette for more.
`add_poly_info(data, type)`	Supplements shifts dataframe with summary information of the study area or species-specific study area polygon in which each shift was detected. See “Working with Polygons” vignette for more.	`lat_min`, `lat_max`, `lat_cent_deg`, `lon_cent_deg`, `lat_extent_km`, `ele_mean_m`, `ele_min_m`, `ele_max_m`, `ele_extent_m`, `area_km2`, `study_area`
`add_polygons(data, type)`	Supplements shifts dataframe with spatial polygons of study areas or species-specific study areas over which shifts were calculated. Note that this requires polygons to be downloaded locally with the `download_polygons()` function first.	`geom`

Each add_ function adds on to the minimal shifts dataframe from get_shifts(), using combinations of identifiers (id, article_id, poly_id, method_id, eco, param), and, when relevant, species names (sp_name_publication, subsp_or_pop). Different sub-dataframes connect to the shifts dataframe using different id keys (e.g., add_articles() connects by article_id but add_poly_info() connects by article_id, poly_id, and if requested, sp_name_publication and subsp_or_pop). These functions automate the merging of subdataframes to correctly match the shifts dataframe with any requested additional information.