4 nature portfolio | reporting summary March 2021 covariance (similarity) in abundance trends for any pair of 1 degree cells across North America. We then derived the average rate of change in abundance across all hierarchical and correlative random effects, and used population-level trend uncertainty to derive the selection of confidence interval thresholds described above.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information. Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A description of any restrictions on data availability
- For clinical datasets or third party data, please ensure that the statement adheres to our policy
All of the data used in the study are publicly available and accessible from the following links: RivFishTIME (https://doi.org/10.1111/geb.13210), North American Breeding Birds (https://doi.org/10.5066/P97WAZE5), BioTIME (https://doi.org/10.1111/geb.12729), Living Planet (https://www.livingplanetindex.org/data_portal), CaPTrends (https://doi.org/10.1111/geb.13587), ReSurvey Germany (https://doi.org/10.25829/idiv.3514-0qsq70), UK Fish Counts (https:// environment.data.gov.uk/dataset/ce2618db-d507-4671-bafe-840b930d2297), FishGlob (https://doi.org/10.31219/osf.io/2bcjw), TimeFISH (https:// doi.org/10.1002/ecy.3966), Pilotto (https://zenodo.org/records/10638241). See below for a summary of the data sources:
- RivFishTIME
RivFishTime is a global database of freshwater fish time-series to study global change ecology in riverine systems, fully described in [Comte et al. 2020](https:// doi.org/10.1111/geb.13210), with the data hosted by iDiv here. Please see the full description and meta data of this database for more information.
The full zipped database can be downloaded directly and unzipped (before removing the zipped version). The two files required are the main survey dataset '1873_2_RivFishTIME_SurveyTable.csv' (this contains the record-level data, i.e. abundance of a given species at a given site in a given year), and the dataset describing each individual time series '1873_2_RivFishTIME_TimeseriesTable.csv' (this contains information on the specific location of each time series).
- North American Breeding Bird Survey
The 2022 release of the North American Breeding Bird Survey dataset (1966-2021) is available from [Ziolkowski Jr. et al. (2022)](https://doi.org/10.5066/
P97WAZE5). From the dataset description there: This dataset contains avian point count data for more than 700 North American bird taxa (species, races, and
unidentified species groupings), collected annually during the breeding season along thousands of randomly established roadside survey routes in the United States
and Canada. Routes are roughly 24.5 miles (39.2 km) long with counting locations placed at approximately half-mile (800-m) intervals, for a total of 50 stops. At
each stop, a citizen scientist highly skilled in avian identification conducts a 3-minute point count, recording all birds seen within a quarter-mile (400-m) radius and
all birds heard. Surveys begin 30 minutes before local sunrise and take approximately 5 hours to complete. Routes are sampled once per year, with the total
number of routes sampled per year growing over time; just over 500 routes were sampled in 1966, while in recent decades approximately 3000 routes have been
sampled annually. No data are provided for 2020. BBS field activities were cancelled in 2020 because of the coronavirus disease (COVID-19) global pandemic and
observers were directed to not sample routes. Route location information includes country, state, and BCR, as well as geographic coordinates of route start point,
and an indicator of run data quality. We require the 'States' and 'Routes' zipped datasets, and the species list (provided as a text file). The states data are provided
as a zipped file for each state, but each of these is just a single csv file so can be read directly with read_csv To read them all into one big dataframe (specifying
which columns to read, and stating their required data types). Processing involves adding location details and species names to this large dataset, creating a unique
'site' ID (from country, state, and route information), and mutating, renaming, or adding the other required variables:
- BioTIME
BioTIME (Dornelas et al. 2018) is a comprehensive collection of assemblage time-series in which the abundances of the species that comprise ecological communities have been monitored over a number of years. BioTIME data span the globe and encompass land and seas; they also include freshwater systems. The current version of BioTIME contains over 12 million records, features almost 50 thousand species, covers over 600 thousand distinct geographic locations and is representative of over 20 biomes, occurring over 6 different climatic zones. This dataset requires registration prior to download. We downloaded the June 2021 version (the latest available) of the raw data in CSV format, together with the meta data and citations files, from the link provided following registration at https://biotime.st-andrews.ac.uk/download.php. Processing the BioTIME data requires selecting relevant columns, and joining to the metadata in order to filter out studies that recorded only presence/absence, or only biomass (no abundance data). A 'site' variable is created by pasting the study ID, latitude, and longitude. No country information is provided.
- Living Planet Index
The Living Planet Index (LPI; LPI 2022) is a measure of the state of the world's biological diversity based on population trends of vertebrate species from terrestrial, freshwater and marine habitats. The LPI is based on trends of thousands of population time series collected from monitored sites around the world. Accessing the dataset requires registration prior to download from https://www.livingplanetindex.org/data_portal. We downloaded the latest available (2022) zipped version of the database into our raw data folder.
The subfolder 'LivingPlanetIndex_2022_PublicData' includes the LPI data agreement (data_agreement_2022.pdf) and metadata (LPD_metadata.pdf) as well as the public data as a csv file (LPD2022_public.csv). Read in this csv - NB 'NULL' is used to indicate missing abundance values. Country names are supplied, to convert
5
nature portfolio | reporting summary
March 2021
these to 3 character ISO codes we use the countrycode package (Arel-Bundock et al. 2018). There are two countries in the
LPI data that do not match due to character encoding issues, plus 'International Waters', so we set up custom matches for these.
- CaPTrends
CaPTrends (Johnson et al. 2022) is a database of 1,122 population trends from around the world, describing changes in
abundance over time in large mammal species (n = 50) from four families (Canidae, Felidae, Hyaenidae and Ursidae) in the order Carnivora. Trends represent 621
unique locations across the globe (latitude: −51.0 to 80.0; longitude: −166.0 to 166.0), from 1726 to 2017. The dataset itself is hosted on Zenodo here: https://
zenodo.org/record/6949487. First, download the zipped dataset and unzip. Examining the meta data and descriptions, the two files we require are
'captrends.csv' (details of each individual study) and 'abundance.csv' (the actual abundance values). Some studies cover multiple countries - we assign the code to
the first country mentioned. A 'site' variable is created by pasting data_table_id, citation_key, and locality_name. Latitude and longitude values are not available
so these are set to NA. The abundance unit is set to the value in population_metric if available, otherwise to the field_method value.
- ReSurvey Germany: vegetation-plot resurvey data from Germany
ReSurvey Germany: vegetation-plot resurvey data from Germany (Jandt et al. 2022) is a compilation of harmonised vegetation-plot resurvey data from Germany covering almost 100 years. The data allow calculating temporal biodiversity change at the community scale. They also enable tracking changes in the incidence and distribution of individual species across Germany. Cover records are available for 1,794 vascular plant species in 7,738 (semi-)permanent vegetation plots from Germany, resurveyed from 2 to 54 times, in total resulting in 23,641 vegetation records and 458,311 species cover records, comprising the years from 1927 to 2020 and 97 EUNIS habitat types. The data is available to download from: https://doi.org/10.25829/idiv.3514-0qsq70. The main files required are the 'header' data (project / site level data) in 'Header_ReSurveyGermany.csv', and the main dataset of species-level abundances in 'ReSurveyGermany.csv'.
- Environment Agency NFPD Fish Counts
The Environment Agency undertakes fisheries monitoring work on rivers, lakes and transitional and coastal waters (TraC) throughout England. The freshwater fish
survey dataset (or National Fish Populations Database, NFPD, [Environment Agency 2020](https://environment.data.gov.uk/dataset/ce2618db-d507-4671-
bafe-840b930d2297)) contains site and survey information, as well as the numbers and species of fish caught, for all the freshwater fish surveys carried out across
England (with a small number from Wales and Scotland) from 1975 onwards. We accessed the Freshwater Fish Count dataset from https://
environment.data.gov.uk/ecology/explorer/downloads/. Meta-data is available in a pdf document [here](https://environment.data.gov.uk/portalstg/sharing/rest/
content/items/1150f6994d294d78b422b97848c3a286/data). The geographic coordinates in this database are provided as eastings and northings in the UK national
projection (EPSG 27700). In addition, there are some location errors placing surveys offshore or with incomplete
coordinates (e.g. easting and northing both = 1). To address these, we use the sf library (Pebesma 2018) to convert the
database into spatial format, reproject to WGS84 latitude and longitude coordinates, and we combine it with a UK coastline map to exclude surveys that occur at
sea. First, load sf and make the NFPD dataset spatial. We obtained the digital vector boundaries for Countries in the United Kingdom as at December 2022 at full
resolution, clipped to the coastline (Mean High Water mark), from the Office for National Statistics, available https://geoportal.statistics.gov.uk/datasets/
ons::countries-december-2022-uk-bfc/. We downloaded the shapefile, 'Countries_December_2022_UK_BFC_3731595901038458592.zip', directly downloaded and
unzipped.
Now we can process this dataset. For the 'site' variable, there are two levels of site identification - the local survey site (identified by site_id, typically on the order
of 100m river length) and a larger scale 'parent' site (site_parent_id), which may constitute several surveys within the same local area (typically all within a few km
of each other). We retain both of these in a composite 'site' variable, meaning that the data are at the level of the site but that aggregation to parent site remains
possible if required. Abundance is measured as the total number of fish sampled. This is done over one or more survey runs at each site in each time period. For
around half the surveys, estimates of total population density are available by using the Carle & Strub equation ([Carle & Strub 1978](https://
doi.org/10.2307/2530381)) over a three run catch depletion survey. However, this method can only be used on multiple run surveys, which would result in
discarding over half available surveys, and so we take as our abundance measure the counts from the first run of the multiple run surveys, or the only run from
single run surveys (as is done for Water Framework Directive classification; Philip Rudd, Fisheries Technical Specialist, Environment Agency, pers. comm. April 2023).
Counts are then divided by the survey area (ength of river fished multiplied by the average width) to give abundance as individuals per 100 m^2^. For some surveys,
exact counts are not given - these are excluded. Zero catches are recorded in a separate variable; these are converted to 0s in our abundance variable.
- FishGlob Global Bottom Trawl Survey Database
FishGlob_data is a global database of bottom trawl survey data for marine fish, described in Maureaud et al. (2023). The
database contains a cleaned collation of 26 publicly available bottom-trawl surveys conducted in national waters of 18 countries that are standardised and pre-
processed, covering a total of 2,162 sampled fish taxa and 232,800 hauls collected from 1963 to 2020. The database is available from Zenodo at https://zenodo.org/
record/7527447#.ZDhrFuzMIqt. The full clean standardised dataset is found in the 'ouputs' subfolder as a .RData binary file, 'FishGlob_std_public_clean.RData'. This
contains site level estimates of abundance per km2 from multiple bottom trawl surveys across multiple species. This data includes two objects, the main data as
data, and meta-data in readme. Country is included in the dataset, but as country name. To convert to country code, create a dataframe of distinct countries,
and use the countrycode (Arel-Bundock et al. 2018) package to obtain relevant ISO3 codes (all surveys listed as 'multi-
countries' are in Europe, so we use EUROPE as a code for these).
Now process the data. As these abundances are derived through trawling, there is no discrete spatial repetition in sampling i.e. the exact same site is not sampled every year. Instead, the trawlers collecting data can deviate slightly from the exact site, but do often stay within the same general region/area. To handle these discrepancies in sampling location, we upscale the sampling locations (latitude and longitude) to a 1-degree resolution and take the average abundance estimates from the same sampling scheme,for each species in each year within this 1-degree gird cell. Simply put, we upscale the spatial resolution to allow temporal comparisons in trawling data. Note: A 1-degree resolution was decided on as it allowed us to develop these temporal comparisons whilst maintaining spatial structure
- TimeFISH
The TimeFISH database (Quimbayo et al. 2022) provides the first public time-series dataset on reef fish assemblages in the southwestern Atlantic (SWA), comprising 15 years of data (2007–2022) based on standardized Underwater Visual Censuses (UVCs) in nine locations along the southern Brazilian coast (25–29°S). All fish individuals in the water column (up to 2 m above the substratum) and at the bottom were targeted. In total, 202,965 individuals belonging to 163 reef fish species and 53 families were recorded across 1857 UVCs. Data are available to use with no restrictions, and can be downloaded from Zenodo: https://zenodo.org/record/7317084#.ZFJRK-zMIqs