s41586-024-07236-z.pdf - Page 6

6 nature portfolio | reporting summary March 2021 Process the data. First, aggregate census data to species per transect (as we do not need individual size class abundances). Then fomat species names, and join to location data. Note - two transect IDs appear twice in the location data. As we are not interested in dates beyond years, we can disregard this in joining below by setting multiple = "first". Then add dataset_id, create new site variable, add country code and unit, rename variables as needed and select the final set:

Pilotto

In 2020, Pilotto published a study on biodiversity trends in Europe, largely relaying on Europe's network of 'long-term ecological research sites'. Unlike the other datasets compiled in this Rmarkdown, Pilotto does not provide the full dataset, instead only sharing links to the raw data. To make use of the Pilotto data, we compiled this raw data into a series of .csv files (https://zenodo.org/records/10638241). Specifically, this involved working through the links available in Pilotto's data, downloading the raw data from these links into the directory 'data/raw_data/Pilotto/unique_id'. With each downloaded dataset, we then extracted information into three .csv files: compile_ts.csv - this file contains the abundance time-series, reported taxa, year, and name of the site; compile_sp.csv - this file contains the coordinates linked to sites contained in compile_ts.csv; and compile_tx.csv - this file contains species names, and is neccasary as reported taxa in 'compile_ts.csv' sometimes are described with codes instead of their actual name e.g. 'species_xyz123'. 'compile_tx.csv' can be used to convert these codes into actual species names. In some cases, we decided the information used by Pilotto was not suitable for our study (e.g. sometimes data contained presence/absence instead of abundance values). We have carefully annotated the data we extracted from Pilotto within the file 'Pilotto/master.csv'. Human research participants Policy information about studies involving human research participants and Sex and Gender in Research. Reporting on sex and gender NA Population characteristics NA Recruitment NA Ethics oversight NA Note that full information on the approval of the study protocol must also be provided in the manuscript. Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf Ecological, evolutionary & environmental sciences study design All studies must disclose on these points even when the disclosure is negative. Study description We assess how temporal trends in abundance change after accounting for spatial, temporal and phylogenetic dependencies. We test this across 10 high-profile datasets, reporting the change in abundance ~ year coefficient, and credible intervals. We also explore predictive accuracy after incorporating spatial, temporal and phylogenetic dependencies Research sample We use 10 high-profile abundance datasets, representing more than 30,000 populations, ~2,900 species and ~5,850 unique locations.

For each dataset, we extracted the population abundance estimates, the accompanying time-stamps, the species scientific names, the name of the site (location) where the population was sampled, and any site coordinates. For datasets to be included they had to be open access, and contain multiple abundance time series for a selection of species and locations. Whilst these datasets are vital within biodiversity science, many of the datasets are prone to biases e.g. lacking tropical representation, and contain few plant and invertebrate species. The datasets have been compiled from a variety of methods, realms and systems, covering a vast array of spatial, taxonomic and temporal scales. Further, there is likely some overlap in data between datasets - i.e. population time-series may occur in more than one dataset. We take no action to correct or acknowledge these biases and features, as our analysis is designed to show how model choice can have a substantial influence on inference in a variety of datasets, rather than to derive new trend estimates for each dataset or derive a consensus trend across datasets. Sampling strategy We selected datasets with open-use policies to enable the work to be reproducible. We conducted no power analyses to determine sample sizes as our work included all available data. Data collection See comprehensive instructions on data collection above. Timing and spatial scale The datasets vary in temporal, spatial and taxonomic scale. Most records tend to occur within later decades, the global north, and vertebrate populations. Please see a summary for each dataset below

Population abundance time series from the BioTIME dataset - representing all core taxa and realms. Covering 12,065 abundance time

7 nature portfolio | reporting summary March 2021 series, derived from 243,993 abundance observations. These time series represent 438 unique sites and 1,233 species. Temporal extent: 1933-2018 Latitude extent: -77.6 - 67.8 Longitude extent: -179.8 - 179.2

Global population abundance time series from the Living Planet database. Covering 5,255 abundance time series, derived from 166,827 abundance observations. These time series represent 1,159 unique sites and 1,264 species. Temporal extent: 1950 - 2020 Latitude extent: -77.8 - 78.9 Longitude extent: -180 - 180

Population abundance time series from the North American breeding bird survey. Covering 8,718 abundance time series, derived from 164,317 abundance observations. These time series represent 584 unique sites and 361 species. Temporal extent: 1966 - 2019 Latitude extent: 25.9 - 67.0 Longitude extent: -165.3 - -55.4

Population abundance time series from the FishGlob database, describing abundances from the bottom-trawl survey for marine fishes. Covering 2,286 abundance time series, derived from 67,908 abundance observations. These time series represent 229 unique sites and 152 species. Temporal extent: 1977 - 2020 Latitude extent: 26 - 62 Longitude extent: -178 - 21

Population abundance time series from the RivFishTIME database. Covering 2,386 abundance time series, derived from 40,834 abundance observations. These time series represent 197 unique sites and 191 species. Temporal extent: 1975 - 2019 Latitude extent: -28.3 - 67.9 Longitude extent: -122.4 - 153.4

Population abundance time series from the UK Environment Agency Fish population database, describing fish populations in rivers, lakes and transitional/coastal waters. Covering 361 abundance time series, derived from 3,016 abundance observations. These time series represent 181 unique sites and 16 species. Temporal extent: 1984 - 2019 Latitude extent: 50.4 - 55.4 Longitude extent: -3.9 - 0.5

Population abundance time series from the TimeFISH database, describing abundances of reef assemblages in the South-western Atlantic. Covering 86 abundance time series, derived from 262 abundance observations. These time series represent 12 unique sites and 52 species. Temporal extent: 2008 - 2022 Latitude extent: -27.7 - -27.1 Longitude extent: -48.5 - -48.3

Population abundance time series from the ReSurveyGermany database, describing relative cover in vegetation plots. Covering 356 abundance time series, derived from 4,954 abundance observations. These time series represent 7 unique sites and 93 species. Temporal extent: 1965 - 2018 Latitude extent: 48.3 - 53.6 Longitude extent: 7.4 - 13.9

Population abundance time series from the Pilotto et al., (2020) study ‘’Meta-analysis of multidecadal biodiversity trends in Europe’ dataset - representing diverse taxa across the terrestrial, freshwater and marine realms. Covering 2,386 abundance time series, derived from 40,834 abundance observations. These time series represent 197 unique sites and 191 species. Note: The compiled form of this dataset was not openly available (unlike all others), instead the dataset only provided the references to the primary sources. We extracted relative abundance/density estimates from across the 51 primary sources referenced within the database. We excluded a further 31 datasets contained within this database, as the data lacked clear metadata, or data were not resolved to the species level, or data represented species presence/absences instead of abundances. Temporal extent: 1975 - 2019 Latitude extent: 40.1 - 67.8 Longitude extent: -8.9 - 29.6

Population abundance time series from the CaPTrends database of large carnivore population trends and time series. Covering 279 abundance time series, derived from 2,670 abundance observations. These time series represent 165 unique sites and 26 species. Temporal extent: 1880 - 2019 Latitude extent: -40.0 - 71.6 Longitude extent: -158.0 - 99.2

Data exclusions For each dataset we extract synthetic trees from the open tree of life 45,46 and estimate missing branch lengths using Grafen’s approach 47 from the compute.brlen function in the R package ape 48. The Open Tree of Life identified a phylogeny for 80% of species (N = 23,871); all other species were removed from the analysis. For studies with the overall aim of assessing biodiversity change, removing species could be problematic, as the collective trend would not be representative of all species. However, in our case, where the aim is to assess how collective trend inference changes under a variety of modelling approaches, trimming the data to species with an accompanying phylogeny has no impact on our conclusions.

8 nature portfolio | reporting summary March 2021

After removing species not present in the Open Tree of Life topology, we further trimmed the data to only include higher-quality time series, removing the following: time series that contained zeros (which we considered extreme cases of extinctions or recolonisations) and time series with missing abundance values for a given year throughout the sampling duration (i.e., we required consecutive abundance estimates.) In all datasets except the two smallest - G) Atlantic reef fishes & J) Large carnivores) - we further trimmed the datasets to only keep time series which had greater than or equal to the median number of abundance observations i.e., including the longest 50% of time series in each dataset. In some cases, this cut-off was not sufficient as the median number of observations in the time series equaled two. With only two abundance observations, trends are highly exposed to error purely driven by random fluctuations in abundance 10. To partially address this issue, we imposed a further cut-off on these datasets, ensuring each time-series had at least 5 observations. These datasets are characterised in Table S1. With our trimmed dataset, we derived a mean abundance in each year (in cases where there were more than one observation per year) for each time series.
Reproducibility All code and code are openly avalable and reproducible Randomization Data was not randomized. Our study showcases the need to conduct statistical controls for non-independence, which has been been neglected in previous work Blinding NA Did the study involve field work? Yes No Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. Materials & experimental systems n/a Involved in the study Antibodies Eukaryotic cell lines Palaeontology and archaeology Animals and other organisms Clinical data Dual use research of concern Methods n/a Involved in the study ChIP-seq Flow cytometry MRI-based neuroimaging

Page 6 of 6