s41586-024-07236-z.pdf - Page 4

Article Extended Data Fig. 1 | Impact of phylogenetic and spatial signal on inference. Fold change in the standard deviation of the abundance-time coefficient between the correlated effect and random slopes model, plotted against the mean signal. Mean signal is calculated by finding the mean of the phylogenetic signal (variance captured by the phylogeny divided by the sum of the phylogenetic and non-phylogenetic variance) and spatial signal (as in the phylogeny). n = 10 with each point representing a dataset. Shading represents 95% confidence intervals from a linear model.

Extended Data Fig. 2 | Contribution of spatial, temporal and phylogenetic components to collective trend uncertainty. Relative fold change in the standard deviation of the collective trend as additional components (space, time or phylogeny) are added to the random slope model. Models are compared to the standard deviation of the collective trend in the random slope model.
In each model comparison (y-axis), n = 10 with each point representing a dataset. The larger point and error bar represents the mean change and associated standard deviation around this mean. 1 nature portfolio | reporting summary March 2021 Corresponding author(s): Thomas Frederick Johnson Last updated by author(s): Feb 7, 2024 Reporting Summary Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist. Statistics For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one- or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section. A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable. For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above. Software and code Policy information about availability of computer code Data collection We have a fully reproducible and annotated data collection pipeline in the form of an RMarkdown document, named 'data_compile.Rmd' within our code repository (https://zenodo.org/records/10638241). Data was compiled from 10 sources (seed data availability below) and processed within R V4.0.5 using the following packages: tidyverse 2.0.0, here 1.0.1, janitor 2.2.0, countrycode 1.5.0 and arrow 12.0.1.

Arel-Bundock et al. (2018) countrycode: An R package to convert country names and country codes. Journal of Open Source Software https://doi.org/10.21105/joss.00848. Version 1.5.0

Firke S. (2023) janitor: Simple Tools for Examining and Cleaning Dirty Data. R package version 2.2.0, https://CRAN.R-project.org/package=janitor

Müller K. (2020) here: A Simpler Way to Find Your Files. R package version 1.0.1, https://CRAN.Rproject. org/package=here

R Core Team (2022) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Version 4.0.5

Richardson, N. et al. (2023) arrow: Integration to ‘Apache’ ‘Arrow’. R package version 11.0.0.3, https: //CRAN.R-project.org/package=arrow

Wickham H. et al. (2019) Welcome to the tidyverse. Journal of Open Source Software https://doi.org/10. 21105/joss.01686. Version 2.0.0

2 nature portfolio | reporting summary March 2021

Data analysis All of our code is openly available and reproducible from https://zenodo.org/records/10638241. Here, we provide a summary of our analysis, more information is available in the main text, methods, supplementary material and code.

The objective of our study was to consider how inference made in commonly applied biodiversity models could be impacted by failing to account for spatial, temporal and phylogenetic dependencies. Prior to designing our models, we first explored what models have been used in the literature to infer abundance change. In this work, we focussed on studies trying to characterise the average change in abundances over time, rather than studies attempting to assess how many species are declining or increasing, as this avoids discretizing a numeric value i.e. by assessing the average change, we avoid having to define what change is necessary to be classified as a ‘decline’. To evaluate the diversity of approaches used to model abundance change over time in multi-species and/or multi-location datasets, we conducted a literature search within a selection of high profile ecology journals over the last 13 years. Our search identified 282 research papers, 28 of which described approaches to model abundance change across multi-species/location datasets. A further 16 methods were not detected within the literature search but were known priori to the authorship team, resulting in 44 different studies/methods. Models of abundance change varied in complexity, each containing their own assumptions, with no clear ‘standard’ approach for deriving the rate of change in abundance. However, across the 44 studies/methods we compiled (Table 1), five general approaches were present (Table 1):

Abundance average - The simplest models derive an average or total abundance across all species or sites in a given year, and then regress average abundance against time. This approach fails to recognise any of the hierarchical structure in the data.

Trend average - A slightly more complex model, which estimates abundance change per population by fitting a series of log-linear modes of abundance against year; averaging over the extracted slope coefficients. This approach fails to propagate uncertainty within average rates of change of each population, and ignores the implicit spatial and taxonomic structure within the data, inducing pseudoreplication.

Random intercept - Some studies partially address the aforementioned pseudoreplication (e.g. certain sites or species having multiple estimates) with mixed models, regressing log-linear abundance against year across all populations, whilst specifying that populations belong to a site and/or species. However, often this mixed model structure only extends to random intercepts , which only acknowledge that mean abundance can differ between sites, species and location, but assumes that the abundance trends will all remain the same. This is a particularly common approach amongst the indicators from population monitoring schemes which shape policy.

Random slope - In the scientific literature, it is common to use more complex models, with a similar structure to the Random intercept model, but now capturing the differences in abundance trends across populations, sites and species with random slope terms.

Decomposition - This is the rarest of the approaches and deviates from the linear mixed model approaches. Instead, the decomposition approach involves fitting generalised additive models (GAMs) through each time series to smooth abundance estimates and reduce noise. The smoothed time series is then decomposed into a timeseries of rates of change (or lambdas), which are then averaged across species and biomes to derive estimates of the average change in abundance for each year across all the time-series.

The most common approaches were the random intercept and random slope models, used 19 and 23 times, respectively. The abundance average, trend average and decomposition approaches were rare, used just 5, 2, and 3 times, respectively. Not all studies adopted just one approach, sometimes splitting their model into two steps e.g. using a random intercept model to estimate a given species trend across locations, which could then be aggregated across broader taxonomies with a random slope model. Further, all approaches regularly failed to recognise that abundance patterns are shaped by implicit temporal, spatial and phylogenetic signals (i.e. closely related species are likely to have more similar trends than distant species), with only 14 of the 44 approaches accounting for temporal autocorrelation. Phylogenetic and spatial covariance were comparably rarer - included in just 6 and 3 studies respectively. Four studies attempted to account for two sources of correlative non-independence within their models, by first deriving population trends whilst accounting for temporal autocorrelation of abundances within time series, and then using phylogenetic least squares to aggregate these trends. However, no study captured more than one of these covariances simultaneously (e.g. spatio-temporal models for instance). Further, no study attempted to account for all three sources of correlative non-independence at the same time.

Given the apparent rarity of the abundance average, trend average and decomposition approaches within the literature, we focus on understanding how the dominant approaches (i.e. the random intercept and random slope models) compare to our newly developed correlated effect model. Full model equations are available in ‘Supplementary material - Models’.

Model 1. Random intercept

In model 1, we fit a linear mixed effect model between the natural logarithm of abundance and year, with five random intercepts: population (the unique time series), site (unique locations), region (broader spatial category to nest sites; measured as 10-degree grid cell the site occurs in), species (unique species), and genus (broader taxonomic category to nest species; measured as the parent node to the species tip). Within the model, we do not specify any nesting of the population within the site and species random intercepts as the hierarchical structure of the data is poorly defined e.g., whilst populations always occur within a species and site, some species are nested in sites, and some sites are nested in species, creating a crossed random effect design. Model 1 assumes all populations, sites, regions, species, and genera have the same trend in abundance.

Model 2. Random slope

In model 2, we develop a linear mixed effect model, where we regress the natural logarithm of abundance against year, including population, site, region, species, and genus all as random slopes. This builds on the random intercept model by allowing abundance-year slope coefficients to vary for each category in each random slope term (e.g., each species can have a different slope) - not simply differing intercepts as in model 1. Unlike model 1, we centre the year and abundances of each population time series at zero e.g. subtracting each year by the mean year in each population, and subtracting the log of each abundance by the mean log abundance value in each population. This centering fixes the y and x intercepts at zero for each slope, and is a convenient solution to acknowledge variance captured by the random intercepts without increasing the number of parameters. In all intents and purposes, the random slope model is equivalent to a model with random intercepts and slopes.

3 nature portfolio | reporting summary March 2021

Model 3. Correlated effect

Model 3 is structurally similar to model 2, but accounts for correlative non-independence structures. For temporal non-independence, we model the population level time series with a discrete autoregressive-1 (ar-1) temporal process, which assumes neighbouring abundance observations within a time series will be more similar. To capture the spatial and phylogenetic correlative non-independence, we focus on non-independence across time series trends (instead of abundance observations), assuming trends in population abundances through time will be more similar in neighbouring sites and more closely related species. In model 1 and 2, we try to capture this non-independence with grouping categories (genus and region). However, in the correlated effect model, we more explicitly describe shared correlations between every species and site by specifying the covariance structure of our site and species random slopes. The site covariance matrix was derived by taking each site's coordinates and estimating the pairwise Haversine (spherical) distance between the sites e.g. how far is every site from every other site. This was then converted into a matrix, normalised between 0 and 1, with values close to 1 indicating neighbouring sites, whilst values approaching 0 indicate distant sites. The species covariance matrix was derived by converting the phylogeny into a variance- covariance matrix, where phylogenetic branch lengths describe the evolutionary distance between species.

All models were run in INLA using R V4.0.5. We describe our model priors in ‘Supplementary material - Priors’ and validate our model assumptions in ‘Supplementary material - Assumptions’. We also conduct additional sensitivity analyses exploring how phylogeny quality and how the addition of each correlative component (space, time or phylogeny) impacts inference - see ‘Supplementary material - Phylogeny’ and ‘Supplementary material - Component contribution’.

Outputs

Measuring non-independence In our Correlated effect model we measure the presence of total non-independence as the proportion of variance captured by the combination of independent and correlative terms (i.e. random effects) for each component (e.g., temporal components), divided by the sum of the variance for all terms. Next, we assess if correlative terms are the larger contributor to this total non-independence, by dividing the proportion of variance captured by the correlated slopes, by the combination of the variance captured by the correlated and independent slopes. This was done separately for the spatial and phylogenetic terms. As the spatial and phylogenetic components each contain three terms (an independence species/location slope, an independent genera/region slope, and a correlated species/location slope), a proportional variance captured of 0.33 would indicate that the correlative slope captures an equal proportion of variance compared to the two independent slopes. A value greater than 0.33 indicates that correlative slopes account for more variation than independent random slopes. We measure temporal non-independence as the degree of correlation between neighbouring abundances (rho).

Differing inference between the models Using the mean and 50% credible intervals of the global trend (overall abundance-time coefficients), we display abundance projections for each model in each dataset. These projections are based on an arbitrary baseline abundance of 100, set at the first year of available data in each dataset, and this abundance would change according to the overall coefficients and credible intervals. For instance, with a 1% annual rate of change, an abundance in year zero of 100, would become 101 in year 1, and 164 in year 50. The purpose of these projections is to showcase varying abundance trajectories under different model complexities.

Next, we note the number of the datasets where inference reverses (e.g., the global trend reverses direction from positive to negative, or remains consistent), and where uncertainty increases (the variance around the global trend is greater or smaller), comparing the random intercept and random slope models to the correlated effect model. To support these comparisons, we also report the fold change in the collective trend standard deviation of the correlated effect model, relative to the random intercept and random slope models.

Predictive performance We assess the predictive performance of the different models by determining their ability to predict final observations in time series’, and their ability to predict population trends of a given species in a given location. To test the predictive accuracy for the final observation in the time series, we removed the final observation from half of the time-series in each dataset and predicted the missing values using each of the three models on the log scale. We report the percentage error (PE), a metric describing the median of the absolute percentage difference between predicted and observed values e.g. with a 5% error, an abundance on the log-scale of 1 would become 1.05. To test the accuracy of the population trend prediction, we conducted leave-one-out cross validation, removing one population time series (or trend) from each dataset, and estimating the missing trend using the random slope and correlated effect models. We repeated this process 50 times per dataset and compare the predicted trends to trends from a simplified correlated effect model, which contains a population level slope and accounts for temporal autocorrelation, but does not include the spatial and phylogenetic correlation terms or any of the hierarchical terms, which have no influence on the required population level inference. We measured trend predictive performance using the same approaches as above (PE). We estimate this error statistic across the full sample of 50 populations per dataset, and across the sub-sample of the rarest 20% of populations i.e. populations in the least well-studied species and location. In the random slope model, the population trend coefficients were derived by adding the species, location, genus, region, and overall coefficients together, meaning missing population values can still be informed by other hierarchical information. For the correlated effect model, the population trend is informed by the phylogenetic and spatial variance-covariance matrices, as well as all hierarchical information in the random slope model.

Phylogenetic and spatial distribution of abundance change To plot abundance change across a phylogeny, we derived species level rates of change in abundance from the taxonomic (species and genera) and phylogenetic random effects. We incorporate uncertainty in species-level trend prediction by estimating the confidence interval threshold by which a species would be considered to have increased or decreased. For instance, a negative trend at an 80% confidence interval threshold would be considered stronger evidence of decline than a negative trend at at 20% interval threshold. We derive four asymptotic confidence interval thresholds (20%, 40%, 60%, 80%) using the uncertainty (standard deviation) from the phylogenetic random effect and a series of z-scores (0.25, 0.52, 0.84, 1.28).

To plot abundance change across space, we focus solely on one abundant and iconic species, the American Robin Turdus migratorius, as site- level trend variability is high at the community level i.e. community trends across space are rarely significant. To produce abundance change predictions for the American Robin across space, we expanded the BioTIME spatial Haversine distance matrix (describing distances between each time series) by supplementing it with a gridded extent covering North America. This new grid had a latitudinal range of 20 to 60 and 1 degree spacing (e.g. 15, 16, etc.), and longitudinal range of -130 to -60 with 1 degree spacing. This new matrix allows us to estimate expected

Page 4 of 6