1 Background

Citizen science–herein defined as the involvement of volunteers who are typically not professional scientists in the production of scientific knowledge–can generate biodiversity data at spatial, temporal, and taxonomic scales difficult to achieve by any other means. As a result, citizen science is becoming increasingly important in research, management, and policy. Over 50% of data in the Global Biodiversity Information Facility (GBIF)–the international government-funded repository for the world’s species occurrence data in space and time–are citizen science observations, and 6 of the top 10 datasets on the GBIF network are citizen science datasets (GBIF 2020). Citizen science data form the basis of hundreds of peer-reviewed research publications every year, and increasingly of conservation efforts and strategies at local, regional and global scales.

The Dynamic Observatory of Biodiversity (DOB) for the California coast integrates and models information from crowd-sourced citizen science data together with long-term monitoring surveys and oceanographic model outputs to produce synthetic indicators of biodiversity and biodiversity change on the California coast. At its core, the DOB takes advantage of the vast amounts of information on the occurrence of many organisms across the whole coast and throughout the year that can be extracted from crowd-sourced observations gathered through the large-scale citizen science initiatives Snapshot Cal Coast and iNaturalist.org.

Observation biases in crowd-sourced citizen science data are addressed using a series of steps including data filtering, joint statistical inference across space, time, and taxonomy and integration with more systematic yet scarcer long-term monitoring data. Regularly updated as new information becomes available, the DOB provides insights into the pulse of biodiversity on the California coast.

Read the project’s full summary report here

2 Five steps to extract biodiversity indicators from crowd-sourced citizen science data

Although they are noisier than standardized ecological data sources, many of the systematic biases of crowd-sourced citizen science data also occur in data collected systematically by professional scientists (Kosmala et al. 2016): spatially and/or temporally non-random observations, uneven sampling effort over space, time, or taxa, and uneven detectability between rare and common species. Because these forms of bias have been known to occur for many years in data collected by professional scientists, many methods have been developed to control for and model these biases, provided that the relevant metadata are recorded (Bird et al. 2014).

This section provides an overview of the five most useful steps–a recipe of sorts–for dealing with bias in crowd-sourced citizen science data and, thus, help realize the potential of these data for research and management. We note that the steps presented below are not necessarily intended as sequential and are certainly not mutually exclusive. Instead, depending on the desired output, appropriately accounting for bias in crowd-sourced citizen science data may well involve combining several or all of these steps and iterating through them until the noise in the data is minimized.

2.1 Filtering data

Filtering is the process of selecting a smaller part of a data set and using that subset in subsequent analyses. Filtering ecological datasets is frequently used as a way to minimize bias to reveal a signal of biological change (Hickling et al. 2006, Roy et al. 2012). In the context of crowd-sourced citizen science data, filtering data serves two main purposes: reducing measurement error and equalizing observer effort. First, filtering data can be an effective means of reducing spatial, temporal, and/or taxonomic uncertainty in the data. iNaturalist labels its observations based on specific data quality filters, with Research Grade observations achieving the highest data quality standards. Research Grade observations meet all the basic metadata standards for useful species occurrence data-a taxon identification, a geospatial reference, and a timestamp-and are less likely to be subject to error in all those metadata fields than non-Research Grade observations. In particular, Research Grade observations have previously been found to be subject to rates of taxonomic identification error comparable to that of other ecological data sources: Research Grade identifications are correct approximately 85% of the time based on expert identifications (Loarie 2017, Ueda 2019). iNaturalist Research Grade observations are integrated in the Global Biodiversity Information Facility and make up a significant proportion of hundreds of papers every year. As a result, we calculated biodiversity indicators on the California coast based exclusively on iNaturalist Research Grade observations and did not consider any other type of observation on iNaturalist. Moreover, we only used Research Grade observations identified at the rank of species. These basic filters apply to all analyses described herein and all outputs presented. Second, filtering data can be an effective means to equalize observation effort over space, time, and/or taxa. Basic filters can ensure that only areas, time periods, and/or taxa that contain a minimum amount of information are included in analyses. For instance, we derived place-based biodiversity indicators-species richness, species rarity, biodiversity uniqueness, and biodiversity irreplaceability-exclusively for places in which at least 10 species had been observed (see section 3.3 for more details). Similarly, we provide species-based indicators exclusively for species with at least 100 observations on iNaturalist. A more in depth form of filtering-rarefaction-can help to even out observation effort between two areas, time periods, or taxa. Rarefaction is the process of rarefying or thinning a reference sample by repeatedly drawing random subsets of observations in order to standardize comparisons of biodiversity on the basis of a shared number of observations between samples (Gotelli and Chao 2013). Subsampling and rarefaction have been previously used to estimate changes between two broad time periods based on large-scale volunteer-based datasets such as the published Atlases of British birds, butterflies and plants (Warren et al. 2001, Thomas et al. 2004). During the course of this project, we occasionally used rarefaction to tease out ecological signals from crowd-sourced citizen science data on the California coast. For instance, yearly ecological community change values for different places are based on rarefied samples between years (see section

2.2 Aggregating data across space and time

Aggregating data is the process of gathering and combining data and expressing those in a summary form in subsequent statistical analyses. The primary purpose of data aggregation is to derive joint inferences based on grouped data points, whilst reducing the influence of any single data point on the overall inference. Owing to the law of large numbers, the larger the aggregated dataset, the more information-rich and the less biased the overall inference (Bird et al. 2014). As a result, collecting and aggregating a sufficiently large amount of data can reduce bias in crowd-sourced citizen science data. For example, in the past, species occurrence data from opportunistic sources including citizen science were collated into broad time periods and over larger spatial extents, such as in published Atlases (e.g. British and North American Breeding Birds). This compensates to some degree for variation in observer effort and activity, enabling the assessment of changes in species’ distributions between large time periods and areas (Shaffer et al. 1998, Thomas et al. 2004, Tingley and Beissinger 2009, Botts et al. 2012). Fortunately, crowd-sourced citizen science data from Snapshot Cal Coast and iNaturalist accumulate at the rate of tens of thousands of observations every year throughout the California Coast. As a result, aggregating over certain spatial and temporal scales can be an effective means of reducing the noise in the data. Our analyses focused on providing place-based indicators that summarize biodiversity over the last decade at the level of a watershed, county, or Marine Protected Area. We found that a trade-off exists between the spatial, temporal, and taxonomic scale of inference, such that increasing the resolution in any one of these dimensions entails decreasing the resolution in another, else data aggregates do not hold enough information. For this reason, we reduced the spatial scale of our estimates of species-based yearly trends to a given coastal region or statewide, because of the lack of information to further resolve temporal trends in geographical space. When analyzing crowd-sourced citizen science data, a particularly useful instance of spatiotemporal aggregation is the definition of a discrete observation event. This step is key to determine the basic units of analysis-the units that are compared with each other across the whole crowd-sourced dataset-and identify important attributes of each unit such as the observation effort or observer behavior. Analyses of opportunistic species occurrence data commonly involve defining observation events, even if this is not explicitly highlighted as such. For instance, the analysis of historic surveys within the context of the Grinnell Resurvey Project involved making decisions about which museum specimens from the 1930s should be aggregated as part of the same trapping event (Moritz et al. 2008, Tingley et al. 2009). Selecting the degree of spatiotemporal aggregation that defines an observation event is a complicated attempt to reverse-engineer survey structure when none exists, and should likely be based on a combination of data availability and species biology. A particularly in-depth example of spatiotemporal aggregation using crowd-sourced citizen science data is eBird’s Adaptive Spatio-Temporal Exploratory Model (AdaSTEM; Fink et al. 2013). AdaSTEM involves mining the full set of observations to identify discrete spatiotemporal blocks that satisfy a bias-variance tradeoff between a sufficient sample size to fit a good model-limiting the variance of estimates- and a small enough sample size to assume stationarity-controlling for bias (Fink et al. 2013, Fink et al. 2020). Across all analyses and outputs presented herein, we defined an observation event as any set of iNaturalist observations made by a single user over a single day within the same 10km2 area (see section

2.3 Borrowing strength across taxa

Data aggregation across multiple species and/or higher taxa constitutes a key step to deal with bias in crowd-sourced citizen science data. Even in cases where a single species is the primary focus of analyses, taking advantage of existing information on additional species associated with the focal species spatially, temporally, and/or taxonomically (or phylogenetically) can significantly improve inferences on the focal species itself. The overarching assumption here is that the observation of a second species can indicate the likelihood of presence or absence of a focal species by informing on either the observation process or the ecological process that produced the set of observations of the focal species. For instance, evidence that many commonly observed species were not observed during an observation event could indicate that the absence of a focal species is the result of a low observation effort at that location and time. Therefore, by assuming that the primary observation biases are shared across all species, we can borrow strength across species to efficiently estimate bias and improve inferences across all species, no matter how common or rare. Furthermore, evidence of the presence of several prey species may be used to infer the presence of a focal species that predates on them, even if the focal species was not itself observed. Estimating the likelihood of false absence for a focal species from observations of additional species assumes that multiple species are observed together using a pre-defined “search” list (or checklist) of potentially observable species. Unfortunately, this information is seldom recorded as metadata in crowd-sourced citizen science databases (except e.g. eBird 2020). Deciding which additional species are relevant to inferences on a focal species is essentially an attempt to reverse-engineer the observation process and simulate a survey structure. The concept of borrowing strength across taxa has been central to many of the approaches developed to derive inferences from volunteer-based species occurrence data. Telfer et al. (2002) used the change in the number of observations