首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Site occupancy‐detection models (SODMs) are statistical models widely used for biodiversity surveys where imperfect detection of species occurs. For instance, SODMs are increasingly used to analyse environmental DNA (eDNA) data, taking into account the occurrence of both false‐positive and false‐negative errors. However, species occurrence data are often characterized by spatial and temporal autocorrelation, which might challenge the use of standard SODMs. Here we reviewed the literature of eDNA biodiversity surveys and found that most of studies do not take into account spatial or temporal autocorrelation. We then demonstrated how the analysis of data with spatial or temporal autocorrelation can be improved by using a conditionally autoregressive SODM, and show its application to environmental DNA data. We tested the autoregressive model on both simulated and real data sets, including chronosequences with different degrees of autocorrelation, and a spatial data set on a virtual landscape. Analyses of simulated data showed that autoregressive SODMs perform better than traditional SODMs in the estimation of key parameters such as true‐/false‐positive rates and show a better discrimination capacity (e.g., higher true skill statistics). The usefulness of autoregressive SODMs was particularly high in data sets with strong autocorrelation. When applied to real eDNA data sets (eDNA from lake sediment cores and freshwater), autoregressive SODM provided more precise estimation of true‐/false‐positive rates, resulting in more reasonable inference of occupancy states. Our results suggest that analyses of occurrence data, such as many applications of eDNA, can be largely improved by applying conditionally autoregressive specifications to SODMs.  相似文献   

3.
A common aim in ChIP-seq experiments is to identify changes in protein binding patterns between conditions, i.e. differential binding. A number of peak- and window-based strategies have been developed to detect differential binding when the regions of interest are not known in advance. However, careful consideration of error control is needed when applying these methods. Peak-based approaches use the same data set to define peaks and to detect differential binding. Done improperly, this can result in loss of type I error control. For window-based methods, controlling the false discovery rate over all detected windows does not guarantee control across all detected regions. Misinterpreting the former as the latter can result in unexpected liberalness. Here, several solutions are presented to maintain error control for these de novo counting strategies. For peak-based methods, peak calling should be performed on pooled libraries prior to the statistical analysis. For window-based methods, a hybrid approach using Simes’ method is proposed to maintain control of the false discovery rate across regions. More generally, the relative advantages of peak- and window-based strategies are explored using a range of simulated and real data sets. Implementations of both strategies also compare favourably to existing programs for differential binding analyses.  相似文献   

4.
Conservation of biological communities requires accurate estimates of abundance for multiple species. Recent advances in estimating abundance of multiple species, such as Bayesian multispecies N‐mixture models, account for multiple sources of variation, including detection error. However, false‐positive errors (misidentification or double counts), which are prevalent in multispecies data sets, remain largely unaddressed. The dependent‐double observer (DDO) method is an emerging method that both accounts for detection error and is suggested to reduce the occurrence of false positives because it relies on two observers working collaboratively to identify individuals. To date, the DDO method has not been combined with advantages of multispecies N‐mixture models. Here, we derive an extension of a multispecies N‐mixture model using the DDO survey method to create a multispecies dependent double‐observer abundance model (MDAM). The MDAM uses a hierarchical framework to account for biological and observational processes in a statistically consistent framework while using the accurate observation data from the DDO survey method. We demonstrate that the MDAM accurately estimates abundance of multiple species with simulated and real multispecies data sets. Simulations showed that the model provides both precise and accurate abundance estimates, with average credible interval coverage across 100 repeated simulations of 94.5% for abundance estimates and 92.5% for detection estimates. In addition, 92.2% of abundance estimates had a mean absolute percent error between 0% and 20%, with a mean of 7.7%. We present the MDAM as an important step forward in expanding the applicability of the DDO method to a multispecies setting. Previous implementation of the DDO method suggests the MDAM can be applied to a broad array of biological communities. We suggest that researchers interested in assessing biological communities consider the MDAM as a tool for deriving accurate, multispecies abundance estimates.  相似文献   

5.
Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise distance matrices instead of the sites‐by‐species matrix, which stands in stark contrast to univariate statistics, where data models, assuming specific distributions, are the norm. However, through advances in statistical theory and computational power, models for multivariate data have gained traction. Systematic simulation‐based performance evaluations of these methods are important as guides for practitioners but still lacking. Here, we compare two model‐based methods, multivariate generalized linear models (MvGLMs) and constrained quadratic ordination (CQO), with two distance‐based methods, distance‐based redundancy analysis (dbRDA) and canonical correspondence analysis (CCA). We studied the performance of the methods to discriminate between causal variables and noise variables for 190 simulated data sets covering different sample sizes and data distributions. MvGLM and dbRDA differentiated accurately between causal and noise variables. The former had the lowest false‐positive rate (0.008), while the latter had the lowest false‐negative rate (0.027). CQO and CCA had the highest false‐negative rate (0.291) and false‐positive rate (0.256), respectively, where these error rates were typically high for data sets with linear responses. Our study shows that both model‐ and distance‐based methods have their place in the ecologist's statistical toolbox. MvGLM and dbRDA are reliable for analyzing species–environment relations, whereas both CQO and CCA exhibited considerable flaws, especially with linear environmental gradients.  相似文献   

6.
7.
8.
9.
MOTIVATION: Multiple hypothesis testing is a common problem in genome research, particularly in microarray experiments and genomewide association studies. Failure to account for the effects of multiple comparisons would result in an abundance of false positive results. The Bonferroni correction and Holm's step-down procedure are overly conservative, whereas the permutation test is time-consuming and is restricted to simple problems. RESULTS: We developed an efficient Monte Carlo approach to approximating the joint distribution of the test statistics along the genome. We then used the Monte Carlo distribution to evaluate the commonly used criteria for error control, such as familywise error rates and positive false discovery rates. This approach is applicable to any data structures and test statistics. Applications to simulated and real data demonstrate that the proposed approach provides accurate error control, and can be substantially more powerful than the Bonferroni and Holm methods, especially when the test statistics are highly correlated.  相似文献   

10.
Accumulations of dead skeletal material are a valuable archive of past ecological conditions. However, such assemblages are not equivalent to living communities because they mix the remains of multiple generations and are altered by post-mortem processes. The abundance of a species in a death assemblage can be quantitatively modelled by successively integrating the product of an influx time series and a post-mortem loss function (a decay function with a constant half-life). In such a model, temporal mixing increases expected absolute dead abundance relative to average influx as a linear function of half-life and increases variation in absolute dead abundance values as a square-root function of half-life. Because typical abundance distributions of ecological communities are logarithmically distributed, species' differences in preservational half-life would have to be very large to substantially alter species' abundance ranks (i.e. make rare species common or vice-versa). In addition, expected dead abundances increase at a faster rate than their range of variation with increased time averaging, predicting greater consistency in the relative abundance structure of death assemblages than their parent living community.  相似文献   

11.
12.
To characterize microbiomes and other ecological assemblages, ecologists routinely sequence and compare loci that differ among focal taxa. Counts of these sequences convey information regarding the occurrence and relative abundances of taxa, but provide no direct measure of their absolute abundances, due to the technical limitations of the sequencing process. The relative abundances in compositional data are inherently constrained and difficult to interpret. The incorporation of internal standards (ISDs; colloquially referred to as ‘spike‐ins’) into DNA pools can ameliorate the problems posed by relative abundance data and allow absolute abundances to be approximated. Unfortunately, many laboratory and sampling biases cause ISDs to underperform or fail. Here, we discuss how careful deployment of ISDs can avoid these complications and be an integral component of well‐designed studies seeking to characterize ecological assemblages via sequencing of DNA.  相似文献   

13.
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth''s ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth''s parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth''s parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.  相似文献   

14.
15.
Aim To move towards modelling spatial abundance patterns and to evaluate the relative impacts of climatic change upon species abundances as opposed to range extents. Location Southern Africa, including Lesotho, Namibia, South Africa, Swaziland and Zimbabwe. Methods Quantitative response surface models were fitted for 78 bird species, mostly endemic (68) or near‐endemic to the region, to model relationships between species reporting rates (i.e. the proportion of checklists reporting a species for a particular grid cell), as recorded by the Southern African Bird Atlas Project, and four bioclimatic variables derived from climatic data for the period 1961–90. With caution, reporting rates can be used as a proxy for abundance. Models were used to project potential impacts of a series of projected climatic change scenarios upon species abundance patterns and range extents. Results Most models obtained were robust with good predictive power. Projections of potential future abundance patterns indicate that the magnitude of impacts upon a proxy for abundance are greater than those upon range extent for the majority of species (82% by 2071–2100). For most species (74%) both abundance and range extent are projected to decrease by 2100. Impacts are especially severe if species are unable to realize projected range changes; when only the area of a species' simulated present range is considered, overall abundance decreases of more than 80% are projected for 19 (24%) of species examined. Main conclusions Our results indicate that projected climatic changes are likely to elicit greater relative changes in species abundances than range extents. For most species examined changes were decreases, suggesting the impacts upon biodiversity are likely generally to be negative. These results also suggest that previous estimates of the proportion of species at increased risk of extinction as a result of climatic change may, in some cases, be under‐estimates.  相似文献   

16.
Aim An important component of human‐induced global change is the decrease or increase in community distinctiveness (taxonomic homogenization or differentiation, respectively) that follows the loss of native species and gain of non‐native species. We use simulation approaches to assess the extent to which conclusions about the outcome of the homogenization process depend on whether or not abundance data are incorporated. Location Data were produced through computer simulation. Methods The frequency with which occurrence‐based similarity indices and abundance‐based similarity indices give different views of changes in community similarity, and the conditions under which such differences occurred were assessed using both deterministic and stochastic modelling approaches to simulate species assemblage states. Results Occurrence‐based and abundance‐based indices were positively correlated across the set of simulations for both the deterministic and stochastic models. However, in both situations approximately one quarter (25%) of models resulted in contrasting outcomes for the two approaches of calculating changes in compositional similarity; that is, one data type showed a positive value (homogenization), whereas the other showed a negative value (differentiation). Main conclusions In the majority of cases, species abundances will not change drastically enough after perturbation to produce large differences between homogenization scores measured using occurrence versus abundance information. However, in cases where these changes are large, it is important to recognize that the choice of metric to analyse homogenization trends will influence the qualitative and quantitative conclusions drawn. Studies of real assemblages are therefore necessary to evaluate the role of species abundance in defining the magnitude and direction of changes in community composition across space, and the implications of these changes for native biodiversity.  相似文献   

17.
The accurate extraction of species-abundance information from DNA-based data (metabarcoding, metagenomics) could contribute usefully to diet analysis and food-web reconstruction, the inference of species interactions, the modelling of population dynamics and species distributions, the biomonitoring of environmental state and change, and the inference of false positives and negatives. However, multiple sources of bias and noise in sampling and processing combine to inject error into DNA-based data sets. To understand how to extract abundance information, it is useful to distinguish two concepts. (i) Within-sample across-species quantification describes relative species abundances in one sample. (ii) Across-sample within-species quantification describes how the abundance of each individual species varies from sample to sample, such as over a time series, an environmental gradient or different experimental treatments. First, we review the literature on methods to recover across-species abundance information (by removing what we call “species pipeline biases”) and within-species abundance information (by removing what we call “pipeline noise”). We argue that many ecological questions can be answered with just within-species quantification, and we therefore demonstrate how to use a “DNA spike-in” to correct for pipeline noise and recover within-species abundance information. We also introduce a model-based estimator that can be used on data sets without a physical spike-in to approximate and correct for pipeline noise.  相似文献   

18.
The human microbiome, which includes the collective microbes residing in or on the human body, has a profound influence on the human health. DNA sequencing technology has made the large-scale human microbiome studies possible by using shotgun metagenomic sequencing. One important aspect of data analysis of such metagenomic data is to quantify the bacterial abundances based on the metagenomic sequencing data. Existing methods almost always quantify such abundances one sample at a time, which ignore certain systematic differences in read coverage along the genomes due to GC contents, copy number variation and the bacterial origin of replication. In order to account for such differences in read counts, we propose a multi-sample Poisson model to quantify microbial abundances based on read counts that are assigned to species-specific taxonomic markers. Our model takes into account the marker-specific effects when normalizing the sequencing count data in order to obtain more accurate quantification of the species abundances. Compared to currently available methods on simulated data and real data sets, our method has demonstrated an improved accuracy in bacterial abundance quantification, which leads to more biologically interesting results from downstream data analysis.  相似文献   

19.
Indirect tests have detected recombination in mitochondrial DNA (mtDNA) from many animal lineages, including mammals. However, it is possible that features of the molecular evolutionary process without recombination could be incorrectly inferred by indirect tests as being due to recombination. We have identified one such example, which we call "patchy-tachy" (PT), where different partitions of sequences evolve at different rates, that leads to an excess of false positives for recombination inferred by indirect tests. To explore this phenomena, we characterized the false positive rates of six widely used indirect tests for recombination using simulations of general models for mtDNA evolution with PT but without recombination. All tests produced 30-99% false positives for recombination, although the conditions that produced the maximal level of false positives differed between the tests. To evaluate the degree to which conditions that exacerbate false positives are found in published sequence data, we turned to 20 animal mtDNA data sets in which recombination is suggested by indirect tests. Using a model where different regions of the sequences were free to evolve at different rates in different lineages, we demonstrated that PT is prevalent in many data sets in which recombination was previously inferred using indirect tests. Taken together, our results argue that PT without recombination is a viable alternative explanation for detection of widespread recombination in animal mtDNA using indirect tests.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号