首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Quantifying the slightly deleterious mutation model of molecular evolution   总被引:14,自引:0,他引:14  
We have attempted to quantify the frequency and effects of slightly deleterious mutations (SDMs), those that have selective effects close to the reciprocal of the effective population size of a species, by comparing the level of selective constraint in protein-coding genes of related species that have different present-day effective population sizes. In our two comparisons, the species with the smaller effective population size showed lower constraint, implying that SDMs had become fixed. The fixation of SDMs was supported by the observation of a higher fraction of radical to conservative amino acid substitutions in species with smaller effective population sizes. The fraction of strongly deleterious mutations (which rarely become fixed) is >70% in most species. Only approximately 10% or fewer of mutations seem to behave as SDMs, but SDMs could comprise a substantial fraction of mutations in protein-coding genes that have a chance of becoming fixed between species.  相似文献   

2.
Real-world uncertainties and data limitations make it difficult to predict how, when and where non-indigenous species (NIS) will spread. Typically only a small fraction of sites are sampled during only a few time intervals, such that we know neither the full spatial extent nor the true temporal progress of invasion. Yet, these unsampled locations might affect the invasion dynamics. We extend propagule pressure models to incorporate both human-mediated and natural fluvial dispersal vectors, and develop techniques to incorporate missing spatial and temporal data on invasions. We apply our model to Bythotrephes longimanus, a high-risk aquatic NIS, using a regional-scale 311-lake survey in a popular watershed in Ontario and extending our analysis to 1,300 unsampled lakes. Of 100 model runs with different random subsets of 50 sampled lakes reserved for validation, we were able to obtain an average area under the curve value of 0.89. Human-mediated dispersal accounted for 99.75% of the contribution of propagules to probability of establishment. Although the discovery rate is accelerating, our results suggest the annual rate of lake invasions is decelerating over time. Management efforts controlling recreational boating traffic out of the largest lakes in the system will be the most effective way of slowing the spread of B. longimanus in lakes within this system.  相似文献   

3.
Summary .   In surveys of natural populations of animals, a sampling protocol is often spatially replicated to collect a representative sample of the population. In these surveys, differences in abundance of animals among sample locations may induce spatial heterogeneity in the counts associated with a particular sampling protocol. For some species, the sources of heterogeneity in abundance may be unknown or unmeasurable, leading one to specify the variation in abundance among sample locations stochastically. However, choosing a parametric model for the distribution of unmeasured heterogeneity is potentially subject to error and can have profound effects on predictions of abundance at unsampled locations. In this article, we develop an alternative approach wherein a Dirichlet process prior is assumed for the distribution of latent abundances. This approach allows for uncertainty in model specification and for natural clustering in the distribution of abundances in a data-adaptive way. We apply this approach in an analysis of counts based on removal samples of an endangered fish species, the Okaloosa darter. Results of our data analysis and simulation studies suggest that our implementation of the Dirichlet process prior has several attractive features not shared by conventional, fully parametric alternatives.  相似文献   

4.
The extent of microbial diversity is an intrinsically fascinating subject of profound practical importance. The term 'diversity' may allude to the number of taxa or species richness as well as their relative abundance. There is uncertainty about both, primarily because sample sizes are too small. Non-parametric diversity estimators make gross underestimates if used with small sample sizes on unevenly distributed communities. One can make richness estimates over many scales using small samples by assuming a species/taxa-abundance distribution. However, no one knows what the underlying taxa-abundance distributions are for bacterial communities. Latterly, diversity has been estimated by fitting data from gene clone libraries and extrapolating from this to taxa-abundance curves to estimate richness. However, since sample sizes are small, we cannot be sure that such samples are representative of the community from which they were drawn. It is however possible to formulate, and calibrate, models that predict the diversity of local communities and of samples drawn from that local community. The calibration of such models suggests that migration rates are small and decrease as the community gets larger. The preliminary predictions of the model are qualitatively consistent with the patterns seen in clone libraries in 'real life'. The validation of this model is also confounded by small sample sizes. However, if such models were properly validated, they could form invaluable tools for the prediction of microbial diversity and a basis for the systematic exploration of microbial diversity on the planet.  相似文献   

5.
Effects of sample size on the performance of species distribution models   总被引:8,自引:0,他引:8  
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence–absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size ( n  < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.  相似文献   

6.
Anderson EC  Thompson EA 《Genetics》2002,160(3):1217-1229
We present a statistical method for identifying species hybrids using data on multiple, unlinked markers. The method does not require that allele frequencies be known in the parental species nor that separate, pure samples of the parental species be available. The method is suitable for both markers with fixed allelic differences between the species and markers without fixed differences. The probability model used is one in which parentals and various classes of hybrids (F(1)'s, F(2)'s, and various backcrosses) form a mixture from which the sample is drawn. Using the framework of Bayesian model-based clustering allows us to compute, by Markov chain Monte Carlo, the posterior probability that each individual belongs to each of the distinct hybrid classes. We demonstrate the method on allozyme data from two species of hybridizing trout, as well as on two simulated data sets.  相似文献   

7.
Cultivation-independent surveys of ribosomal RNA genes have revealed the existence of novel microbial lineages, many with no known cultivated representatives. Ribosomal RNA-based analyses, however, often do not provide significant information beyond phylogenetic affiliation. Analysis of large genome fragments recovered directly from microbial communities represents one promising approach for characterizing uncultivated microbial species better. To assess further the utility of this approach, we constructed large-insert bacterial artificial chromosome (BAC) libraries from the genomic DNA of planktonic marine microbial assemblages. The BAC libraries we prepared had average insert sizes of 80 kb, with maximal insert sizes > 150 kb. A rapid screening method assessing the phylogenetic diversity and representation in the library was developed and applied. In general, representation in the libraries agreed well with previous culture-independent surveys based on polymerase chain reaction (PCR)amplified rRNA fragments. A significant fraction of the genome fragments in the BAC libraries originated from as yet uncultivated microbial species, thought to be abundant and widely distributed in the marine environment. One entire BAC insert, derived from an uncultivated, surface-dwelling euryarchaeote, was sequenced completely. The planktonic euryarchaeal genome fragment contained some typical archaeal genes, as well as unique open reading frames (ORFs) suggesting novel function. In total, our results verify the utility of BAC libraries for providing access to the genomes of as yet uncultivated microbial species. Further analysis of these BAC libraries has the potential to provide significant insight into the genomic potential and ecological roles of many indigenous microbial species, cultivated or not.  相似文献   

8.
Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably estimate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in comparing species richness estimates by applying Chao''s estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics (‘Hill diversities''), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao''s estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.  相似文献   

9.
Aim Techniques that predict species potential distributions by combining observed occurrence records with environmental variables show much potential for application across a range of biogeographical analyses. Some of the most promising applications relate to species for which occurrence records are scarce, due to cryptic habits, locally restricted distributions or low sampling effort. However, the minimum sample sizes required to yield useful predictions remain difficult to determine. Here we developed and tested a novel jackknife validation approach to assess the ability to predict species occurrence when fewer than 25 occurrence records are available. Location Madagascar. Methods Models were developed and evaluated for 13 species of secretive leaf‐tailed geckos (Uroplatus spp.) that are endemic to Madagascar, for which available sample sizes range from 4 to 23 occurrence localities (at 1 km2 grid resolution). Predictions were based on 20 environmental data layers and were generated using two modelling approaches: a method based on the principle of maximum entropy (Maxent) and a genetic algorithm (GARP). Results We found high success rates and statistical significance in jackknife tests with sample sizes as low as five when the Maxent model was applied. Results for GARP at very low sample sizes (less than c. 10) were less good. When sample sizes were experimentally reduced for those species with the most records, variability among predictions using different combinations of localities demonstrated that models were greatly influenced by exactly which observations were included. Main conclusions We emphasize that models developed using this approach with small sample sizes should be interpreted as identifying regions that have similar environmental conditions to where the species is known to occur, and not as predicting actual limits to the range of a species. The jackknife validation approach proposed here enables assessment of the predictive ability of models built using very small sample sizes, although use of this test with larger sample sizes may lead to overoptimistic estimates of predictive power. Our analyses demonstrate that geographical predictions developed from small numbers of occurrence records may be of great value, for example in targeting field surveys to accelerate the discovery of unknown populations and species.  相似文献   

10.
Species distribution models (SDMs) are widely used to predict the occurrence of species. Because SDMs generally use presence‐only data, validation of the predicted distribution and assessing model accuracy is challenging. Model performance depends on both sample size and species’ prevalence, being the fraction of the study area occupied by the species. Here, we present a novel method using simulated species to identify the minimum number of records required to generate accurate SDMs for taxa of different pre‐defined prevalence classes. We quantified model performance as a function of sample size and prevalence and found model performance to increase with increasing sample size under constant prevalence, and to decrease with increasing prevalence under constant sample size. The area under the curve (AUC) is commonly used as a measure of model performance. However, when applied to presence‐only data it is prevalence‐dependent and hence not an accurate performance index. Testing the AUC of an SDM for significant deviation from random performance provides a good alternative. We assessed the minimum number of records required to obtain good model performance for species of different prevalence classes in a virtual study area and in a real African study area. The lower limit depends on the species’ prevalence with absolute minimum sample sizes as low as 3 for narrow‐ranged and 13 for widespread species for our virtual study area which represents an ideal, balanced, orthogonal world. The lower limit of 3, however, is flawed by statistical artefacts related to modelling species with a prevalence below 0.1. In our African study area lower limits are higher, ranging from 14 for narrow‐ranged to 25 for widespread species. We advocate identifying the minimum sample size for any species distribution modelling by applying the novel method presented here, which is applicable to any taxonomic clade or group, study area or climate scenario.  相似文献   

11.
Accessing the soil metagenome for studies of microbial diversity   总被引:1,自引:0,他引:1  
Soil microbial communities contain the highest level of prokaryotic diversity of any environment, and metagenomic approaches involving the extraction of DNA from soil can improve our access to these communities. Most analyses of soil biodiversity and function assume that the DNA extracted represents the microbial community in the soil, but subsequent interpretations are limited by the DNA recovered from the soil. Unfortunately, extraction methods do not provide a uniform and unbiased subsample of metagenomic DNA, and as a consequence, accurate species distributions cannot be determined. Moreover, any bias will propagate errors in estimations of overall microbial diversity and may exclude some microbial classes from study and exploitation. To improve metagenomic approaches, investigate DNA extraction biases, and provide tools for assessing the relative abundances of different groups, we explored the biodiversity of the accessible community DNA by fractioning the metagenomic DNA as a function of (i) vertical soil sampling, (ii) density gradients (cell separation), (iii) cell lysis stringency, and (iv) DNA fragment size distribution. Each fraction had a unique genetic diversity, with different predominant and rare species (based on ribosomal intergenic spacer analysis [RISA] fingerprinting and phylochips). All fractions contributed to the number of bacterial groups uncovered in the metagenome, thus increasing the DNA pool for further applications. Indeed, we were able to access a more genetically diverse proportion of the metagenome (a gain of more than 80% compared to the best single extraction method), limit the predominance of a few genomes, and increase the species richness per sequencing effort. This work stresses the difference between extracted DNA pools and the currently inaccessible complete soil metagenome.  相似文献   

12.
The Species Abundance Distribution (SAD) is a common metric for characterizing macroscopic ecological communities. Recently, this metric has been applied to analysis of microbial communities as well. However, as compared to macroscopic communities, sampling of microscopic communities is different. In particular, most microbial communities are studied using sequencing techniques. These techniques have known biases that result in certain taxa being detected more often than others, even if the taxa are present in the sample at equivalent abundances. There are, for example, amplification biases that result in some sequences being amplified more than others. Likewise, differences in genome size across organisms can result in different numbers of reads from different taxa, again resulting in biased detection. A number of bioinformatics methods have been devised to account for biases in sequencing data, allowing for more accurate estimates of relative taxon abundances. However, because the sampling process itself is affected by biased detection, and because sampling (and under-sampling in particular) can influence the shape of the SAD, it is possible that, even when corrected for through re-scaling, detection biases can affect SAD predictions from sequencing data. To test this hypothesis, we construct a simulation model of the sampling process, focusing on biased detection in shotgun sequencing that arises from genome size differences across microbial taxa. Interestingly, we find that, although genome size itself does not impact SAD predictions, predictions can vary depending on the range of genome sizes that are represented in a community, as well as how genome size is distributed (i.e., whether the majority of species have small versus large genomes). Our results suggest that care should be taken when comparing SADs across environments, particularly when those environments might have taxa with different genome size distributions. Furthermore, our results indicate that relatively deep sequencing might be required to avoid drawing spurious inferences about ecological differences across microbial communities.  相似文献   

13.
The recent discovery of a diverse phylogenetic assemblage of picoeukaryotes from environments such as oceans, salt marshes and acidic habitats, has expanded the debates about the extent and origin of microbial eukaryotes. However, the diversity of these eukaryote microorganisms, that overlap bacteria in size, and their environmental and biogeographical ubiquity remains poorly understood. Here we survey picoeukaryotes (microbial eukaryotes of 0.2-5 microm in size) from an oligotrophic (nutrient deficient) freshwater habitat using ribosomal RNA gene sequences. Three taxonomic groups the Heterokonta, Cryptomonads and the Alveolata dominated the detected diversity. Most sequences represented previously unsampled species, with several being unassignable to known taxonomic groups and plausibly represent new or unsampled phyla. Many freshwater phylogenetic groups identified in this study appeared unrelated to picoeukaryotic sequences identified in marine ecosystems, suggesting that aspects of eukaryote microbial diversity are specific to certain aquatic environments. Conversely, at least five phylogenetic clusters comprised sequences from freshwater and globally dispersed and often contrasting environments, supporting the concept that a number of picoeukaryotic lineages are widely distributed.  相似文献   

14.
We present models describing the acquisition and deletion of novel sequences in populations of microorganisms. We infer that most novel sequences are neutral. Thus, sequence duplications and gene transfer between organisms sharing the same environment are rarely expected to generate adaptive functions. Two classes of models are considered: (1) a homogeneous population with constant size, and (2) an island model in which the population is subdivided into patches that are in contact through slow migration. Distributions of gene frequencies are derived in a Moran model with overlapping generations. We find that novel, neutral or near-neutral coding sequences in microorganisms will not be fixed globally because they offer large target sizes for mutations and because the populations are so large. At most, such genes may have a transient presence in only a small fraction of the population. Consequently, a microbial population is expected to have a very large diversity of transient neutral gene content. Only sequences that are under strong selection, globally or in individual patches, can be expected to persist. We suggest that genome size is maintained in microorganisms by a quasi-steady state mechanism in which random fluctuations in the effective acquisition and deletion rates result in genome sizes that vary from patch to patch. We assign the genomic identity of a global population to those genes that are required for the participation of patches in the genetic sweeps that maintain the genomic coherence of the population. In contrast, we stress the influence of sequence loss on the isolation and the divergence (speciation) of novel patches from a global population.  相似文献   

15.
Ecologists are increasingly using statistical models to predict animal abundance and occurrence in unsampled locations. The reliability of such predictions depends on a number of factors, including sample size, how far prediction locations are from the observed data, and similarity of predictive covariates in locations where data are gathered to locations where predictions are desired. In this paper, we propose extending Cook’s notion of an independent variable hull (IVH), developed originally for application with linear regression models, to generalized regression models as a way to help assess the potential reliability of predictions in unsampled areas. Predictions occurring inside the generalized independent variable hull (gIVH) can be regarded as interpolations, while predictions occurring outside the gIVH can be regarded as extrapolations worthy of additional investigation or skepticism. We conduct a simulation study to demonstrate the usefulness of this metric for limiting the scope of spatial inference when conducting model-based abundance estimation from survey counts. In this case, limiting inference to the gIVH substantially reduces bias, especially when survey designs are spatially imbalanced. We also demonstrate the utility of the gIVH in diagnosing problematic extrapolations when estimating the relative abundance of ribbon seals in the Bering Sea as a function of predictive covariates. We suggest that ecologists routinely use diagnostics such as the gIVH to help gauge the reliability of predictions from statistical models (such as generalized linear, generalized additive, and spatio-temporal regression models).  相似文献   

16.
Although multicenter data are common, many prediction model studies ignore this during model development. The objective of this study is to evaluate the predictive performance of regression methods for developing clinical risk prediction models using multicenter data, and provide guidelines for practice. We compared the predictive performance of standard logistic regression, generalized estimating equations, random intercept logistic regression, and fixed effects logistic regression. First, we presented a case study on the diagnosis of ovarian cancer. Subsequently, a simulation study investigated the performance of the different models as a function of the amount of clustering, development sample size, distribution of center-specific intercepts, the presence of a center-predictor interaction, and the presence of a dependency between center effects and predictors. The results showed that when sample sizes were sufficiently large, conditional models yielded calibrated predictions, whereas marginal models yielded miscalibrated predictions. Small sample sizes led to overfitting and unreliable predictions. This miscalibration was worse with more heavily clustered data. Calibration of random intercept logistic regression was better than that of standard logistic regression even when center-specific intercepts were not normally distributed, a center-predictor interaction was present, center effects and predictors were dependent, or when the model was applied in a new center. Therefore, to make reliable predictions in a specific center, we recommend random intercept logistic regression.  相似文献   

17.
Microorganisms play a central role in the regulation of ecosystem processes, and they comprise the vast majority of species on Earth. With recent developments in molecular methods, it has become tractable to quantify the extent of microbial diversity in natural environments. Here we examine this revolution in our understanding of microbial diversity, and we explore the factors that contribute to the seemingly astounding numbers of microbial taxa found within individual environmental samples. We conducted a meta-analysis of bacterial richness estimates from a variety of ecosystems. Nearly all environments contained hundreds to thousands of bacterial taxa, and richness levels increased with the number of individuals in a sample, a pattern consistent with those reported for nonmicrobial taxa. A cursory comparison might suggest that bacterial richness far exceeds the richness levels typically observed for plant and animal taxa. However, the apparent diversity of bacterial communities is influenced by phylogenetic breadth and allometric scaling issues. When these features are taken into consideration, the levels of microbial diversity may appear less astounding. Although the fields of ecology and biogeography have traditionally ignored microorganisms, there are no longer valid excuses for neglecting microorganisms in surveys of biodiversity. Many of the concepts developed to explain plant and animal diversity patterns can also be applied to microorganisms once we reconcile the scale of our analyses to the scale of the organisms being observed. Furthermore, knowledge from microbial systems may provide insight into the mechanisms that generate and maintain species richness in nonmicrobial systems.  相似文献   

18.
Aim Species distribution models (SDMs) use the locations of collection records to map the distributions of species, making them a powerful tool in conservation biology, ecology and biogeography. However, the accuracy of range predictions may be reduced by temporally autocorrelated biases in the data. We assess the accuracy of SDMs in predicting the ranges of tropical plant species on the basis of different sample sizes while incorporating real‐world collection patterns and biases. Location Tropical South American moist forests. Methods We use dated herbarium records to model the distributions of 65 Amazonian and Andean plant species. For each species, we use the first 25, 50, 100, 125 and 150 records collected and available for each species to analyse changes in spatial aggregation and climatic representativeness through time. We compare the accuracy of SDM range estimates produced using the time‐ordered data subsets to the accuracy of range estimates generated using the same number of collections but randomly subsampled from all available records. Results We find that collections become increasingly aggregated through time but that additional collecting sites are added resulting in progressively better representations of the species’ full climatic niches. The range predictions produced using time‐ordered data subsets are less accurate than predictions from random subsets of equal sample sizes. Range predictions produced using time‐ordered data subsets consistently underestimate the extent of ranges while no such tendency exists for range predictions produced using random data subsets. Main conclusions These results suggest that larger sample sizes are required to accurately map species ranges. Additional attention should be given to increasing the number of records available per species through continued collecting, better distributed collecting, and/or increasing access to existing collections. The fact that SDMs generally under‐predict the extent of species ranges means that extinction risks of species because of future habitat loss may be lower than previously estimated.  相似文献   

19.
In plants and animals, new biological species clearly have arisen as a byproduct of genetic divergence in allopatry. However, our understanding of the processes that generate new microbial species remains limited [1] despite the large contribution of microbes to the world's biodiversity. A recent hypothesis claims that microbes lack biogeographical divergence because their population sizes are large and their migration rates are presumably high [2, 3]. In recapitulating the classic microbial-ecology dictum that "everything is everywhere, and the environment selects"[4, 5], this hypothesis casts doubt on whether geographic divergence promotes speciation in microbes. To date, its predictions have been tested primarily with data from eubacteria and archaebacteria [6-8]. However, this hypothesis's most important implication is in sexual eukaryotic microbes, where migration and genetic admixture are specifically predicted to inhibit allopatric divergence and speciation [9]. Here, we use nuclear-sequence data from globally distributed natural populations of the yeast Saccharomyces paradoxus to investigate the role of geography in generating diversity in sexual eukaryotic microbes. We show that these populations have undergone allopatric divergence and then secondary contact without genetic admixture. Our data thus support the occurrence of evolutionary processes necessary for allopatric speciation in sexual microbes.  相似文献   

20.
Several intervals have been proposed to quantify the agreement of two methods intended to measure the same quantity in the situation where only one measurement per method and subject is available. The limits of agreement are probably the most well‐known among these intervals, which are all based on the differences between the two measurement methods. The different meanings of the intervals are not always properly recognized in applications. However, at least for small‐to‐moderate sample sizes, the differences will be substantial. This is illustrated both using the width of the intervals and on probabilistic scales related to the definitions of the intervals. In particular, for small‐to‐moderate sample sizes, it is shown that limits of agreement and prediction intervals should not be used to make statements about the distribution of the differences between the two measurement methods or about a plausible range for all future differences. Care should therefore be taken to ensure the correct choice of the interval for the intended interpretation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号