首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Longitudinal samples of DNA sequences are the DNA sequences sampled from the same population at different time points. For fast evolving organisms, e.g. RNA virus, these kind of samples have increasingly been used to study the evolutionary process in action. Longitudinal samples provide some interesting new summary statistics of genetic variation, such as the frequency of mutation of size i in one sample and size j in another, the average number of mutations accumulated since the common ancestor of two sequences each from a different sample, and number of private, shared and fixed mutations within samples. To make the results more applicable, we used in this study a general two-sample model, which assumes two longitudinal samples were taken from the same measurably evolving population. Inspired by the HIV study, we also studied a two-sample-two-stage model, which is a special case of two-sample model and assumes a treatment after the first sampling instantaneously changes the population size. We derived the formulas for calculating statistical properties, e.g. expectations, variances and covariances, of these new summary statistics under the two models. Potential applications of these results were discussed.  相似文献   

2.
Species are considered to be the basic unit of ecological and evolutionary studies. As multilocus genomic data are increasingly available, there have been considerable interests in the use of DNA sequence data to delimit species. In this study, we show that machine learning can be used for species delimitation. Our method treats the species delimitation problem as a classification problem for identifying the category of a new observation on the basis of training data. Extensive simulation is first conducted over a broad range of evolutionary parameters for training purposes. Each pair of known populations is combined to form training samples with a label of “same species” or “different species”. We use support vector machine (SVM) to train a classifier using a set of summary statistics computed from training samples as features. The trained classifier can classify a test sample to two outcomes: “same species” or “different species”. Given multilocus genomic data of multiple related organisms or populations, our method (called CLADES) performs species delimitation by first classifying pairs of populations. CLADES then delimits species by maximizing the likelihood of species assignment for multiple populations. CLADES is evaluated through extensive simulation and also tested on real genetic data. We show that CLADES is both accurate and efficient for species delimitation when compared with existing methods. CLADES can be useful especially when existing methods have difficulty in delimitation, for example with short species divergence time and gene flow.  相似文献   

3.
Human immunodeficiency virus (HIV) infects different organs and tissues. During these infection events, subpopulations of HIV type 1 (HIV-1) develop and, if viral trafficking is restricted between subpopulations, the viruses can follow independent evolutionary histories, i.e., become compartmentalized. This phenomenon is usually detected via comparative sequence analysis and has been reported for viruses isolated from the central nervous system (CNS) and the genital tract. Several approaches have been proposed to study the compartmentalization of HIV sequences, but to date, no rigorous comparison of the most commonly employed methods has been made. In this study, we systematically compared inferences made by six different methods for detecting compartmentalization based on three data sets: (i) a sample of 45 patients with sequences gathered from the CNS, (ii) sequences from the female genital tract of 18 patients, and (iii) a set of simulated sequences. We found that different methods often reached contradictory conclusions. Methods based on the topology of a phylogenetic tree derived from clonal sequences were generally more sensitive in detecting compartmentalization than those that relied solely upon pairwise genetic distances between sequences. However, as the branching structure in a phylogenetic tree is often uncertain, especially for short, low-diversity, or recombinant sequences, tree-based approaches may need to be modified to take phylogenetic uncertainty into account. Given the frequently discordant predictions of different methods and the strengths and weaknesses of each particular methodology, we recommend that a suite of several approaches be used for reliable inference of compartmentalized population structure.  相似文献   

4.
Tests of applicability of several substitution models for DNA sequence data   总被引:8,自引:3,他引:5  
Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.   相似文献   

5.
We have compared two statistical methods of estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences, which have been proposed by Templeton (1993) and Bandeltet al. (1995). Monte-Carlo simulations were used for generating DNA sequence data. Different evolutionary scenarios were simulated and the estimation procedures were evaluated. We have found that for both methods (i) the estimates are insensitive to demographic parameters and (ii) the standard deviations of the estimates are too high for these methods to be reliably used in practice.  相似文献   

6.
? Premise of the study: DNA barcoding has been proposed as a useful technique within many disciplines (e.g., conservation biology and forensics) for determining the taxonomic identity of a sample based on nucleotide similarity to samples of known taxonomy. Application of DNA barcoding to plants has primarily focused on evaluating the success of candidate barcodes across a broad spectrum of evolutionary divergence. Less attention has been paid to evaluating performance when distinguishing congeners or to differential success of analytical techniques despite the fact that the practical application and utility of barcoding hinges on the ability to distinguish closely related species. ? Methods: We tested the ability to distinguish among 92 samples representing 29 putative species in the genus Agalinis (Orobanchaceae) using 13 candidate barcodes and three analytical methods (i.e., threshold genetic distances, hierarchical tree-based, and diagnostic character differences). Due to questions regarding evolutionary distinctiveness of some taxa, we evaluated success under two taxonomic hypotheses. ? Key results: The psbA-trnH and trnT-trnL barcodes in conjunction with the "best close match" distance-based method best met the objectives of DNA barcoding. Success was also a function of the taxonomy used. ? Conclusions: In addition to accurately identifying query sequences, our results showed that DNA barcoding is useful for detecting taxonomic uncertainty; determining whether erroneous taxonomy or incomplete lineage sorting is the cause requires additional information provided by traditional taxonomic approaches. The magnitude of differentiation within and among the Agalinis species sampled suggests that our results inform how DNA barcoding will perform among closely related species in other genera.  相似文献   

7.
Several procedures were compared for reliable PCR detection of Ralstonia solanacearum in common substrates (plant, seed, water and soil). In order to prevent the inhibition of PCR by substances contained in crude extracts, numerous DNA extraction procedures as well as additives to buffers or PCR mixtures were checked. Our results showed that the efficiency of these methods or compounds depended greatly upon the nature of the sample. Consequently, preparation of samples prior to PCR depended upon sample origin. Simple methods such as a combined PVPP/BSA treatment or the association of filtration and centrifugation for detecting the bacterium in plant or water samples were very powerful. DNA capture also efficiently overcame PCR inhibition problems and ensured the detection of R. solanacearum in environmental samples. However, the commercial DNA extraction QIAamp kit appeared to be the most effective tool to guarantee the accurate PCR detection of the pathogen whatever the origin of the sample; this was particularly true for soil samples where the commonly used methods for the detection of R. solanacearum were inefficient. This study demonstrates that using an appropriate procedure, PCR is a useful and powerful tool for detecting low levels of R. solanacearum populations in their natural habitats.  相似文献   

8.
Zhang NR  Siegmund DO  Ji H  Li JZ 《Biometrika》2010,97(3):631-645
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.  相似文献   

9.
Studies of DNA from ancient samples provide a valuable opportunity to gain insight into past evolutionary and demographic processes. Bayesian phylogenetic methods can estimate evolutionary rates and timescales from ancient DNA sequences, with the ages of the samples acting as calibrations for the molecular clock. Sample ages are often estimated using radiocarbon dating, but the associated measurement error is rarely taken into account. In addition, the total uncertainty quantified by converting radiocarbon dates to calendar dates is typically ignored. Here, we present a tool for incorporating both of these sources of uncertainty into Bayesian phylogenetic analyses of ancient DNA. This empirical calibrated radiocarbon sampler (ECRS) integrates the age uncertainty for each ancient sequence over the calibrated probability density function estimated for its radiocarbon date and associated error. We use the ECRS to analyse three ancient DNA data sets. Accounting for radiocarbon‐dating and calibration error appeared to have little impact on estimates of evolutionary rates and related parameters for these data sets. However, analyses of other data sets, particularly those with few or only very old radiocarbon dates, might be more sensitive to using artificially precise sample ages and should benefit from use of the ECRS.  相似文献   

10.
Hughes JP  Totten P 《Biometrics》2003,59(3):505-511
Polymerase chain reaction (PCR)-based tests for various microorganisms or target DNA sequences are generally acknowledged to be highly "sensitive," yet the concept of sensitivity is ill-defined in the literature on these tests. We propose that sensitivity should be expressed as a function of the number of target DNA molecules in the sample (or specificity, when the target number is 0). However, estimating this "sensitivity curve" is problematic, since it is difficult to construct samples with a fixed number of targets. Nonetheless, using serially diluted replicate aliquots of a known concentration of the target DNA sequence, we show that it is possible to disentangle random variations in the number of target DNA molecules from the underlying test sensitivity. We develop parametric, nonparametric, and semiparametric (spline-based) models for the sensitivity curve. The methods are compared on a new test for M. genitalium.  相似文献   

11.
Summary We present compositional statistics, a new method of phylogenetic inference, which is an extension of evolutionary parsimony. Compositional statistics takes account of the base composition of the compared sequences by using nucleotide positions that evolutionary parsimony ignores. It shares with evolutionary parsimony the features of rate invariance and the fundamental distinction between transitions and transversions. Of the presently available methods of phylogenetic inference, compositional statistics is based on the fewest and mildest assumptions about the mode of DNA sequence evolution. It is therefore applicable to phylogenetic studies of the most distantly related organisms or molecules. This was illustrated by analyzing conservative positions in the DNA sequences of the large subunit of RNA polymerase from three archaebacterial groups, a eubacterium, a chloroplast, and the three eukaryotic polymerases. Internally consistent results, which are in accord with our knowledge of organelle origin and archaebacterial physiology, were achieved.  相似文献   

12.
MOTIVATION: Most phylogenetic methods assume that the sequences of nucleotides or amino acids have evolved under stationary, reversible and homogeneous conditions. When these assumptions are violated by the data, there is an increased probability of errors in the phylogenetic estimates. Methods to examine aligned sequences for these violations are available, but they are rarely used, possibly because they are not widely known or because they are poorly understood. RESULTS: We describe and compare the available tests for symmetry of k-dimensional contingency tables from homologous sequences, and develop two new tests to evaluate different aspects of the evolutionary processes. For any pair of sequences, we consider a partition of the test for symmetry into a test for marginal symmetry and a test for internal symmetry. The proposed tests can be used to identify appropriate models for estimation of evolutionary relationships under a Markovian model. Simulations under more or less complex evolutionary conditions were done to display the performance of the tests. Finally, the tests were applied to an alignment of small-subunit ribosomal RNA sequences of five species of bacteria to outline the evolutionary processes under which they evolved. AVAILABILITY: Programs written in R to do the tests on nucleotides are available from http://www.maths.usyd.edu.au/u/johnr/testsym/  相似文献   

13.
Feather mites are among the most common and diverse ectosymbionts of birds, yet basic questions such as the nature of their relationship remain largely unanswered. One reason for feather mites being understudied is that their morphological identification is often virtually impossible when using female or young individuals. Even for adult male specimens this task is tedious and requires advanced taxonomic expertise, thus hampering large-scale studies. In addition, molecular-based methods are challenging because the low DNA amounts usually obtained from these tiny mites do not reach the levels required for high-throughput sequencing. This work aims to overcome these issues by using a DNA metabarcoding approach to accurately identify and quantify the feather mite species present in a sample. DNA metabarcoding is a widely used molecular technique that takes advantage of high-throughput sequencing methodologies to assign the taxonomic identity to all the organisms present in a complex sample (i.e., a sample made up of multiple specimens that are hard or impossible to individualise). We present a high-throughput method for feather mite identification using a fragment of the COI gene as marker and Illumina Miseq technology. We tested this method by performing two experiments plus a field test over a total of 11,861 individual mites (5360 of which were also morphologically identified). In the first experiment, we tested the probability of detecting a single feather mite in a heterogeneous pool of non-conspecific individuals. In the second experiment, we made 2?×?2 combinations of species and studied the relationship between the proportion of individuals of a given species in a sample and the proportion of sequences retrieved to test whether DNA metabarcoding can reliably quantify the relative abundance of mites in a sample. Here we also tested the efficacy of degenerate primers (i.e., a mixture of similar primers that differ in one or several bases that are designed to increase the chance of annealing) and investigated the relationship between the number of mismatches and PCR success. Finally, we applied our DNA metabarcoding pipeline to a total of 6501 unidentified and unsorted feather mite individuals sampled from 380 European passerine birds belonging to 10 bird species (field test). Our results show that this proposed pipeline is suitable for correct identification and quantitative estimation of the relative abundance of feather mite species in complex samples, especially when dealing with a moderate number (>?30) of individuals per sample.  相似文献   

14.
We investigated the effectiveness of culture-independent molecular methods for determining host-associated microbial diversity in bighorn sheep (Ovis canadensis). Results from bacterial culture attempts have been the primary source of information on host-associated bacteria, but studies have shown that culture-based results significantly underestimate bacterial diversity in biological samples. To test the effectiveness of culture-independent methods, we extracted DNA from nasal and oropharyngeal swab samples collected from bighorn sheep in four different populations. From these samples, we amplified, cloned, and sequenced small subunit (16S) ribosomal DNA (rDNA) to identify the scope of microbial diversity in bighorn respiratory tracts. Phylogenetic analysis of these rDNA gene sequences revealed organismal diversity an order of magnitude higher than was determined by culture methods. Pasteurellaceae bacteria were the most diverse phylogenetic group in live bighorn sheep, and members of bacterial genera often associated with respiratory disease were found in all the samples. Culture-independent methods were also able to directly detect leukotoxin (lktA) gene sequences in swab and lung tissue samples. Overall, our results show the power of culture-independent molecular methods for identifying microbial diversity in bighorn sheep and the potential for these methods to detect the presence of virulence genes in biological samples.  相似文献   

15.
The most stringent test for predictive methods of protein secondary structure is whether identical short sequences that are known to be present with different conformations in different proteins known at atomic resolution can be correctly discriminated. In this study, we show that the prediction efficiency of this type of segments in unrelated proteins reaches an average accuracy per residue ranging from about 72 to 75% (depending on the alignment method used to generate the input sequence profile) only when methods of the third generation are used. A comparison of different methods based on segment statistics (2nd generation methods) and/or including also evolutionary information (3rd generation methods) indicate that the discrimination of the different conformations of identical segments is dependent on the method used for the prediction. Accuracy is similar when methods similarly performing on the secondary structure prediction are tested. When evolutionary information is taken into account as compared to single sequence input, the number of correctly discriminated pairs is increased twofold. The results also highlight the predictive capability of neural networks for identical segments whose conformation differs in different proteins.  相似文献   

16.
Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance.  相似文献   

17.
We used genome fragment enrichment and bioinformatics to identify several microbial DNA sequences with high potential for use as markers in PCR assays for detection of human fecal contamination in water. Following competitive solution-phase hybridization of total DNA from human and pig fecal samples, 351 plasmid clones were sequenced and were determined to define 289 different genomic DNA regions. These putative human-specific fecal bacterial DNA sequences were then analyzed by dot blot hybridization, which confirmed that 98% were present in the source human fecal microbial community and absent from the original pig fecal DNA extract. Comparative sequence analyses of these sequences suggested that a large number (43.5%) were predicted to encode bacterial secreted or surface-associated proteins. Deoxyoligonucleotide primers capable of annealing to a subset of 26 of the candidate sequences predicted to encode factors involved in interactions with host cells were then used in the PCR and did not amplify markers in DNA from any additional pig fecal specimens. These 26 PCR assays exhibited a range of specificity in tests with 11 other animal sources, with more than half amplifying markers only in specimens from dogs or cats. Four assays were more specific, detecting markers only in specimens from humans, including those from 18 different human populations examined. We then demonstrated the potential utility of these assays by using them to detect human fecal contamination in several impacted watersheds.  相似文献   

18.
Anisimova M  Nielsen R  Yang Z 《Genetics》2003,164(3):1229-1236
Maximum-likelihood methods based on models of codon substitution accounting for heterogeneous selective pressures across sites have proved to be powerful in detecting positive selection in protein-coding DNA sequences. Those methods are phylogeny based and do not account for the effects of recombination. When recombination occurs, such as in population data, no unique tree topology can describe the evolutionary history of the whole sequence. This violation of assumptions raises serious concerns about the likelihood method for detecting positive selection. Here we use computer simulation to evaluate the reliability of the likelihood-ratio test (LRT) for positive selection in the presence of recombination. We examine three tests based on different models of variable selective pressures among sites. Sequences are simulated using a coalescent model with recombination and analyzed using codon-based likelihood models ignoring recombination. We find that the LRT is robust to low levels of recombination (with fewer than three recombination events in the history of a sample of 10 sequences). However, at higher levels of recombination, the type I error rate can be as high as 90%, especially when the null model in the LRT is unrealistic, and the test often mistakes recombination as evidence for positive selection. The test that compares the more realistic models M7 (beta) against M8 (beta and omega) is more robust to recombination, where the null model M7 allows the positive selection pressure to vary between 0 and 1 (and so does not account for positive selection), and the alternative model M8 allows an additional discrete class with omega = d(N)/d(S) that could be estimated to be >1 (and thus accounts for positive selection). Identification of sites under positive selection by the empirical Bayes method appears to be less affected than the LRT by recombination.  相似文献   

19.
20.
Detection of invasive species is critical for management but is often limited by challenges associated with capture, processing and identification of early life stages. DNA metabarcoding facilitates large-scale monitoring projects to detect establishment early. Here, we test the use of DNA metabarcoding to monitor invasive species by sequencing over 5000 fishes in bulk ichthyoplankton samples (larvae and eggs) from four rivers of ecological and cultural importance in southern Canada. We were successful in detecting species known from each river and three invasive species in two of the four rivers. This includes the first detection of early life-stage rudd in the Credit River. We evaluated whether sampling gear affected the detection of invasive species and estimates of species richness, and found that light traps outperform bongo nets in both cases. We also found that the primers used for the amplification of target sequences and the number of sequencing reads generated per sample affect the consistency of species detections. However, these factors have less impact on detections and species richness estimates than the number of samples collected and analysed. Our analyses also show that incomplete reference databases can result in incorrectly attributing DNA sequences to invasive species. Overall, we conclude that DNA metabarcoding is an efficient tool for monitoring the early establishment of invasive species by detecting evidence of reproduction but requires careful consideration of sampling design and the primers used to amplify, sequence and classify the diversity of native and potentially invasive species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号