首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Inferring positive selection at single amino acid sites is of biological and medical importance. Parsimony-based and likelihood-based methods have been developed for this purpose, but the reliabilities of these methods are not well understood. Because the evolutionary models assumed in these methods are only rough approximations to reality, it is desirable that the methods are not very sensitive to violation of the assumptions made. In this study we show by computer simulation that the likelihood-based method is sensitive to violation of the assumptions and produces many false-positive results under certain conditions, whereas the parsimony-based method tends to be conservative. These observations, together with those from previous studies, suggest that the positively selected sites inferred by the parsimony-based method are more reliable than those inferred by the likelihood-based method.  相似文献   

2.
Previous studies have shown that recombination between allelic sequences can cause likelihood-based methods for detecting positive selection to produce many false-positive results. In this article, we use simulations to study the impact of nonallelic gene conversion on the specificity of PAML to detect positive selection among gene duplicates. Our results show that, as expected, gene conversion leads to higher rates of false-positive results, although only moderately. These rates increase with the genetic distance between sequences, the length of converted tracts, and when no outgroup sequences are included in the analysis. We also find that branch-site models will incorrectly identify unconverted sequences as the targets of positive selection when their close paralogs are converted. Bayesian prediction of sites undergoing adaptive evolution implemented in PAML is affected by conversion, albeit in a less straightforward way. Our work suggests that particular attention should be devoted to the evolutionary analysis of recent duplicates that may have experienced gene conversion because they may provide false signals of positive selection. Fortunately, these results also imply that those cases most susceptible to false-positive results—i.e., high divergence between paralogs, long conversion tracts—are also the cases where detecting gene conversion is the easiest. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

3.
Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.  相似文献   

4.
Maximum-Likelihood-based and parsimony-based methods were used to test for potential effects of positive selection on the sexually induced gene 1 (Sig1) in Thalassiosira weissflogii. The Sig proteins are thought to play a role in mediating sperm-egg recognition during the sexual reproduction phase. The results obtained from parsimony-based analyses showed that none of the amino acid sites were influenced by positive selection. Maximum-likelihood analyses indicated that positive selection was affecting a maximum of seven and a minimum of four amino acid sites in the polypeptide derived from Sig1. It was concluded that the results obtained from the maximum-likelihood-based method are more reliable than those obtained from the parsimony-based approach. This is apparently the first study that has shown that reproductive proteins in unicellular eukaryotes are influenced by positive selection.  相似文献   

5.
Sexually induced gene 1 (Sig1) in the centric diatom Thalassiosira weissflogii is considered to encode a gamete recognition protein. Sorhannus (2003) analyzed nucleotide sequences of Sig1 using parsimony analysis and the maximum-likelihood (ML)-based Bayesian method for inferring positive selection at single amino acid sites and reported that positively selected sites were detected by the latter method but not by the former. He then concluded that for this type of study, the ML-based method is more reliable than parsimony analysis. Here we show that his results apparently represent false-positive cases of the ML-based method and that there is no solid evidence that this gene contains positively selected sites. We further demonstrate that in the tax gene of human T-cell lymphotropic virus type I (HTLV-I), all codon sites, including invariable sites, can be inferred as positively selected sites by the ML-based method. These observations indicate that the ML-based method may produce many false-positive sites. One of the main reasons for the occurrence of false positives is that in the ML-based method, codon sites are grouped into several categories, with different nonsynonymous/synonymous rate ratios (omegas), on a purely statistical basis, and positive selection is inferred indirectly by examining whether the average omega for each category is greater than 1. In parsimony analysis, however, the evolutionary change of nucleotides at each codon site is examined. For this reason, parsimony-based methods rarely produce false positives and are safer than ML-based methods for detecting positive selection at individual codon sites, although a large number of sequences are necessary.  相似文献   

6.
We consider three approaches for estimating the rates of nonsynonymous and synonymous changes at each site in a sequence alignment in order to identify sites under positive or negative selection: (1) a suite of fast likelihood-based "counting methods" that employ either a single most likely ancestral reconstruction, weighting across all possible ancestral reconstructions, or sampling from ancestral reconstructions; (2) a random effects likelihood (REL) approach, which models variation in nonsynonymous and synonymous rates across sites according to a predefined distribution, with the selection pressure at an individual site inferred using an empirical Bayes approach; and (3) a fixed effects likelihood (FEL) method that directly estimates nonsynonymous and synonymous substitution rates at each site. All three methods incorporate flexible models of nucleotide substitution bias and variation in both nonsynonymous and synonymous substitution rates across sites, facilitating the comparison between the methods. We demonstrate that the results obtained using these approaches show broad agreement in levels of Type I and Type II error and in estimates of substitution rates. Counting methods are well suited for large alignments, for which there is high power to detect positive and negative selection, but appear to underestimate the substitution rate. A REL approach, which is more computationally intensive than counting methods, has higher power than counting methods to detect selection in data sets of intermediate size but may suffer from higher rates of false positives for small data sets. A FEL approach appears to capture the pattern of rate variation better than counting methods or random effects models, does not suffer from as many false positives as random effects models for data sets comprising few sequences, and can be efficiently parallelized. Our results suggest that previously reported differences between results obtained by counting methods and random effects models arise due to a combination of the conservative nature of counting-based methods, the failure of current random effects models to allow for variation in synonymous substitution rates, and the naive application of random effects models to extremely sparse data sets. We demonstrate our methods on sequence data from the human immunodeficiency virus type 1 env and pol genes and simulated alignments.  相似文献   

7.
Positive Darwinian selection promotes fixations of advantageous mutations during gene evolution and is probably responsible for most adaptations. Detecting positive selection at the DNA sequence level is of substantial interest because such information provides significant insights into possible functional alterations during gene evolution as well as important nucleotide substitutions involved in adaptation. Efficient detection of positive selection, however, has been difficult because selection often operates on only a few sites in a short period of evolutionary time. A likelihood-based method with branch-site models was recently introduced to overcome such difficulties. Here I examine the accuracy of the method using computer simulation. I find that the method detects positive selection in 20%-70% of cases when the DNA sequences are generated by computer simulation under no positive selection. Although the frequency of such false detection varies depending on, among other things, the tree topology, branch length, and selection scheme, the branch-site likelihood method generally gives misleading results. Thus, detection of positive selection by this method alone is unreliable. This unreliability may have resulted from its over-sensitivity to violations of assumptions made in the method, such as certain distributions of selective strength among sites and equal transition/transversion ratios for synonymous and nonsynonymous substitutions.  相似文献   

8.
Among-site rate variation, as quantified by the gamma-distribution shape parameter,a or , and the ratio of transition rate to transversion rate (Ts/Tv) influence phylogenetic inference. We examine the effect of topology on estimates of these two parameters in 12S rRNA sequences from nine species of mice belonging to the generaOnychomys andPeromyscus by generating 100 random topologies and estimating these parameters using parsimony and maximum-likelihood methods for each of the random topologies. The parsimony-based estimate ofTs/Tv from the well-corroborated topology falls within the distribution of estimates based on random topologies, whereas the maximum-likelihood estimate ofTs/Tv based on the well-corroborated topology lies well outside the distribution of estimates derived from random topologies. TheTs/Tv ratio derived via maximumlikelihood estimation is three times the parsimony-based estimate, suggesting that parsimony-based estimates are severe underestimates even when the correct topology is used. Both parsimony- and likelihood-based estimates of the gamma-distribution shape parameter () are sensitive to topology because the best estimates based on the well-corroborated topology are well outside the distributions of estimates derived from random topologies for both methods. We show that the reason for topology dependence is the presence of long internal branches in the underlying topology.  相似文献   

9.
Models of codon evolution are useful for investigating the strength and direction of natural selection via a parameter for the nonsynonymous/synonymous rate ratio (omega = d(N)/d(S)). Different codon models are available to account for diversity of the evolutionary patterns among sites. Codon models that specify data partitions as fixed effects allow the most evolutionary diversity among sites but require that site partitions are a priori identifiable. Models that use a parametric distribution to express the variability in the omega ratio across site do not require a priori partitioning of sites, but they permit less among-site diversity in the evolutionary process. Simulation studies presented in this paper indicate that differences among sites in estimates of omega under an overly simplistic analytical model can reflect more than just natural selection pressure. We also find that the classic likelihood ratio tests for positive selection have a high false-positive rate in some situations. In this paper, we developed a new method for assigning codon sites into groups where each group has a different model, and the likelihood over all sites is maximized. The method, called likelihood-based clustering (LiBaC), can be viewed as a generalization of the family of model-based clustering approaches to models of codon evolution. We report the performance of several LiBaC-based methods, and selected alternative methods, over a wide variety of scenarios. We find that LiBaC, under an appropriate model, can provide reliable parameter estimates when the process of evolution is very heterogeneous among groups of sites. Certain types of proteins, such as transmembrane proteins, are expected to exhibit such heterogeneity. A survey of genes encoding transmembrane proteins suggests that overly simplistic models could be leading to false signal for positive selection among such genes. In these cases, LiBaC-based methods offer an important addition to a "toolbox" of methods thereby helping to uncover robust evidence for the action of positive selection.  相似文献   

10.
In this study, we investigate the possibility of selection acting on the proline-rich antigen (PRA) gene in natural populations of the two human pathogens, Coccidioides immitis and Coccidioides posadasii, and three of their close relatives, Chrysosporium lucknowense, Chrysosporium queenslandicum, and Uncinocarpus reesii. We addressed the following questions: Is diversifying selection acting on PRA in the pathogenic species as a result of avoidance of the host's immune system, and has adaptation to a pathogenic life style lead to positive directional selection and increased rate of evolution in PRA between the species? For these purposes, we amplified and sequenced from 40 individuals belonging to the five species, the entire coding region of the PRA gene, as well as partial sequences from the coding region of each of the three housekeeping genes glyderaldehyde-3-phosphate dehydrogenase, glutamine synthetase A, and hexokinase A. We used likelihood-based methods to compare models of different types of selective pressure among codons to analyze the mode of evolution of the genes and found that the PRA gene evolves under positive selection, but the investigated parts of the housekeeping genes evolve primarily under purifying selection. We found a very low level of intraspecific variability and no evidence of diversifying selection, suggesting that the increased rate of evolution in the PRA gene is not a result of avoidance of the host's immune system. Neither did likelihood-based analyses suggest that selection was stronger on the branch separating pathogenic and nonpathogenic species. Instead, we suggest that positive selection act on PRA as a consequence of spore cell-wall morphogenesis unique to each species.  相似文献   

11.
Gu  X; Zhang  J 《Molecular biology and evolution》1997,14(11):1106-1113
When the rate variation among sites is described by a gamma distribution, an important problem is how to estimate the shape parameter alpha, which is an index of the degree of among-site rate variation. The parsimony-based methods for estimating alpha are simple but biased, i.e., alpha tends to be overestimated. On the other hand, the likelihood-based methods are asymptotically unbiased but take a huge amount of computational time. In this paper, we have developed a new method to solve this problem: we first estimate the expected number of substitutions at each site, which is corrected for multiple hits, and then estimate the parameter alpha. Our method is computationally as fast as the parsimony method, and the estimation accuracy is much higher than that of parsimony and similar to that of the likelihood method.   相似文献   

12.
Using likelihood-based variable selection models, we determined if positive selection was acting on 523 EST sequence pairs from two lineages of sunflower and lettuce. Variable rate models are generally not used for comparisons of sequence pairs due to the limited information and the inaccuracy of estimates of specific substitution rates. However, previous studies have shown that the likelihood ratio test (LRT) is reliable for detecting positive selection, even with low numbers of sequences. These analyses identified 56 genes that show a signature of selection, of which 75% were not identified by simpler models that average selection across codons. Subsequent mapping studies in sunflower show four of five of the positively selected genes identified by these methods mapped to domestication QTLs. We discuss the validity and limitations of using variable rate models for comparisons of sequence pairs, as well as the limitations of using ESTs for identification of positively selected genes. [Reviewing Editor: Dr. Rasmus Nielsen]  相似文献   

13.
Wong WS  Yang Z  Goldman N  Nielsen R 《Genetics》2004,168(2):1041-1051
The parsimony method of Suzuki and Gojobori (1999) and the maximum likelihood method developed from the work of Nielsen and Yang (1998) are two widely used methods for detecting positive selection in homologous protein coding sequences. Both methods consider an excess of nonsynonymous (replacement) substitutions as evidence for positive selection. Previously published simulation studies comparing the performance of the two methods show contradictory results. Here we conduct a more thorough simulation study to cover and extend the parameter space used in previous studies. We also reanalyzed an HLA data set that was previously proposed to cause problems when analyzed using the maximum likelihood method. Our new simulations and a reanalysis of the HLA data demonstrate that the maximum likelihood method has good power and accuracy in detecting positive selection over a wide range of parameter values. Previous studies reporting poor performance of the method appear to be due to numerical problems in the optimization algorithms and did not reflect the true performance of the method. The parsimony method has a very low rate of false positives but very little power for detecting positive selection or identifying positively selected sites.  相似文献   

14.
Lee MS  Worthy TH 《Biology letters》2012,8(2):299-303
The widespread view that Archaeopteryx was a primitive (basal) bird has been recently challenged by a comprehensive phylogenetic analysis that placed Archaeopteryx with deinonychosaurian theropods. The new phylogeny suggested that typical bird flight (powered by the front limbs only) either evolved at least twice, or was lost/modified in some deinonychosaurs. However, this parsimony-based result was acknowledged to be weakly supported. Maximum-likelihood and related Bayesian methods applied to the same dataset yield a different and more orthodox result: Archaeopteryx is restored as a basal bird with bootstrap frequency of 73 per cent and posterior probability of 1. These results are consistent with a single origin of typical (forelimb-powered) bird flight. The Archaeopteryx-deinonychosaur clade retrieved by parsimony is supported by more characters (which are on average more homoplasious), whereas the Archaeopteryx-bird clade retrieved by likelihood-based methods is supported by fewer characters (but on average less homoplasious). Both positions for Archaeopteryx remain plausible, highlighting the hazy boundary between birds and advanced theropods. These results also suggest that likelihood-based methods (in addition to parsimony) can be useful in morphological phylogenetics.  相似文献   

15.
We propose two approximate methods (one based on parsimony and one on pairwise sequence comparison) for estimating the pattern of nucleotide substitution and a parsimony-based method for estimating the gamma parameter for variable substitution rates among sites. The matrix of substitution rates that represents the substitution pattern can be recovered through its relationship with the observable matrix of site pattern frequences in pairwise sequence comparisons. In the parsimony approach, the ancestral sequences reconstructed by the parsimony algorithm were used, and the two sequences compared are those at the ends of a branch in the phylogenetic tree. The method for estimating the gamma parameter was based on a reinterpretation of the numbers of changes at sites inferred by parsimony. Three data sets were analyzed to examine the utility of the approximate methods compared with the more reliable likelihood methods. The new methods for estimating the substitution pattern were found to produce estimates quite similar to those obtained from the likelihood analyses. The new method for estimating the gamma parameter was effective in reducing the bias in conventional parsimony estimates, although it also overestimated the parameter. The approximate methods are computationally very fast and appear useful for analyzing large data sets, for which use of the likelihood method requires excessive computation.   相似文献   

16.
Island systems have long been useful models for understanding lineage diversification in a geographic context, especially pertaining to the importance of dispersal in the origin of new clades. Here we use a well-resolved phylogeny of the flowering plant genus Cyrtandra (Gesneriaceae) from the Pacific Islands to compare four methods of inferring ancestral geographic ranges in islands: two developed for character-state reconstruction that allow only single-island ranges and do not explicitly associate speciation with range evolution (Fitch parsimony [FP; parsimony-based] and stochastic mapping [SM; likelihood-based]) and two methods developed specifically for ancestral range reconstruction, in which widespread ranges (spanning islands) are integral to inferences about speciation scenarios (dispersal-vicariance analysis [DIVA; parsimony-based] and dispersal-extinction-cladogenesis [DEC; likelihood-based]). The methods yield conflicting results, which we interpret in light of their respective assumptions. FP exhibits the least power to unequivocally reconstruct ranges, likely due to a combination of having flat (uninformative) transition costs and not using branch length information. SM reconstructions generally agree with a prior hypothesis about dispersal-driven speciation across the Pacific, despite the conceptual mismatch between its character-based model and this mode of range evolution. In contrast with narrow extant ranges for species of Cyrtandra, DIVA reconstructs broad ancestral ranges at many nodes. DIVA results also conflict with geological information on island ages; we attribute these conflicts to the parsimony criterion not considering branch lengths or time, as well as vicariance being the sole means of divergence for widespread ancestors. DEC analyses incorporated geological information on island ages and allowed prior hypotheses about range size and dispersal rates to be evaluated in a likelihood framework and gave more nuanced inferences about range evolution and the geography of speciation than other methods tested. However, ancestral ranges at several nodes could not be conclusively resolved, due possibly to uncertainty in the phylogeny or the relative complexity of the underlying model. Of the methods tested, SM and DEC both converge on plausible hypotheses for area range histories in Cyrtandra, due in part to the consideration of branch lengths and/or timing of events. We suggest that DEC model-based methods for ancestral range inference could be improved by adopting a Bayesian SM approach, in which stochastic sampling of complete geographic histories could be integrated over alternative phylogenetic topologies. Likelihood-based estimates of ancestral ranges for Cyrtandra suggest a major dispersal route into the Pacific through the islands of Fiji and Samoa, motivating future biogeographic investigation of this poorly known region.  相似文献   

17.
Phylogenetic methods that use matrices of pairwise distances between sequences (e.g., neighbor joining) will only give accurate results when the initial estimates of the pairwise distances are accurate. For many different models of sequence evolution, analytical formulae are known that give estimates of the distance between two sequences as a function of the observed numbers of substitutions of various classes. These are often of a form that we call "log transform formulae". Errors in these distance estimates become larger as the time t since divergence of the two sequences increases. For long times, the log transform formulae can sometimes give divergent distance estimates when applied to finite sequences. We show that these errors become significant when t approximately 1/2 |lambda(max)|(-1) logN, where lambda(max) is the eigenvalue of the substitution rate matrix with the largest absolute value and N is the sequence length. Various likelihood-based methods have been proposed to estimate the values of parameters in rate matrices. If rate matrix parameters are known with reasonable accuracy, it is possible to use the maximum likelihood method to estimate evolutionary distances while keeping the rate parameters fixed. We show that errors in distances estimated in this way only become significant when t approximately 1/2 |lambda(1)|(-1) logN, where lambda(1) is the eigenvalue of the substitution rate matrix with the smallest nonzero absolute value. The accuracy of likelihood-based distance estimates is therefore much higher than those based on log transform formulae, particularly in cases where there is a large range of timescales involved in the rate matrix (e.g., when the ratio of transition to transversion rates is large). We discuss several practical ways of estimating the rate matrix parameters before distance calculation and hence of increasing the accuracy of distance estimates.  相似文献   

18.
GARD: a genetic algorithm for recombination detection   总被引:6,自引:0,他引:6  
MOTIVATION: Phylogenetic and evolutionary inference can be severely misled if recombination is not accounted for, hence screening for it should be an essential component of nearly every comparative study. The evolution of recombinant sequences can not be properly explained by a single phylogenetic tree, but several phylogenies may be used to correctly model the evolution of non-recombinant fragments. RESULTS: We developed a likelihood-based model selection procedure that uses a genetic algorithm to search multiple sequence alignments for evidence of recombination breakpoints and identify putative recombinant sequences. GARD is an extensible and intuitive method that can be run efficiently in parallel. Extensive simulation studies show that the method nearly always outperforms other available tools, both in terms of power and accuracy and that the use of GARD to screen sequences for recombination ensures good statistical properties for methods aimed at detecting positive selection. AVAILABILITY: Freely available http://www.datamonkey.org/GARD/  相似文献   

19.
Current sitewise methods for detecting positive selection on gene sequences (the de facto standard being the CODEML method (Yang et al., 2000)) assume no recombination. This paper presents simulation results indicating that violation of this assumption can lead to false positive detection of sites undergoing positive selection. Through the use of population-scaled mutation and recombination rates, simulations can be performed that permit the generation of appropriate null distributions corresponding to neutral expectations in the presence of recombination, thereby allowing for a more accurate estimation of positive selection.  相似文献   

20.
Borchers DL  Efford MG 《Biometrics》2008,64(2):377-385
Live-trapping capture-recapture studies of animal populations with fixed trap locations inevitably have a spatial component: animals close to traps are more likely to be caught than those far away. This is not addressed in conventional closed-population estimates of abundance and without the spatial component, rigorous estimates of density cannot be obtained. We propose new, flexible capture-recapture models that use the capture locations to estimate animal locations and spatially referenced capture probability. The models are likelihood-based and hence allow use of Akaike's information criterion or other likelihood-based methods of model selection. Density is an explicit parameter, and the evaluation of its dependence on spatial or temporal covariates is therefore straightforward. Additional (nonspatial) variation in capture probability may be modeled as in conventional capture-recapture. The method is tested by simulation, using a model in which capture probability depends only on location relative to traps. Point estimators are found to be unbiased and standard error estimators almost unbiased. The method is used to estimate the density of Red-eyed Vireos (Vireo olivaceus) from mist-netting data from the Patuxent Research Refuge, Maryland, U.S.A. Estimates agree well with those from an existing spatially explicit method based on inverse prediction. A variety of additional spatially explicit models are fitted; these include models with temporal stratification, behavioral response, and heterogeneous animal home ranges.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号