首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Accurate detection of positive Darwinian selection can provide important insights to researchers investigating the evolution of pathogens. However, many pathogens (particularly viruses) undergo frequent recombination and the phylogenetic methods commonly applied to detect positive selection have been shown to give misleading results when applied to recombining sequences. We propose a method that makes maximum likelihood inference of positive selection robust to the presence of recombination. This is achieved by allowing tree topologies and branch lengths to change across detected recombination breakpoints. Further improvements are obtained by allowing synonymous substitution rates to vary across sites. RESULTS: Using simulation we show that, even for extreme cases where recombination causes standard methods to reach false positive rates >90%, the proposed method decreases the false positive rate to acceptable levels while retaining high power. We applied the method to two HIV-1 datasets for which we have previously found that inference of positive selection is invalid owing to high rates of recombination. In one of these (env gene) we still detected positive selection using the proposed method, while in the other (gag gene) we found no significant evidence of positive selection. AVAILABILITY: A HyPhy batch language implementation of the proposed methods and the HIV-1 datasets analysed are available at http://www.cbio.uct.ac.za/pub_support/bioinf06. The HyPhy package is available at http://www.hyphy.org, and it is planned that the proposed methods will be included in the next distribution. RDP2 is available at http://darwin.uvigo.es/rdp/rdp.html  相似文献   

2.
The prion diseases, such as Creutzfeldt-Jakob disease of humans and bovine spongiform encephalopathy, involve the aberrant metabolism and accumulation of prion protein PrP. There are three contradictory hypotheses about evolution of prion protein gene PRNP. Population genetic studies have proposed that PRNP could be under balancing selection, strong purifying selection, or mainly positive selection. We made use of the maximum likelihood tests for detection of positive selection at the amino acid level and present availability of PRNP coding sequences to contribute to these disagreements. Positive selection could occur at amino acids residing in active sites, and at amino acids involved in protein-protein interactions. Thus we tested a hypothesis that positive selection at the amino acid level in PrP might have taken place in human and related species from the superordinal group Euarchonta, as well as in bovine and related species from the superordinal clade Laurasiatheria. Our study and the present experimental evidences indicate that positive selection at the amino acid level might have taken place in the PrP signal sequences and conformationally plastic PrP regions, as well as at the protein X binding sites. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users. Prof. Vera Gamulin passed away.  相似文献   

3.
Previous studies have shown that recombination between allelic sequences can cause likelihood-based methods for detecting positive selection to produce many false-positive results. In this article, we use simulations to study the impact of nonallelic gene conversion on the specificity of PAML to detect positive selection among gene duplicates. Our results show that, as expected, gene conversion leads to higher rates of false-positive results, although only moderately. These rates increase with the genetic distance between sequences, the length of converted tracts, and when no outgroup sequences are included in the analysis. We also find that branch-site models will incorrectly identify unconverted sequences as the targets of positive selection when their close paralogs are converted. Bayesian prediction of sites undergoing adaptive evolution implemented in PAML is affected by conversion, albeit in a less straightforward way. Our work suggests that particular attention should be devoted to the evolutionary analysis of recent duplicates that may have experienced gene conversion because they may provide false signals of positive selection. Fortunately, these results also imply that those cases most susceptible to false-positive results—i.e., high divergence between paralogs, long conversion tracts—are also the cases where detecting gene conversion is the easiest. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

4.
The nonsynonymous (amino acid-altering) to synonymous (silent) substitution rate ratio (omega = d(N)/d(S)) provides a measure of natural selection at the protein level, with omega = 1, >1, and <1, indicating neutral evolution, purifying selection, and positive selection, respectively. Previous studies that used this measure to detect positive selection have often taken an approach of pairwise comparison, estimating substitution rates by averaging over all sites in the protein. As most amino acids in a functional protein are under structural and functional constraints and adaptive evolution probably affects only a few sites at a few time points, this approach of averaging rates over sites and over time has little power. Previously, we developed codon-based substitution models that allow the omega ratio to vary either among lineages or among sites. In this paper we extend previous models to allow the omega ratio to vary both among sites and among lineages and implement the new models in the likelihood framework. These models may be useful for identifying positive selection along prespecified lineages that affects only a few sites in the protein. We apply those branch-site models as well as previous branch- and site-specific models to three data sets: the lysozyme genes from primates, the tumor suppressor BRCA1 genes from primates, and the phytochrome (PHY) gene family in angiosperms. Positive selection is detected in the lysozyme and BRCA genes by both the new and the old models. However, only the new models detected positive selection acting on lineages after gene duplication in the PHY gene family. Additional tests on several data sets suggest that the new models may be useful in detecting positive selection after gene duplication in gene family evolution.  相似文献   

5.
Bayes prediction quantifies uncertainty by assigning posterior probabilities. It was used to identify amino acids in a protein under recurrent diversifying selection indicated by higher nonsynonymous (d(N)) than synonymous (d(S)) substitution rates or by omega = d(N)/d(S) > 1. Parameters were estimated by maximum likelihood under a codon substitution model that assumed several classes of sites with different omega ratios. The Bayes theorem was used to calculate the posterior probabilities of each site falling into these site classes. Here, we evaluate the performance of Bayes prediction of amino acids under positive selection by computer simulation. We measured the accuracy by the proportion of predicted sites that were truly under selection and the power by the proportion of true positively selected sites that were predicted by the method. The accuracy was slightly better for longer sequences, whereas the power was largely unaffected by the increase in sequence length. Both accuracy and power were higher for medium or highly diverged sequences than for similar sequences. We found that accuracy and power were unacceptably low when data contained only a few highly similar sequences. However, sampling a large number of lineages improved the performance substantially. Even for very similar sequences, accuracy and power can be high if over 100 taxa are used in the analysis. We make the following recommendations: (1) prediction of positive selection sites is not feasible for a few closely related sequences; (2) using a large number of lineages is the best way to improve the accuracy and power of the prediction; and (3) multiple models of heterogeneous selective pressures among sites should be applied in real data analysis.  相似文献   

6.
彭阳  苏应娟  王艇 《植物学报》2020,55(3):287-298
rpoC1基因编码RNA聚合酶β°亚基蛋白, 在转录过程中与DNA模板结合, 与β亚基形成的β-β°亚基复合体构成RNA合成的催化中心。以rpoC1基因为研究对象, 在贝叶斯因子大于20的条件下, 用HyPhy软件位点模型检测到3个正选择位点和541个负选择位点; 用PAML软件位点模型检测到10个正选择位点, 其中3个位点的后验概率超过99%。此外, 基于最大似然法构建64种蕨类植物的系统发育树, 结合HyPhy软件分析rpoC1基因的转换率、颠换率、转换率/颠换率、同义替换率、非同义替换率以及同义替换率/非同义替换率, 探讨rpoC1基因内含子丢失与分子进化速率的关系。结果表明, rpoC1基因内含子缺失对转换率、颠换率以及非同义替换率有一定影响。  相似文献   

7.
Massingham T  Goldman N 《Genetics》2005,169(3):1753-1762
An excess of nonsynonymous over synonymous substitution at individual amino acid sites is an important indicator that positive selection has affected the evolution of a protein between the extant sequences under study and their most recent common ancestor. Several methods exist to detect the presence, and sometimes location, of positively selected sites in alignments of protein-coding sequences. This article describes the "sitewise likelihood-ratio" (SLR) method for detecting nonneutral evolution, a statistical test that can identify sites that are unusually conserved as well as those that are unusually variable. We show that the SLR method can be more powerful than currently published methods for detecting the location of positive selection, especially in difficult cases where the strength of selection is low. The increase in power is achieved while relaxing assumptions about how the strength of selection varies over sites and without elevated rates of false-positive results that have been reported with some other methods. We also show that the SLR method performs well even under circumstances where the results from some previous methods can be misleading.  相似文献   

8.
The selective pressure at the protein level is usually measured by the nonsynonymous/synonymous rate ratio (omega = dN/dS), with omega < 1, omega = 1, and omega > 1 indicating purifying (or negative) selection, neutral evolution, and diversifying (or positive) selection, respectively. The omega ratio is commonly calculated as an average over sites. As every functional protein has some amino acid sites under selective constraints, averaging rates across sites leads to low power to detect positive selection. Recently developed models of codon substitution allow the omega ratio to vary among sites and appear to be powerful in detecting positive selection in empirical data analysis. In this study, we used computer simulation to investigate the accuracy and power of the likelihood ratio test (LRT) in detecting positive selection at amino acid sites. The test compares two nested models: one that allows for sites under positive selection (with omega > 1), and another that does not, with the chi2 distribution used for significance testing. We found that use of the chi(2) distribution makes the test conservative, especially when the data contain very short and highly similar sequences. Nevertheless, the LRT is powerful. Although the power can be low with only 5 or 6 sequences in the data, it was nearly 100% in data sets of 17 sequences. Sequence length, sequence divergence, and the strength of positive selection also were found to affect the power of the LRT. The exact distribution assumed for the omega ratio over sites was found not to affect the effectiveness of the LRT.  相似文献   

9.
Spidermonkey is a new component of the Datamonkey suite of phylogenetic tools that provides methods for detecting coevolving sites from a multiple alignment of homologous nucleotide or amino acid sequences. It reconstructs the substitution history of the alignment by maximum likelihood-based phylogenetic methods, and then analyzes the joint distribution of substitution events using Bayesian graphical models to identify significant associations among sites. AVAILABILITY: Spidermonkey is publicly available both as a web application at http://www.data-monkey.org and as a stand-alone component of the phylogenetic software package HyPhy, which is freely distributed on the web (http://www.hyphy.org) as precompiled binaries and open source.  相似文献   

10.
We present the statistical analysis of diversifying selective pressures on the hepatitis D antigen gene (HDAg). Thirty-three distinct HDAg sequences from subtypes I, II, and III were tested for positive selection using maximum likelihood methods based on models of codon substitution that allow variable selective pressures across sites. Such methods have been shown to be sufficiently accurate and successful in detecting positive selection in a variety of viral and nonviral protein-coding genes. About 11% of codon sites in HDAg were estimated to be under diversifying selection. Remarkably, most of the residues predicted to evolve under positive selection were located in the immunogenic domain and the N-terminus region with reported antigenic activity. These sites are potential targets of the hosts immune response. Identification of residues mutating to escape immune recognition may help to distinguish the most virulent strains and aid vaccine design. Possible interplay between positive selection and recombination on the gene is discussed but no significant evidence for recombination was found.This article contains online supplementary material.Reviewing Editor: Dr. Nicolas Galtier  相似文献   

11.
Comparison of orthologous gene sequences is emerging as a powerful approach to elucidating functionally important positions in human disease genes. Using a diverse array of 132 mammalian BRCA1 (exon 11) sequences, we evaluated the functional significance of specific sites in the context of selection information (purifying, neutral, or diversifying) as well as the ability to extract such information from alignments that index varying degrees of mammalian diversity. Small data sets of either closely related taxa (Primates) or divergent placental taxa were unable to distinguish sites conserved due to purifying selection from sites conserved due to chance (false-positive rate = 65%-99%). Increasing the number of placental taxa to 57 greatly reduced the potential false-positive rate (0%-1.5%). Using the larger data set, we ranked the oncogenic risk of human missense mutations using a novel method that incorporates site-specific selection level and severity of the amino acid change evaluated against the amino acids present in other mammalian taxa. In addition to sites undergoing positive selection in Marsupialia, Laurasiatheria, Euarchontoglires, and Primates, we identified sites most likely to be undergoing divergent selection pressure in different lineages and six pairs of potentially interacting sites. Our results demonstrate the necessity of including large numbers of sequences to elucidate functionally important sites of a protein when using a comparative evolutionary approach.  相似文献   

12.
As a consequence of immune pressure, influenza virus hemagglutinin presents some of its amino acids under positive selection. Several authors have reported the existence of influenza A hemagglutinin codons under positive selective pressure (PSP). In this framework, the present work objectives were to demonstrate the presence of PSP and evaluate its effects on Victoria- and Yamagata-like influenza B viruses. Methodology adopted consisted in estimating the acceptance rate of nonsynonymous substitutions (ω = dN/dS) that describe the strength of selective pressure and identifying codons that may be positively selected, applying a set of continuous-time Markov chain codon-substitution models. Two groups of HA1 sequences (140 from Yamagata and 60 from Victoria lineage) were used. All the model maximum-likelihood estimates were obtained using codeml software application (PAML 3.15). The hypothesis of no existence of sites under PSP was rejected for both lineages (p < 0.001), using likelihood ratio tests. These results demonstrate the presence of positive selection acting on hemagglutinin of both Yamagata- and Victoria-like influenza B viruses. Several different sites were identified to be under PSP on Yamagata and Victoria hemagglutinins. Sites found with a posterior probability > 0.95 were codons 197 and 199 in both lineages, codon 75 in the Yamagata lineage, and codon 129 in the Victoria lineage. The detected amino acids are located at or near antigenic sites in influenza A virus H3 hemagglutinin. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

13.
14.
Detecting positive Darwinian selection at the DNA sequence level has been a subject of considerable interest. However, positive selection is difficult to detect because it often operates episodically on a few amino acid sites, and the signal may be masked by negative selection. Several methods have been developed to test positive selection that acts on given branches (branch methods) or on a subset of sites (site methods). Recently, Yang, Z., and R. Nielsen (2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917) developed likelihood ratio tests (LRTs) based on branch-site models to detect positive selection that affects a small number of sites along prespecified lineages. However, computer simulations suggested that the tests were sensitive to the model assumptions and were unable to distinguish between relaxation of selective constraint and positive selection (Zhang, J. 2004. Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21:1332-1339). Here, we describe a modified branch-site model and use it to construct two LRTs, called branch-site tests 1 and 2. We applied the new tests to reanalyze several real data sets and used computer simulation to examine the performance of the two tests by examining their false-positive rate, power, and robustness. We found that test 1 was unable to distinguish relaxed constraint from positive selection affecting the lineages of interest, while test 2 had acceptable false-positive rates and appeared robust against violations of model assumptions. As test 2 is a direct test of positive selection on the lineages of interest, it is referred to as the branch-site test of positive selection and is recommended for use in real data analysis. The test appeared conservative overall, but exhibited better power in detecting positive selection than the branch-based test. Bayes empirical Bayes identification of amino acid sites under positive selection along the foreground branches was found to be reliable, but lacked power.  相似文献   

15.
Using the ratio of nonsynonymous to synonymous nucleotide substitution rates (Ka/Ks) is a common approach for detecting positive selection. However, calculation of this ratio over a whole gene combines amino acid sites that may be under positive selection with those that are highly conserved. We introduce a new covarion‐based method to sample only the sites potentially under selective pressure. Using ancestral sequence reconstruction over a phylogenetic tree coupled with calculation of Ka/Ks ratios, positive selection is better detected by this simple covarion‐based approach than it is using a whole gene analysis or a windowing analysis. This is demonstrated on a synthetic dataset and is tested on primate leptin, which indicates a previously undetected round of positive selection in the branch leading to Gorilla gorilla.  相似文献   

16.
Statistical methods for detecting molecular adaptation   总被引:2,自引:0,他引:2  
The past few years have seen the development of powerful statistical methods for detecting adaptive molecular evolution. These methods compare synonymous and nonsynonymous substitution rates in protein-coding genes, and regard a nonsynonymous rate elevated above the synonymous rate as evidence for darwinian selection. Numerous cases of molecular adaptation are being identified in various systems from viruses to humans. Although previous analyses averaging rates over sites and time have little power, recent methods designed to detect positive selection at individual sites and lineages have been successful. Here, we summarize recent statistical methods for detecting molecular adaptation, and discuss their limitations and possible improvements.  相似文献   

17.
For most proteins, multiple sequence alignments are a viable method to identify functionally and structurally important amino acids, but for most organisms, there is a subset of proteins that are unique or found in a few closely related organisms. For these proteins, it is not possible to produce sequence alignments that are useful in identifying functionally or structurally important amino acids. We have investigated the relationship between amino acid conservation and five factors (the amino acid’s identity, N-terminal neighbor, C-terminal neighbor, the local hydropathy of surrounding amino acids, and the local expected net charge of the surrounding amino acids based on the primary sequence) in Escherichia coli proteins. For four of the factors examined (all but the amino acid’s identity), there is a significant relationship with conservation for some of the standard 20 amino acids. Using the combination of all five factors, we show that it is possible to calculate a score based on the primary sequences of a subset of E. coli proteins that has statistically significant predictive value with respect to predicting conserved amino acids in other E. coli proteins and Saccharomyces cerevisiae proteins. As these five variables show significant relationships with conservation, we have termed them conservation factors. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

18.
Rigers Bakiu 《Biologia》2014,69(3):270-280
Calreticulin (CRT) is a low molecular weight protein present in vertebrates, invertebrates and higher plants. Its multiple functions have been demonstrated. It plays an important role as a chaperone and Ca2+ buffer inside sarcoplasmic/endoplasmic reticulum (SR/ER), and outside the ER in many physiological/pathological processes. Recently it has been observed that CRT over-expression or its absence is linked to various pathological conditions, such as malignant evolution and progression, and these facts really increased its study interests. Using an evolution approach CRT was further characterized. Several Bayesian phylogenetic analyses were performed using coding and amino acid sequences. CRT molecular evolution was investigated for the presence of negative or/and positive selection using HyPhy package. The results indicated that the purifying selection might have operated over the whole CRT primary structure. Although, an episodic diversifying selection was also found on the analyzed CRT sequences.  相似文献   

19.
Errors in the inferred multiple sequence alignment may lead to false prediction of positive selection. Recently, methods for detecting unreliable alignment regions were developed and were shown to accurately identify incorrectly aligned regions. While removing unreliable alignment regions is expected to increase the accuracy of positive selection inference, such filtering may also significantly decrease the power of the test, as positively selected regions are fast evolving, and those same regions are often those that are difficult to align. Here, we used realistic simulations that mimic sequence evolution of HIV-1 genes to test the hypothesis that the performance of positive selection inference using codon models can be improved by removing unreliable alignment regions. Our study shows that the benefit of removing unreliable regions exceeds the loss of power due to the removal of some of the true positively selected sites.  相似文献   

20.
以68种蕨类植物和2种石松类植物的rps12基因为对象,在系统发育背景下,结合最大似然法,使用HyPhy和PAML软件对该基因进行进化速率和适应性进化研究。结果显示:位于IR区的外显子2~3,其替换率明显降低,rps12基因编码序列的替换率也随之降低,且rps12基因密码子第3位的GC含量明显升高;在蕨类植物的进化过程中,3′-rps12更倾向定位于IR区,以保持较低的替换率;rps12基因编码的123个氨基酸位点中,共检测到4个正选择位点和116个负选择位点。研究结果表明基因序列进入到IR区后,显示出降低的替换率;强烈的负选择压力表明RPS12蛋白的高度保守性以及rps12基因的功能和结构已经趋于稳定。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号