首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A method for detecting positive selection at single amino acid sites   总被引:23,自引:0,他引:23  
A method was developed for detecting the selective force at single amino acid sites given a multiple alignment of protein-coding sequences. The phylogenetic tree was reconstructed using the number of synonymous substitutions. Then, the neutrality was tested for each codon site using the numbers of synonymous and nonsynonymous changes throughout the phylogenetic tree. Computer simulation showed that this method accurately estimated the numbers of synonymous and nonsynonymous substitutions per site, as long as the substitution number on each branch was relatively small. The false-positive rate for detecting the selective force was generally low. On the other hand, the true-positive rate for detecting the selective force depended on the parameter values. Within the range of parameter values used in the simulation, the true-positive rate increased as the strength of the selective force and the total branch length (namely the total number of synonymous substitutions per site) in the phylogenetic tree increased. In particular, with the relative rate of nonsynonymous substitutions to synonymous substitutions being 5.0, most of the positively selected codon sites were correctly detected when the total branch length in the phylogenetic tree was > or = 2.5. When this method was applied to the human leukocyte antigen (HLA) gene, which included antigen recognition sites (ARSs), positive selection was detected mainly on ARSs. This finding confirmed the effectiveness of the present method with actual data. Moreover, two amino acid sites were newly identified as positively selected in non-ARSs. The three-dimensional structure of the HLA molecule indicated that these sites might be involved in antigen recognition. Positively selected amino acid sites were also identified in the envelope protein of human immunodeficiency virus and the influenza virus hemagglutinin protein. This method may be helpful for predicting functions of amino acid sites in proteins, especially in the present situation, in which sequence data are accumulating at an enormous speed.  相似文献   

2.
To elucidate the evolutionary mechanisms of the human immunodeficiency virus type 1 gp120 envelope glycoprotein at the single-site level, the degree of amino acid variation and the numbers of synonymous and nonsynonymous substitutions were examined in 186 nucleotide sequences for gp120 (subtype B). Analyses of amino acid variabilities showed that the level of variability was very different from site to site in both conserved (C1 to C5) and variable (V1 to V5) regions previously assigned. To examine the relative importance of positive and negative selection for each amino acid position, the numbers of synonymous and nonsynonymous substitutions that occurred at each codon position were estimated by taking phylogenetic relationships into account. Among the 414 codon positions examined, we identified 33 positions where nonsynonymous substitutions were significantly predominant. These positions where positive selection may be operating, which we call putative positive selection (PS) sites, were found not only in the variable loops but also in the conserved regions (C1 to C4). In particular, we found seven PS sites at the surface positions of the alpha-helix (positions 335 to 347 in the C3 region) in the opposite face for CD4 binding. Furthermore, two PS sites in the C2 region and four PS sites in the C4 region were detected in the same face of the protein. The PS sites found in the C2, C3, and C4 regions were separated in the amino acid sequence but close together in the three-dimensional structure. This observation suggests the existence of discontinuous epitopes in the protein's surface including this alpha-helix, although the antigenicity of this area has not been reported yet.  相似文献   

3.
Hepatitis C virus (HCV) populations persist in vivo as a mixture of heterogeneous viruses called quasispecies. The relationship between the genetic heterogeneity of these variants and their responses to antiviral treatment remains to be elucidated. We have studied 26 virus strains to determine the influence of hypervariable region 1 (HVR-1) of the HCV genome on the effectiveness of alpha interferon (IFN-alpha) therapy. Following PCR amplification, we cloned and sequenced HVR-1. Pretreatment serum samples from 13 individuals with chronic hepatitis C whose virus was subsequently eradicated by treatment were compared with samples from 13 nonresponders matched according to the major factors known to influence the response, i.e., sex, genotype, and pretreatment serum HCV RNA concentration. The degree of virus variation was assessed by analyzing 20 clones per sample and by calculating nucleotide sequence entropy (complexity) and genetic distances (diversity). Types of mutational changes were also determined by calculating nonsynonymous substitutions per nonsynonymous site (K(a)) and synonymous substitutions per synonymous site (K(s)). The paired-comparison analysis of the nucleotide sequence entropy and genetic distance showed no statistical differences between responders and nonresponders. By contrast, nonsynonymous substitutions were more frequent than synonymous substitutions (P 相似文献   

4.
Ramaiah Arunachalam 《Genetica》2013,141(4-6):143-155
In the twenty-first century, the first pandemic novel human influenza A/H1N1virus (NIV) outbreak was reported at Mexico and USA on March and early April, 2009 respectively. The outbreak occurred among human populations due to the presence of meager or no immune response against newly emerged viruses. The success of vaccines and drugs depends on their low susceptibility to the formation of escape mutants in virus. Identification of excess, non-synonymous substitutions over synonymous ones is a main indicator of positive Darwinian selection in protein-coding genes of NIVs. The positive Darwinian selection operating on each site of proteins were inferred by computing ω, the ratio of the non-synonymous/synonymous substitutions [dN/dS (or) Ka/Ks], which was calculated by three different methods in terms of codon-based maximum likelihood, branch-site and empirical Bayesian methods under various models. Totally, nine sites from PB2, PB1, HA, M2 and NS1 are inferred as positively selected. The function for amino acid sites of NIVs proteins under positive selection are inferred by comparing the sites with experimentally determined functionally known amino acid sites. Completely 4 positively selected sites of PB1, HA and M2 are found to be involved in B-cell epitopes (BCEs). Interestingly, most of these sites are also involving in T-cell epitopes (TCEs). However, more sites under positive selection forces are involved in TCEs than those of BCEs. Amino acid sites engaged in both BCEs and TCEs should be measured as highly suitable targets, because these sites could induce the strong humoral and cellular immune responses against targets.  相似文献   

5.
Influenza viruses are the etiological agents of influenza. Although vaccines and drugs are available for the prophylaxis and treatment of influenza virus infections, the generation of escape mutants has been reported. To develop vaccines and drugs that are less susceptible to the generation of escape mutants, it is important to understand the evolutionary mechanisms of the viruses. Here natural selection operating on all the proteins encoded by the H3N2 human influenza A virus genome was inferred by comparing the numbers of synonymous (d(S) [D(S)]) and nonsynonymous (d(N) [D(N)]) substitutions per site. Natural selection was also inferred for the groups of functional amino acid sites involved in B-cell epitopes (BCEs), T-cell epitopes (TCEs), drug resistance, and growth in eggs. The entire region of PB1-F2 was positively selected, and positive selection also appeared to operate on BCEs, TCEs, and growth in eggs. The frequency of escape mutant generation appeared to be positively correlated with the d(N)/d(S) (D(N)/D(S)) values for the targets of vaccines and drugs, suggesting that the amino acid sites under strong functional constraint are suitable targets. In particular, TCEs may represent candidate targets because the d(N)/d(S) (D(N)/D(S)) values were small and negative selection was inferred for many of them.  相似文献   

6.
In order to understand the impact of overlapping reading frames on natural selection by host CD8+ T lymphocytes (CD8(+)-TL), we analyzed the pattern of nucleotide substitution in simian immunodeficiency virus (SIV) genomes sampled from populations at time of death in 35 rhesus monkeys. Both the mean number of nonsynonymous nucleotide substitutions per nonsynonymous site (d(N)) and the mean number of synonymous nucleotide substitutions per synonymous site (d(S)) were elevated in overlap regions in comparison to non-overlap regions. Mean d(N) exceeded mean d(S) in CD8(+)-TL epitopes restricted by the host's class I major histocompatibility complex molecules. This pattern, which is indicative of positive Darwinian selection favoring amino acid changes in these epitopes, was seen in both overlap and non-overlap regions; but mean d(N) was particularly elevated in restricted CD8(+)-TL epitopes encoded in overlap regions. Amino acid changes from the inoculum were defined as parallel if the same amino acid change occurred at the same site independently in two or more monkeys, and a surprisingly high proportion (71.9%) of observed amino acid changes throughout the SIV genome occurred in parallel in different monkeys. The proportion of parallel changes in restricted epitopes encoded by overlapping reading frames was still higher (80%), supporting the hypothesis that the interaction of positive selection and overlapping reading frames enhances the probability of convergent or parallel amino acid change.  相似文献   

7.
8.
We surveyed the molecular evolutionary characteristics of 11 nuclear genes from 10 conifer trees belonging to the Taxodioideae, the Cupressoideae, and the Sequoioideae. Comparisons of substitution rates among the lineages indicated that the synonymous substitution rates of the Cupressoideae lineage were higher than those of the Taxodioideae. This result parallels the pattern previously found in plastid genes. Likelihood-ratio tests showed that the nonsynonymous-synonymous rate ratio did not change significantly among lineages. In addition, after adjustments for lineage effects, the dispersion indices of synonymous and nonsynonymous substitutions were considerably reduced, and the latter was close to 1. These results indicated that the acceleration of evolutionary rates in the Cupressoideae lineage occurred in both the nuclear and plastid genomes, and that generally, this lineage effect affected synonymous and nonsynonymous substitutions similarly. We also investigated the relationship of synonymous substitution rates with the nonsynonymous substitution rate, base composition, and codon bias in each lineage. Synonymous substitution rates were positively correlated with nonsynonymous substitution rates and GC content at third codon positions, but synonymous substitution rates were not correlated with codon bias. Finally, we tested the possibility of positive selection at the protein level, using maximum likelihood models, assuming heterogeneous nonsynonymous-synonymous rate ratios among codon (amino acid) sites. Although we did not detect strong evidence of positively selected codon sites, the analysis suggested that significant variation in nonsynonymous-synonymous rate ratio exists among the sites. The most likely sites for action of positive selection were found in the ferredoxin gene, which is an important component of the apparatus for photosynthesis.  相似文献   

9.
10.
A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes. The method is applied to a large number of mammalian genes. The rate of nonsynonymous substitution is extremely variable among genes; it ranges from 0.004 X 10(-9) (histone H4) to 2.80 X 10(-9) (interferon gamma), with a mean of 0.88 X 10(-9) substitutions per nonsynonymous site per year. The rate of synonymous substitution is also variable among genes; the highest rate is three to four times higher than the lowest one, with a mean of 4.7 X 10(-9) substitutions per synonymous site per year. The rate of nucleotide substitution is lowest at nondegenerate sites (the average being 0.94 X 10(-9), intermediate at twofold degenerate sites (2.26 X 10(-9)). and highest at fourfold degenerate sites (4.2 X 10(-9)). The implication of our results for the mechanisms of DNA evolution and that of the relative likelihood of codon interchanges in parsimonious phylogenetic reconstruction are discussed.  相似文献   

11.
Friedman R  Drake JW  Hughes AL 《Genetics》2004,167(3):1507-1512
To test the hypothesis that the proteins of thermophilic prokaryotes are subject to unusually stringent functional constraints, we estimated the numbers of synonymous and nonsynonymous nucleotide substitutions per site between 17,957 pairs of orthologous genes from 22 pairs of closely related species of Archaea and Bacteria. The average ratio of nonsynonymous to synonymous substitutions was significantly lower in thermophiles than in nonthermophiles, and this effect was observed in both Archaea and Bacteria. There was no evidence that this difference could be explained by factors such as nucleotide content bias. Rather, the results support the hypothesis that proteins of thermophiles are subject to unusually strong purifying selection, leading to a reduced overall level of amino acid evolution per mutational event. The results show that genome-wide patterns of sequence evolution can be influenced by natural selection exerted by a species' environment and shed light on a previous observation that relatively few of the mutations arising in a thermophilic archaeon were nucleotide substitutions in contrast to indels.  相似文献   

12.
The envelope glycoprotein of human immunodeficiency virus type 1 (HIV-1) interacts with receptors on the target cell and mediates virus entry by fusing the viral and cell membranes. To maintain the viral infectivity, amino acids that interact with receptors are expected to be more conserved than the other sites on the protein surface. In contrast to the functional constraint of amino acids for the receptor binding, some amino acid changes in this protein may produce antigenic variations that enable the virus to escape from recognition of the host immune system. Therefore, both positive selection (higher fitness) and negative selection (lower fitness) against amino acid changes are taking place during evolution of surface proteins of parasites To elucidate the evolutionary mechanisms of the whole HIV-1 gp120 envelope glycoprotein at the single site level, we collected and analyzed all available sequence data for the protein. By analyzing 186 sequences of the HIV-1 gp120 (subtype B), we reevaluated amino acid variability at the single site level, and estimated the numbers of synonymous and nonsynonymous substitutions at each codon position to detect positive and negative selection. We identified 33 amino acid positions which may be under positive selection. Some of these positions may form discontinuous epitopes. We also analyzed amino acid sequences to find amino acid positions responsible for usage of the second receptor. We found that, in addition to the V3 loop, amino acid variation at residue 440 in C4 region is clearly linked with the usage of CXCR 4.  相似文献   

13.
To understand the process and mechanism of protein evolution, it is important to know what types of amino acid substitutions are more likely to be under selection and what types are mostly neutral. An amino acid substitution can be classified as either conservative or radical, depending on whether it involves a change in a certain physicochemical property of the amino acid. Assuming Kimura's two-parameter model of nucleotide substitution, I present a method for computing the numbers of conservative and radical nonsynonymous (amino acid altering) nucleotide substitutions per site and estimate these rates for 47 nuclear genes from mammals. The results are as follows. (1) The average radical/conservative rate ratio is 0.81 for charge changes, 0.85 for polarity changes, and 0.49 when both polarity and volume changes are considered. (2) The radical/conservative rate ratio is positively correlated with the nonsynonymous/synonymous rate ratio for charge changes or when both polarity and volume changes are considered. (3) Both the conservative/synonymous rate ratio and the radical/synonymous rate ratio are lower in the rodent lineage than in the primate or artiodactyl lineage, suggesting more intense purifying selection in the rodent lineage, for both conservative and radical nonsynonymous substitutions. (4) Neglecting transition/transversion bias would cause an underestimation of both radical and conservative rates and the ratio thereof. (5) Transversions induce more dramatic genetic alternations than transitions in that transversions produce more amino acid altering changes and among which, more radical changes. Received: 6 April 1999 / Accepted: 16 August 1999  相似文献   

14.
R Nielsen  Z Yang 《Genetics》1998,148(3):929-936
Several codon-based models for the evolution of protein-coding DNA sequences are developed that account for varying selection intensity among amino acid sites. The "neutral model" assumes two categories of sites at which amino acid replacements are either neutral or deleterious. The "positive-selection model" assumes an additional category of positively selected sites at which nonsynonymous substitutions occur at a higher rate than synonymous ones. This model is also used to identify target sites for positive selection. The models are applied to a data set of the V3 region of the HIV-1 envelope gene, sequenced at different years after the infection of one patient. The results provide strong support for variable selection intensity among amino acid sites The neutral model is rejected in favor of the positive-selection model, indicating the operation of positive selection in the region. Positively selected sites are found in both the V3 region and the flanking regions.  相似文献   

15.
16.
Natural selection operating at the amino acid sequence level can be detected by comparing the rates of synonymous (r(S)) and nonsynonymous (r(N)) nucleotide substitutions, where r(N)/r(S) (omega) > 1 and omega < 1 suggest positive and negative selection, respectively. The branch-site test has been developed for detecting positive selection operating at a group of amino acid sites for a pre-specified (foreground) branch of a phylogenetic tree by taking into account the heterogeneity of omega among sites and branches. Here the performance of the branch-site test was examined by computer simulation, with special reference to the false-positive rate when the divergence of the sequences analyzed was small. The false-positive rate was found to inflate when the assumptions made on the omega values for the foreground and other (background) branches in the branch-site test were violated. In addition, under a similar condition, false-positive results were often obtained even when Bonferroni correction was conducted and the false-discovery rate was controlled in a large-scale analysis. False-positive results were also obtained even when the number of nonsynonymous substitutions for the foreground branch was smaller than the minimum value required for detecting positive selection. The existence of a codon site with a possibility of occurrence of multiple nonsynonymous substitutions for the foreground branch often caused the branch-site test to falsely identify positive selection. In the re-analysis of orthologous trios of protein-coding genes from humans, chimpanzees, and macaques, most of the genes previously identified to be positively selected for the human or chimpanzee branch by the branch-site test contained such a codon site, suggesting a possibility that a significant fraction of these genes are false-positives.  相似文献   

17.
The strength and direction of selection on the identity of an amino acid residue in a protein is typically measured by the ratio of the rate of non-synonymous substitutions to the rate of synonymous substitutions. In attempting to predict positively selected sites from amino acid alignments, we made the unexpected observation that the site likelihood of an alignment column for a given tree tends to be negatively correlated with the posterior probability that site is in the positive selection class under widely-used codon models. This is likely because positively selected sites tend to be more variable and display more “radical” amino acid changes; both of these features are expected to result in low site log-likelihoods. We explored the efficacy of using the site log-likelihood (SLL) score as a predictor for positive selection. Through simulation we show that a SLL-based test has a low false positive rate and comparable power as the codon models. In one case where the simulated data violated the assumption that synonymous substitution rates were constant across the sites, the codon models were not able to detect positive selection in the data while the SLL test did. We applied the new method to ten empirical datasets and found that it made similar predictions as the codon models in eight of them. For the tax gene dataset the SLL test seemed to produce more reasonable results. The SLL methods are a valuable complement to codon models, especially for some cases where the assumptions of codon models are likely violated.  相似文献   

18.
The mouse cadherin-related neuronal receptor/protocadherin (CNR/Pcdh) gene clusters are located on chromosome 18. We sequenced single-nucleotide polymorphisms (SNPs) of the CNR/Pcdh(alpha)-coding region among 12 wild-derived and four laboratory strains; these included the four major subspecies groups of Mus musculus: domesticus, musculus, castaneus, and bactrianus. We detected 883 coding SNPs (cSNPs) in the CNR/Pcdh(alpha) variable exons and three in the constant exons. Among all the cSNPs, 586 synonymous (silent) and 297 nonsynonymous (amino acid exchanged) substitutions were found; therefore, the K(a)/K(s) ratio (nonsynonymous substitutions per synonymous substitution) was 0.51. The synonymous cSNPs were relatively concentrated in the first and fifth extracellular cadherin domain-encoding regions (ECs) of CNR/Pcdh(alpha). These regions have high nucleotide homology among the CNR/Pcdh(alpha) paralogs, suggesting that gene conversion events in synonymous and homologous regions of the CNR/Pcdh(alpha) cluster are related to the generation of cSNPs. A phylogenetic analysis revealed gene conversion events in the EC1 and EC5 regions. Assuming that the common sequences between rat and mouse are ancestral, the GC content of the third codon position has increased in the EC1 and EC5 regions, although biased substitutions from GC to AT were detected in all the codon positions. In addition, nonsynonymous substitutions were extremely high (11 of 13, K(a)/K(s) ratio 5.5) in the laboratory mouse strains. The artificial environment of laboratory mice may allow positive selection for nonsynonymous amino acid variations in CNR/Pcdh(alpha) during inbreeding. In this study, we analyzed the direction of cSNP generation, and concluded that subspecies-specific nucleotide substitutions and region-restricted gene conversion events may have contributed to the generation of genetic variations in the CNR/Pcdh genes within and between species.  相似文献   

19.
The RNA genome of the hepatitis C virus (HCV) diversifies rapidly during the acute phase of infection, but the selective forces that drive this process remain poorly defined. Here we examined whether Darwinian selection pressure imposed by CD8(+) T cells is a dominant force driving early amino acid replacement in HCV viral populations. This question was addressed in two chimpanzees followed for 8 to 10 years after infection with a well-defined inoculum composed of a clonal genotype 1a (isolate H77C) HCV genome. Detailed characterization of CD8(+) T cell responses combined with sequencing of recovered virus at frequent intervals revealed that most acute-phase nonsynonymous mutations were clustered in class I epitopes and appeared much earlier than those in the remainder of the HCV genome. Moreover, the ratio of nonsynonymous to synonymous mutations, a measure of positive selection pressure, was increased 50-fold in class I epitopes compared with the rest of the HCV genome. Finally, some mutation of the clonal H77C genome toward a genotype 1a consensus sequence considered most fit for replication was observed during the acute phase of infection, but the majority of these amino acid substitutions occurred slowly over several years of chronic infection. Together these observations indicate that during acute hepatitis C, virus evolution was driven primarily by positive selection pressure exerted by CD8(+) T cells. This influence of immune pressure on viral evolution appears to subside as chronic infection is established and genetic drift becomes the dominant evolutionary force.  相似文献   

20.
Codon-and amino acid-substitution models are widely used for the evolutionary analysis of protein-coding DNA sequences. Using codon models, the amounts of both nonsynonymous and synonymous DNA substitutions can be estimated. The ratio of these amounts represents the strength of selective pressure. Using amino acid models, the amount of nonsynonymous substitutions is estimated, but that of synonymous substitutions is ignored. Although amino acid models lose any information regarding synonymous substitutions, they explicitly incorporate the information for amino acid replacement, which is empirically derived from databases. It is often presumed that when the protein-coding sequences are highly divergent, synonymous substitutions might be saturated and the evolutionary analysis may be hampered by synonymous noise. However, there exists no quantitative procedure to verify whether synonymous substitutions can be ignored; therefore, amino acid models have been arbitrarily selected. In this study, we investigate the issue of a statistical comparison between codon-and amino acid-substitution models. For this purpose, we propose a new procedure to transform a 20-dimensional amino acid model to a 61-dimensional codon model. This transformation reveals that amino acid models belong to a subset of the codon models and enables us to test whether synonymous substitutions can be ignored by using the likelihood ratio. Our theoretical results and analyses of real data indicate that synonymous substitutions are very informative and substantially improve evolutionary inference, even when the sequences are highly divergent. Therefore, we note that amino acid models should be adopted only after carefully investigating and discarding the possibility that synonymous substitutions can reveal important evolutionary information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号