首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Maximum-likelihood models of codon substitution were used to analyze sperm lysin genes of 25 abalone (HALIOTIS:) species to identify lineages and amino acid sites under diversifying selection. The models used the nonsynonymous/synonymous rate ratio (omega = d(N)/d(S)) as an indicator of selective pressure and allowed the ratio to vary among lineages or sites. Likelihood ratio tests suggested significant variation in selective pressure among lineages. The variable selective pressure provided an explanation for the previous observation that the omega ratio is >1 in comparisons of closely related species and <1 in comparisons of distantly related species. Computer simulations demonstrated that saturation of nonsynonymous substitutions and constraint on lysin structure were unlikely to account for the observed pattern. Lineages linking closely related sympatric species appeared to be under diversifying selection, while lineages separating distantly related species from different geographic locations were associated with low evolutionary rates. The selective pressure indicated by the omega ratio was found to vary greatly among amino acid sites in lysin. Sites under potential diversifying selection were identified. Ancestral lysins were inferred to trace the route of evolution at individual sites and to provide lysin sequences for future laboratory studies.  相似文献   

2.
The nonsynonymous to synonymous substitution rate ratio (omega = d(N)/d(S)) provides a sensitive measure of selective pressure at the protein level, with omega values <1, =1, and >1 indicating purifying selection, neutral evolution, and diversifying selection, respectively. Maximum likelihood models of codon substitution developed recently account for variable selective pressures among amino acid sites by employing a statistical distribution for the omega ratio among sites. Those models, called random-sites models, are suitable when we do not know a priori which sites are under what kind of selective pressure. Sometimes prior information (such as the tertiary structure of the protein) might be available to partition sites in the protein into different classes, which are expected to be under different selective pressures. It is then sensible to use such information in the model. In this paper, we implement maximum likelihood models for prepartitioned data sets, which account for the heterogeneity among site partitions by using different omega parameters for the partitions. The models, referred to as fixed-sites models, are also useful for combined analysis of multiple genes from the same set of species. We apply the models to data sets of the major histocompatibility complex (MHC) class I alleles from human populations and of the abalone sperm lysin genes. Structural information is used to partition sites in MHC into two classes: those in the antigen recognition site (ARS) and those outside. Positive selection is detected in the ARS by the fixed-sites models. Similarly, sites in lysin are classified into the buried and solvent-exposed classes according to the tertiary structure, and positive selection was detected at the solvent-exposed sites. The random-sites models identified a number of sites under positive selection in each data set, confirming and elaborating the results of the fixed-sites models. The analysis demonstrates the utility of the fixed-sites models, as well as the power of previous random-sites models, which do not use the prior information to partition sites.  相似文献   

3.
The selective pressure at the protein level is usually measured by the nonsynonymous/synonymous rate ratio (omega = dN/dS), with omega < 1, omega = 1, and omega > 1 indicating purifying (or negative) selection, neutral evolution, and diversifying (or positive) selection, respectively. The omega ratio is commonly calculated as an average over sites. As every functional protein has some amino acid sites under selective constraints, averaging rates across sites leads to low power to detect positive selection. Recently developed models of codon substitution allow the omega ratio to vary among sites and appear to be powerful in detecting positive selection in empirical data analysis. In this study, we used computer simulation to investigate the accuracy and power of the likelihood ratio test (LRT) in detecting positive selection at amino acid sites. The test compares two nested models: one that allows for sites under positive selection (with omega > 1), and another that does not, with the chi2 distribution used for significance testing. We found that use of the chi(2) distribution makes the test conservative, especially when the data contain very short and highly similar sequences. Nevertheless, the LRT is powerful. Although the power can be low with only 5 or 6 sequences in the data, it was nearly 100% in data sets of 17 sequences. Sequence length, sequence divergence, and the strength of positive selection also were found to affect the power of the LRT. The exact distribution assumed for the omega ratio over sites was found not to affect the effectiveness of the LRT.  相似文献   

4.
Genes that have undergone positive or diversifying selection are likely to be associated with adaptive divergence between species. One indicator of adaptive selection at the molecular level is an excess of amino acid replacement fixed differences per replacement site relative to the number of synonymous fixed differences per synonymous site (omega = K(a)/K(s)). We used an evolutionary expressed sequence tag (EST) approach to estimate the distribution of omega among 304 orthologous loci between Arabidopsis thaliana and A. lyrata to identify genes potentially involved in the adaptive divergence between these two Brassicaceae species. We find that 14 of 304 genes (approximately 5%) have an estimated omega > 1 and are candidates for genes with increased selection intensities. Molecular population genetic analyses of 6 of these rapidly evolving protein loci indicate that, despite their high levels of between-species nonsynonymous divergence, these genes do not have elevated levels of intraspecific replacement polymorphisms compared to previously studied genes. A hierarchical Bayesian analysis of protein-coding region evolution within and between species also indicates that the selection intensities of these genes are elevated compared to previously studied A. thaliana nuclear loci.  相似文献   

5.
The nonsynonymous (amino acid-altering) to synonymous (silent) substitution rate ratio (omega = d(N)/d(S)) provides a measure of natural selection at the protein level, with omega = 1, >1, and <1, indicating neutral evolution, purifying selection, and positive selection, respectively. Previous studies that used this measure to detect positive selection have often taken an approach of pairwise comparison, estimating substitution rates by averaging over all sites in the protein. As most amino acids in a functional protein are under structural and functional constraints and adaptive evolution probably affects only a few sites at a few time points, this approach of averaging rates over sites and over time has little power. Previously, we developed codon-based substitution models that allow the omega ratio to vary either among lineages or among sites. In this paper we extend previous models to allow the omega ratio to vary both among sites and among lineages and implement the new models in the likelihood framework. These models may be useful for identifying positive selection along prespecified lineages that affects only a few sites in the protein. We apply those branch-site models as well as previous branch- and site-specific models to three data sets: the lysozyme genes from primates, the tumor suppressor BRCA1 genes from primates, and the phytochrome (PHY) gene family in angiosperms. Positive selection is detected in the lysozyme and BRCA genes by both the new and the old models. However, only the new models detected positive selection acting on lineages after gene duplication in the PHY gene family. Additional tests on several data sets suggest that the new models may be useful in detecting positive selection after gene duplication in gene family evolution.  相似文献   

6.
Natural selection operating at the amino acid sequence level can be detected by comparing the rates of synonymous (r(S)) and nonsynonymous (r(N)) nucleotide substitutions, where r(N)/r(S) (omega) > 1 and omega < 1 suggest positive and negative selection, respectively. The branch-site test has been developed for detecting positive selection operating at a group of amino acid sites for a pre-specified (foreground) branch of a phylogenetic tree by taking into account the heterogeneity of omega among sites and branches. Here the performance of the branch-site test was examined by computer simulation, with special reference to the false-positive rate when the divergence of the sequences analyzed was small. The false-positive rate was found to inflate when the assumptions made on the omega values for the foreground and other (background) branches in the branch-site test were violated. In addition, under a similar condition, false-positive results were often obtained even when Bonferroni correction was conducted and the false-discovery rate was controlled in a large-scale analysis. False-positive results were also obtained even when the number of nonsynonymous substitutions for the foreground branch was smaller than the minimum value required for detecting positive selection. The existence of a codon site with a possibility of occurrence of multiple nonsynonymous substitutions for the foreground branch often caused the branch-site test to falsely identify positive selection. In the re-analysis of orthologous trios of protein-coding genes from humans, chimpanzees, and macaques, most of the genes previously identified to be positively selected for the human or chimpanzee branch by the branch-site test contained such a codon site, suggesting a possibility that a significant fraction of these genes are false-positives.  相似文献   

7.
Codon-based substitution models are routinely used to measure selective pressures acting on protein-coding genes. To this effect, the nonsynonymous to synonymous rate ratio (dN/dS = omega) is estimated. The proportion of amino-acid sites potentially under positive selection, as indicated by omega > 1, is inferred by fitting a probability distribution where some sites are permitted to have omega > 1. These sites are then inferred by means of an empirical Bayes or by a Bayes empirical Bayes approach that, respectively, ignores or accounts for sampling errors in maximum-likelihood estimates of the distribution used to infer the proportion of sites with omega > 1. Here, we extend a previous full-Bayes approach to include models with high power and low false-positive rates when inferring sites under positive selection. We propose some heuristics to alleviate the computational burden, and show that (i) full Bayes can be superior to empirical Bayes when analyzing a small data set or small simulated data, (ii) full Bayes has only a small advantage over Bayes empirical Bayes with our small test data, and (iii) Bayesian methods appear relatively insensitive to mild misspecifications of the random process generating adaptive evolution in our simulations, but in practice can prove extremely sensitive to model specification. We suggest that the codon model used to detect amino acids under selection should be carefully selected, for instance using Akaike information criterion (AIC).  相似文献   

8.
The nature of selection on capsid genes of foot-and-mouth disease virus (FMDV) was characterized by examining the ratio of nonsynonymous to synonymous substitutions in 11 data sets of sequences obtained from six different serotypes of FMDV. Using a method of analysis that assigns each codon position to one of a number of estimated values of nonsynonymous to synonymous ratio, significant evidence of positive selection was identified in 5 data sets, operating at 1-7% of codon positions. Evidence of positive selection was identified in complete capsid sequences of serotypes A and C and in VP1 sequences of serotypes SAT 1 and 2. Sequences of serotype SAT-2 recovered from a persistently infected African buffalo also revealed evidence for positive selection. Locations of codons under positive selection coincide closely with those of antigenic sites previously identified with the use of monoclonal antibody escape mutants. The vast majority of codons are under mild to strong purifying selection. However, these results suggest that arising antigenic variants benefit from a selective advantage in their interaction with the immune system, either during the course of an infection or in transmission to individuals with previous exposure to antigen. Analysis of amino acid usage at sites under positive selection indicates that this selective advantage can be conferred by amino acid substitutions that share physicochemically similar properties.  相似文献   

9.
Human papillomavirus type 16 (HPV16) is the primary etiological agent of cervical cancer, the second most common cancer in women worldwide. Complete genomes of 12 isolates representing the major lineages of HPV16 were cloned and sequenced from cervicovaginal cells. The sequence variations within the open reading frames (ORFs) and noncoding regions were identified and compared with the HPV16R reference sequence. This whole-genome approach gives us unprecedented precision in detailing sequence-level changes that are under selection on a whole-viral-genome scale. Of 7,908 base pair nucleotide positions, 313 (4.0%) were variable. Within the 2,452 amino acids (aa) comprising 8 ORFs, 243 (9.9%) amino acid positions were variable. In order to investigate the molecular evolution of HPV16 variants, maximum likelihood models of codon substitution were used to identify lineages and amino acid sites under selective pressure. Five codon sites in the E5 (aa 48, 65) and E6 (aa 10, 14, 83) ORFs were demonstrated to be under diversifying selective pressure. The E5 ORF had the overall highest nonsynonymous/synonymous substitution rate (omega) ratio (M3 = 0.7965). The E2 gene had the next-highest omega ratio (M3 = 0.5611); however, no specific codons were under positive selection. These data indicate that the E6 and E5 ORFs are evolving under positive Darwinian selection and have done so in a relatively short time period. Whether response to selective pressure upon the E5 and E6 ORFs contributes to the biological success of HPV16, its specific biological niche, and/or its oncogenic potential remains to be established.  相似文献   

10.
Since the birth of molecular evolutionary analysis,primates have been a central focus of study and mitochondrial DNA is well suited to these endeavors because of its unique features.Surprisingly,to date no comprehensive evaluation of the nucleotide substitution patterns has been conducted on the mitochondrial genome of primates.Here,we analyzed the evolutionary patterns and evaluated selection and recombination in the mitochondrial genomes of 44 Primates species downloaded from GenBank.The results revealed that a strong rate heterogeneity occurred among sites and genes in all comparisons.Likewise,an obvious decline in primate nucleotide diversity was noted in the subunit rRNAs and tRNAs as compared to the protein-coding genes.Within 13 protein-coding genes,the pattern of nonsynonymous divergence was similar to that of overall nucleotide divergence,while synonymous changes differed only for individual genes,indicating that the rate heterogeneity may result from the rate of change at nonsynonymous sites.Codon usage analysis revealed that there was intermediate codon usage bias in primate protein-coding genes,and supported the idea that GC mutation pressure might determine codon usage and that positive selection is not the driving force for the codon usage bias.Neutrality tests using site-specific positive selection from a Bayesian framework indicated no sites were under positive selection for any gene,consistent with near neutrality.Recombination tests based on the pairwise homoplasy test statistic supported complete linkage even for much older divergent primate species.Thus,with the exception of rate heterogeneity among mitochondrial genes,evaluating the validity assumed complete linkage and selective neutrality in primates prior to phylogenetic or phylogeographic analysis seems unnecessary.  相似文献   

11.
12.
R Nielsen  Z Yang 《Genetics》1998,148(3):929-936
Several codon-based models for the evolution of protein-coding DNA sequences are developed that account for varying selection intensity among amino acid sites. The "neutral model" assumes two categories of sites at which amino acid replacements are either neutral or deleterious. The "positive-selection model" assumes an additional category of positively selected sites at which nonsynonymous substitutions occur at a higher rate than synonymous ones. This model is also used to identify target sites for positive selection. The models are applied to a data set of the V3 region of the HIV-1 envelope gene, sequenced at different years after the infection of one patient. The results provide strong support for variable selection intensity among amino acid sites The neutral model is rejected in favor of the positive-selection model, indicating the operation of positive selection in the region. Positively selected sites are found in both the V3 region and the flanking regions.  相似文献   

13.
Zhao L  Zhang XT  Tao XK  Wang WW  Li M 《动物学研究》2012,33(E3-4):E47-E56
Since the birth of molecular evolutionary analysis, primates have been a central focus of study and mitochondrial DNA is well suited to these endeavors because of its unique features. Surprisingly, to date no comprehensive evaluation of the nucleotide substitution patterns has been conducted on the mitochondrial genome of primates. Here, we analyzed the evolutionary patterns and evaluated selection and recombination in the mitochondrial genomes of 44 Primates species downloaded from GenBank. The results revealed that a strong rate heterogeneity occurred among sites and genes in all comparisons. Likewise, an obvious decline in primate nucleotide diversity was noted in the subunit rRNAs and tRNAs as compared to the protein-coding genes. Within 13 protein-coding genes, the pattern of nonsynonymous divergence was similar to that of overall nucleotide divergence, while synonymous changes differed only for individual genes, indicating that the rate heterogeneity may result from the rate of change at nonsynonymous sites. Codon usage analysis revealed that there was intermediate codon usage bias in primate protein-coding genes, and supported the idea that GC mutation pressure might determine codon usage and that positive selection is not the driving force for the codon usage bias. Neutrality tests using site-specific positive selection from a Bayesian framework indicated no sites were under positive selection for any gene, consistent with near neutrality. Recombination tests based on the pairwise homoplasy test statistic supported complete linkage even for much older divergent primate species. Thus, with the exception of rate heterogeneity among mitochondrial genes, evaluating the validity assumed complete linkage and selective neutrality in primates prior to phylogenetic or phylogeographic analysis seems unnecessary.  相似文献   

14.
Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (d(N)/d(S), denoted omega) is used as a measure of selective pressure at the protein level, with omega > 1 indicating positive selection. Statistical distributions are used to model the variation in omega among sites, allowing a subset of sites to have omega > 1 while the rest of the sequence may be under purifying selection with omega < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with omega > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and omega ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.  相似文献   

15.
Codon-and amino acid-substitution models are widely used for the evolutionary analysis of protein-coding DNA sequences. Using codon models, the amounts of both nonsynonymous and synonymous DNA substitutions can be estimated. The ratio of these amounts represents the strength of selective pressure. Using amino acid models, the amount of nonsynonymous substitutions is estimated, but that of synonymous substitutions is ignored. Although amino acid models lose any information regarding synonymous substitutions, they explicitly incorporate the information for amino acid replacement, which is empirically derived from databases. It is often presumed that when the protein-coding sequences are highly divergent, synonymous substitutions might be saturated and the evolutionary analysis may be hampered by synonymous noise. However, there exists no quantitative procedure to verify whether synonymous substitutions can be ignored; therefore, amino acid models have been arbitrarily selected. In this study, we investigate the issue of a statistical comparison between codon-and amino acid-substitution models. For this purpose, we propose a new procedure to transform a 20-dimensional amino acid model to a 61-dimensional codon model. This transformation reveals that amino acid models belong to a subset of the codon models and enables us to test whether synonymous substitutions can be ignored by using the likelihood ratio. Our theoretical results and analyses of real data indicate that synonymous substitutions are very informative and substantially improve evolutionary inference, even when the sequences are highly divergent. Therefore, we note that amino acid models should be adopted only after carefully investigating and discarding the possibility that synonymous substitutions can reveal important evolutionary information.  相似文献   

16.
Positive and negative selection in the DAZ gene family   总被引:4,自引:0,他引:4  
Because a microdeletion containing the DAZ gene is the most frequently observed deletion in infertile men, the DAZ gene was considered a strong candidate for the azoospermia factor. A recent evolutionary analysis, however, suggested that DAZ was free from functional constraints and consequently played little or no role in human spermatogenesis. The major evidence for this surprising conclusion is that the nonsynonymous substitution rate is similar to the synonymous rate and to the rate in introns. In this study, we reexamined the evolution of the DAZ gene family by using maximum-likelihood methods, which accommodate variable selective pressures among sites or among branches. The results suggest that DAZ is not free from functional constraints. Most amino acids in DAZ are under strong selective constraint, while a few sites are under diversifying selection with nonsynonymous/ synonymous rate ratios (d(N)/d(S)) well above 1. As a result, the average d(N)/d(S) ratio over sites is not a sensible measure of selective pressure on the protein. Lineage-specific analysis indicated that human members of this gene family were evolving by positive Darwinian selection, although the evidence was not strong.  相似文献   

17.
We sequenced the nearly complete mtDNA of 3 species of parasitic wasps, Nasonia vitripennis (2 strains), Nasonia giraulti, and Nasonia longicornis, including all 13 protein-coding genes and the 2 rRNAs, and found unusual patterns of mitochondrial evolution. The Nasonia mtDNA has a unique gene order compared with other insect mtDNAs due to multiple rearrangements. The mtDNAs of these wasps also show nucleotide substitution rates over 30 times faster than nuclear protein-coding genes, indicating among the highest substitution rates found in animal mitochondria (normally <10 times faster). A McDonald and Kreitman test shows that the between-species frequency of fixed replacement sites relative to silent sites is significantly higher compared with within-species polymorphisms in 2 mitochondrial genes of Nasonia, atp6 and atp8, indicating directional selection. Consistent with this interpretation, the Ka/Ks (nonsynonymous/synonymous substitution rates) ratios are higher between species than within species. In contrast, cox1 shows a signature of purifying selection for amino acid sequence conservation, although rates of amino acid substitutions are still higher than for comparable insects. The mitochondrial-encoded polypeptides atp6 and atp8 both occur in F0F1ATP synthase of the electron transport chain. Because malfunction in this fundamental protein severely affects fitness, we suggest that the accelerated accumulation of replacements is due to beneficial mutations necessary to compensate mild-deleterious mutations fixed by random genetic drift or Wolbachia sweeps in the fast evolving mitochondria of Nasonia. We further propose that relatively high rates of amino acid substitution in some mitochondrial genes can be driven by a "Compensation-Draft Feedback"; increased fixation of mildly deleterious mutations results in selection for compensatory mutations, which lead to fixation of additional deleterious mutations in nonrecombining mitochondrial genomes, thus accelerating the process of amino acid substitutions.  相似文献   

18.
We surveyed the molecular evolutionary characteristics of 11 nuclear genes from 10 conifer trees belonging to the Taxodioideae, the Cupressoideae, and the Sequoioideae. Comparisons of substitution rates among the lineages indicated that the synonymous substitution rates of the Cupressoideae lineage were higher than those of the Taxodioideae. This result parallels the pattern previously found in plastid genes. Likelihood-ratio tests showed that the nonsynonymous-synonymous rate ratio did not change significantly among lineages. In addition, after adjustments for lineage effects, the dispersion indices of synonymous and nonsynonymous substitutions were considerably reduced, and the latter was close to 1. These results indicated that the acceleration of evolutionary rates in the Cupressoideae lineage occurred in both the nuclear and plastid genomes, and that generally, this lineage effect affected synonymous and nonsynonymous substitutions similarly. We also investigated the relationship of synonymous substitution rates with the nonsynonymous substitution rate, base composition, and codon bias in each lineage. Synonymous substitution rates were positively correlated with nonsynonymous substitution rates and GC content at third codon positions, but synonymous substitution rates were not correlated with codon bias. Finally, we tested the possibility of positive selection at the protein level, using maximum likelihood models, assuming heterogeneous nonsynonymous-synonymous rate ratios among codon (amino acid) sites. Although we did not detect strong evidence of positively selected codon sites, the analysis suggested that significant variation in nonsynonymous-synonymous rate ratio exists among the sites. The most likely sites for action of positive selection were found in the ferredoxin gene, which is an important component of the apparatus for photosynthesis.  相似文献   

19.
The tissue-specific expression and differential function of the crustacean hyperglycemic hormone (CHH) in Carcinus maenas indicate an interesting evolutionary history. Previous studies have shown that CHH from the sinus gland X-organ (XO-type) has hyperglycemic activity, whereas the CHH from the pericardial organ (PO-type) neither shows hyperglycemic activity nor it inhibits Y-organ ecdysteroid synthesis. Here we examined the types of selective pressures operating on the variants of CHH in Carcinus maenas. Maximum likelihood-based codon substitution analyses revealed that the variants of this neuropeptide in C. maenas have been subjected to positive Darwinian selection indicating adaptive evolution and functional divergence among the CHH variants leading to two unique groups (PO and XO-type). Although the average ratio of nonsynonymous to synonymous substitution (omega) for the entire coding region is 0.5096, few codon sites showed significantly higher omega (10.95). Comparison of models that incorporate positive selection (omega > 1) with models not incorporating positive selection (omega <1) at certain codon sites failed to reject (p=0) evidence of positive Darwinian selection.  相似文献   

20.
Bayes prediction quantifies uncertainty by assigning posterior probabilities. It was used to identify amino acids in a protein under recurrent diversifying selection indicated by higher nonsynonymous (d(N)) than synonymous (d(S)) substitution rates or by omega = d(N)/d(S) > 1. Parameters were estimated by maximum likelihood under a codon substitution model that assumed several classes of sites with different omega ratios. The Bayes theorem was used to calculate the posterior probabilities of each site falling into these site classes. Here, we evaluate the performance of Bayes prediction of amino acids under positive selection by computer simulation. We measured the accuracy by the proportion of predicted sites that were truly under selection and the power by the proportion of true positively selected sites that were predicted by the method. The accuracy was slightly better for longer sequences, whereas the power was largely unaffected by the increase in sequence length. Both accuracy and power were higher for medium or highly diverged sequences than for similar sequences. We found that accuracy and power were unacceptably low when data contained only a few highly similar sequences. However, sampling a large number of lineages improved the performance substantially. Even for very similar sequences, accuracy and power can be high if over 100 taxa are used in the analysis. We make the following recommendations: (1) prediction of positive selection sites is not feasible for a few closely related sequences; (2) using a large number of lineages is the best way to improve the accuracy and power of the prediction; and (3) multiple models of heterogeneous selective pressures among sites should be applied in real data analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号