首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 33 毫秒
1.
The evolutionary expansion of CAG repeats in human triplet expansion disease genes is intriguing because of their deleterious phenotype. In the past, this expansion has been suggested to reflect a broad genomewide expansion of repeats, which would imply that mutational and evolutionary processes acting on repeats differ between species. Here, we tested this hypothesis by analyzing repeat- and flanking-sequence evolution in 28 repeat-containing genes that had been sequenced in humans and mice and by considering overall lengths and distributions of CAG repeats in the two species. We found no evidence that these repeats were longer in humans than in mice. We also found no evidence for preferential accumulation of CAG repeats in the human genome relative to mice from an analysis of the lengths of repeats identified in sequence databases. We then investigated whether sequence properties, such as base and amino acid composition and base substitution rates, showed any relationship to repeat evolution. We found that repeat-containing genes were enriched in certain amino acids, presumably as the result of selection, but that this did not reflect underlying biases in base composition. We also found that regions near repeats showed higher nonsynonymous substitution rates than the remainder of the gene and lower nonsynonymous rates in genes that contained a repeat in both the human and the mouse. Higher rates of nonsynonymous mutation in the neighborhood of repeats presumably reflect weaker purifying selection acting in these regions of the proteins, while the very low rate of nonsynonymous mutation in proteins containing a CAG repeat in both species presumably reflects a high level of purifying selection. Based on these observations, we propose that the mutational processes giving rise to polyglutamine repeats in human and murine proteins do not differ. Instead, we propose that the evolution of polyglutamine repeats in proteins results from an interplay between mutational processes and selection.  相似文献   

2.
A strong negative correlation between the rate of amino-acid substitution and codon usage bias in Drosophila has been attributed to interference between positive selection at nonsynonymous sites and weak selection on codon usage. To further explore this possibility we have investigated polymorphism and divergence at three kinds of sites: synonymous, nonsynonymous and intronic in relation to codon bias in D. melanogaster and D. simulans. We confirmed that protein evolution is one of the main explicative parameters for interlocus codon bias variation (r(2) approximately 40%). However, intron or synonymous diversities, which could have been expected to be good indicators of local interference [here defined as the additional increase of drift due to selection on tightly linked sites, also called 'genetic draft' by Gillespie (2000)] did not covary significantly with codon bias or with protein evolution. Concurrently, levels of polymorphism were reduced in regions of low recombination rates whereas codon bias was not. Finally, while nonsynonymous diversities were very well correlated between species, neither synonymous nor intron diversities observed in D. melanogaster were correlated with those observed in D. simulans. All together, our results suggest that the selective constraint on the protein is a stable component of gene evolution while local interference is not. The pattern of variation in genetic draft along the genome therefore seems to be instable through evolutionary times and should therefore be considered as a minor determinant of codon bias variance. We argue that selective constraints for optimal codon usage are likely to be correlated with selective constraints on the protein, both between codons within a gene, as previously suggested, and also between genes within a genome.  相似文献   

3.
Models of amino acid substitution were developed and compared using maximum likelihood. Two kinds of models are considered. "Empirical" models do not explicitly consider factors that shape protein evolution, but attempt to summarize the substitution pattern from large quantities of real data. "Mechanistic" models are formulated at the codon level and separate mutational biases at the nucleotide level from selective constraints at the amino acid level. They account for features of sequence evolution, such as transition-transversion bias and base or codon frequency biases, and make use of physicochemical distances between amino acids to specify nonsynonymous substitution rates. A general approach is presented that transforms a Markov model of codon substitution into a model of amino acid replacement. Protein sequences from the entire mitochondrial genomes of 20 mammalian species were analyzed using different models. The mechanistic models were found to fit the data better than empirical models derived from large databases. Both the mutational distance between amino acids (determined by the genetic code and mutational biases such as the transition-transversion bias) and the physicochemical distance are found to have strong effects on amino acid substitution rates. A significant proportion of amino acid substitutions appeared to have involved more than one codon position, indicating that nucleotide substitutions at neighboring sites may be correlated. Rates of amino acid substitution were found to be highly variable among sites.   相似文献   

4.
In free-living microorganisms, such as Escherichia coli and Saccharomyces cerevisiae, both synonymous and nonsynonymous substitution frequencies correlate with expression levels. Here, we have tested the hypothesis that the correlation between amino acid substitution rates and expression is a by-product of selection for codon bias and translational efficiency in highly expressed genes. To this end, we have examined the correlation between protein evolutionary rates and expression in the human gastric pathogen Helicobacter pylori, where the absence of selection on synonymous sites enables the two types of substitutions to be uncoupled. The results revealed a statistically significant negative correlation between expression levels and nonsynonymous substitutions in both H. pylori and E. coli. We also found that neighboring genes located on the same, but not on opposite strands, evolve at significantly more similar rates than random gene pairs, as expected by co-expression of genes located in the same operon. However, the two species differ in that synonymous substitutions show a strand-specific pattern in E. coli, whereas the weak similarity in synonymous substitutions for neighbors in H. pylori is independent of gene orientation. These results suggest a direct influence of expression levels on nonsynonymous substitution frequencies independent of codon bias and selective constraints on synonymous sites. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Nicolas Galtier]  相似文献   

5.
6.
Miyazawa S 《PloS one》2011,6(3):e17244

Background

Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices.

Results

Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins.

Conclusions/Significance

The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.  相似文献   

7.
Evolution of duplicate genes in a tetraploid animal, Xenopus laevis   总被引:6,自引:1,他引:5  
To understand the evolution of duplicate genes, we compared rates of nucleotide substitution between 17 pairs of nonallelic duplicated genes in the tetraploid frog Xenopus laevis with rates between the orthologous loci of human and rodent. For all duplicated X. laevis genes, the number of synonymous substitutions per site (dS) was greater than the number of nonsynonymous substitutions per site (dN), indicating that these genes are subject to purifying selection. There was also a significant positive correlation (r = 0.915) between dN for the X. laevis genes and dN for the mammalian genes, suggesting that, at the amino acid level, the X. laevis genes and the mammalian genes are under similar constraints. Results of relative-rate tests showed nearly equal rates of nonsynonymous substitution in each copy of the X. laevis genes; apparently there are similar constraints on both copies. No correlation was found between dS for the X. laevis genes and dS for the mammalian genes. There was a significant positive correlation both between members of pairs of duplicated X. laevis genes (r = 0.951) and between human and rodent orthologues (r = 0.854) with respect to third- position G+C content but no such relationship between the X. laevis genes and either of their mammalian orthologues. The results indicate that both copies of a duplicate gene can be subject to purifying selection and thus support the hypothesis of selection against all genotypes containing a null allele at either of two duplicate loci.   相似文献   

8.
A method for detecting positive selection at single amino acid sites   总被引:23,自引:0,他引:23  
A method was developed for detecting the selective force at single amino acid sites given a multiple alignment of protein-coding sequences. The phylogenetic tree was reconstructed using the number of synonymous substitutions. Then, the neutrality was tested for each codon site using the numbers of synonymous and nonsynonymous changes throughout the phylogenetic tree. Computer simulation showed that this method accurately estimated the numbers of synonymous and nonsynonymous substitutions per site, as long as the substitution number on each branch was relatively small. The false-positive rate for detecting the selective force was generally low. On the other hand, the true-positive rate for detecting the selective force depended on the parameter values. Within the range of parameter values used in the simulation, the true-positive rate increased as the strength of the selective force and the total branch length (namely the total number of synonymous substitutions per site) in the phylogenetic tree increased. In particular, with the relative rate of nonsynonymous substitutions to synonymous substitutions being 5.0, most of the positively selected codon sites were correctly detected when the total branch length in the phylogenetic tree was > or = 2.5. When this method was applied to the human leukocyte antigen (HLA) gene, which included antigen recognition sites (ARSs), positive selection was detected mainly on ARSs. This finding confirmed the effectiveness of the present method with actual data. Moreover, two amino acid sites were newly identified as positively selected in non-ARSs. The three-dimensional structure of the HLA molecule indicated that these sites might be involved in antigen recognition. Positively selected amino acid sites were also identified in the envelope protein of human immunodeficiency virus and the influenza virus hemagglutinin protein. This method may be helpful for predicting functions of amino acid sites in proteins, especially in the present situation, in which sequence data are accumulating at an enormous speed.  相似文献   

9.
All established methods for detecting positive selection at the molecular level rely on comparisons between nucleotide sequences. An exceptional method that purports to detect selection on the basis of a single genomic sequence has recently been proposed. This method uses a measure called "codon volatility," defined for each codon as the ratio between the number of nonsynonymous codons that differ from the codon under study at a single nucleotide position and the number of sense codons that differ from the codon under study at a single nucleotide position. Here, we examine various properties of codon volatility and its derivatives and use simulation of evolutionary processes to determine whether they can be used to detect selective pressures. Codons for only four amino acids (glycine, leucine, arginine, and serine) show any variation in codon volatility. Thus, codon volatility is mainly a proxy for amino acid usage, rather than for codon usage, with 65% of all synonymous changes and 27% of all nonsynonymous changes being undetectable by this measure. Genes identified by the volatility method as being subject to positive selection tend to have idiosyncratic amino acid compositions (e.g., they are glycine rich or arginine poor). An additional property of codon volatility is the near zero variance of its mean expectation, which translates into overestimated statistical significance estimates, especially in the absence of corrections for multiple comparisons. A comparison with measures of selection inferred through comparative methodology reveals no relationship between the results of the two methods. Finally, we show that codon volatility can increase in the absence of positive Darwinian selection; that is, increased codon volatility is not indicative of positive selection.  相似文献   

10.
Differential selection of genes of cucumber mosaic virus subgroups   总被引:1,自引:0,他引:1  
Cucumber mosaic virus (CMV) has an extremely broad plant-host range, a large number of vector species, and a wide geographical distribution. CMV is, therefore, a model by which to understand plant virus adaptation. The selective constraints exerted on the five proteins expressed from the CMV genome were evaluated by application of newly developed maximum-likelihood algorithms to analyze sequences available in data banks. The ratio between nonsynonymous and synonymous substitution rates (omega) was used to detect positive selection on particular codon sites. Amino acid sequences were conserved with omega ranging from 0.07 to 0.60 in different proteins. However, a small proportion of amino acids in proteins 1a, 2a, and 3b, the coat protein (CP), were positively selected (omega > 1). Moreover, the evolution of the CP in the three subgroups of CMV strains revealed different selection profiles along the sequence and significantly different speed of evolution at many positions. Constraints exerted by aphid transmission, rather than plant adaptation, seemed to be responsible for these patterns of evolution in the CP.  相似文献   

11.
Short protein repeats, frequently with a length between 20 and 40 residues, represent a significant fraction of known proteins. Many repeats appear to possess high amino acid substitution rates and thus recognition of repeat homologues is highly problematic. Even if the presence of a certain repeat family is known, the exact locations and the number of repetitive units often cannot be determined using current methods. We have devised an iterative algorithm based on optimal and sub-optimal score distributions from profile analysis that estimates the significance of all repeats that are detected in a single sequence. This procedure allows the identification of homologues at alignment scores lower than the highest optimal alignment score for non-homologous sequences. The method has been used to investigate the occurrence of eleven families of repeats in Saccharomyces cerevisiae, Caenorhabditis elegans and Homo sapiens accounting for 1055, 2205 and 2320 repeats, respectively. For these examples, the method is both more sensitive and more selective than conventional homology search procedures. The method allowed the detection in the SwissProt database of more than 2000 previously unrecognised repeats belonging to the 11 families. In addition, the method was used to merge several repeat families that previously were supposed to be distinct, indicating common phylogenetic origins for these families.  相似文献   

12.
Behura SK  Severson DW 《Gene》2012,504(2):226-232
We present a detailed genome-scale comparative analysis of simple sequence repeats within protein coding regions among 25 insect genomes. The repetitive sequences in the coding regions primarily represented single codon repeats and codon pair repeats. The CAG triplet is highly repetitive in the coding regions of insect genomes. It is frequently paired with the synonymous codon CAA to code for polyglutamine repeats. The codon pairs that are least repetitive code for polyalanine repeats. The frequency of hexanucleotide and dinucleotide motifs of codon pair repeats is significantly (p<0.001) different in the Drosophila species compared to the non-Drosophila species. However, the frequency of synonymous and non-synonymous codon pair repeats varies in a correlated manner (r(2)=0.79) among all the species. Results further show that perfect and imperfect repeats have significant association with the trinucleotide and hexanucleotide coding repeats in most of these insects. However, only select species show significant association between the numbers of perfect/imperfect hexamers and repeat coding for single amino acid/amino acid pair runs. Our data further suggests that genes containing simple sequence coding repeats may be under negative selection as they tend to be poorly conserved across species. The sequences of coding repeats of orthologous genes vary according to the known phylogeny among the species. In conclusion, the study shows that simple sequence coding repeats are important features of genome diversity among insects.  相似文献   

13.
The selective forces acting on a protein-coding gene are commonly inferred using evolutionary codon models by contrasting the rate of nonsynonymous substitutions to the rate of synonymous substitutions. These models usually assume that the synonymous substitution rate, Ks, is homogenous across all sites, which is justified if synonymous sites are free from selection. However, a growing body of evidence indicates that the DNA and RNA levels of protein-coding genes are subject to varying degrees of selective constraints due to various biological functions encoded at these levels. In this paper, we develop evolutionary models that account for these layers of selection by allowing for both among-site variability of substitution rates at the DNA/RNA level (which leads to Ks variability among protein-coding sites) and among-site variability of substitution rates at the protein level (Ka variability). These models are constructed so that positive selection is either allowed or not. This enables statistical testing of positive selection when variability at the DNA/RNA substitution rate is accounted for. Using this methodology, we show that variability of the baseline DNA/RNA substitution rate is a widespread phenomenon in coding sequence data of mammalian genomes, most likely reflecting varying degrees of selection at the DNA and RNA levels. Additionally, we use simulations to examine the impact that accounting for the variability of the baseline DNA/RNA substitution rate has on the inference of positive selection. Our results show that ignoring this variability results in a high rate of erroneous positive-selection inference. Our newly developed model, which accounts for this variability, does not suffer from this problem and hence provides a likelihood framework for the inference of positive selection on a background of variability in the baseline DNA/RNA substitution rate.  相似文献   

14.
We investigated the relationships between the nucleotide substitution rates and the predicted secondary structures in the three states representation (alpha-helix, beta-sheet, and coil). The analysis was carried out on 34 alignments, each of which comprised sequences belonging to at least four different mammalian orders. The rates of synonymous substitution were found to be significantly different in regions predicted to be alpha-helix, beta-sheet, or coil. Likewise, the nonsynonymous rates also differ, although expectedly at a lower extent, in the three types of secondary structure, suggesting that different selective constraints associated with the different structures are affecting in a similar way the synonymous and nonsynonymous rates. Moreover, the base composition of the third codon positions is different in coding sequence regions corresponding to different secondary structures of proteins.  相似文献   

15.
We estimated the intensity of selection on preferred codons in Drosophila pseudoobscura and D. miranda at X-linked and autosomal loci, using a published data set on sequence variability at 67 loci, by means of an improved method that takes account of demographic effects. We found evidence for stronger selection at X-linked loci, consistent with their higher levels of codon usage bias. The estimates of the strength of selection and mutational bias in favor of unpreferred codons were similar to those found in other species, after taking into account the fact that D. pseudoobscura showed evidence for a recent expansion in population size. We examined correlates of synonymous and nonsynonymous diversity in these species and found no evidence for effects of recurrent selective sweeps on nonsynonymous mutations, which is probably because this set of genes have much higher than average levels of selective constraints. There was evidence for correlated effects of levels of selective constraints on protein sequences and on codon usage, as expected under models of selection for translational accuracy. Our analysis of a published data set on D. melanogaster provided evidence for the effects of selective sweeps of nonsynonymous mutations on linked synonymous diversity, but only in the subset of loci that experienced the highest rates of nonsynonymous substitutions (about one-quarter of the total) and not at more slowly evolving loci. Our correlational analysis of this data set suggested that both selective constraints on protein sequences and recurrent selective sweeps affect the overall level of codon usage.  相似文献   

16.
There has been a controversy on whether alternatively spliced exons (ASEs) evolve faster than constitutively spliced exons (CSEs). Although it has been noted that ASEs are subject to weaker selective constraints than CSEs, so they evolve faster, there have also been studies that indicated slower evolution in ASEs than in CSEs. In this study, we retrieve more than 5,000 human-mouse orthologous exons and calculate the synonymous (KS) and nonsynonymous (KA) substitution rates in these exons. Our results show that ASEs have higher KA values and higher KA/KS ratios than CSEs, indicating faster amino acid-level evolution in ASEs. The faster evolution may be in part due to weaker selective constraints. It is also possible that the faster rate is in part due to faster functional evolution in ASEs. On the other hand, the majority of ASEs have lower KS values than CSEs. With reference to the substitution rate in introns, we show that the KS values in ASEs are close to the neutral substitution rate, whereas the synonymous substitution rate in CSEs has likely been accelerated. The elevated synonymous rate in CSEs is not related to CpG dinucleotides or low-complexity regions of protein but may be weakly related to codon usage bias. The overall trends of higher KA and lower KS in ASEs than in CSEs are also observed in human-rat and mouse-rat comparisons. Therefore, our observations hold for mammals of different molecular clocks.  相似文献   

17.
The biologically active state of many proteins requires their prior homo-oligomerisation. Such complexes are typically symmetrical, a feature that has been proposed to increase their stability and facilitate the evolution of allosteric regulation. We wished to examine the possibility that similar structures and properties could arise from genetic amplifications leading to internal symmetrical repeats. For this, we identified internal structural repeats in a nonredundant Protein Data Bank subset. While testing if repeats in proteins tend to be symmetrical, we found that about half of the large internal repeats are symmetrical, most frequently around a rotation axis of 180°. These repeats were most likely created by genetic amplification processes because they show significant sequence similarity. Symmetrical repeats tend to have a fixed number of copies corresponding to their rotational symmetry order, that is, two for 180° rotation axis, whereas asymmetrical repeats are in longer proteins and show copy number variability. When possible, we confirmed that proteins with symmetrical repeats folding as an n-mer have homologues lacking the repeat with a higher oligomerisation number corresponding to the rotation symmetry order of the repeat. Phylogenetic analyses of these protein families suggest that typically, but not always, symmetrical repeats arise in one single event from proteins that are homo-oligomers. These results suggest that oligomerisation and amplification of internal sequences can interplay in evolutionary terms because they result in functional analogues when the latter exhibit rotational symmetry.  相似文献   

18.
What are the major forces governing protein evolution? A common view is that proteins with strong structural and functional requirements evolve more slowly than proteins with weak constraints, because a stringent negative selection pressure limits the number of substitutions. In contrast, Graur claimed that the substitution rate of a protein is mainly determined by its amino acid composition and the changeabilities of amino acids. In this paper, however, we found that the relative changeabilities of amino acids in mammalian proteins are different for transmembranal and nontransmembranal segments, which have very distinct structural requirements. This indicates that the changeability of a given residue is influenced by the structural and functional context. We also reexamined the relationship between substitution rate and amino acid composition. Indeed, the two kinds of segments exhibit contrasting amino acid compositions: transmembranal regions are made up mainly of hydrophobic residues (a total frequency of approximately 60%) and are very poor in polar amino acids (<5%), whereas nontransmembranal segments have frequencies of 30% and 22%, respectively. Interestingly, we found that within a given integral membrane protein, nontransmembranal segments accumulate, on average, twice as many substitutions as transmembranal regions. However, regression analyses showed that the variability in amino acid frequencies among proteins cannot explain more than 30% of the variability in substitution rate for the transmembranal and nontransmembranal data sets. Furthermore, transmembranal and nontransmembranal segments evolving at the same rate in different proteins have different compositions, and the compositions of slowly evolving and rapidly evolving segments of the same type are similar. From these observations, we conclude that the rate of protein evolution is only weakly affected by amino acid composition but is mostly determined by the strength of functional requirements or selective constraints.  相似文献   

19.
In this work, we have investigated the relationships between synonymous and nonsynonymous rates and base composition in coding sequences from Gramineae to analyze the factors underlying the variation in substitutional rates. We have shown that in these genes the rates of nucleotide divergence, both synonymous and nonsynonymous, are, to some extent, dependent on each other and on the base composition. In the first place, the variation in nonsynonymous rate is related to the GC level at the second codon position (the higher the GC2 level, the higher the amino acid replacement rate). The correlation is especially strong with T2, the coefficients being significant in the three data sets analyzed. This correlation between nonsynonymous rate and base composition at the second codon position is also detectable at the intragenic level, which implies that the factors that tend to increase the intergenic variance in nonsynonymous rates also affect the intragenic variance. On the other hand, we have shown that the synonymous rate is strongly correlated with the GC3 level. This correlation is observed both across genes and at the intragenic level. Similarly, the nonsynonymous rate is also affected at the intragenic level by GC3 level, like the silent rate. In fact, synonymous and nonsynonymous rates exhibit a parallel behavior in relation to GC3 level, indicating that the intragenic patterns of both silent and amino acid divergence rates are influenced in a similar way by the intragenic variation of GC3. This result, taken together with the fact that the number of genes displaying intragenic correlation coefficients between synonymous and nonsynonymous rates is not very high, but higher than random expectation (in the three data sets analyzed), strongly suggests that the processes of silent and amino acid replacement divergence are, at least in part, driven by common evolutionary forces in genes from Gramineae. Received: 2 July 1998 / Accepted: 18 April 1999  相似文献   

20.
A survey of the patterns of synonymous codon preference in the HIV env gene reveals a correlation between the codon bias and the mutability requirements of different regions of the protein. At hypervariable regions in gp120 one finds a greater proportion of codons that tend to mutate nonsynonymously, but to a target that is similar in hydrophobicity and volume. We argue that this strategy results from a compromise between the selective pressure placed on the virus by the induced immune response, which favors amino acid substitutions in the complementarity determining regions, and the negative selection against missense mutations that violate structural constraints of the env protein. Received: 9 June 1997 / Accepted: 25 May 1998  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号