首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Amino acid composition and the evolutionary rates of protein-coding genes   总被引:14,自引:0,他引:14  
Summary Based on the rates of amino acid substitution for 60 mammalian genes of 50 codons or more, it is shown that the rate of amino acid substitution of a protein is correlated with its amino acid composition. In particular, the content of glycine residues is negatively correlated with the rate of amino acid substitution, and this content alone explains about 38% of the total variation in amino acid substitution rates among different protein families. The propensity of a polypeptide to evolve fast or slowly may be predicted from an index or indices of protein mutability directly derivable from the amino acid composition. The propensity of an amino acid to remain conserved during evolutionary times depends not so much on its being features prominently in active sites, but on its stability index, defined as the mean chemical distance [R. Grantham (1974) Science 185862–864] between the amino acid and its mutational derivatives produced by single-nucleotide substitutions. Functional constraints related to active and binding sites of proteins play only a minor role in determining the overall rate of amino acid substitution. The importance of amino acid composition in determining rates of substitution is illustrated with examples involving cytochrome c, cytochrome b5,ras-related genes, the calmodulin protein family, and fibrinopeptides.  相似文献   

2.
Examining rates and patterns of nucleotide substitution in plants   总被引:19,自引:0,他引:19  
Driven by rapid improvements in affordable computing power and by the even faster accumulation of genomic data, the statistical analysis of molecular sequence data has become an active area of interdisciplinary research. Maximum likelihood methods have become mainstream because of their desirable properties and, more importantly, their potential for providing statistically sound solutions in complex data analysis settings. In this chapter, a review of recent literature focusing on rates and patterns of nucleotide substitution rates in the nuclear, chloroplast, and mitochondrial genomes of plants demonstrates the power and flexibility of these new methods. The emerging picture of the nucleotide substitution process in plants is a complex one. Evolutionary rates are seen to be quite variable, both among genes and among plant lineages. However, there are hints, particularly in the chloroplast, that individual factors can have important effects on many genes simultaneously.  相似文献   

3.
Summary A method of estimating the number of nucleotide substitutions from amino acid sequence data is developed by using Dayhoff's mutation probability matrix. This method takes into account the effect of nonrandom amino acid substitutions and gives an estimate which is similar to the value obtained by Fitch's counting method, but larger than the estimate obtained under the assumption of random substitutions (Jukes and Cantor's formula). Computer simulations based on Dayhoff's mutation probability matrix have suggested that Jukes and Holmquist's method of estimating the number of nucleotide substitutions gives an overestimate when amino acid substitution is not random and the variance of the estimate is generally very large. It is also shown that when the number of nucleotide substitutions is small, this method tends to give an overestimate even when amino acid substitution is purely at random.  相似文献   

4.
Patterns of nucleotide substitution in pseudogenes and functional genes   总被引:26,自引:0,他引:26  
Summary The pattern of point mutations is inferred from nucleotide substitutions in pseudogenes. The pattern obtained suggests that transition mutations occur somewhat more frequently than transversion mutations and that mutations result more often in A or T than in G or C. Our results are discussed with respect to the predictions from Topal and Fresco's model for the molecular basis of point (substitution) mutations (Nature 263:285–289, 1976). The pattern of nucleotide substitution at the first and second positions of codons in functional genes is quite similar to that in pseudogenes, but the relative frequency of the transition CT in the sense strand is drastically reduced and those of the transversions CG and GC are doubled. The differences between the two patterns can be explained by the observation that in the protein evolution amino acid substitutions occur mainly between amino acids with similar biochemical properties (Grantham, Science 185:862–864, 1974). Our results for the patterns of nucleotide substitutions in pseudogenes and in functional genes lead to the prediction that both the coding and non-coding regions of protein coding genes should have high frequencies of A and T. Available data show that the non-coding regions are indeed high in A and T but the coding regions are low in T, though high in A.  相似文献   

5.
Using sequence data from the last introns of ZFX and ZFY genes, we previously estimated the male-to-female ratio () of mutation rate to be close to 6 in higher primates and 1.8 in rodents. As the mutation rate may vary among different regions of the mammalian genome, it is interesting to see whether sequence data from other regions will give similar estimates. In this study, we have determined the partial genomic sequences of the ubiquitin-activating enzyme El genes (Ube 1x and Ube 1y for the X-linked and Y-linked homologues, respectively) of mice and rats and two mouse Ube 1y pseudogenes. From the intron sequences of the Ube 1 genes, we calculated the divergence of the Y-linked genes (Y = 0.161) and that of the X-linked genes (X = 0.107) between mouse and rat, and found the Y/X ratio to be 1.50. This ratio led to an estimate of = 2.0 with a 95% confidence interval of (1.0, 3.9). Similar estimates of were obtained if mouse Ube 1y pseudogenes were used instead of the mouse Ube 1y functional gene. These estimates are consistent with our previous estimate for rodents and suggest that the sex ratio of mutation rate in rodents is approximately only one-third of that in higher primates. Our estimate of the divergence time between Ube 1x and Ube 1y supports the view that the two genes separated before the eutherian radiation.Correspondence to: W.-H. Li  相似文献   

6.
Molecular evolution, including nucleotide substitutions, plays an important role in understanding the dynamics and mechanisms of species evolution. Here, we sequenced whole plastid genomes (plastomes) of Quercus fabri, Quercus semecarpifolia, Quercus engleriana, and Quercus phellos and compared them with 14 other Quercus plastomes to explore their evolutionary relationships using 67 shared protein‐coding sequences. While many previously identified evolutionary relationships were found, our findings do not support previous research which retrieve Quercus subg. Cerris sect. Ilex as a monophyletic group, with sect. Ilex found to be polyphyletic and composed of three strongly supported lineages inserted between sections Cerris and Cyclobalanposis. Compared with gymnosperms, Quercus plastomes showed higher evolutionary rates (Dn/Ds = 0.3793). Most protein‐coding genes experienced relaxed purifying selection, and the high Dn value (0.1927) indicated that gene functions adjusted to environmental changes effectively. Our findings suggest that gene interval regions play an important role in Quercus evolution. We detected greater variation in the intergenic regions (trnH‐psbA, trnK_UUU‐rps16, trnfM_CAU‐rps14, trnS_GCU‐trnG_GCC, and atpF‐atpH), intron losses (petB and petD), and pseudogene loss and degradation (ycf15). Additionally, the loss of some genes suggested the existence of gene exchanges between plastid and nuclear genomes, which affects the evolutionary rate of the former. However, the connective mechanism between these two genomes is still unclear.  相似文献   

7.
8.
Summary The nucleotide substitution rate in structural portions of the embryonic β-globin genes of placental mammals is lower than that for the adult β-globin genes. This difference occurs entirely within the class of substitutions that result in nonsynonymous (replacement) differences between these genes, and therefore represents a constraint on the structure of the mammalian embryonic β-globin proteins relative to the adult proteins (Shapiro et al. 1983; Hardison 1984). A similar effect has also been observed in marsupial mammals (Koop and Goodman 1988). In an effort to determine whether the observed rates are evidence of a uniform degree of selective constraint on the embryonic β-globin genes, analyses were performed that compared replacement substitution rates. The analyses reveal that embryonic β-globin genes appear to have been fixing replacement substitutions at nearly the same average rate not only in placental and marsupial mammals but in avian and amphibian species as well. In contrast, the adult β-globin genes from these organisms appear to have a more variable rate of replacement substitution with an especially low rate for birds. In the chicken (Gallus gallus), the adult β-globin gene replacement substitution rate appears to be lower than the embryonic replacement substitution rate.  相似文献   

9.
Summary We subjected 35 rbcL nucleotide sequences from monocotyledonous taxa to maximum likelihood relative rate tests and estimated relative differences in rates of nucleotide substitution between groups of sequences without relying on knowledge of divergence times between taxa. Rate tests revealed that there is a hierarchy of substitution rate at the rbcL locus within the monocots. Among the taxa analyzed the grasses have the most rapid substitution rate; they are followed in rate by the Orchidales, the Liliales, the Bromeliales, and the Arecales. The overall substitution rate for the rbcL locus of grasses is over 5 times the substitution rate in the rbcL of the palms. The substitution rate at the third codon positions in the rbcL of the grasses is over 8 times the third position rate in the palms. The pattern of rate variation is consistent with the generation-time-effect hypothesis. Heterogenous rates of substitution have important implications for phylogenetic reconstruction.Offprint requests to: M.T. Clegg  相似文献   

10.
Summary A method for estimating the evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences is presented. This method is applied to genes of øX174 and G4 genomes, histone genes and-globin genes, for which homologous nucleotide sequences are available for comparison to be made. It is shown that the rates of synonymous substitutions are quite uniform among the non-overlapping genes of øX174 and G4 and among histone genes H4, H2B, H3 and H2A. A comparison between øX174 and G4 reveals that, in the overlapping segments of the A-gene, the rate of synonymous substitution is reduced more significantly than the rate of amino acid substitution relative to the corresponding rate in the nonoverlapping segment. It is also suggested that, in the coding regions surrounding the splicing points of intervening sequences of-globin genes, there exist rigid secondary structures. It is in only these regions that the-globin genes show the slowing down of evolutionary rates of both synonymous and amino acid substitutions in the primate line.  相似文献   

11.
Three frequently used methods for estimating the synonymous and nonsynonymous substitution rates (Ks and Ka) were evaluated and compared for their accuracies; these methods are denoted by LWL85, LPB93, and GY94, respectively. For this purpose, we used a codon-evolution model to obtain the expected Ka and Ks values for the above three methods and compared the values with those obtained by the three methods. We also proposed some modifications of LWL85 and LPB93 to increase their accuracies. Our computer simulations under the codon-evolution model showed that for sequences < or =300 codons, the performance of GY94 may not be reliable. For longer sequences, GY94 is more accurate for estimating the Ka/Ks ratio than the modified LPB93 and LWL85 in the majority of the cases studied. This is particularly so when k > or = 3, which is the transition/transversion (mutation) rate ratio. However, when k is approximately 2 and when the sequence divergence is relatively large, the modified LWL85 performed better than GY94 and the modified LPB93. The inferiority of LPB93 to LWL85 is surprising because LPB93 was intended to improve LWL85. Also, it has been thought that the codon-based method of GY94 is better than the heuristic method of LWL85, but our simulation results showed that in many cases, the opposite was true, even though our simulation was based on the codon-evolution model.  相似文献   

12.
It has been demonstrated that recombination in the human p-arm pseudoautosomal region (p-PAR) is at least twenty times more frequent than the genomic average of approximately 1 cM/Mb, which may affect substitution patterns and rates in this region. Here I report the analysis of substitution patterns and rates in 10 human, chimpanzee, gorilla, and orangutan genes across the p-PAR. Between species silent divergence in the p-PAR forms a gradient, increasing toward the telomere. The correlation of silent divergence with distance from the p-PAR boundary is highly significant (rho = 0.911, P < 0.001). After exclusion of the CpG dinucleotides this correlation is still significant (rho = 0.89, P < 0.01), thus the substitution rate gradient cannot be explained solely by the differences in the extent of methylation across the p-PAR. Frequent recombination in the PAR may result in a relatively strong effect of biased gene conversion (BGC), which, because of the increased probability of fixation of the G or C nucleotides at (A or T)/(G or C) segregating sites, may affect substitution rates. BGC, however, does not seem to be the factor creating the substitution rate gradient in the p-PAR, because the only gradient is still detactable if only A<-->T and G<-->C substitutions are taken into account (rho = 0.82, P < 0.01). I hypothesize that the substitution rate gradient in the p-PAR is due to the mutagenic effect of recombination, which is very frequent in the distal human p-PAR and might be lower near the p-PAR boundary.  相似文献   

13.
A comparative study of the last exon of the zinc finger genes Zfx, Zfy, and Zfa from species of mice in the genus Mus was conducted to assess the extent of gene-specific and chromosome-specific effects on the evolutionary patterns among related X-, Y-, and autosomal-linked genes. Phylogenetic analyses of 29 sequences from Zfx, Zfa, and Zfy from 10 taxa were performed to infer relatedness among the zinc finger loci, and codon-based maximum likelihood analyses were conducted to assess evolutionary pattern among genes. Five models of nucleotide sequence evolution were applied and compared using a likelihood ratio test. Estimates of nonsynonymous to synonymous changes (dN/dS) for these genes suggest that amino acid substitutions are occurring at a more rapid rate across the autosomal- and Y-specific lineages compared to the X-specific lineage, with the Y-specific lineage showing the highest rate under certain models. The data suggest the action of gene-specific effects on evolutionary pattern. In particular, Zfa and Zfy genes, both with presumed restricted expression, appear less functionally constrained relative to ubiquitously expressed Zfx. Slightly elevated dN/dS for Zfy genes in comparison to Zfa also suggest Y-specific effects.  相似文献   

14.
Genes can be classified as essential or nonessential based on their indispensability for a living organism. Previous researches have suggested that essential genes evolve more slowly than nonessential genes and the impact of gene dispensability on a gene’s evolutionary rate is not as strong as expected. However, findings have not been consistent and evidence is controversial regarding the relationship between the gene indispensability and the rate of gene evolution. Understanding how different classes of genes evolve is essential for a full understanding of evolutionary biology, and may have medical relevance in the design of new antibacterial agents. We therefore performed an investigation into the properties of essential and nonessential genes. Analysis of evolutionary conservation, protein length distribution and amino acid usage between essential and nonessential genes in Escherichia coli K12 demonstrated that essential genes are relatively preserved throughout the bacterial kingdom when compared to nonessential genes. Furthermore, results show that essential genes, compared to nonessential genes, have a significantly higher proportion of large (>534 amino acids) and small proteins (<139 amino acids) relative to medium-sized proteins. The pattern of amino acids usage shows a similar trend for essential and nonessential genes, although some notable exceptions are observed. These findings help to clarify our understanding of the evolutionary mechanisms of essential and nonessential genes, relevant to the study of mutagenesis and possibly allowing prediction of gene properties in other poorly understood organisms.  相似文献   

15.
We have estimated phylogenetic patterns and rates of nucleotide substitution in the hominoid primates using two different probabilistic models of molecular evolution as applied to three different data sets of nucleic acid sequences. The orang-utan was found to be the out-group of the other hominoids examined. Within the African apes and human clade the sister-group relationship of chimpanzee and human was found to be statistically the best, although the magnitude of the error estimates (a reflection of random statistical fluctuations) makes this conclusion tentative. The ψν-globin data sets were found to be statistically the most consistent and gave estimates of the times of divergence of chimpanzee and human from gorilla and of chimpanzee from human as 7·7 ± 1·5 Ma (Millions of years ago) and 7·4 ± 1·5 Ma respectively, although the speculative nature of these estimates is emphasized. In all cases the calibration point was the assumed divergence of the orang-utan from the remaining hominoids at 14·5 Ma. There was no statistically significant evidence of a slowdown in nucleotide substitution rate for the human lineage, or among the hominoids as a whole with respect to the Old and New World monkeys. We advocate the continued use and development of stochastic models of molecular evolution as a basis for phylogenetic estimation. On this basis one can choose between competing hypotheses of relationship in a statistical manner and can provide estimates of the errors involved in such estimations. The assumptions of all stochastic models are open to test and future refinement.  相似文献   

16.
Rates of molecular evolution are known to vary considerably among lineages, partially due to differences in life-history traits such as generation time. The generation-time effect has been well documented in some eukaryotes, but its prevalence in prokaryotes is unknown. "Because many species of Firmicute bacteria spend long periods of time as metabolically dormant spores, which could result in fewer DNA substitutions per unit time, they present an excellent system for testing predictions of the molecular clock hypothesis." To test whether spore-forming bacteria evolve more slowly than their non-spore-forming relatives, I used phylogenetic methods to determine if there were differences in rates of amino acid substitution between spore-forming and non-spore-forming lineages of Firmicute bacteria. Although rates of evolution do vary among lineages, I find no evidence for an effect of spore-formation on evolutionary rate and, furthermore, evolutionary rates are similar to those calculated for enteric bacteria. These results support the notion that variation in generation time does not affect evolutionary rates in bacterial lineages.  相似文献   

17.
Summary The G+C content of DNA varies widely in different organisms, especially microorganisms. This variation is accompanied by changes in the nucleotide composition of silent positions in codons. (Silent positions are defined and explained in the text.) These changes are mostly neutral or near neutral, and appear to result from mutation pressure in the direction of increasing either A+T (AT pressure) or G+C(GC pressure) content. Variations in G+C content are also accompanied by substitutions at replacement positions in codons. These substituions produce changes in the amino acid content of homologous proteins. The examples studied were genes for 13 mitochondrial proteins in five species, and A and B genes for bacterial tryptophan synthase in four species.In microorganisms, varying AT and GC mutational pressures, presumably resulting from shifts in the DNA polymerase system, exert strong effects on molecular evolution by changing the G+C content of DNA. These effects may be greater than those of random drift. The effects of GC pressure on silent substitutions in the systems examined are several times as great as the effects on replacement substitutions.GC pressure is exerted on noncoding as well as coding regions in mitochondrial DNA. This is shown by the close correlation (correlation coefficient, 0.99) of the G+C content of the noncoding D loop of mitochondria with the G+C content of silent positions in the corresponding mitochondrial genes.  相似文献   

18.
Using mammalian gene sequences, the variances in the numbers of synonymous and nonsynonymous substitutions among genes were estimated together with the correlation coefficient between the two. The expected correlation coefficient can be obtained under the neutral theory using these estimated values of the variances. The expected coefficient is found to often be one-half to two-thirds of the observed value. Possible causes for the disagreement were discussed, such as correlated selective constraints on the two types of substitutions and excess doublet mutations. The variance of mutation rate and that of selective constraint were also estimated. The results show that the coefficient of variation of the former is 0.2–0.3, whereas that of the latter is 0.7–0.9. Correspondence to: T. Ohta  相似文献   

19.

Background

Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease. Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data.

Results

Here we report a novel algorithm that features two innovative changes: 1. making better use of sequence conservation information by grouping the homologous protein sequences into six blocks according to evolutionary distances to human and evaluating sequence conservation in each block independently, and 2. including as many such homologous sequences as possible in analyses. Random forests are used to evaluate sequence conservation in each block and to predict potential impact of an AAS on protein function. Testing of this algorithm on a comprehensive dataset showed significant improvement on prediction accuracy upon currently widely-used programs. The algorithm and a web-based application tool implementing it, EFIN (Evaluation of Functional Impact of Nonsynonymous SNPs) were made freely available (http://paed.hku.hk/efin/) to the public.

Conclusions

Grouping homologous sequences into different blocks according to the evolutionary distance of the species to human and evaluating sequence conservation in each group independently significantly improved prediction accuracy. This approach may help us better understand the roles of genetic variants in human disease and health.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-455) contains supplementary material, which is available to authorized users.  相似文献   

20.
Summary The observed gene overlays in the viruses X174 and SV40 show a surprising economy of information storage; two different amino acid sequences are read in different frames from the same stretch of DNA. This phenomenon appears contradictory in that the information in the two overlaid amino acid sequences is strongly interdependent, yet each of the two proteins has evolved to its own well-defined function. The contradiction can be resolved by assuming sufficiently large degeneracy of the information contents of amino acid sequences with respect to function. Such a degeneracy is familiar from homologous proteins where a given biological function is implemented by many different amino acid sequences. It is shown that the very existence of viral overlays allows to derive a lower limit for the magnitude of this degeneracy: The degeneracy is equal to, or greater than fourfold; on the average, at each position of the chain a choice of 1 out of 5 or less amino acids, and not a choice of 1 out of 20 is necessary for constructing a protein with a specified function. In addition, the strong dependence of overlay probabilities on chain length allows the definition of a maximal length of overlays; in bacterial viruses overlay regions should be shorter than about 150 residues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号