首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Unbiased estimation of evolutionary distance between nucleotide sequences   总被引:7,自引:2,他引:5  
A new algorithm for estimating the number of nucleotide substitutions per site (i.e., the evolutionary distance) between two nucleotide sequences is presented. This algorithm can be applied to many estimation methods, such as Jukes and Cantor's method, Kimura's transition/transversion method, and Tajima and Nei's method. Unlike ordinary methods, this algorithm is always applicable. Numerical computations and computer simulations indicate that this algorithm gives an almost unbiased estimate of the evolutionary distance, unless the evolutionary distance is very large. This algorithm should be useful especially when we analyze short nucleotide sequences. It can also be applied to amino acid sequences, for estimating the number of amino acid replacements.   相似文献   

2.
Summary A formal mathematical analysis of Kimura's (1981) six-parameter model of nucleotide substitution for the case of unequal substitution rates among different pairs of nucleotides is conducted, and new formulae for estimating the number of nucleotide substitutions and its standard error are obtained. By using computer simulation, the validities and utilities of Jukes and Cantor's (1969) one-parameter formula, Takahata and Kimura's (1981) four-parameter formula, and our sixparameter formula for estimating the number of nucleotide substitutions are examined under three different schemes of nucleotide substitution. It is shown that the one-parameter and four-parameter formulae often give underestimates when the number of nucleotide substitutions is large, whereas the six-parameter formula generally gives a good estimate for all the three substitution schemes examined. However, when the number of nucleotide substitutions is large, the six-parameter and four-parameter formulae are often inapplicable unless the number of nucleotides compared is extremely large. It is also shown that as long as the mean number of nucleotide substitutions is smaller than one per nucleotide site the three formulae give more or less the same estimate regardless of the substitution scheme used.On leave of absence from the Department of Biology, Faculty of Science, Kyushu University 33, Fukuoka 812, Japan  相似文献   

3.
Summary Conducting computer simulations, Nei and Tateno (1978) have shown that Jukes and Holmquist's (1972) method of estimating the number of nucleotide substitutions tends to give an overestimate and the estimate obtained has a large variance. Holmquist and Conroy (1980) repeated some parts of our simulation and claim that the overestimation of nucleotide substitutions in our paper occurred mainly because we used selected data. Examination of Holmquist and Conroy's simulation indicates that their results are essentially the same as ours when the Jukes-Holmquist method is used, but since they used a different method of computation their estimates of nucleotide substitutions differed substantially from ours. Another problem in Holmquist and Conroy's Letter is that they confused the expected number of nucleotide substitution with the number in a sample. This confusion has resulted in a number of unnecessary arguments. They also criticized ourX 2 measure, but this criticism is apparently due to a misunderstanding of the assumptions of our method and a failure to use our method in the way we described. We believe that our earlier conclusions remain unchanged.  相似文献   

4.
There are three different methods of estimating the number of nucleotide substitutions between a pair of species from amino acid sequence data, i.e. the Poisson correction method, random evolutionary hit method, and counting the actual but minimum number of nucleotide substitutions. In this paper the relationships among the estimates obtained by these methods are studied empirically. The results obtained indicate that there is a high correlation among these estimates and in practice any of the three methods may be used for constructing evolutionary trees or relating nucleotide substitutions to evolutionary time. The effects of varying rates of nucleotide substition among different sites on the Poisson correction and random evolutionary hit methods are also studied mathematically. It is shown that these two methods are quite insensitive to the variation of the rate of nucleotide substitution.  相似文献   

5.
Codon Substitution in Evolution and the "Saturation" of Synonymous Changes   总被引:4,自引:1,他引:3  
Takashi Gojobori 《Genetics》1983,105(4):1011-1027
A mathematical model for codon substitution is presented, taking into account unequal mutation rates among different nucleotides and purifying selection. This model is constructed by using a 61 X 61 transition probability matrix for the 61 nonterminating codons. Under this model, a computer simulation is conducted to study the numbers of silent (synonymous) and amino acid-altering (nonsynonymous) nucleotide substitutions when the underlying mutation rates among the four kinds of nucleotides are not equal. It is assumed that the substitution rates are constant over evolutionary time, the codon frequencies being in equilibrium, and, thus, the numbers of synonymous and nonsynonymous substitutions both increase linearly with evolutionary time. It is shown that, when the mutation rates are not equal, the estimate of synonymous substitutions obtained by F. Perler, A. Efstratiadis, P. Lomedico, W. Gilbert, R. Kolodner and J. Dodgson's "Percent Corrected Divergence" method increases nonlinearly, although the true number of synonymous substitutions increases linearly. It is, therefore, possible that the "saturation" of synonymous substitutions observed by Perler et al. is due to the inefficiency of their method to detect all synonymous substitutions.  相似文献   

6.
Estimation of evolutionary distance between nucleotide sequences   总被引:34,自引:9,他引:25  
A mathematical formula for estimating the average number of nucleotide substitutions per site (delta) between two homologous DNA sequences is developed by taking into account unequal rates of substitution among different nucleotide pairs. Although this formula is obtained for the equal-input model of nucleotide substitution, computer simulations have shown that it gives a reasonably good estimate for a wide range of nucleotide substitution patterns as long as delta is equal to or smaller than 1. Furthermore, the frequency of cases to which the formula is inapplicable is much lower than that for other similar methods recently proposed. This point is illustrated using insulin genes. A statistical method for estimating the number of nucleotide changes due to deletion and insertion is also developed. Application of this method to globin gene data indicates that the number of nucleotide changes per site increases with evolutionary time but the pattern of the increase is quite irregular.   相似文献   

7.
On the constancy of the evolutionary rate of cistrons   总被引:32,自引:0,他引:32  
Summary The variations of evolutionary rates in hemoglobins and cytochrome c among various lines of vertebrates are analysed by estimating the variance. The observed variances appear to be larger than expected purely by chance.If the amino acid substitutions in evolution are the result of random fixation of selectively neutral or nearly neutral mutations, the evolutionary rate of cistrons can be represented by the integral of the product of mutation rate and fixation probability in terms of selective values around the neutral point. This integral is called the effective neutral mutation rate.The influence of effective population number and generation time on the effective neutral mutation rate is discussed. It is concluded that the uniformity of the rate of amino acid substitutions over diverse lines is compatible with random fixation of neutral or very slightly deleterious mutations which have some chance of being selected against during the course of substitution. On the other hand, definitely advantageous mutations will introduce significant variation in the substitution rate among lines. Approximately 10% of the amino acid substitutions of average cistrons might be adaptive and create slight but significant variations in evolutionary rate among vertebrate lines, although the uniformity of evolutionary rate is still valid as a first approximation.Contribution No. 813 from the National Institute of Genetics, Mishima, Shizuokaken 411 Japan. Aided in part by a grant-in-aid from the Ministry of Education, Japan.  相似文献   

8.
Miyazawa S 《PloS one》2011,6(12):e28892
BACKGROUND: A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated. RESULTS: The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths.  相似文献   

9.
Mitochondrial DNA (mtDNA) sequences are widely used for inferring the phylogenetic relationships among species. Clearly, the assumed model of nucleotide or amino acid substitution used should be as realistic as possible. Dependence among neighboring nucleotides in a codon complicates modeling of nucleotide substitutions in protein-encoding genes. It seems preferable to model amino acid substitution rather than nucleotide substitution. Therefore, we present a transition probability matrix of the general reversible Markov model of amino acid substitution for mtDNA-encoded proteins. The matrix is estimated by the maximum likelihood (ML) method from the complete sequence data of mtDNA from 20 vertebrate species. This matrix represents the substitution pattern of the mtDNA-encoded proteins and shows some differences from the matrix estimated from the nuclear-encoded proteins. The use of this matrix would be recommended in inferring trees from mtDNA-encoded protein sequences by the ML method. Received: 3 May 1995 / Accepted: 31 October 1995  相似文献   

10.
New methods for estimating the numbers of synonymous and nonsynonymous substitutions per site were developed. The methods are unweighted pathway methods based on Kimura's two-parameter model. Computer simulations were conducted to evaluate the accuracies of the new methods, Nei and Gojobori's (NG) method, Miyata and Yasunaga's (MY) method, Li, Wu, and Luo's (LWL) method, and Pamilo, Bianchi, and Li's (PBL) method. The following results were obtained: (1) The NG, MY, and LWL methods give overestimates of the number of synonymous substitutions and underestimates of the number of nonsynonymous substitutions. The major cause for the biased estimation is that these three methods underestimate the number of synonymous sites and overestimate the number of nonsynonymous sites. (2) The PBL method gives better estimates of the numbers of synonymous and nonsynonymous substitutions than those obtained by the NG, MY, and LWL methods. (3) The new methods also give better estimates of the numbers of synonymous and nonsynonymous substitutions than those obtained by the NG, MY, and LWL methods. In addition, estimates of the numbers of synonymous and nonsynonymous sites obtained by the new methods are reasonably accurate. (4) In some cases, the new methods and the PBL method give biased estimates of substitution numbers. However, from the number of nucleotide substitutions at the third position of codons, we can examine whether estimates obtained by the new methods are good or not, whereas we cannot make an examination of estimates obtained by the PBL method. (5) When there are strong transition/transversion and nucleotide-frequency biases like mitochondrial genes, all of the above methods give biased estimates of substitution numbers. In such cases, Kondo et al.'s method is recommended to be used for estimating the number of synonymous substitutions, although their method cannot estimate the number of nonsynonymous substitutions and is time-consuming. These results, particularly result (1), call for reexaminations of some genes. This is because evolutionary pictures of genes have often been discussed on the basis of results obtained by the NG, MY, and LWL methods, which are favorable for the neutral theory of molecular evolution.  相似文献   

11.
Tests of applicability of several substitution models for DNA sequence data   总被引:8,自引:3,他引:5  
Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.   相似文献   

12.
Summary In response to criticism of REH theory (Fitch 1980), Holmquist and Jukes (1981) have mostly avoided the criticism or misunderstood it. Since they themselves state in their response that Amino acid sequence data alone cannot be used to estimate total nucleotide substitutions, they agree with the criticism. Most of their paper treats the newer theory (here designated as the REHN theory) which attempts to use the nucleotide sequences encoding proteins to better estimate total nucleotide substitutions (Holmquist and Pearl 1980). Since I made no criticism of REHN theory, their comments are frequently beside the point of my original criticism of REH theory. Nevertheless, it is shown here that REHN theory is also unsatisfactory in that: One, the varions are now more clearly defined but in such a way as to preclude the same codon from suffering a nucleotide substitution in more than one evolutionary interval. Two, the set of codons that accepts silent substitutions is identical to the set that accepts amino acid changing nucleotide substitutions. Three, the uncertainty in the REH estimate is considerable in that alternative excellent fits to the same observatuonal data may give alternative REH values that differ significantly even before stochastic variation and selective bias are considered. Four, the fit of their model to data is an irrelevancy where there are zero degrees of freedom.  相似文献   

13.
Summary In the maximum likelihood (ML) method for estimating a molecular phylogenetic tree, the pattern of nucleotide substitutions for computing likelihood values is assumed to be simpler than that of the actual evolutionary process, simply because the process, considered to be quite devious, is unknown. The problem, however, is that there has been no guarantee to endorse the simplification.To study this problem, we first evaluated the robustness of the ML method in the estimation of molecular trees against different nucleotide substitution patterns, including Jukes and Cantor's, the simplest ever proposed. Namely, we conducted computer simulations in which we could set up various evolutionary models of a hypothetical gene, and define a true tree to which an estimated tree by the ML method was to be compared. The results show that topology estimation by the ML method is considerably robust against different ratios of transitions to transversions and different GC contents, but branch length estimation is not so. The ML tree estimation based on Jukes and Cantor's model is also revealed to be resistant to GC content, but rather sensitive to the ratio of transitions to transversions.We then applied the ML method with different substitution patterns to nucleotide sequence data ontax gene from T-cell leukemia viruses whose evolutionary process must have been more complicated than that of the hypothetical gene. The results are in accordance with those from the simulation study, showing that Jukes and Cantor's model is as useful as a more complicated one for making inferences about molecular phylogeny of the viruses.  相似文献   

14.
Summary Statistical properties of Goodman et al.'s (1974) method of compensating for undetected nucleotide substitutions in evolution are investigated by using computer simulation. It is found that the method tends to overcompensate when the stochastic error of the number of nucleotide substitutions is large. Furthermore, the estimate of the number of nucleotide substitutions obtained by this method has a large variance. However, in order to see whether this method gives overcompensation when applied together with the maximum parsimony method, a much larger scale of simulation seems to be necessary.  相似文献   

15.
Summary A model of molecular evolution in which the parameter (intrinsic rate of amino acid substitution) fluctuates from time to time was investigated by simulating the process. It was found that the usual method of estimation such as Poisson fitting underestimates this variation of the parameter when remote comparisons are made. At the same time, four distance measures (minimum base difference, Poisson fitting, random nucleotide substitutions and negative binomial fitting) were tested for their accuracy. When the substitution rate is not uniform among the amino acid sites, the negative binomial fitting gives most satisfactory results, however, one needs to know the parameter beforehand in order to use this method. It was pointed out that the fluctuation of the evolutionary rate is expected if the nearly neutral but very slightly deleterious mutations play an important role on molecular evolution.Contribution No. 1087 from the National Institute of Genetics, Mishima, Shizuoka-ken, 411 Japan.  相似文献   

16.
Directed protein evolution is the most versatile method for studying protein structure-function relationships, and for tailoring a protein's properties to the needs of industrial applications. In this review, we performed a statistical analysis on the genetic code to study the extent and consequence of the organization of the genetic code on amino acid substitution patterns generated in directed evolution experiments. In detail, we analyzed amino acid substitution patterns caused by (a) a single nucleotide (nt) exchange at each position of all 64 codons, and (b) two subsequent nt exchanges (first and second nt, first and third nt, second and third nt). Additionally, transitions and transversions mutations were compared at the level of amino acid substitution patterns. The latter analysis showed that single nucleotide substitution in a codon generates only 39.5% of the natural diversity on the protein level with 5.2-7 amino acid substitutions per codon. Transversions generate more complex amino acid substitution patterns (increased number and chemically more diverse amino acid substitutions) than transitions. Simultaneous nt exchanges at both first and second nt of a codon generates very diverse amino acid substitution patterns, achieving 83.2% of the natural diversity. The statistical analysis described in this review sets the objectives for novel random mutagenesis methods that address the consequences of the organization of the genetic code. Random mutagenesis methods that favor transversions or introduce consecutive nt exchanges can contribute in this regard.  相似文献   

17.
J. H. Gillespie 《Genetics》1993,134(3):971-981
A computer simulation of the process of nucleotide substitutions in a finite haploid population subject to selection in a randomly fluctuating environment provides a number of unexpected results. For rapidly fluctuating environments, substitutions are more regular than random. A small mutationrate approximation is used to explain the regularity. The explanation does not depend heavily on the particulars of the haploid model, leading to the conjecture that many symmetrical models of molecular evolution with rapidly changing parameters may exhibit substitutions that are more regular than random. When fitnesses change very slowly, the simulation shows that substitutions are more clumped than random. Here a small-mutation approximation shows that the clustering is due to the increase in fitness that accompanies each successive substitution with a consequent lowering of the effective mutation rate. The two observations taken together suggest that the common observation that amino acid substitutions are clustered in time is due to the presence of parameters that change very slowly.  相似文献   

18.
We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.  相似文献   

19.
Directed protein evolution is the most versatile method for studying protein structure–function relationships, and for tailoring a protein's properties to the needs of industrial applications. In this review, we performed a statistical analysis on the genetic code to study the extent and consequence of the organization of the genetic code on amino acid substitution patterns generated in directed evolution experiments. In detail, we analyzed amino acid substitution patterns caused by (a) a single nucleotide (nt) exchange at each position of all 64 codons, and (b) two subsequent nt exchanges (first and second nt, first and third nt, second and third nt). Additionally, transitions and transversions mutations were compared at the level of amino acid substitution patterns. The latter analysis showed that single nucleotide substitution in a codon generates only 39.5% of the natural diversity on the protein level with 5.2–7 amino acid substitutions per codon. Transversions generate more complex amino acid substitution patterns (increased number and chemically more diverse amino acid substitutions) than transitions. Simultaneous nt exchanges at both first and second nt of a codon generates very diverse amino acid substitution patterns, achieving 83.2% of the natural diversity. The statistical analysis described in this review sets the objectives for novel random mutagenesis methods that address the consequences of the organization of the genetic code. Random mutagenesis methods that favor transversions or introduce consecutive nt exchanges can contribute in this regard.  相似文献   

20.
A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes. The method is applied to a large number of mammalian genes. The rate of nonsynonymous substitution is extremely variable among genes; it ranges from 0.004 X 10(-9) (histone H4) to 2.80 X 10(-9) (interferon gamma), with a mean of 0.88 X 10(-9) substitutions per nonsynonymous site per year. The rate of synonymous substitution is also variable among genes; the highest rate is three to four times higher than the lowest one, with a mean of 4.7 X 10(-9) substitutions per synonymous site per year. The rate of nucleotide substitution is lowest at nondegenerate sites (the average being 0.94 X 10(-9), intermediate at twofold degenerate sites (2.26 X 10(-9)). and highest at fourfold degenerate sites (4.2 X 10(-9)). The implication of our results for the mechanisms of DNA evolution and that of the relative likelihood of codon interchanges in parsimonious phylogenetic reconstruction are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号