首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary A method of estimating the number of nucleotide substitutions from amino acid sequence data is developed by using Dayhoff's mutation probability matrix. This method takes into account the effect of nonrandom amino acid substitutions and gives an estimate which is similar to the value obtained by Fitch's counting method, but larger than the estimate obtained under the assumption of random substitutions (Jukes and Cantor's formula). Computer simulations based on Dayhoff's mutation probability matrix have suggested that Jukes and Holmquist's method of estimating the number of nucleotide substitutions gives an overestimate when amino acid substitution is not random and the variance of the estimate is generally very large. It is also shown that when the number of nucleotide substitutions is small, this method tends to give an overestimate even when amino acid substitution is purely at random.  相似文献   

2.
Summary A mathematical formula for the relationship between the average number of nucleotide substitutions per site and the proportion of shared restriction sites between two homologous nucleons is developed by taking into account the unequal rates of substitution among different pairs of nucleotides. Using this formula, the possible amount of bias of the estimate of the number of nucleotide substitutions obtained by the Upholt-Nei-Li formula for restriction site data is investigated. The results obtained indicate that the bias depends upon the nucleotides in the recognition sequence of the restriction enzyme used, the unequal rates of substitution among different nucleotides, and the unequal nucleotide frequencies, but the primary factor is the unequal rates of nucleotide substitution. The amount of bias is generally larger for four-base enzymes than for six-base enzymes. However, when many restriction enzymes are used for the study of DNA divergence, the bias is unlikely to be very large unless the rate of substitution greatly varies from nucleotide to nucleotide.  相似文献   

3.
When the number of nucleotides examined is relatively small, the estimators of nucleotide substitutions between DNA sequences often introduce systematic error even if the data used fit the mathematical model underlying the estimation formula. The systematic error of this kind is especially large for models that allow variation in substitution rate among different sites. In the present paper we present a number of formulas that produce virtually bias-free estimates of evolutionary distances for these models. Correspondence to: M. Nei  相似文献   

4.
Estimation of evolutionary distance between nucleotide sequences   总被引:34,自引:9,他引:25  
A mathematical formula for estimating the average number of nucleotide substitutions per site (delta) between two homologous DNA sequences is developed by taking into account unequal rates of substitution among different nucleotide pairs. Although this formula is obtained for the equal-input model of nucleotide substitution, computer simulations have shown that it gives a reasonably good estimate for a wide range of nucleotide substitution patterns as long as delta is equal to or smaller than 1. Furthermore, the frequency of cases to which the formula is inapplicable is much lower than that for other similar methods recently proposed. This point is illustrated using insulin genes. A statistical method for estimating the number of nucleotide changes due to deletion and insertion is also developed. Application of this method to globin gene data indicates that the number of nucleotide changes per site increases with evolutionary time but the pattern of the increase is quite irregular.   相似文献   

5.
Unbiased estimation of evolutionary distance between nucleotide sequences   总被引:7,自引:2,他引:5  
A new algorithm for estimating the number of nucleotide substitutions per site (i.e., the evolutionary distance) between two nucleotide sequences is presented. This algorithm can be applied to many estimation methods, such as Jukes and Cantor's method, Kimura's transition/transversion method, and Tajima and Nei's method. Unlike ordinary methods, this algorithm is always applicable. Numerical computations and computer simulations indicate that this algorithm gives an almost unbiased estimate of the evolutionary distance, unless the evolutionary distance is very large. This algorithm should be useful especially when we analyze short nucleotide sequences. It can also be applied to amino acid sequences, for estimating the number of amino acid replacements.   相似文献   

6.
A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes. The method is applied to a large number of mammalian genes. The rate of nonsynonymous substitution is extremely variable among genes; it ranges from 0.004 X 10(-9) (histone H4) to 2.80 X 10(-9) (interferon gamma), with a mean of 0.88 X 10(-9) substitutions per nonsynonymous site per year. The rate of synonymous substitution is also variable among genes; the highest rate is three to four times higher than the lowest one, with a mean of 4.7 X 10(-9) substitutions per synonymous site per year. The rate of nucleotide substitution is lowest at nondegenerate sites (the average being 0.94 X 10(-9), intermediate at twofold degenerate sites (2.26 X 10(-9)). and highest at fourfold degenerate sites (4.2 X 10(-9)). The implication of our results for the mechanisms of DNA evolution and that of the relative likelihood of codon interchanges in parsimonious phylogenetic reconstruction are discussed.  相似文献   

7.
A simple mathematical method is developed to estimate the number of nucleotide substitutions per site between two DNA sequences, by extending Kimura's (1980) two-parameter method to the case where a G+C-content bias exists. This method will be useful when there are strong transition-transversion and G+C-content biases, as in the case of Drosophila mitochondrial DNA.  相似文献   

8.
New methods for estimating the numbers of synonymous and nonsynonymous substitutions per site were developed. The methods are unweighted pathway methods based on Kimura's two-parameter model. Computer simulations were conducted to evaluate the accuracies of the new methods, Nei and Gojobori's (NG) method, Miyata and Yasunaga's (MY) method, Li, Wu, and Luo's (LWL) method, and Pamilo, Bianchi, and Li's (PBL) method. The following results were obtained: (1) The NG, MY, and LWL methods give overestimates of the number of synonymous substitutions and underestimates of the number of nonsynonymous substitutions. The major cause for the biased estimation is that these three methods underestimate the number of synonymous sites and overestimate the number of nonsynonymous sites. (2) The PBL method gives better estimates of the numbers of synonymous and nonsynonymous substitutions than those obtained by the NG, MY, and LWL methods. (3) The new methods also give better estimates of the numbers of synonymous and nonsynonymous substitutions than those obtained by the NG, MY, and LWL methods. In addition, estimates of the numbers of synonymous and nonsynonymous sites obtained by the new methods are reasonably accurate. (4) In some cases, the new methods and the PBL method give biased estimates of substitution numbers. However, from the number of nucleotide substitutions at the third position of codons, we can examine whether estimates obtained by the new methods are good or not, whereas we cannot make an examination of estimates obtained by the PBL method. (5) When there are strong transition/transversion and nucleotide-frequency biases like mitochondrial genes, all of the above methods give biased estimates of substitution numbers. In such cases, Kondo et al.'s method is recommended to be used for estimating the number of synonymous substitutions, although their method cannot estimate the number of nonsynonymous substitutions and is time-consuming. These results, particularly result (1), call for reexaminations of some genes. This is because evolutionary pictures of genes have often been discussed on the basis of results obtained by the NG, MY, and LWL methods, which are favorable for the neutral theory of molecular evolution.  相似文献   

9.
Summary Conducting computer simulations, Nei and Tateno (1978) have shown that Jukes and Holmquist's (1972) method of estimating the number of nucleotide substitutions tends to give an overestimate and the estimate obtained has a large variance. Holmquist and Conroy (1980) repeated some parts of our simulation and claim that the overestimation of nucleotide substitutions in our paper occurred mainly because we used selected data. Examination of Holmquist and Conroy's simulation indicates that their results are essentially the same as ours when the Jukes-Holmquist method is used, but since they used a different method of computation their estimates of nucleotide substitutions differed substantially from ours. Another problem in Holmquist and Conroy's Letter is that they confused the expected number of nucleotide substitution with the number in a sample. This confusion has resulted in a number of unnecessary arguments. They also criticized ourX 2 measure, but this criticism is apparently due to a misunderstanding of the assumptions of our method and a failure to use our method in the way we described. We believe that our earlier conclusions remain unchanged.  相似文献   

10.
MOTIVATION: Neighbor-dependent substitution processes generated specific pattern of dinucleotide frequencies in the genomes of most organisms. The CpG-methylation-deamination process is, e.g. a prominent process in vertebrates (CpG effect). Such processes, often with unknown mechanistic origins, need to be incorporated into realistic models of nucleotide substitutions. RESULTS: Based on a general framework of nucleotide substitutions we developed a method that is able to identify the most relevant neighbor-dependent substitution processes, estimate their relative frequencies and judge their importance in order to be included into the modeling. Starting from a model for neighbor independent nucleotide substitution we successively added neighbor-dependent substitution processes in the order of their ability to increase the likelihood of the model describing given data. The analysis of neighbor-dependent nucleotide substitutions based on repetitive elements found in the genomes of human, zebrafish and fruit fly is presented. AVAILABILITY: A web server to perform the presented analysis is freely available at: http://evogen.molgen.mpg.de/server/substitution-analysis  相似文献   

11.
In the nucleotide substitution model for molecular evolution, a major task in the exploration of an evolutionary process is to estimate the substitution number per site of a protein or DNA sequence. The usual estimators are based on the observation of the difference proportion of the two nucleotide sequences. However, a more objective approach is to report a confidence interval with precision rather than only providing point estimators. The conventional confidence intervals used in the literature for the substitution number are constructed by the normal approximation. The performance and construction of confidence intervals for evolutionary models have not been much investigated in the literature. In this article, the performance of these conventional confidence intervals for one-parameter and two-parameter models are explored. Results show that the coverage probabilities of these intervals are unsatisfactory when the true substitution number is small. Since the substitution number may be small in many situations for an evolutionary process, the conventional confidence interval cannot provide accurate information for these cases. Improved confidence intervals for the one-parameter model with desirable coverage probability are proposed in this article. A numerical calculation shows the substantial improvement of the new confidence intervals over the conventional confidence intervals.  相似文献   

12.
Summary The mRNA sequences of beta hemoglobin for human, mouse and rabbit were examined. Observations included the following: (1) there is a significant bias against the use of codons only one nucleotide different from terminating codons; (2) less than 4% of the codons end in adenine; (3), guanine is the most common third position nucleotide but it never follows a second position cytosine; (4) nearest neighbor (doublet) nucleotides are non-random with the greatest contributor to non-randomness being the third position suggesting that codon choice for a given amino acid rather than a choice among amino acids is the more important contributor; (5) the CG dinucleotide is even rarer in positions other than the first and second of the codon than it is in those two, suggesting that the need for arginine has in fact elevated the CG frequency in those positions; (6) 77 per cent of the nucleotides are unsubstituted among these three taxa, which could be a sampling effect, but there is strong evidence that about one-third of them are in fact unsubstitutable because of selective constrainsts; (7) the two longest stretches of unsubstituted nucleotides (32 and 35 consecutive nucleotides) surround the points of the two non-coding insertion sequences; (8) over half the substitutions occur in the third nucleotide position of the codons; (9) silent (non-amino acid changing) substitutions occur at about four times the rate of non-silent substitutions on the basis of their relative opportunity to occur; (10) silent substitutions occur slightly but significantly more often in codons that also have non-silent substitutions than independence of the two events would predict; (11) substitutions occur in adjacent nucleotides significantly more often than chance would predict; (12) among four-fold degenerate codons, third position transitions (principally cytosine-uracil interchanges) outnumber transversions by two to one although the reverse ratio would be expected.The analysis of these messengers provided an opportunity to evaluate the random evolutionary hit (REH) theory. I observed that: (1) the REH theory is premised upon five assumptions, all false; (2) the theory leads to contradictory estimates of the number of varions; (3) the REH values are underestimates; (4) the REH values frequently violate the triangle inequality; (5) the REH values, contrary to claim, are not concordant either with accepted point mutations (PAMs) or augmented distances; (6) the REH values are more likely than values uncorrected for multiple substitutions to give incorrect phylogenies; and (7) the REH values have statistical problems probably associated with a large variance in its fundamental parameter, re. From this I conclude that REH theory is not suitable for its intended purpose of estimating from protein sequences of nucleotide substitutions since the common ancestor of two gene products.  相似文献   

13.
Two simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions are presented. Although they give no weights to different types of codon substitutions, these methods give essentially the same results as those obtained by Miyata and Yasunaga's and by Li et al.'s methods. Computer simulation indicates that estimates of synonymous substitutions obtained by the two methods are quite accurate unless the number of nucleotide substitutions per site is very large. It is shown that all available methods tend to give an underestimate of the number of nonsynonymous substitutions when the number is large.   相似文献   

14.
The relative efficiencies of the maximum parsimony (MP) and distance-matrix methods in obtaining the correct tree (topology) were studied by using computer simulation. The distance-matrix methods examined are the neighbor-joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li methods. In the computer simulation, six or eight DNA sequences were assumed to evolve following a given model tree, and the evolutionary changes of the sequences were followed. Both constant and varying rates of nucleotide substitution were considered. From the sequences thus obtained, phylogenetic trees were constructed using the six tree-making methods and compared with the model (true) tree. This process was repeated 300 times for each different set of parameters. The results obtained indicate that when the number of nucleotide substitutions per site is small and a relatively small number of nucleotides are used, the probability of obtaining the correct topology (P1) is generally lower in the MP method than in the distance-matrix methods. The P1 value for the MP method increases with increasing number of nucleotides but is still generally lower than the value for the NJ or DW method. Essentially the same conclusion was obtained whether or not the rate of nucleotide substitution was constant or whether or not a transition bias in nucleotide substitution existed. The relatively poor performance of the MP method for these cases is due to the fact that information from singular sites is not used in this method. The MP method also showed a relatively low P1 value when the model of varying rate of nucleotide substitution was used and the number of substitutions per site was large. However, the MP method often produced cases in which the correct tree was one of several equally parsimonious trees. When these cases were included in the class of "success," the MP method performed better than the other methods, provided that the number of nucleotide substitutions per site was small.  相似文献   

15.
Summary Using nine sets of viral and cellular oncogenes, the rates of nucleotide substitutions were computed by using Gojobori and Yokoyama's (1985) method. The results obtained confirmed our previous conclusion that the rates of nucleotide substitution for the viral oncogenes are about a million times higher than those for their cellular counterparts. For cellular oncogenes and most viral oncogenes, however, the rate of synonymous substitution is higher than that of nonsynonymous substitution. Moreover, the pattern of nucleotide substitutions for viral oncogenes is more similar to that for functional genes (such as cellular oncongenes) than for pseudogenes. This implies that nucleotide substitutions in viral oncogenes may be functionally constrained. Thus, our observation supports that nucleotide substitutions for the oncogenes in those DNA and RNA genomes are consistent with Kimura's neutral theory of molecular evolution (Kimura 1968, 1983).  相似文献   

16.
Summary Statistical properties of Goodman et al.'s (1974) method of compensating for undetected nucleotide substitutions in evolution are investigated by using computer simulation. It is found that the method tends to overcompensate when the stochastic error of the number of nucleotide substitutions is large. Furthermore, the estimate of the number of nucleotide substitutions obtained by this method has a large variance. However, in order to see whether this method gives overcompensation when applied together with the maximum parsimony method, a much larger scale of simulation seems to be necessary.  相似文献   

17.
Summary In response to criticism of REH theory (Fitch 1980), Holmquist and Jukes (1981) have mostly avoided the criticism or misunderstood it. Since they themselves state in their response that Amino acid sequence data alone cannot be used to estimate total nucleotide substitutions, they agree with the criticism. Most of their paper treats the newer theory (here designated as the REHN theory) which attempts to use the nucleotide sequences encoding proteins to better estimate total nucleotide substitutions (Holmquist and Pearl 1980). Since I made no criticism of REHN theory, their comments are frequently beside the point of my original criticism of REH theory. Nevertheless, it is shown here that REHN theory is also unsatisfactory in that: One, the varions are now more clearly defined but in such a way as to preclude the same codon from suffering a nucleotide substitution in more than one evolutionary interval. Two, the set of codons that accepts silent substitutions is identical to the set that accepts amino acid changing nucleotide substitutions. Three, the uncertainty in the REH estimate is considerable in that alternative excellent fits to the same observatuonal data may give alternative REH values that differ significantly even before stochastic variation and selective bias are considered. Four, the fit of their model to data is an irrelevancy where there are zero degrees of freedom.  相似文献   

18.
There are three different methods of estimating the number of nucleotide substitutions between a pair of species from amino acid sequence data, i.e. the Poisson correction method, random evolutionary hit method, and counting the actual but minimum number of nucleotide substitutions. In this paper the relationships among the estimates obtained by these methods are studied empirically. The results obtained indicate that there is a high correlation among these estimates and in practice any of the three methods may be used for constructing evolutionary trees or relating nucleotide substitutions to evolutionary time. The effects of varying rates of nucleotide substition among different sites on the Poisson correction and random evolutionary hit methods are also studied mathematically. It is shown that these two methods are quite insensitive to the variation of the rate of nucleotide substitution.  相似文献   

19.
We present data on the frequencies of nucleotides and nucleotide substitutions in conservative DNA regions involved in the regulation of gene expression. Data on prokaryotes and eukaryotes are considered separately. In both cases DNA strands complementary to those which serve as templates for RNA-polymerase have low frequencies of cytosine. The most conservative positions also have an increased frequency of adenine. Various substitutions in the series of homologous regulatory DNA sequences, as compared to their consensuses, have different frequencies. In prokaryotes guanine in a consensus sequence is substituted for at the lowest and adenine at the highest frequency, whereas in eukaryotes cytosine is substituted for at the lowest and guanine at the highest frequency. In both cases the nucleotides substituted for are most frequently replaced with cytosine. Deviations from consensus sequences tend to cluster in adjacent positions. The more pronounced the consequences of a nucleotide substitution are the higher is the frequency of substitutions in adjacent positions. Possible explanations for these phenomena are discussed.  相似文献   

20.
It has been known that in noncoding regions of the chloroplast genome, the pattern of nucleotide substitution is influenced by the two nucleotides flanking the substitution site. In a GC-rich environment, a bias toward transition was observed, whereas in an AT-rich environment, a bias toward transversion was observed. In this study, the influence of the two adjacent neighbors on the substitution pattern was observed in the first intron of the mitochondrial nad4 gene, although the AT content of this intron is only 48%. The proportion of transversions increases from 0.32 to 0.75 as the A + T content (number of A's + T's) of the two nearest neighbors increases from 0 to 2. This trend was also observed in another mitochondrial group I intron with an AT content of 64%. In addition, a similar, though weaker, effect was observed in vertebrate pseudogenes. So this effect is present in all three types of genomes. Furthermore, in contrast to the situation in the noncoding regions of chloroplast DNA, where most nucleotide substitutions occurred in the categories with an A + T content of either 1 or 2, nucleotide substitutions in the mitochondrial first nad4 intron occurred more evenly in three categories of different A + T contents. This might be due largely to the difference in the AT content (0.48 vs. 0.72) between the mitochondrial first nad4 intron and the chloroplast DNA regions studied.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号