首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes. The method is applied to a large number of mammalian genes. The rate of nonsynonymous substitution is extremely variable among genes; it ranges from 0.004 X 10(-9) (histone H4) to 2.80 X 10(-9) (interferon gamma), with a mean of 0.88 X 10(-9) substitutions per nonsynonymous site per year. The rate of synonymous substitution is also variable among genes; the highest rate is three to four times higher than the lowest one, with a mean of 4.7 X 10(-9) substitutions per synonymous site per year. The rate of nucleotide substitution is lowest at nondegenerate sites (the average being 0.94 X 10(-9), intermediate at twofold degenerate sites (2.26 X 10(-9)). and highest at fourfold degenerate sites (4.2 X 10(-9)). The implication of our results for the mechanisms of DNA evolution and that of the relative likelihood of codon interchanges in parsimonious phylogenetic reconstruction are discussed.  相似文献   

2.
Summary A formal mathematical analysis of Kimura's (1981) six-parameter model of nucleotide substitution for the case of unequal substitution rates among different pairs of nucleotides is conducted, and new formulae for estimating the number of nucleotide substitutions and its standard error are obtained. By using computer simulation, the validities and utilities of Jukes and Cantor's (1969) one-parameter formula, Takahata and Kimura's (1981) four-parameter formula, and our sixparameter formula for estimating the number of nucleotide substitutions are examined under three different schemes of nucleotide substitution. It is shown that the one-parameter and four-parameter formulae often give underestimates when the number of nucleotide substitutions is large, whereas the six-parameter formula generally gives a good estimate for all the three substitution schemes examined. However, when the number of nucleotide substitutions is large, the six-parameter and four-parameter formulae are often inapplicable unless the number of nucleotides compared is extremely large. It is also shown that as long as the mean number of nucleotide substitutions is smaller than one per nucleotide site the three formulae give more or less the same estimate regardless of the substitution scheme used.On leave of absence from the Department of Biology, Faculty of Science, Kyushu University 33, Fukuoka 812, Japan  相似文献   

3.
Summary A mathematical formula for the relationship between the average number of nucleotide substitutions per site and the proportion of shared restriction sites between two homologous nucleons is developed by taking into account the unequal rates of substitution among different pairs of nucleotides. Using this formula, the possible amount of bias of the estimate of the number of nucleotide substitutions obtained by the Upholt-Nei-Li formula for restriction site data is investigated. The results obtained indicate that the bias depends upon the nucleotides in the recognition sequence of the restriction enzyme used, the unequal rates of substitution among different nucleotides, and the unequal nucleotide frequencies, but the primary factor is the unequal rates of nucleotide substitution. The amount of bias is generally larger for four-base enzymes than for six-base enzymes. However, when many restriction enzymes are used for the study of DNA divergence, the bias is unlikely to be very large unless the rate of substitution greatly varies from nucleotide to nucleotide.  相似文献   

4.
A method for estimating nucleotide diversity from AFLP data   总被引:8,自引:0,他引:8  
Innan H  Terauchi R  Kahl G  Tajima F 《Genetics》1999,151(3):1157-1164
A method for estimating the nucleotide diversity from AFLP data is developed by using the relationship between the number of nucleotide changes and the proportion of shared bands. The estimation equation is based on the assumption that GC-content is 0.5. Computer simulations, however, show that this method gives a reasonably accurate estimate even when GC-content deviates from 0.5, as long as the number of nucleotide changes per site (nucleotide diversity) is small. As an example, the nucleotide diversity of the wild yam, Dioscorea tokoro, was estimated. The estimated nucleotide diversity is 0.0055, which is larger than estimations from nucleotide sequence data for Adh and Pgi.  相似文献   

5.
Examining the pattern of nucleotide substitution for the control region of mitochondrial DNA (mtDNA) in humans and chimpanzees, we developed a new mathematical method for estimating the number of transitional and transversional substitutions per site, as well as the total number of nucleotide substitutions. In this method, excess transitions, unequal nucleotide frequencies, and variation of substitution rate among different sites are all taken into account. Application of this method to human and chimpanzee data suggested that the transition/transversion ratio for the entire control region was approximately 15 and nearly the same for the two species. The 95% confidence interval of the age of the common ancestral mtDNA was estimated to be 80,000-480,000 years in humans and 0.57-2.72 Myr in common chimpanzees.   相似文献   

6.
Summary A method of estimating the number of nucleotide substitutions from amino acid sequence data is developed by using Dayhoff's mutation probability matrix. This method takes into account the effect of nonrandom amino acid substitutions and gives an estimate which is similar to the value obtained by Fitch's counting method, but larger than the estimate obtained under the assumption of random substitutions (Jukes and Cantor's formula). Computer simulations based on Dayhoff's mutation probability matrix have suggested that Jukes and Holmquist's method of estimating the number of nucleotide substitutions gives an overestimate when amino acid substitution is not random and the variance of the estimate is generally very large. It is also shown that when the number of nucleotide substitutions is small, this method tends to give an overestimate even when amino acid substitution is purely at random.  相似文献   

7.
A mathematical theory for the evolutionary change of restriction endonuclease cleavage sites is developed, and the probabilities of various types of restriction-site changes are evaluated. A computer simulation is also conducted to study properties of the evolutionary change of restriction sites. These studies indicate that parsimony methods of constructing phylogenetic trees often make erroneous inferences about evolutionary changes of restriction sites unless the number of nucleotide substitutions per site is less than 0.01 for all branches of the tree. This introduces a systematic error in estimating the number of mutational changes for each branch and, consequently, in constructing phylogenetic trees. Therefore, parsimony methods should be used only in cases where nucleotide sequences are closely related. Reexamination of Ferris et al.'s data on restriction-site differences of mitochondrial DNAs does not support Templeton's conclusions regarding the phylogenetic tree for man and apes and the molecular clock hypothesis. Templeton's claim that Nei and Li's method of estimating the number of nucleotide substitutions per site is seriously affected by parallel losses and loss-gains of restriction sites is also unsupported.   相似文献   

8.
Genetic distance and electrophoretic identity of proteins between taxa   总被引:11,自引:0,他引:11  
Summary The relationship between amino acid substitution and charge change of proteins in the evolutionary process is studied by using a stochastic model. A mathematical formula is developed for the electrophoretic identity of proteins between two different taxa for a given number of average codon differences per protein locus. Using this formula, a reference figure is constructed for estimating the average number of codon differences per locus between taxa.  相似文献   

9.
DNA Polymorphism Detectable by Restriction Endonucleases   总被引:67,自引:15,他引:67       下载免费PDF全文
Data on DNA polymorphisms detected by restriction endonucleases are rapidly accumulating. With the aim of analyzing these data, several different measures of nucleon (DNA segment) diversity within and between populations are proposed, and statistical methods for estimating these quantities are developed. These statistical methods are applicable to both nuclear and nonnuclear DNAs. When evolutionary change of nucleons occurs mainly by mutation and genetic drift, all the measures can be expressed in terms of the product of mutation rate per nucleon and effective population size. A method for estimating nucleotide diversity from nucleon diversity is also presented under certain assumptions. It is shown that DNA divergence between two populations can be studied either by the average number of restriction site differences or by the average number of nucleotide differences. In either case, a large number of different restriction enzymes should be used for studying phylogenetic relationships among related organisms, since the effect of stochastic factors on these quantities is very large. The statistical methods developed have been applied to data of Shah and Langley on mitochondrial (mt)DNA from Drosophila melanogaster, simulans and virilis. This application has suggested that the evolutionary change of mtDNA in higher animals occurs mainly by nucleotide substitution rather than by deletion and insertion. The evolutionary distances among the three species have also been estimated.  相似文献   

10.
The relative efficiencies of the maximum parsimony (MP) and distance-matrix methods in obtaining the correct tree (topology) were studied by using computer simulation. The distance-matrix methods examined are the neighbor-joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li methods. In the computer simulation, six or eight DNA sequences were assumed to evolve following a given model tree, and the evolutionary changes of the sequences were followed. Both constant and varying rates of nucleotide substitution were considered. From the sequences thus obtained, phylogenetic trees were constructed using the six tree-making methods and compared with the model (true) tree. This process was repeated 300 times for each different set of parameters. The results obtained indicate that when the number of nucleotide substitutions per site is small and a relatively small number of nucleotides are used, the probability of obtaining the correct topology (P1) is generally lower in the MP method than in the distance-matrix methods. The P1 value for the MP method increases with increasing number of nucleotides but is still generally lower than the value for the NJ or DW method. Essentially the same conclusion was obtained whether or not the rate of nucleotide substitution was constant or whether or not a transition bias in nucleotide substitution existed. The relatively poor performance of the MP method for these cases is due to the fact that information from singular sites is not used in this method. The MP method also showed a relatively low P1 value when the model of varying rate of nucleotide substitution was used and the number of substitutions per site was large. However, the MP method often produced cases in which the correct tree was one of several equally parsimonious trees. When these cases were included in the class of "success," the MP method performed better than the other methods, provided that the number of nucleotide substitutions per site was small.  相似文献   

11.
Unbiased estimation of evolutionary distance between nucleotide sequences   总被引:7,自引:2,他引:5  
A new algorithm for estimating the number of nucleotide substitutions per site (i.e., the evolutionary distance) between two nucleotide sequences is presented. This algorithm can be applied to many estimation methods, such as Jukes and Cantor's method, Kimura's transition/transversion method, and Tajima and Nei's method. Unlike ordinary methods, this algorithm is always applicable. Numerical computations and computer simulations indicate that this algorithm gives an almost unbiased estimate of the evolutionary distance, unless the evolutionary distance is very large. This algorithm should be useful especially when we analyze short nucleotide sequences. It can also be applied to amino acid sequences, for estimating the number of amino acid replacements.   相似文献   

12.
Summary A phylogenetic tree for the human lymphadenopathy-associated virus (LAV), the human T-cell lymphotrophic virus type III (HTLV-III), and the acquired immune deficiency syndrome (AIDS)-associated retrovirus (ARV) has been constructed from comparisons of the amino acid sequences of their gag proteins. A method is proposed for estimating the divergence times among these AIDS viruses and the rates of nucleotide substitution for their RNA genomes. The analysis indicates that the LAV and HTLV-III strains diverged from one another after 1977 and that their common ancestor diverged from the ARV virus no more than 10 years earlier. Hence, the evolutionary diversity among strains of the AIDS viruses apparently has been generated within the last 20 years. It is estimated that the genome of the AIDS virus has a nucleotide substitution rate on the order of 10–3 per site per year, with the rate in the second half of the genome being double that in the first half.  相似文献   

13.
Most of the sophisticated methods to estimate evolutionary divergence between DNA sequences assume that the two sequences have evolved with the same pattern of nucleotide substitution after their divergence from their most recent common ancestor (homogeneity assumption). If this assumption is violated, the evolutionary distance estimated will be biased, which may result in biased estimates of divergence times and substitution rates, and may lead to erroneous branching patterns in the inferred phylogenies. Here we present a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences. Results from computer simulations and empirical data analyses for human and mouse genes are presented to demonstrate that the proposed modification reduces the estimation bias considerably and that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site. We also discuss the relationship of the substitution and mutation rate estimates when the substitution pattern is not the same in the lineages leading to the two sequences compared.  相似文献   

14.
More than an order of magnitude difference in substitution rate exists among sites within hypervariable region 1 of the control region of human mitochondrial DNA. A two-rate Poisson mixture and a negative binomial distribution are used to describe the distribution of the inferred number of changes per nucleotide site in this region. When three data sets are pooled, however, the two-rate model cannot explain the data. The negative binomial distribution always fits, suggesting that substitution rates are approximately gamma distributed among sites. Simulations presented here provide support for the use of a biased, yet commonly employed, method of examining rate variation. The use of parsimony in the method to infer the number of changes at each site introduces systematic errors into the analysis. These errors preclude an unbiased quantification of variation in substitution rate but make the method conservative overall. The method can be used to distinguish sites with highly elevated rates, and 29 such sites are identified in hypervariable region 1. Variation does not appear to be clustered within this region. Simulations show that biases in rates of substitution among nucleotides and non-uniform base composition can mimic the effects of variation in rate among sites. However, these factors contribute little to the levels of rate variation observed in hypervariable region 1.  相似文献   

15.
Summary The effects of temporal (among different branches of a phylogeny) and spatial (among different nucleotide sites within a gene) nonuniformities of nucleotide substitution rates on the construction of phylogenetic trees from nucleotide sequences are addressed. Spatial nonuniformity may be estimated by using Shannon's (1948) entropy formula to measure the Relative Nucleotide Variability (RNV) at each nucleotide site in an aligned set of sequences; this is demonstrated by a comparative analysis of 5S rRNAs. New methods of constructing phylogenetic trees are proposed that augment the Unweighted Pair-Group Using Arithmetic Averages (UPGMA) algorithm by estimating and compensating for both spatial and temporal nonuniformity in substitution rates. These methods are evaluated by computer simulations of 5S rRNA evolution that include both kinds of nonuniformities. It was found that the proposed Reference Ratio Method improved both the ability to reconstruct the correct topology of a tree and also the estimation of branch lengths as compared to UPGMA. A previous method (Farris et al. 1970; Klotz et al. 1979; Li 1981) was found to be less successful in reconstructing topologies when there is high probability of multiple mutations at some sites. Phylogenetic analyses of 5S rRNA sequences support the endosymbiotic origins of both chloroplasts and mitochondria, even though the latter exhibit an accelerated rate of nucleotide substitution. Phylogenetic trees also reveal an adaptive radiation within the eubacteria and another within the eukaryotes for the origins of most major phyla within each group during the Precambrian era.  相似文献   

16.
Tests of applicability of several substitution models for DNA sequence data   总被引:8,自引:3,他引:5  
Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.   相似文献   

17.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

18.
The mitochondrial DNA (mtDNA) control region was sequenced in 37 sperm whales from a large part of the global range of the species. Nucleotide diversity was several-fold lower than that reported for control regions of abundant and outbred mammals, but similar to that for populations known to have experienced bottlenecks. Relative neck tests did not suggest that the low diversity is due to a lower substitution rate in sperm whale mtDNA. Rather, it is more likely that demographic factors have reduced diversity. The pattern of nucleotide substitutions was examined by cladistic methods, facilitated by the apparent monophyly of lineages from the Southern Hemisphere, as defined by a single base pair deletion. Substitutions were nonrandom in nature, confined to a few "hot spots," and parallel substitutions constituted a majority of the inferred changes. The substitution pattern fitted a negative binomial distribution better than a Poisson distribution, and the bias in number of substitutions among sites was considerably higher than previously reported for the mtDNA control region of any species. A novel method of estimating time since common ancestry was developed, which utilizes the transition/transversion ratio R and the number of substitutions inferred from a parsimony analysis. Using this method, we estimated the age of sperm whale mtDNA diversity to be about 6,000-25,000 years, and when the uncertainty of R was accounted for, a range of about 1,000- 100,000 years was obtained.   相似文献   

19.
Summary A simulation study has been conducted to check the accuracy of Nei and Li's (1979) formulas for the mean and variance of the proportion (S) of identical restriction sites between two DNA sequences and for estimating the mean and variance of the number () of base substitutions per nucleotide site between two DNA sequences. The results show that these formulas are quite accurate as long as the probability of S becoming zero is negligibly small. In addition to the simulation, approximate formulas have also been obtained for the probability for S to become zero at time t and for the contribution to S due to parallel mutation.  相似文献   

20.
M. Nei  J. C. Miller 《Genetics》1990,125(4):873-879
A simple method is proposed for estimating the average number of nucleotide substitutions per site within and between populations for the case where a large number of individuals are examined for many restriction enzymes. This method gives essentially the same results as those obtained by Nei and Li's method but saves a large amount of computer time. The variances of the quantities estimated can be obtained by the jackknife method, and these variances are very similar to those obtained by Nei and Jin's more sophisticated method. A similar method can also be applied to DNA sequence data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号