首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes. The method is applied to a large number of mammalian genes. The rate of nonsynonymous substitution is extremely variable among genes; it ranges from 0.004 X 10(-9) (histone H4) to 2.80 X 10(-9) (interferon gamma), with a mean of 0.88 X 10(-9) substitutions per nonsynonymous site per year. The rate of synonymous substitution is also variable among genes; the highest rate is three to four times higher than the lowest one, with a mean of 4.7 X 10(-9) substitutions per synonymous site per year. The rate of nucleotide substitution is lowest at nondegenerate sites (the average being 0.94 X 10(-9), intermediate at twofold degenerate sites (2.26 X 10(-9)). and highest at fourfold degenerate sites (4.2 X 10(-9)). The implication of our results for the mechanisms of DNA evolution and that of the relative likelihood of codon interchanges in parsimonious phylogenetic reconstruction are discussed.  相似文献   

2.
A method for estimating nucleotide diversity from AFLP data   总被引:8,自引:0,他引:8  
Innan H  Terauchi R  Kahl G  Tajima F 《Genetics》1999,151(3):1157-1164
A method for estimating the nucleotide diversity from AFLP data is developed by using the relationship between the number of nucleotide changes and the proportion of shared bands. The estimation equation is based on the assumption that GC-content is 0.5. Computer simulations, however, show that this method gives a reasonably accurate estimate even when GC-content deviates from 0.5, as long as the number of nucleotide changes per site (nucleotide diversity) is small. As an example, the nucleotide diversity of the wild yam, Dioscorea tokoro, was estimated. The estimated nucleotide diversity is 0.0055, which is larger than estimations from nucleotide sequence data for Adh and Pgi.  相似文献   

3.
We propose two approximate methods (one based on parsimony and one on pairwise sequence comparison) for estimating the pattern of nucleotide substitution and a parsimony-based method for estimating the gamma parameter for variable substitution rates among sites. The matrix of substitution rates that represents the substitution pattern can be recovered through its relationship with the observable matrix of site pattern frequences in pairwise sequence comparisons. In the parsimony approach, the ancestral sequences reconstructed by the parsimony algorithm were used, and the two sequences compared are those at the ends of a branch in the phylogenetic tree. The method for estimating the gamma parameter was based on a reinterpretation of the numbers of changes at sites inferred by parsimony. Three data sets were analyzed to examine the utility of the approximate methods compared with the more reliable likelihood methods. The new methods for estimating the substitution pattern were found to produce estimates quite similar to those obtained from the likelihood analyses. The new method for estimating the gamma parameter was effective in reducing the bias in conventional parsimony estimates, although it also overestimated the parameter. The approximate methods are computationally very fast and appear useful for analyzing large data sets, for which use of the likelihood method requires excessive computation.   相似文献   

4.
We introduce a new model for relaxing the assumption of a strict molecular clock for use as a prior in Bayesian methods for divergence time estimation. Lineage-specific rates of substitution are modeled using a Dirichlet process prior (DPP), a type of stochastic process that assumes lineages of a phylogenetic tree are distributed into distinct rate classes. Under the Dirichlet process, the number of rate classes, assignment of branches to rate classes, and the rate value associated with each class are treated as random variables. The performance of this model was evaluated by conducting analyses on data sets simulated under a range of different models. We compared the Dirichlet process model with two alternative models for rate variation: the strict molecular clock and the independent rates model. Our results show that divergence time estimation under the DPP provides robust estimates of node ages and branch rates without significantly reducing power. Further analyses were conducted on a biological data set, and we provide examples of ways to summarize Markov chain Monte Carlo samples under this model.  相似文献   

5.
Summary The method proposed by Kaplan and Langley for estimating the extent of sequence divergence between related DNA's using restriction endonuclease maps is modified so that the estimates are easier to compute. In the two-species case, these modifications lead via a maximum likelihood approach to an estimate which is closely related to one recently suggested by Nei and Li (1979) and Gotoh et al. (1979). Simulation studies show that the modified estimates are comparable to those of Kaplan and Langley, providing that there is sufficient homology in the DNA segments of the related species. The M-species case, M 3, is also discussed.  相似文献   

6.
Several maximum likelihood and distance matrix methods for estimating phylogenetic trees from homologous DNA sequences were compared when substitution rates at sites were assumed to follow a gamma distribution. Computer simulations were performed to estimate the probabilities that various tree estimation methods recover the true tree topology. The case of four species was considered, and a few combinations of parameters were examined. Attention was applied to discriminating among different sources of error in tree reconstruction, i.e., the inconsistency of the tree estimation method, the sampling error in the estimated tree due to limited sequence length, and the sampling error in the estimated probability due to the number of simulations being limited. Compared to the least squares method based on pairwise distance estimates, the joint likelihood analysis is found to be more robust when rate variation over sites is present but ignored and an assumption is thus violated. With limited data, the likelihood method has a much higher probability of recovering the true tree and is therefore more efficient than the least squares method. The concept of statistical consistency of a tree estimation method and its implications were explored, and it is suggested that, while the efficiency (or sampling error) of a tree estimation method is a very important property, statistical consistency of the method over a wide range of, if not all, parameter values is prerequisite.  相似文献   

7.
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.   相似文献   

8.
In this article, a new approach is presented for estimating the efficiencies of the nucleotide substitution models in a four-taxon case and then this approach is used to estimate the relative efficiencies of six substitution models under a wide variety of conditions. In this approach, efficiencies of the models are estimated by using a simple probability distribution theory. To assess the accuracy of the new approach, efficiencies of the models are also estimated by using the direct estimation method. Simulation results from the direct estimation method confirmed that the new approach is highly accurate. The success of the new approach opens a unique opportunity to develop analytical methods for estimating the relative efficiencies of the substitution models in a straightforward way.  相似文献   

9.
Tests of applicability of several substitution models for DNA sequence data   总被引:5,自引:3,他引:5  
Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.   相似文献   

10.
Summary Some simple formulae were obtained which enable us to estimate evolutionary distances in terms of the number of nucleotide substitutions (and, also, the evolutionary rates when the divergence times are known). In comparing a pair of nucleotide sequences, we distinguish two types of differences; if homologous sites are occupied by different nucleotide bases but both are purines or both pyrimidines, the difference is called type I (or transition type), while, if one of the two is a purine and the other is a pyrimidine, the difference is called type II (or transversion type). Letting P and Q be respectively the fractions of nucleotide sites showing type I and type II differences between two sequences compared, then the evolutionary distance per site is K = — (1/2) ln {(1 — 2P — Q) }. The evolutionary rate per year is then given by k = K/(2T), where T is the time since the divergence of the two sequences. If only the third codon positions are compared, the synonymous component of the evolutionary base substitutions per site is estimated by K'S = — (1/2) ln (1 — 2P — Q). Also, formulae for standard errors were obtained. Some examples were worked out using reported globin sequences to show that synonymous substitutions occur at much higher rates than amino acid-altering substitutions in evolution.Contribution No. 1330 from the National Institute of Genetics, Mishima, 411 Japan  相似文献   

11.
Summary A method for molecular phylogeny construction is newly developed. The method, called the stepwise ancestral sequence method, estimates molecular phylogenetic trees and ancestral sequences simultaneously on the basis of parsimony and sequence homology. For simplicity the emphasis is placed more on parsiomony than on sequence homology in the present study, though both are certainly important. Because parsimony alone will sometimes generate plural candidate trees, the method retains not one but five candidates from which one can then single out the final tree taking other criteria into account.The properties and performance of the method are then examined by simulating an evolving gene along a model phylogenetic tree. The estimated trees are found to lie in a narrow range of the parsimony criteria used in the present study. Thus, other criteria such as biological evidence and likelihood are necessary to single out the correct tree among them, with biological evidence taking precedence over any other criterion. The computer simulation also reveals that the method satisfactorily estimates both tree topology and ancestral sequences, at least for the evolutionary model used in the present study.  相似文献   

12.
13.
A codon-based model of nucleotide substitution for protein-coding DNA sequences   总被引:11,自引:23,他引:11  
A codon-based model for the evolution of protein-coding DNA sequences is presented for use in phylogenetic estimation. A Markov process is used to describe substitutions between codons. Transition/transversion rate bias and codon usage bias are allowed in the model, and selective restraints at the protein level are accommodated using physicochemical distances between the amino acids coded for by the codons. Analyses of two data sets suggest that the new codon-based model can provide a better fit to data than can nucleotide-based models and can produce more reliable estimates of certain biologically important measures such as the transition/transversion rate ratio and the synonymous/nonsynonymous substitution rate ratio.   相似文献   

14.
A new method for calculating evolutionary substitution rates   总被引:39,自引:0,他引:39  
Summary In this paper we present a new method for analysing molecular evolution in homologous genes based on a general stationary Markov process. The elaborate statistical analysis necessary to apply the method effectively has been performed using Monte Carlo technqiues. We have applied our method to the silent third position of the codon of the five mitochondrial genes coding for identified proteins of four mammalian species (rat, mouse, cow and man). We found that the method applies satisfactorily to the three former species, while the last appears to be outside the scope of the present approach. The method allows one to calculate the evolutionarily effective silent substitution rate (vs) for mitochondrial genes, which in the species mentioned above is 1.4×10–8 nucleotide substitutions per site per year. We have also determined the divergence time ratios between the couples mousecow/rat-mouse and rat-cow/rat-mouse. In both cases this value is approximately 1.4.  相似文献   

15.
16.
We report a method for studying global DNA methylation based on using bisulfite treatment of DNA and simultaneous PCR of multiple DNA repetitive elements, such as Alu elements and long interspersed nucleotide elements (LINE). The PCR product, which represents a pool of approximately 15000 genomic loci, could be used for direct sequencing, selective restriction digestion or pyrosequencing, in order to quantitate DNA methylation. By restriction digestion or pyrosequencing, the assay was reproducible with a standard deviation of only 2% between assays. Using this method we found that almost two-thirds of the CpG methylation sites in Alu elements are mutated, but of the remaining methylation target sites, 87% were methylated. Due to the heavy methylation of repetitive elements, this assay was especially useful in detecting decreases in DNA methylation, and this assay was validated by examining cell lines treated with the methylation inhibitor 5-aza-2′deoxycytidine (DAC), where we found a 1–16% decrease in Alu element and 18–60% LINE methylation within 3 days of treatment. This method can be used as a surrogate marker of genome-wide methylation changes. In addition, it is less labor intensive and requires less DNA than previous methods of assessing global DNA methylation.  相似文献   

17.
Restriction mapping is used to estimate nucleotide sequence polymorphism when the regions to be studied are too long or too numerous to be sequenced. Restriction mapping is less costly than DNA sequencing, but it does not allow direct measurement of underlying nucleotide polymorphism. It is therefore useful to be able to estimate underlying nucleotide polymorphism from observations of polymorphism in restriction maps, as this offers some of the resolution afforded by DNA sequencing at a reduced cost. Previous estimators of underlying nucleotide polymorphism have assumed that each restriction-enzyme- binding site contains, at most, a single polymorphic nucleotide position (the low-polymorphism-frequency assumption), and this assumption has placed an upper limit on the level of polymorphism that can be resolved by these estimators. The present study documents an estimator which allows relaxation of this assumption. The new estimator more accurately estimates underlying nucleotide polymorphism when the polymorphism level is high enough to falsify the low-polymorphism- frequency assumption. The new estimator therefore yields good results for data sets that are too divergent for analysis by present methods.   相似文献   

18.
A new approach to the identification of point mutations by allele-specific PCR was proposed. The mutation R408W of the human phenylalanine hydroxylase gene was used as a model. A high specificity of the approach was achieved by the use of primers partially complementary to the genomic DNA. Polyethylene glycol covalently attached to one of the allele-specific primers provides for the differential identification of the PCR products due to a change in electrophoretic mobility.  相似文献   

19.
Methods for estimating synonymous and nonsynonymous substitution rates among protein-coding sequences adopt different mutation (substitution) models with subtle yet significant differences, which lead to different estimates of evolutionary information. Little attention has been devoted to the comparison of methods for obtaining reliable estimates since the amount of sequence variations within targeted datasets is always unpredictable. To our knowledge, there is little information available in literature about evaluation of these different methods. In this study, we compared six widely used methods and provided with evaluation results using simulated sequences. The results indicate that incorporating sequence features (such as transition/transversion bias and nucleotide/codon frequency bias) into methods could yield better performance. We recommend that conclusions related to or derived from Ka and Ks analyses should not be readily drawn only according to results from one method.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号