共查询到20条相似文献,搜索用时 0 毫秒
1.
A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes 总被引:71,自引:7,他引:71
A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes. The method is applied to a large number of mammalian genes. The rate of nonsynonymous substitution is extremely variable among genes; it ranges from 0.004 X 10(-9) (histone H4) to 2.80 X 10(-9) (interferon gamma), with a mean of 0.88 X 10(-9) substitutions per nonsynonymous site per year. The rate of synonymous substitution is also variable among genes; the highest rate is three to four times higher than the lowest one, with a mean of 4.7 X 10(-9) substitutions per synonymous site per year. The rate of nucleotide substitution is lowest at nondegenerate sites (the average being 0.94 X 10(-9), intermediate at twofold degenerate sites (2.26 X 10(-9)). and highest at fourfold degenerate sites (4.2 X 10(-9)). The implication of our results for the mechanisms of DNA evolution and that of the relative likelihood of codon interchanges in parsimonious phylogenetic reconstruction are discussed. 相似文献
2.
A method for estimating nucleotide diversity from AFLP data 总被引:8,自引:0,他引:8
A method for estimating the nucleotide diversity from AFLP data is developed by using the relationship between the number of nucleotide changes and the proportion of shared bands. The estimation equation is based on the assumption that GC-content is 0.5. Computer simulations, however, show that this method gives a reasonably accurate estimate even when GC-content deviates from 0.5, as long as the number of nucleotide changes per site (nucleotide diversity) is small. As an example, the nucleotide diversity of the wild yam, Dioscorea tokoro, was estimated. The estimated nucleotide diversity is 0.0055, which is larger than estimations from nucleotide sequence data for Adh and Pgi. 相似文献
3.
Approximate methods for estimating the pattern of nucleotide substitution and the variation of substitution rates among sites 总被引:13,自引:6,他引:7
We propose two approximate methods (one based on parsimony and one on
pairwise sequence comparison) for estimating the pattern of nucleotide
substitution and a parsimony-based method for estimating the gamma
parameter for variable substitution rates among sites. The matrix of
substitution rates that represents the substitution pattern can be
recovered through its relationship with the observable matrix of site
pattern frequences in pairwise sequence comparisons. In the parsimony
approach, the ancestral sequences reconstructed by the parsimony algorithm
were used, and the two sequences compared are those at the ends of a branch
in the phylogenetic tree. The method for estimating the gamma parameter was
based on a reinterpretation of the numbers of changes at sites inferred by
parsimony. Three data sets were analyzed to examine the utility of the
approximate methods compared with the more reliable likelihood methods. The
new methods for estimating the substitution pattern were found to produce
estimates quite similar to those obtained from the likelihood analyses. The
new method for estimating the gamma parameter was effective in reducing the
bias in conventional parsimony estimates, although it also overestimated
the parameter. The approximate methods are computationally very fast and
appear useful for analyzing large data sets, for which use of the
likelihood method requires excessive computation.
相似文献
4.
We introduce a new model for relaxing the assumption of a strict molecular clock for use as a prior in Bayesian methods for divergence time estimation. Lineage-specific rates of substitution are modeled using a Dirichlet process prior (DPP), a type of stochastic process that assumes lineages of a phylogenetic tree are distributed into distinct rate classes. Under the Dirichlet process, the number of rate classes, assignment of branches to rate classes, and the rate value associated with each class are treated as random variables. The performance of this model was evaluated by conducting analyses on data sets simulated under a range of different models. We compared the Dirichlet process model with two alternative models for rate variation: the strict molecular clock and the independent rates model. Our results show that divergence time estimation under the DPP provides robust estimates of node ages and branch rates without significantly reducing power. Further analyses were conducted on a biological data set, and we provide examples of ways to summarize Markov chain Monte Carlo samples under this model. 相似文献
5.
An improved method for estimating sequence divergence of DNA using restriction endonuclease mappings
Summary The method proposed by Kaplan and Langley for estimating the extent of sequence divergence between related DNA's using restriction endonuclease maps is modified so that the estimates are easier to compute. In the two-species case, these modifications lead via a maximum likelihood approach to an estimate which is closely related to one recently suggested by Nei and Li (1979) and Gotoh et al. (1979). Simulation studies show that the modified estimates are comparable to those of Kaplan and Langley, providing that there is sufficient homology in the DNA segments of the related species. The M-species case, M 3, is also discussed. 相似文献
6.
Ziheng Yang 《Journal of molecular evolution》1995,40(6):689-697
Several maximum likelihood and distance matrix methods for estimating phylogenetic trees from homologous DNA sequences were compared when substitution rates at sites were assumed to follow a gamma distribution. Computer simulations were performed to estimate the probabilities that various tree estimation methods recover the true tree topology. The case of four species was considered, and a few combinations of parameters were examined. Attention was applied to discriminating among different sources of error in tree reconstruction, i.e., the inconsistency of the tree estimation method, the sampling error in the estimated tree due to limited sequence length, and the sampling error in the estimated probability due to the number of simulations being limited. Compared to the least squares method based on pairwise distance estimates, the joint likelihood analysis is found to be more robust when rate variation over sites is present but ignored and an assumption is thus violated. With limited data, the likelihood method has a much higher probability of recovering the true tree and is therefore more efficient than the least squares method. The concept of statistical consistency of a tree estimation method and its implications were explored, and it is suggested that, while the efficiency (or sampling error) of a tree estimation method is a very important property, statistical consistency of the method over a wide range of, if not all, parameter values is prerequisite. 相似文献
7.
A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data 总被引:4,自引:2,他引:4
Phylogeny reconstruction is a difficult computational problem, because the
number of possible solutions increases with the number of included taxa.
For example, for only 14 taxa, there are more than seven trillion possible
unrooted phylogenetic trees. For this reason, phylogenetic inference
methods commonly use clustering algorithms (e.g., the neighbor-joining
method) or heuristic search strategies to minimize the amount of time spent
evaluating nonoptimal trees. Even heuristic searches can be painfully slow,
especially when computationally intensive optimality criteria such as
maximum likelihood are used. I describe here a different approach to
heuristic searching (using a genetic algorithm) that can tremendously
reduce the time required for maximum-likelihood phylogenetic inference,
especially for data sets involving large numbers of taxa. Genetic
algorithms are simulations of natural selection in which individuals are
encoded solutions to the problem of interest. Here, labeled phylogenetic
trees are the individuals, and differential reproduction is effected by
allowing the number of offspring produced by each individual to be
proportional to that individual's rank likelihood score. Natural selection
increases the average likelihood in the evolving population of phylogenetic
trees, and the genetic algorithm is allowed to proceed until the likelihood
of the best individual ceases to improve over time. An example is presented
involving rbcL sequence data for 55 taxa of green plants. The genetic
algorithm described here required only 6% of the computational effort
required by a conventional heuristic search using tree
bisection/reconnection (TBR) branch swapping to obtain the same
maximum-likelihood topology.
相似文献
8.
Anup Som 《Theorie in den Biowissenschaften》2007,125(2):133-145
In this article, a new approach is presented for estimating the efficiencies of the nucleotide substitution models in a four-taxon
case and then this approach is used to estimate the relative efficiencies of six substitution models under a wide variety
of conditions. In this approach, efficiencies of the models are estimated by using a simple probability distribution theory.
To assess the accuracy of the new approach, efficiencies of the models are also estimated by using the direct estimation method.
Simulation results from the direct estimation method confirmed that the new approach is highly accurate. The success of the
new approach opens a unique opportunity to develop analytical methods for estimating the relative efficiencies of the substitution
models in a straightforward way. 相似文献
9.
Using linear invariants for various models of nucleotide substitution, we
developed test statistics for examining the applicability of a specific
model to a given dataset in phylogenetic inference. The models examined are
those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei
(1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a
new model called the eight-parameter model. The first six models are
special cases of the last model. The test statistics developed are
independent of evolutionary time and phylogeny, although the variances of
the statistics contain phylogenetic information. Therefore, these
statistics can be used before a phylogenetic tree is estimated. Our
objective is to find the simplest model that is applicable to a given
dataset, keeping in mind that a simple model usually gives an estimate of
evolutionary distance (number of nucleotide substitutions per site) with a
smaller variance than a complicated model when the simple model is correct.
We have also developed a statistical test of the homogeneity of nucleotide
frequencies of a sample of several sequences that takes into account
possible phylogenetic correlations. This test is used to examine the
stationarity in time of the base frequencies in the sample. For Hasegawa et
al.'s and the eight-parameter models, analytical formulas for estimating
evolutionary distances are presented. Application of the above tests to
several sets of real data has shown that the assumption of stationarity of
base composition is usually acceptable when the sequences studied are
closely related but otherwise it is rejected. Similarly, the simple models
of nucleotide substitution are almost always rejected when actual genes are
distantly related and/or the total number of nucleotides examined is large.
相似文献
10.
A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences 总被引:235,自引:0,他引:235
Motoo Kimura 《Journal of molecular evolution》1980,16(2):111-120
Summary Some simple formulae were obtained which enable us to estimate evolutionary distances in terms of the number of nucleotide substitutions (and, also, the evolutionary rates when the divergence times are known). In comparing a pair of nucleotide sequences, we distinguish two types of differences; if homologous sites are occupied by different nucleotide bases but both are purines or both pyrimidines, the difference is called type I (or transition type), while, if one of the two is a purine and the other is a pyrimidine, the difference is called type II (or transversion type). Letting P and Q be respectively the fractions of nucleotide sites showing type I and type II differences between two sequences compared, then the evolutionary distance per site is K = — (1/2) ln {(1 — 2P — Q) }. The evolutionary rate per year is then given by k = K/(2T), where T is the time since the divergence of the two sequences. If only the third codon positions are compared, the synonymous component of the evolutionary base substitutions per site is estimated by K'S = — (1/2) ln (1 — 2P — Q). Also, formulae for standard errors were obtained. Some examples were worked out using reported globin sequences to show that synonymous substitutions occur at much higher rates than amino acid-altering substitutions in evolution.Contribution No. 1330 from the National Institute of Genetics, Mishima, 411 Japan 相似文献
11.
Yoshio Tateno 《Journal of molecular evolution》1990,30(1):85-93
Summary A method for molecular phylogeny construction is newly developed. The method, called the stepwise ancestral sequence method, estimates molecular phylogenetic trees and ancestral sequences simultaneously on the basis of parsimony and sequence homology. For simplicity the emphasis is placed more on parsiomony than on sequence homology in the present study, though both are certainly important. Because parsimony alone will sometimes generate plural candidate trees, the method retains not one but five candidates from which one can then single out the final tree taking other criteria into account.The properties and performance of the method are then examined by simulating an evolving gene along a model phylogenetic tree. The estimated trees are found to lie in a narrow range of the parsimony criteria used in the present study. Thus, other criteria such as biological evidence and likelihood are necessary to single out the correct tree among them, with biological evidence taking precedence over any other criterion. The computer simulation also reveals that the method satisfactorily estimates both tree topology and ancestral sequences, at least for the evolutionary model used in the present study. 相似文献
12.
13.
A codon-based model for the evolution of protein-coding DNA sequences is
presented for use in phylogenetic estimation. A Markov process is used to
describe substitutions between codons. Transition/transversion rate bias
and codon usage bias are allowed in the model, and selective restraints at
the protein level are accommodated using physicochemical distances between
the amino acids coded for by the codons. Analyses of two data sets suggest
that the new codon-based model can provide a better fit to data than can
nucleotide-based models and can produce more reliable estimates of certain
biologically important measures such as the transition/transversion rate
ratio and the synonymous/nonsynonymous substitution rate ratio.
相似文献
14.
A new method for calculating evolutionary substitution rates 总被引:39,自引:0,他引:39
Cecilia Lanave Giuliano Preparata Cecilia Sacone Gabriella Serio 《Journal of molecular evolution》1984,20(1):86-93
Summary In this paper we present a new method for analysing molecular evolution in homologous genes based on a general stationary Markov process. The elaborate statistical analysis necessary to apply the method effectively has been performed using Monte Carlo technqiues. We have applied our method to the silent third position of the codon of the five mitochondrial genes coding for identified proteins of four mammalian species (rat, mouse, cow and man). We found that the method applies satisfactorily to the three former species, while the last appears to be outside the scope of the present approach. The method allows one to calculate the evolutionarily effective silent substitution rate (vs) for mitochondrial genes, which in the species mentioned above is 1.4×10–8 nucleotide substitutions per site per year. We have also determined the divergence time ratios between the couples mousecow/rat-mouse and rat-cow/rat-mouse. In both cases this value is approximately 1.4. 相似文献
15.
16.
A simple method for estimating global DNA methylation using bisulfite PCR of repetitive DNA elements 总被引:15,自引:2,他引:15 下载免费PDF全文
We report a method for studying global DNA methylation based on using bisulfite treatment of DNA and simultaneous PCR of multiple DNA repetitive elements, such as Alu elements and long interspersed nucleotide elements (LINE). The PCR product, which represents a pool of approximately 15000 genomic loci, could be used for direct sequencing, selective restriction digestion or pyrosequencing, in order to quantitate DNA methylation. By restriction digestion or pyrosequencing, the assay was reproducible with a standard deviation of only 2% between assays. Using this method we found that almost two-thirds of the CpG methylation sites in Alu elements are mutated, but of the remaining methylation target sites, 87% were methylated. Due to the heavy methylation of repetitive elements, this assay was especially useful in detecting decreases in DNA methylation, and this assay was validated by examining cell lines treated with the methylation inhibitor 5-aza-2′deoxycytidine (DAC), where we found a 1–16% decrease in Alu element and 18–60% LINE methylation within 3 days of treatment. This method can be used as a surrogate marker of genome-wide methylation changes. In addition, it is less labor intensive and requires less DNA than previous methods of assessing global DNA methylation. 相似文献
17.
Restriction mapping is used to estimate nucleotide sequence polymorphism
when the regions to be studied are too long or too numerous to be
sequenced. Restriction mapping is less costly than DNA sequencing, but it
does not allow direct measurement of underlying nucleotide polymorphism. It
is therefore useful to be able to estimate underlying nucleotide
polymorphism from observations of polymorphism in restriction maps, as this
offers some of the resolution afforded by DNA sequencing at a reduced cost.
Previous estimators of underlying nucleotide polymorphism have assumed that
each restriction-enzyme- binding site contains, at most, a single
polymorphic nucleotide position (the low-polymorphism-frequency
assumption), and this assumption has placed an upper limit on the level of
polymorphism that can be resolved by these estimators. The present study
documents an estimator which allows relaxation of this assumption. The new
estimator more accurately estimates underlying nucleotide polymorphism when
the polymorphism level is high enough to falsify the low-polymorphism-
frequency assumption. The new estimator therefore yields good results for
data sets that are too divergent for analysis by present methods.
相似文献
18.
A new approach to the identification of point mutations by allele-specific PCR was proposed. The mutation R408W of the human phenylalanine hydroxylase gene was used as a model. A high specificity of the approach was achieved by the use of primers partially complementary to the genomic DNA. Polyethylene glycol covalently attached to one of the allele-specific primers provides for the differential identification of the PCR products due to a change in electrophoretic mobility. 相似文献
19.
Methods for estimating synonymous and nonsynonymous substitution rates among protein-coding sequences adopt different mutation (substitution) models with subtle yet significant differences, which lead to different estimates of evolutionary information. Little attention has been devoted to the comparison of methods for obtaining reliable estimates since the amount of sequence variations within targeted datasets is always unpredictable. To our knowledge, there is little information available in literature about evaluation of these different methods. In this study, we compared six widely used methods and provided with evaluation results using simulated sequences. The results indicate that incorporating sequence features (such as transition/transversion bias and nucleotide/codon frequency bias) into methods could yield better performance. We recommend that conclusions related to or derived from Ka and Ks analyses should not be readily drawn only according to results from one method. 相似文献