共查询到20条相似文献,搜索用时 15 毫秒
1.
Normalized nucleotide and amino acid contents of complete genome sequences can be visualized
as radar charts. The shapes of these charts depict the characteristics of an organism’s genome. The normalized
values calculated from the genome sequence theoretically exclude experimental errors. Further, because
normalization is independent of both target size and kind, this procedure is applicable not only to single
genes but also to whole genomes, which consist of a huge number of different genes. In this review, we discuss
the applications of the normalization of the nucleotide and predicted amino acid contents of complete genomes to the
investigation of genome structure and to evolutionary research from primitive organisms to Homo sapiens. Some of the
results could never have been obtained from the analysis of individual nucleotide or amino acid sequences but were revealed
only after the normalization of nucleotide and amino acid contents was applied to genome research. The discovery
that genome structure was homogeneous was obtained only after normalization methods were applied to the nucleotide or
predicted amino acid contents of genome sequences. Normalization procedures are also applicable to evolutionary research.
Thus, normalization of the contents of whole genomes is a useful procedure that can help to characterize organisms. 相似文献
2.
There are three different methods of estimating the number of nucleotide substitutions between a pair of species from amino acid sequence data, i.e. the Poisson correction method, random evolutionary hit method, and counting the actual but minimum number of nucleotide substitutions. In this paper the relationships among the estimates obtained by these methods are studied empirically. The results obtained indicate that there is a high correlation among these estimates and in practice any of the three methods may be used for constructing evolutionary trees or relating nucleotide substitutions to evolutionary time. The effects of varying rates of nucleotide substition among different sites on the Poisson correction and random evolutionary hit methods are also studied mathematically. It is shown that these two methods are quite insensitive to the variation of the rate of nucleotide substitution. 相似文献
3.
Estimating the rate of evolution of the rate of molecular evolution 总被引:35,自引:13,他引:22
A simple model for the evolution of the rate of molecular evolution is
presented. With a Bayesian approach, this model can serve as the basis for
estimating dates of important evolutionary events even in the absence of
the assumption of constant rates among evolutionary lineages. The method
can be used in conjunction with any of the widely used models for
nucleotide substitution or amino acid replacement. It is illustrated by
analyzing a data set of rbcL protein sequences.
相似文献
4.
Most of the sophisticated methods to estimate evolutionary divergence between DNA sequences assume that the two sequences have evolved with the same pattern of nucleotide substitution after their divergence from their most recent common ancestor (homogeneity assumption). If this assumption is violated, the evolutionary distance estimated will be biased, which may result in biased estimates of divergence times and substitution rates, and may lead to erroneous branching patterns in the inferred phylogenies. Here we present a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences. Results from computer simulations and empirical data analyses for human and mouse genes are presented to demonstrate that the proposed modification reduces the estimation bias considerably and that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site. We also discuss the relationship of the substitution and mutation rate estimates when the substitution pattern is not the same in the lineages leading to the two sequences compared. 相似文献
5.
The covarion (or site specific rate variation, SSRV) process of biological sequence evolution is a process by which the evolutionary rate of a nucleotide/amino acid/codon position can change in time. In this paper, we introduce time-continuous, space-discrete, Markov-modulated Markov chains as a model for representing SSRV processes, generalizing existing theory to any model of rate change. We propose a fast algorithm for diagonalizing the generator matrix of relevant Markov-modulated Markov processes. This algorithm makes phylogeny likelihood calculation tractable even for a large number of rate classes and a large number of states, so that SSRV models become applicable to amino acid or codon sequence datasets. Using this algorithm, we investigate the accuracy of the discrete approximation to the Gamma distribution of evolutionary rates, widely used in molecular phylogeny. We show that a relatively large number of classes is required to achieve accurate approximation of the exact likelihood when the number of analyzed sequences exceeds 20, both under the SSRV and among site rate variation (ASRV) models. 相似文献
6.
7.
Estimation of evolutionary distance between nucleotide sequences 总被引:34,自引:9,他引:25
A mathematical formula for estimating the average number of nucleotide
substitutions per site (delta) between two homologous DNA sequences is
developed by taking into account unequal rates of substitution among
different nucleotide pairs. Although this formula is obtained for the
equal-input model of nucleotide substitution, computer simulations have
shown that it gives a reasonably good estimate for a wide range of
nucleotide substitution patterns as long as delta is equal to or smaller
than 1. Furthermore, the frequency of cases to which the formula is
inapplicable is much lower than that for other similar methods recently
proposed. This point is illustrated using insulin genes. A statistical
method for estimating the number of nucleotide changes due to deletion and
insertion is also developed. Application of this method to globin gene data
indicates that the number of nucleotide changes per site increases with
evolutionary time but the pattern of the increase is quite irregular.
相似文献
8.
We propose a simple algorithm for estimating the number of nucleotide differences between a pair of RNA or DNA sequences through comparison of their RNAse A mismatch cleavage patterns. In the RNAse A mismatch cleavage technique two or more sample sequences are hybridized to the same RNA probe, the hybrids are partially digested with RNAse A, and the digestion products are compared on an electrophoretic gel. Here we provide an algorithm for converting the numbers of unique and matching electrophoretic bands into an estimate of the number of nucleotide differences between the sequences. Computer simulation indicates that the proposed method yields a robust estimate of the genetic distance despite stochastic errors and occasional violation of certain assumptions. Our study suggests that the method performs best when the distance between the sequences is <15 differences. When the sequences under analysis are likely to have larger distances, we advise to substitute one long riboprobe with a set of shorter nonoverlapping probes. The new algorithm is applied to infer the proximity of several strains of pseudorabies virus. 相似文献
9.
Amino acid similarity often needs to be considered in DNA sequence comparison to elucidate gene functions. We propose a Smith-Waterman-like algorithm which considers amino acid similarity and insertions/deletions in sequences at the DNA level and at the protein level in a hybrid manner. The algorithm is applied to cDNA sequences of Oryza sativa and those of Arabidopsis thaliana. The results are compared with the results of application of NCBI's tblastx program (which compares the sequences in the BLAST manner after translation). It is shown that the present algorithm is very helpful in discovering nucleotide insertions/deletions originating from experimental errors as well as amino acid insertions/deletions due to evolutionary reasons. 相似文献
10.
Löytynoja A Goldman N 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2008,363(1512):3913-3919
We have developed a phylogeny-aware progressive alignment method that recognizes insertions and deletions as distinct evolutionary events and thus avoids systematic errors created by traditional alignment methods. We now extend this method to simultaneously model regional heterogeneity and evolution. This novel method can be flexibly adapted to alignment of nucleotide or amino acid sequences evolving under processes that vary over genomic regions and, being fully probabilistic, provides an estimate of regional heterogeneity of the evolutionary process along the alignment and a measure of local reliability of the solution. Furthermore, the evolutionary modelling of substitution process permits adjusting the sensitivity and specificity of the alignment and, if high specificity is aimed at, leaving sequences unaligned when their divergence is beyond a meaningful detection of homology. 相似文献
11.
Evolutionary change of restriction cleavage sites and phylogenetic inference for man and apes 总被引:2,自引:0,他引:2
A mathematical theory for the evolutionary change of restriction
endonuclease cleavage sites is developed, and the probabilities of various
types of restriction-site changes are evaluated. A computer simulation is
also conducted to study properties of the evolutionary change of
restriction sites. These studies indicate that parsimony methods of
constructing phylogenetic trees often make erroneous inferences about
evolutionary changes of restriction sites unless the number of nucleotide
substitutions per site is less than 0.01 for all branches of the tree. This
introduces a systematic error in estimating the number of mutational
changes for each branch and, consequently, in constructing phylogenetic
trees. Therefore, parsimony methods should be used only in cases where
nucleotide sequences are closely related. Reexamination of Ferris et al.'s
data on restriction-site differences of mitochondrial DNAs does not support
Templeton's conclusions regarding the phylogenetic tree for man and apes
and the molecular clock hypothesis. Templeton's claim that Nei and Li's
method of estimating the number of nucleotide substitutions per site is
seriously affected by parallel losses and loss-gains of restriction sites
is also unsupported.
相似文献
12.
Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny 总被引:24,自引:6,他引:18
The relative efficiencies of different protein-coding genes of the
mitochondrial genome and different tree-building methods in recovering a
known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum,
chicken, frog, and three bony fish species) was evaluated. The
tree-building methods examined were the neighbor joining (NJ), minimum
evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and
both nucleotide sequences and deduced amino acid sequences were analyzed.
Generally speaking, amino acid sequences were better than nucleotide
sequences in obtaining the true tree (topology) or trees close to the true
tree. However, when only first and second codon positions data were used,
nucleotide sequences produced reasonably good trees. Among the 13 genes
examined, Nd5 produced the true tree in all tree-building methods or
algorithms for both amino acid and nucleotide sequence data. Genes Cytb and
Nd4 also produced the correct tree in most tree-building algorithms when
amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed
a poor performance. In general, large genes produced better results, and
when the entire set of genes was used, all tree-building methods generated
the true tree. In each tree-building method, several distance measures or
algorithms were used, but all these distance measures or algorithms
produced essentially the same results. The ME method, in which many
different topologies are examined, was no better than the NJ method, which
generates a single final tree. Similarly, an ML method, in which many
topologies are examined, was no better than the ML star decomposition
algorithm that generates a single final tree. In ML the best substitution
model chosen by using the Akaike information criterion produced no better
results than simpler substitution models. These results question the
utility of the currently used optimization principles in phylogenetic
construction. Relatively simple methods such as the NJ and ML star
decomposition algorithms seem to produce as good results as those obtained
by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML
methods in obtaining the correct tree were nearly the same when amino acid
sequence data were used. The most important factor in constructing reliable
phylogenetic trees seems to be the number of amino acids or nucleotides
used.
相似文献
13.
Data on DNA polymorphisms detected by restriction endonucleases are rapidly accumulating. With the aim of analyzing these data, several different measures of nucleon (DNA segment) diversity within and between populations are proposed, and statistical methods for estimating these quantities are developed. These statistical methods are applicable to both nuclear and nonnuclear DNAs. When evolutionary change of nucleons occurs mainly by mutation and genetic drift, all the measures can be expressed in terms of the product of mutation rate per nucleon and effective population size. A method for estimating nucleotide diversity from nucleon diversity is also presented under certain assumptions. It is shown that DNA divergence between two populations can be studied either by the average number of restriction site differences or by the average number of nucleotide differences. In either case, a large number of different restriction enzymes should be used for studying phylogenetic relationships among related organisms, since the effect of stochastic factors on these quantities is very large. The statistical methods developed have been applied to data of Shah and Langley on mitochondrial (mt)DNA from Drosophila melanogaster, simulans and virilis. This application has suggested that the evolutionary change of mtDNA in higher animals occurs mainly by nucleotide substitution rather than by deletion and insertion. The evolutionary distances among the three species have also been estimated. 相似文献
14.
15.
All established methods for detecting positive selection at the molecular level rely on comparisons between nucleotide sequences. An exceptional method that purports to detect selection on the basis of a single genomic sequence has recently been proposed. This method uses a measure called "codon volatility," defined for each codon as the ratio between the number of nonsynonymous codons that differ from the codon under study at a single nucleotide position and the number of sense codons that differ from the codon under study at a single nucleotide position. Here, we examine various properties of codon volatility and its derivatives and use simulation of evolutionary processes to determine whether they can be used to detect selective pressures. Codons for only four amino acids (glycine, leucine, arginine, and serine) show any variation in codon volatility. Thus, codon volatility is mainly a proxy for amino acid usage, rather than for codon usage, with 65% of all synonymous changes and 27% of all nonsynonymous changes being undetectable by this measure. Genes identified by the volatility method as being subject to positive selection tend to have idiosyncratic amino acid compositions (e.g., they are glycine rich or arginine poor). An additional property of codon volatility is the near zero variance of its mean expectation, which translates into overestimated statistical significance estimates, especially in the absence of corrections for multiple comparisons. A comparison with measures of selection inferred through comparative methodology reveals no relationship between the results of the two methods. Finally, we show that codon volatility can increase in the absence of positive Darwinian selection; that is, increased codon volatility is not indicative of positive selection. 相似文献
16.
Summary A method of estimating the number of nucleotide substitutions from amino acid sequence data is developed by using Dayhoff's mutation probability matrix. This method takes into account the effect of nonrandom amino acid substitutions and gives an estimate which is similar to the value obtained by Fitch's counting method, but larger than the estimate obtained under the assumption of random substitutions (Jukes and Cantor's formula). Computer simulations based on Dayhoff's mutation probability matrix have suggested that Jukes and Holmquist's method of estimating the number of nucleotide substitutions gives an overestimate when amino acid substitution is not random and the variance of the estimate is generally very large. It is also shown that when the number of nucleotide substitutions is small, this method tends to give an overestimate even when amino acid substitution is purely at random. 相似文献
17.
Summary A method for estimating the evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences is presented. This method is applied to genes of øX174 and G4 genomes, histone genes and-globin genes, for which homologous nucleotide sequences are available for comparison to be made. It is shown that the rates of synonymous substitutions are quite uniform among the non-overlapping genes of øX174 and G4 and among histone genes H4, H2B, H3 and H2A. A comparison between øX174 and G4 reveals that, in the overlapping segments of the A-gene, the rate of synonymous substitution is reduced more significantly than the rate of amino acid substitution relative to the corresponding rate in the nonoverlapping segment. It is also suggested that, in the coding regions surrounding the splicing points of intervening sequences of-globin genes, there exist rigid secondary structures. It is in only these regions that the-globin genes show the slowing down of evolutionary rates of both synonymous and amino acid substitutions in the primate line. 相似文献
18.
W M Fitch 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》1986,312(1154):317-324
A nomographic method is presented that estimates the number of nucleotide substitutions since the common ancestor of two nucleotide sequences with no assumption about the proportion of transition and transversion substitutions except that it is constant over time. Of two previous methods of estimating this number, that of M. Kimura (Proc. natn. Acad. Sci. U.S.A. 78, 454-458 (1981) obtains the same result, and is thus confirmed by this work, while that of W. M. Brown, E. M. Prager, A. Wang & A. C. Wilson (J. molec. Evol. 18, 225-239 (1982] does not get the same result. The method presented here also obtains the fraction of all substitutions that are transitions. If one has three or more homologous sequences to compare, one can test the validity of the model by examining the constancy of the estimated proportion of substitutions that are transitions across the various pairs of sequences in a simple visual way. The method is general for any pair of mutually exclusive nucleotide substitutional categories, not just transitions and transversions. Mitochondrial data provide evidence that, for this and probably other current models correcting for superimposed substitutions, one or more of the underlying assumptions is incorrect. This is because there is some unknown systematic bias affecting this evolutionary process. It is suggested that at least part of the bias arises from incorrectly assuming that all sites are variable. In the absence of evidence that this bias is not present in other data, all estimates of the number of substitutions based upon pairs of sequences and current methods of estimating superimposed substitutions at a single site should be viewed as uncertain. 相似文献
19.
Adaptive evolution at the molecular level can be studied by detecting
convergent and parallel evolution at the amino acid sequence level. For a
set of homologous protein sequences, the ancestral amino acids at all
interior nodes of the phylogenetic tree of the proteins can be
statistically inferred. The amino acid sites that have experienced
convergent or parallel changes on independent evolutionary lineages can
then be identified by comparing the amino acids at the beginning and end of
each lineage. At present, the efficiency of the methods of ancestral
sequence inference in identifying convergent and parallel changes is
unknown. More seriously, when we identify convergent or parallel changes,
it is unclear whether these changes are attributable to random chance. For
these reasons, claims of convergent and parallel evolution at the amino
acid sequence level have been disputed. We have conducted computer
simulations to assess the efficiencies, of the parsimony and Bayesian
methods of ancestral sequence inference in identifying convergent and
parallel-change sites. Our results showed that the Bayesian method performs
better than the parsimony method in identifying parallel changes, and both
methods are inefficient in identifying convergent changes. However, the
Bayesian method is recommended for estimating the number of
convergent-change sites because it gives a conservative estimate. We have
developed statistical tests for examining whether the observed numbers of
convergent and parallel changes are due to random chance. As an example, we
reanalyzed the stomach lysozyme sequences of foregut fermenters and found
that parallel evolution is statistically significant, whereas convergent
evolution is not well supported.
相似文献
20.
The estimation of the amount of evolutionary divergence that has taken
place between two DNA coding sequences depends strongly on the degree of
constraint on amino acid replacements. If amino acid replacements are
relatively unconstrained, the individual nucleotide is the appropriate unit
of analysis and the method of Tajima and Nei can be used. If amino acid
replacements are constrained, however, this method is shown to be
inapplicable. For sequences with strong amino acid constraints, a method is
outlined analogous to the Tajima and Nei method using codons as the unit of
analysis. Only synonymous substitutions are used. Codon usage data can be
employed to estimate the necessary parameters of the calculation, or a
priori models of substitution may be employed. Sequences with significant
but intermediate constraints on amino acid replacements are, in principle,
unanalyzable.
相似文献