首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Normalized nucleotide and amino acid contents of complete genome sequences can be visualized as radar charts. The shapes of these charts depict the characteristics of an organism’s genome. The normalized values calculated from the genome sequence theoretically exclude experimental errors. Further, because normalization is independent of both target size and kind, this procedure is applicable not only to single genes but also to whole genomes, which consist of a huge number of different genes. In this review, we discuss the applications of the normalization of the nucleotide and predicted amino acid contents of complete genomes to the investigation of genome structure and to evolutionary research from primitive organisms to Homo sapiens. Some of the results could never have been obtained from the analysis of individual nucleotide or amino acid sequences but were revealed only after the normalization of nucleotide and amino acid contents was applied to genome research. The discovery that genome structure was homogeneous was obtained only after normalization methods were applied to the nucleotide or predicted amino acid contents of genome sequences. Normalization procedures are also applicable to evolutionary research. Thus, normalization of the contents of whole genomes is a useful procedure that can help to characterize organisms.  相似文献   

2.
There are three different methods of estimating the number of nucleotide substitutions between a pair of species from amino acid sequence data, i.e. the Poisson correction method, random evolutionary hit method, and counting the actual but minimum number of nucleotide substitutions. In this paper the relationships among the estimates obtained by these methods are studied empirically. The results obtained indicate that there is a high correlation among these estimates and in practice any of the three methods may be used for constructing evolutionary trees or relating nucleotide substitutions to evolutionary time. The effects of varying rates of nucleotide substition among different sites on the Poisson correction and random evolutionary hit methods are also studied mathematically. It is shown that these two methods are quite insensitive to the variation of the rate of nucleotide substitution.  相似文献   

3.
Estimating the rate of evolution of the rate of molecular evolution   总被引:35,自引:13,他引:22  
A simple model for the evolution of the rate of molecular evolution is presented. With a Bayesian approach, this model can serve as the basis for estimating dates of important evolutionary events even in the absence of the assumption of constant rates among evolutionary lineages. The method can be used in conjunction with any of the widely used models for nucleotide substitution or amino acid replacement. It is illustrated by analyzing a data set of rbcL protein sequences.   相似文献   

4.
Most of the sophisticated methods to estimate evolutionary divergence between DNA sequences assume that the two sequences have evolved with the same pattern of nucleotide substitution after their divergence from their most recent common ancestor (homogeneity assumption). If this assumption is violated, the evolutionary distance estimated will be biased, which may result in biased estimates of divergence times and substitution rates, and may lead to erroneous branching patterns in the inferred phylogenies. Here we present a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences. Results from computer simulations and empirical data analyses for human and mouse genes are presented to demonstrate that the proposed modification reduces the estimation bias considerably and that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site. We also discuss the relationship of the substitution and mutation rate estimates when the substitution pattern is not the same in the lineages leading to the two sequences compared.  相似文献   

5.
The covarion (or site specific rate variation, SSRV) process of biological sequence evolution is a process by which the evolutionary rate of a nucleotide/amino acid/codon position can change in time. In this paper, we introduce time-continuous, space-discrete, Markov-modulated Markov chains as a model for representing SSRV processes, generalizing existing theory to any model of rate change. We propose a fast algorithm for diagonalizing the generator matrix of relevant Markov-modulated Markov processes. This algorithm makes phylogeny likelihood calculation tractable even for a large number of rate classes and a large number of states, so that SSRV models become applicable to amino acid or codon sequence datasets. Using this algorithm, we investigate the accuracy of the discrete approximation to the Gamma distribution of evolutionary rates, widely used in molecular phylogeny. We show that a relatively large number of classes is required to achieve accurate approximation of the exact likelihood when the number of analyzed sequences exceeds 20, both under the SSRV and among site rate variation (ASRV) models.  相似文献   

6.
7.
Estimation of evolutionary distance between nucleotide sequences   总被引:34,自引:9,他引:25  
A mathematical formula for estimating the average number of nucleotide substitutions per site (delta) between two homologous DNA sequences is developed by taking into account unequal rates of substitution among different nucleotide pairs. Although this formula is obtained for the equal-input model of nucleotide substitution, computer simulations have shown that it gives a reasonably good estimate for a wide range of nucleotide substitution patterns as long as delta is equal to or smaller than 1. Furthermore, the frequency of cases to which the formula is inapplicable is much lower than that for other similar methods recently proposed. This point is illustrated using insulin genes. A statistical method for estimating the number of nucleotide changes due to deletion and insertion is also developed. Application of this method to globin gene data indicates that the number of nucleotide changes per site increases with evolutionary time but the pattern of the increase is quite irregular.   相似文献   

8.
We propose a simple algorithm for estimating the number of nucleotide differences between a pair of RNA or DNA sequences through comparison of their RNAse A mismatch cleavage patterns. In the RNAse A mismatch cleavage technique two or more sample sequences are hybridized to the same RNA probe, the hybrids are partially digested with RNAse A, and the digestion products are compared on an electrophoretic gel. Here we provide an algorithm for converting the numbers of unique and matching electrophoretic bands into an estimate of the number of nucleotide differences between the sequences. Computer simulation indicates that the proposed method yields a robust estimate of the genetic distance despite stochastic errors and occasional violation of certain assumptions. Our study suggests that the method performs best when the distance between the sequences is <15 differences. When the sequences under analysis are likely to have larger distances, we advise to substitute one long riboprobe with a set of shorter nonoverlapping probes. The new algorithm is applied to infer the proximity of several strains of pseudorabies virus.  相似文献   

9.
Amino acid similarity often needs to be considered in DNA sequence comparison to elucidate gene functions. We propose a Smith-Waterman-like algorithm which considers amino acid similarity and insertions/deletions in sequences at the DNA level and at the protein level in a hybrid manner. The algorithm is applied to cDNA sequences of Oryza sativa and those of Arabidopsis thaliana. The results are compared with the results of application of NCBI's tblastx program (which compares the sequences in the BLAST manner after translation). It is shown that the present algorithm is very helpful in discovering nucleotide insertions/deletions originating from experimental errors as well as amino acid insertions/deletions due to evolutionary reasons.  相似文献   

10.
We have developed a phylogeny-aware progressive alignment method that recognizes insertions and deletions as distinct evolutionary events and thus avoids systematic errors created by traditional alignment methods. We now extend this method to simultaneously model regional heterogeneity and evolution. This novel method can be flexibly adapted to alignment of nucleotide or amino acid sequences evolving under processes that vary over genomic regions and, being fully probabilistic, provides an estimate of regional heterogeneity of the evolutionary process along the alignment and a measure of local reliability of the solution. Furthermore, the evolutionary modelling of substitution process permits adjusting the sensitivity and specificity of the alignment and, if high specificity is aimed at, leaving sequences unaligned when their divergence is beyond a meaningful detection of homology.  相似文献   

11.
A mathematical theory for the evolutionary change of restriction endonuclease cleavage sites is developed, and the probabilities of various types of restriction-site changes are evaluated. A computer simulation is also conducted to study properties of the evolutionary change of restriction sites. These studies indicate that parsimony methods of constructing phylogenetic trees often make erroneous inferences about evolutionary changes of restriction sites unless the number of nucleotide substitutions per site is less than 0.01 for all branches of the tree. This introduces a systematic error in estimating the number of mutational changes for each branch and, consequently, in constructing phylogenetic trees. Therefore, parsimony methods should be used only in cases where nucleotide sequences are closely related. Reexamination of Ferris et al.'s data on restriction-site differences of mitochondrial DNAs does not support Templeton's conclusions regarding the phylogenetic tree for man and apes and the molecular clock hypothesis. Templeton's claim that Nei and Li's method of estimating the number of nucleotide substitutions per site is seriously affected by parallel losses and loss-gains of restriction sites is also unsupported.   相似文献   

12.
The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.   相似文献   

13.
DNA Polymorphism Detectable by Restriction Endonucleases   总被引:82,自引:15,他引:67       下载免费PDF全文
Data on DNA polymorphisms detected by restriction endonucleases are rapidly accumulating. With the aim of analyzing these data, several different measures of nucleon (DNA segment) diversity within and between populations are proposed, and statistical methods for estimating these quantities are developed. These statistical methods are applicable to both nuclear and nonnuclear DNAs. When evolutionary change of nucleons occurs mainly by mutation and genetic drift, all the measures can be expressed in terms of the product of mutation rate per nucleon and effective population size. A method for estimating nucleotide diversity from nucleon diversity is also presented under certain assumptions. It is shown that DNA divergence between two populations can be studied either by the average number of restriction site differences or by the average number of nucleotide differences. In either case, a large number of different restriction enzymes should be used for studying phylogenetic relationships among related organisms, since the effect of stochastic factors on these quantities is very large. The statistical methods developed have been applied to data of Shah and Langley on mitochondrial (mt)DNA from Drosophila melanogaster, simulans and virilis. This application has suggested that the evolutionary change of mtDNA in higher animals occurs mainly by nucleotide substitution rather than by deletion and insertion. The evolutionary distances among the three species have also been estimated.  相似文献   

14.
15.
All established methods for detecting positive selection at the molecular level rely on comparisons between nucleotide sequences. An exceptional method that purports to detect selection on the basis of a single genomic sequence has recently been proposed. This method uses a measure called "codon volatility," defined for each codon as the ratio between the number of nonsynonymous codons that differ from the codon under study at a single nucleotide position and the number of sense codons that differ from the codon under study at a single nucleotide position. Here, we examine various properties of codon volatility and its derivatives and use simulation of evolutionary processes to determine whether they can be used to detect selective pressures. Codons for only four amino acids (glycine, leucine, arginine, and serine) show any variation in codon volatility. Thus, codon volatility is mainly a proxy for amino acid usage, rather than for codon usage, with 65% of all synonymous changes and 27% of all nonsynonymous changes being undetectable by this measure. Genes identified by the volatility method as being subject to positive selection tend to have idiosyncratic amino acid compositions (e.g., they are glycine rich or arginine poor). An additional property of codon volatility is the near zero variance of its mean expectation, which translates into overestimated statistical significance estimates, especially in the absence of corrections for multiple comparisons. A comparison with measures of selection inferred through comparative methodology reveals no relationship between the results of the two methods. Finally, we show that codon volatility can increase in the absence of positive Darwinian selection; that is, increased codon volatility is not indicative of positive selection.  相似文献   

16.
Summary A method of estimating the number of nucleotide substitutions from amino acid sequence data is developed by using Dayhoff's mutation probability matrix. This method takes into account the effect of nonrandom amino acid substitutions and gives an estimate which is similar to the value obtained by Fitch's counting method, but larger than the estimate obtained under the assumption of random substitutions (Jukes and Cantor's formula). Computer simulations based on Dayhoff's mutation probability matrix have suggested that Jukes and Holmquist's method of estimating the number of nucleotide substitutions gives an overestimate when amino acid substitution is not random and the variance of the estimate is generally very large. It is also shown that when the number of nucleotide substitutions is small, this method tends to give an overestimate even when amino acid substitution is purely at random.  相似文献   

17.
Summary A method for estimating the evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences is presented. This method is applied to genes of øX174 and G4 genomes, histone genes and-globin genes, for which homologous nucleotide sequences are available for comparison to be made. It is shown that the rates of synonymous substitutions are quite uniform among the non-overlapping genes of øX174 and G4 and among histone genes H4, H2B, H3 and H2A. A comparison between øX174 and G4 reveals that, in the overlapping segments of the A-gene, the rate of synonymous substitution is reduced more significantly than the rate of amino acid substitution relative to the corresponding rate in the nonoverlapping segment. It is also suggested that, in the coding regions surrounding the splicing points of intervening sequences of-globin genes, there exist rigid secondary structures. It is in only these regions that the-globin genes show the slowing down of evolutionary rates of both synonymous and amino acid substitutions in the primate line.  相似文献   

18.
A nomographic method is presented that estimates the number of nucleotide substitutions since the common ancestor of two nucleotide sequences with no assumption about the proportion of transition and transversion substitutions except that it is constant over time. Of two previous methods of estimating this number, that of M. Kimura (Proc. natn. Acad. Sci. U.S.A. 78, 454-458 (1981) obtains the same result, and is thus confirmed by this work, while that of W. M. Brown, E. M. Prager, A. Wang & A. C. Wilson (J. molec. Evol. 18, 225-239 (1982] does not get the same result. The method presented here also obtains the fraction of all substitutions that are transitions. If one has three or more homologous sequences to compare, one can test the validity of the model by examining the constancy of the estimated proportion of substitutions that are transitions across the various pairs of sequences in a simple visual way. The method is general for any pair of mutually exclusive nucleotide substitutional categories, not just transitions and transversions. Mitochondrial data provide evidence that, for this and probably other current models correcting for superimposed substitutions, one or more of the underlying assumptions is incorrect. This is because there is some unknown systematic bias affecting this evolutionary process. It is suggested that at least part of the bias arises from incorrectly assuming that all sites are variable. In the absence of evidence that this bias is not present in other data, all estimates of the number of substitutions based upon pairs of sequences and current methods of estimating superimposed substitutions at a single site should be viewed as uncertain.  相似文献   

19.
Adaptive evolution at the molecular level can be studied by detecting convergent and parallel evolution at the amino acid sequence level. For a set of homologous protein sequences, the ancestral amino acids at all interior nodes of the phylogenetic tree of the proteins can be statistically inferred. The amino acid sites that have experienced convergent or parallel changes on independent evolutionary lineages can then be identified by comparing the amino acids at the beginning and end of each lineage. At present, the efficiency of the methods of ancestral sequence inference in identifying convergent and parallel changes is unknown. More seriously, when we identify convergent or parallel changes, it is unclear whether these changes are attributable to random chance. For these reasons, claims of convergent and parallel evolution at the amino acid sequence level have been disputed. We have conducted computer simulations to assess the efficiencies, of the parsimony and Bayesian methods of ancestral sequence inference in identifying convergent and parallel-change sites. Our results showed that the Bayesian method performs better than the parsimony method in identifying parallel changes, and both methods are inefficient in identifying convergent changes. However, the Bayesian method is recommended for estimating the number of convergent-change sites because it gives a conservative estimate. We have developed statistical tests for examining whether the observed numbers of convergent and parallel changes are due to random chance. As an example, we reanalyzed the stomach lysozyme sequences of foregut fermenters and found that parallel evolution is statistically significant, whereas convergent evolution is not well supported.   相似文献   

20.
The estimation of the amount of evolutionary divergence that has taken place between two DNA coding sequences depends strongly on the degree of constraint on amino acid replacements. If amino acid replacements are relatively unconstrained, the individual nucleotide is the appropriate unit of analysis and the method of Tajima and Nei can be used. If amino acid replacements are constrained, however, this method is shown to be inapplicable. For sequences with strong amino acid constraints, a method is outlined analogous to the Tajima and Nei method using codons as the unit of analysis. Only synonymous substitutions are used. Codon usage data can be employed to estimate the necessary parameters of the calculation, or a priori models of substitution may be employed. Sequences with significant but intermediate constraints on amino acid replacements are, in principle, unanalyzable.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号