首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

2.
The relative efficiencies of the maximum-parsimony (MP), UPGMA, and neighbor-joining (NJ) methods in obtaining the correct tree (topology) for restriction-site and restriction-fragment data were studied by computer simulation. In this simulation, six DNA sequences of 16,000 nucleotides were assumed to evolve following a given model tree. The recognition sequences of 20 different six-base restriction enzymes were used to identify the restriction sites of the DNA sequences generated. The restriction-site data and restriction-fragment data thus obtained were used to reconstruct a phylogenetic tree, and the tree obtained was compared with the model tree. This process was repeated 300 times. The results obtained indicate that when the rate of nucleotide substitution is constant the probability of obtaining the correct tree (Pc) is generally higher in the NJ method than in the MP method. However, if we use the average topological deviation from the model tree (dT) as the criterion of comparison, the NJ and MP methods are nearly equally efficient. When the rate of nucleotide substitution varies with evolutionary lineage, the NJ method is better than the MP method, whether Pc or dT is used as the criterion of comparison. With 500 nucleotides and when the number of nucleotide substitutions per site was very small, restriction-site data were, contrary to our expectation, more useful than sequence data. Restriction-fragment data were less useful than restriction-site data, except when the sequence divergence was very small. UPGMA seems to be useful only when the rate of nucleotide substitution is constant and sequence divergence is high.  相似文献   

3.
The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.   相似文献   

4.
Phylogenetic analysis using parsimony and likelihood methods   总被引:1,自引:0,他引:1  
The assumptions underlying the maximum-parsimony (MP) method of phylogenetic tree reconstruction were intuitively examined by studying the way the method works. Computer simulations were performed to corroborate the intuitive examination. Parsimony appears to involve very stringent assumptions concerning the process of sequence evolution, such as constancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For practical data analysis, the requirement of equal branch lengths means similar substitution rates among lineages (the existence of an approximate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficulties involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981,J. Mol. Evol. 17: 368–376) for topology estimation, as well as its many variations and extensions, differs fundamentally from the maximum likelihood estimation of a conventional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simulations were performed to study the probability that MP recovers the true tree under a hierarchy of models of nucleotide substitution; its performance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumptions underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that MP recovers the true topology could be as high as, or even higher than, that for the likelihood method. When the assumed model became more complex and realistic, e.g., when substitution rates were allowed to differ between nucleotides or across sites, the probability that MP recovers the true topology, and especially its performance relative to that of the likelihood method, generally deteriorates. As the complexity of the process of nucleotide substitution in real sequences is well recognized, the likelihood method appears preferable to parsimony. However, the development of a statistical methodology for the efficient estimation of the tree topology remains a difficult open problem.  相似文献   

5.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

6.
We examined the effect of increasing the number of sampled amplified fragment length polymorphism (AFLP) bands to reconstruct an accurate and well-supported AFLP-based phylogeny. In silico AFLP was performed using simulated DNA sequences evolving along balanced and unbalanced model trees with recent, uniform and ancient radiations and average branch lengths (from the most internal node to the tip) ranging from 0.02 to 0.05 substitutions per site. Trees were estimated by minimum evolution (ME) and maximum parsimony (MP) methods from both DNA sequences and virtual AFLP fingerprints. The comparison of the true tree with the estimated AFLP trees suggests that moderate numbers of AFLP bands are necessary to recover the correct topology with high bootstrap support values (i.e. >70%). Fewer numbers of bands are necessary for shorter tree lengths and for balanced than for unbalanced tree topologies. However, branch length estimation was rather unreliable and did not improve substantially after a certain number of bands were sampled. These results hold for different levels of genome coverage and number of taxa analysed. In silico AFLP using bacterial genomic DNA sequences recovered a well-supported tree topology that mirrored an empirical phylogeny based on a set of 31 orthologous gene sequences when as few as 263 AFLP bands were scored. These results suggest that AFLPs may be an efficient alternative to traditional DNA sequencing for accurate topology reconstruction of shallow trees when not very short ancestral branches exist.  相似文献   

7.
The relative efficiencies of the maximum-likelihood (ML), neighbor- joining (NJ), and maximum-parsimony (MP) methods in obtaining the correct topology and in estimating the branch lengths for the case of four DNA sequences were studied by computer simulation, under the assumption either that there is variation in substitution rate among different nucleotide sites or that there is no variation. For the NJ method, several different distance measures (Jukes-Cantor, Kimura two- parameter, and gamma distances) were used, whereas for the ML method three different transition/transversion ratios (R) were used. For the MP method, both the standard unweighted parsimony and the dynamically weighted parsimony methods were used. The results obtained are as follows: (1) When the R value is high, dynamically weighted parsimony is more efficient than unweighted parsimony in obtaining the correct topology. (2) However, both weighted and unweighted parsimony methods are generally less efficient than the NJ and ML methods even in the case where the MP method gives a consistent tree. (3) When all the assumptions of the ML method are satisfied, this method is slightly more efficient than the NJ method. However, when the assumptions are not satisfied, the NJ method with gamma distances is slightly better in obtaining the correct topology than is the ML method. In general, the two methods show more or less the same performance. The NJ method may give a correct topology even when the distance measures used are not unbiased estimators of nucleotide substitutions. (4) Branch length estimates of a tree with the correct topology are affected more easily than topology by violation of the assumptions of the mathematical model used, for both the ML and the NJ methods. Under certain conditions, branch lengths are seriously overestimated or underestimated. The MP method often gives serious underestimates for certain branches. (5) Distance measures that generate the correct topology, with high probability, do not necessarily give good estimates of branch lengths. (6) The likelihood-ratio test and the confidence-limit test, in Felsenstein's DNAML, for examining the statistical of branch length estimates are quite sensitive to violation of the assumptions and are generally too liberal to be used for actual data. Rzhetsky and Nei's branch length test is less sensitive to violation of the assumptions than is Felsenstein's test. (7) When the extent of sequence divergence is < or = 5% and when > or = 1,000 nucleotides are used, all three methods show essentially the same efficiency in obtaining the correct topology and in estimating branch lengths.(ABSTRACT TRUNCATED AT 400 WORDS)   相似文献   

8.
Lake's evolutionary parsimony (EP) method of constructing a phylogenetic tree is primarily applied to four DNA sequences. In this method, three quantities--X, Y, and Z--that correspond to three possible unrooted trees are computed, and an invariance property of these quantities is used for choosing the best tree. However, Lake's method depends on a number of unrealistic assumptions. We therefore examined the theoretical basis of his method and reached the following conclusions: (1) When the rates of two transversional changes from a nucleotide are unequal, his invariance property breaks down. (2) Even if the rates of two transversional changes are equal, the invariance property requires some additional conditions. (3) When Kimura's two- parameter model of nucleotide substitution applies and the rate of nucleotide substitution varies greatly with branch, the EP method is generally better than the standard maximum-parsimony (MP) method in recovering the correct tree but is inferior to the neighbor-joining (NJ) and a few other distance matrix methods. (4) When the rate of nucleotide substitution is the same or nearly the same for all branches, the EP method is inferior to the MP method even if the proportion of transitional changes is high. (5) When Lake's assumptions fail, his chi2 test may identify an erroneous tree as the correct tree. This happens because the test is not for comparing different trees. (6) As long as a proper distance measure is used, the NJ method is better than the EP and MP methods whether there is a transition/transversion bias or whether there is variation in substitution rate among different nucleotide sites.   相似文献   

9.
Nucleotide sequences of the genome RNA encoding capsid protein VP1 (918 nucleotides) of 18 enterovirus 70 (EV70) isolates collected from various parts of the world in 1971 to 1981 were determined, and nucleotide substitutions among them were studied. The genetic distances between isolates were calculated by the pairwise comparison of nucleotide difference. Regression analysis of the genetic distances against time of isolation of the strains showed that the synonymous substitution rate was very high at 21.53 x 10(-3) substitution per nucleotide per year, while the nonsynonymous rate was extremely low at 0.32 x 10(-3) substitution per nucleotide per year. The rate estimated by the average value of synonymous and nonsynonymous substitutions (W.-H. Li, C.-C. Wu, and C.-C. Luo, Mol. Biol. Evol. 2:150-174, 1985) was 5.00 x 10(-3) substitution per nucleotide per year. Taking the average value of synonymous and nonsynonymous substitutions as genetic distances between isolates, the phylogenetic tree was inferred by the unweighted pairwise grouping method of arithmetic average and by the neighbor-joining method. The tree indicated that the virus had evolved from one focal place, and the time of emergence was estimated to be August 1967 +/- 15 months, 2 years before first recognition of the pandemic of acute hemorrhagic conjunctivitis. By superimposing every nucleotide substitution on the branches of the phylogenetic tree, we analyzed nucleotide substitution patterns of EV70 genome RNA. In synonymous substitutions, the proportion of transitions, i.e., C<==>U and G<==>A, was found to be extremely frequent in comparison with that reported on other viruses or pseudogenes. In addition, parallel substitutions (independent substitutions at the same nucleotide position on different branches, i.e., different isolates, of the tree) were frequently found in both synonymous and nonsynonymous substitutions. These frequent parallel substitutions and the low nonsynonymous substitution rate despite the very high synonymous substitution rate described above imply a strong restriction on nonsynonymous substitution sites of VP1, probably due to the requirement for maintaining the rigid icosahedral conformation of the virus.  相似文献   

10.
The most commonly used measure of evolutionary distance in molecular phylogenetics is the number of nucleotide substitutions per site. However, this number is not necessarily most efficient for reconstructing a phylogenetic tree. In order to evaluate the accuracy of evolutionary distance, D(t), for obtaining the correct tree topology, an accuracy index, A(t), was proposed. This index is defined as D'(t)/square root of[D(t)], where D'(t) is the first derivative of D(t) with respect to evolutionary time and V[D(t)] is the sampling variance of evolutionary distance. Using A(t), namely, finding the condition under which A(t) gives the maximum value, we can obtain an evolutionary distance which is efficient for obtaining the correct topology. Under the assumption that the transversional changes do not occur as frequently as the transitional changes, we obtained the evolutionary distances which are expected to give the correct topology more often than are the other distances.   相似文献   

11.
In phylogenetic inference by maximum-parsimony (MP), minimum-evolution (ME), and maximum-likelihood (ML) methods, it is customary to conduct extensive heuristic searches of MP, ME, and ML trees, examining a large number of different topologies. However, these extensive searches tend to give incorrect tree topologies. Here we show by extensive computer simulation that when the number of nucleotide sequences (m) is large and the number of nucleotides used (n) is relatively small, the simple MP or ML tree search algorithms such as the stepwise addition (SA) plus nearest neighbor interchange (NNI) search and the SA plus subtree pruning regrafting (SPR) search are as efficient as the extensive search algorithms such as the SA plus tree bisection-reconnection (TBR) search in inferring the true tree. In the case of ME methods, the simple neighbor-joining (NJ) algorithm is as efficient as or more efficient than the extensive NJ+TBR search. We show that when ME methods are used, the simple p distance generally gives better results in phylogenetic inference than more complicated distance measures such as the Hasegawa-Kishino-Yano (HKY) distance, even when nucleotide substitution follows the HKY model. When ML methods are used, the simple Jukes-Cantor (JC) model of phylogenetic inference generally shows a better performance than the HKY model even if the likelihood value for the HKY model is much higher than that for the JC model. This indicates that at least in the present case, selecting of a substitution model by using the likelihood ratio test or the AIC index is not appropriate. When n is small relative to m and the extent of sequence divergence is high, the NJ method with p distance often shows a better performance than ML methods with the JC model. However, when the level of sequence divergence is low, this is not the case.  相似文献   

12.
This paper describes the inferential method, an approach for reconstructing protein and nucleotide sequences of ancestral species, starting from known, homologous, contemporary sequences. The method requires knowledge of the topology of the phylogenetic tree, whose nodes are the species to whom the reconstructed sequences belong.The method has been tested by computer simulation of speciation and nucleotide substitutions, starting from a single ancestral sequence, and by subsequent reconstruction of nodal sequences. Results have shown that reconstructions obtained by the inferential method are affected by limited error frequencies, which (1) are proportional to the squares of nucleotide substitution rates and of internodal distances, and (2) are little influenced by non-uniformity of transformation rates of nucleotides.Furthermore, good agreement of the results has been obtained by comparing protein-sequence reconstructions carried out with the inferential method with those obtained using the maximum parsimony method in two different cases: e.g., a reconstruction of simulated sequences and a reconstruction of mammalian ribonuclease sequences.Abbreviations used MP maximum parsimony method - ML maximum likelihood method - IM inferential method - MY millions of years - N-tree natural-like phylogenetic tree - E-tree equibranched phylogenetic tree - EA percentage number of erroneous amino acids in a reconstructed sequence - EC percentage number of erroneous codons in a reconstructed sequence - t n time interval between a P- and its - F-sequence nucleotides and amino acids are indicated by their I.U.B. codes (N.C.-I.U.B., 1985) Correspondence to: A. Di Donato  相似文献   

13.
Accuracy of estimated phylogenetic trees from molecular data   总被引:27,自引:0,他引:27  
The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's f theta, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogenetic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques.  相似文献   

14.
Summary The effects of temporal (among different branches of a phylogeny) and spatial (among different nucleotide sites within a gene) nonuniformities of nucleotide substitution rates on the construction of phylogenetic trees from nucleotide sequences are addressed. Spatial nonuniformity may be estimated by using Shannon's (1948) entropy formula to measure the Relative Nucleotide Variability (RNV) at each nucleotide site in an aligned set of sequences; this is demonstrated by a comparative analysis of 5S rRNAs. New methods of constructing phylogenetic trees are proposed that augment the Unweighted Pair-Group Using Arithmetic Averages (UPGMA) algorithm by estimating and compensating for both spatial and temporal nonuniformity in substitution rates. These methods are evaluated by computer simulations of 5S rRNA evolution that include both kinds of nonuniformities. It was found that the proposed Reference Ratio Method improved both the ability to reconstruct the correct topology of a tree and also the estimation of branch lengths as compared to UPGMA. A previous method (Farris et al. 1970; Klotz et al. 1979; Li 1981) was found to be less successful in reconstructing topologies when there is high probability of multiple mutations at some sites. Phylogenetic analyses of 5S rRNA sequences support the endosymbiotic origins of both chloroplasts and mitochondria, even though the latter exhibit an accelerated rate of nucleotide substitution. Phylogenetic trees also reveal an adaptive radiation within the eubacteria and another within the eukaryotes for the origins of most major phyla within each group during the Precambrian era.  相似文献   

15.
Summary A mathematical formula for the relationship between the average number of nucleotide substitutions per site and the proportion of shared restriction sites between two homologous nucleons is developed by taking into account the unequal rates of substitution among different pairs of nucleotides. Using this formula, the possible amount of bias of the estimate of the number of nucleotide substitutions obtained by the Upholt-Nei-Li formula for restriction site data is investigated. The results obtained indicate that the bias depends upon the nucleotides in the recognition sequence of the restriction enzyme used, the unequal rates of substitution among different nucleotides, and the unequal nucleotide frequencies, but the primary factor is the unequal rates of nucleotide substitution. The amount of bias is generally larger for four-base enzymes than for six-base enzymes. However, when many restriction enzymes are used for the study of DNA divergence, the bias is unlikely to be very large unless the rate of substitution greatly varies from nucleotide to nucleotide.  相似文献   

16.
The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p- distance) between sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree- making with any of the popular methods working with evolutionary distances if evolution of sequences obeys a "molecular clock" (more precisely, if it follows a stationary time-reversible Markov model of nucleotide substitution). Next, we show that p-distances seem to be efficient in recovering the correct tree topology under a "molecular clock," but produce "statistically supported" wrong trees when substitutions rates vary among evolutionary lineages. Finally, we outline a practical approach for selecting an "optimal" model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a "prior" distribution of the expected tree branch lengths under the Jukes-Cantor model. We conclude that the use of a model that is obviously oversimplified is inadvisable unless it is justified by a preliminary analysis of the real sequences.   相似文献   

17.
Summary A formal mathematical analysis of Kimura's (1981) six-parameter model of nucleotide substitution for the case of unequal substitution rates among different pairs of nucleotides is conducted, and new formulae for estimating the number of nucleotide substitutions and its standard error are obtained. By using computer simulation, the validities and utilities of Jukes and Cantor's (1969) one-parameter formula, Takahata and Kimura's (1981) four-parameter formula, and our sixparameter formula for estimating the number of nucleotide substitutions are examined under three different schemes of nucleotide substitution. It is shown that the one-parameter and four-parameter formulae often give underestimates when the number of nucleotide substitutions is large, whereas the six-parameter formula generally gives a good estimate for all the three substitution schemes examined. However, when the number of nucleotide substitutions is large, the six-parameter and four-parameter formulae are often inapplicable unless the number of nucleotides compared is extremely large. It is also shown that as long as the mean number of nucleotide substitutions is smaller than one per nucleotide site the three formulae give more or less the same estimate regardless of the substitution scheme used.On leave of absence from the Department of Biology, Faculty of Science, Kyushu University 33, Fukuoka 812, Japan  相似文献   

18.
Estimation of evolutionary distance between nucleotide sequences   总被引:34,自引:9,他引:25  
A mathematical formula for estimating the average number of nucleotide substitutions per site (delta) between two homologous DNA sequences is developed by taking into account unequal rates of substitution among different nucleotide pairs. Although this formula is obtained for the equal-input model of nucleotide substitution, computer simulations have shown that it gives a reasonably good estimate for a wide range of nucleotide substitution patterns as long as delta is equal to or smaller than 1. Furthermore, the frequency of cases to which the formula is inapplicable is much lower than that for other similar methods recently proposed. This point is illustrated using insulin genes. A statistical method for estimating the number of nucleotide changes due to deletion and insertion is also developed. Application of this method to globin gene data indicates that the number of nucleotide changes per site increases with evolutionary time but the pattern of the increase is quite irregular.   相似文献   

19.
A method for detecting positive selection at single amino acid sites   总被引:23,自引:0,他引:23  
A method was developed for detecting the selective force at single amino acid sites given a multiple alignment of protein-coding sequences. The phylogenetic tree was reconstructed using the number of synonymous substitutions. Then, the neutrality was tested for each codon site using the numbers of synonymous and nonsynonymous changes throughout the phylogenetic tree. Computer simulation showed that this method accurately estimated the numbers of synonymous and nonsynonymous substitutions per site, as long as the substitution number on each branch was relatively small. The false-positive rate for detecting the selective force was generally low. On the other hand, the true-positive rate for detecting the selective force depended on the parameter values. Within the range of parameter values used in the simulation, the true-positive rate increased as the strength of the selective force and the total branch length (namely the total number of synonymous substitutions per site) in the phylogenetic tree increased. In particular, with the relative rate of nonsynonymous substitutions to synonymous substitutions being 5.0, most of the positively selected codon sites were correctly detected when the total branch length in the phylogenetic tree was > or = 2.5. When this method was applied to the human leukocyte antigen (HLA) gene, which included antigen recognition sites (ARSs), positive selection was detected mainly on ARSs. This finding confirmed the effectiveness of the present method with actual data. Moreover, two amino acid sites were newly identified as positively selected in non-ARSs. The three-dimensional structure of the HLA molecule indicated that these sites might be involved in antigen recognition. Positively selected amino acid sites were also identified in the envelope protein of human immunodeficiency virus and the influenza virus hemagglutinin protein. This method may be helpful for predicting functions of amino acid sites in proteins, especially in the present situation, in which sequence data are accumulating at an enormous speed.  相似文献   

20.
The advantages of nucleotide sequence data for studying phylogeny have been shown to include number of potential characters available for comparison, rate independence between molecular and morphological evolution, and utility of molecular data for modeling patterns of nucleotide substitution. Potential pitfalls have also been revealed and include difficulties of inferring positional homology, incongruence between organismal and gene genealogies, and low likelihood of recovering the correct phylogeny given certain patterns in the timing of speciation events. Statistical methods for comparing phylogenetic hypotheses have been used to assess the reliability of alternative trees for ascaridoid nematodes. Based on partial ribosomal RNA sequences, tree topologies inconsistent with monophyly of the Ascaridinae were significantly worse by maximum likelihood inference. The topology of the maximum parsimony tree based on full-length sequences of 18S rRNA and 300 nucleotides of Cytochrome oxidase II for 13 ascaridoid species was generally consistent with traditional taxonomic expectations at lower ranks, but inconsistent with most proposed arrangements at higher taxonomic levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号