首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

2.
Accuracy of estimated phylogenetic trees from molecular data   总被引:27,自引:0,他引:27  
The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's f theta, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogenetic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques.  相似文献   

3.
Comparisons are made of the accuracy of the restricted maximum-likelihood, Wagner parsimony, and UPGMA (unweighted pair-group method using arithmetic averages) clustering methods to estimate phylogenetic trees. Data matrices were generated by constructing simulated stochastic evolution in a multidimensional gene-frequency space using a simple genetic-drift model (Brownian-motion, random-walk) with constant rates of divergence in all lineages. Ten differentphylogenetic tree topologies of 20 operational taxonomic units (OTU's), representing a range of tree shapes, were used. Felsenstein's restricted maximum-likelihood method, Wagner parsimony, and UPGMA clustering were used to construct trees from the resulting data matrices. The computations for the restricted maximum-likelihood method were performed on a Cray-1 supercomputer since the required calculations (especially when optimized for the vector hardware) are performed substantially faster than on more conventional computing systems. The overall level of accuracy of tree reconstruction depends on the topology of the true phylogenetic tree. The UPGMA clustering method, especially when genetic-distance coefficients are used, gives the most accurate estimates of the true phylogeny (for our model with constant evolutionary rates). For large numbers of loci, all methods give similar results, but trends in the results imply that the restricted maximum-likelihood method would produce the most accurate trees if sample sizes were large enough.  相似文献   

4.
The neighbor-joining method: a new method for reconstructing phylogenetic trees   总被引:702,自引:29,他引:673  
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.   相似文献   

5.
6.
蚊科三十八个已知属的系统发育数值分析   总被引:6,自引:0,他引:6  
瞿逢伊  钱国正 《昆虫学报》1993,36(1):103-109
  相似文献   

7.
The electrophoretic patterns of seven isozyme systems (ADH, AMY, AAT, GDH, LAP, MDH, and SOD) obtained from dormant seeds from 44 accessions belonging to 12 Petrocoptis taxa were compared in order to clarify taxonomic relationships within the genus. Overall, electrophoretic zymograms showed the presence of up 28 electromorphs, of which 26 were polymorphic among accessions. Mantel tests revealed a moderate level of correlation between the geographic distance matrix and several dissimilarity matrices based on the isozyme data (r=0.3052-0.3376). The electrophoretic profiles of seed isozymes did not match closely the analytical taxonomic framework drawn from morphology. Many electromorphs are widely distributed among Petrocoptis species, and since isozyme polymorphism is present within taxa, few species-specific markers have been found. However, a relationship between the geographic origin of the accessions and several electromorphs has been noticed. Isozyme data gave moderate support to the splitting of the genus into two groups previously defined on the basis of morphology and geographic distribution (western and eastern taxa). However, some samples belonging to P. hispanica and P. pseudoviscosa were somewhat intermediate between both groups as revealed by multivariate ordination techniques. Seed isozymes did not reveal any clear taxonomic grouping among western Petrocoptis species. In fact, no single segregate of this group is supported by the electrophoretic data.  相似文献   

8.
Model selection is an essential issue in longitudinal data analysis since many different models have been proposed to fit the covariance structure. The likelihood criterion is commonly used and allows to compare the fit of alternative models. Its value does not reflect, however, the potential improvement that can still be reached in fitting the data unless a reference model with the actual covariance structure is available. The score test approach does not require the knowledge of a reference model, and the score statistic has a meaningful interpretation in itself as a goodness-of-fit measure. The aim of this paper was to show how the score statistic may be separated into the genetic and environmental parts, which is difficult with the likelihood criterion, and how it can be used to check parametric assumptions made on variance and correlation parameters. Selection of models for genetic analysis was applied to a dairy cattle example for milk production.  相似文献   

9.
Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent procedure splits the resulting trees such that intra-species paralogous genes are divided into different groups so as to create plausible orthologous groups. As a result, the procedure can split genes into the domains minimally required for ortholog grouping. The procedure, named DomClust, was tested using the COG database as a reference. When comparing several clustering algorithms combined with the conventional bidirectional best-hit (BBH) criterion, we found that our method generally showed better agreement with the COG classification. By comparing the clustering results generated from datasets of different releases, we also found that our method showed relatively good stability in comparison to the BBH-based methods.  相似文献   

10.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

11.
We examined the efficiencies of ordination methods in the treatment of gene frequency data at intraspecific level, using metric and nonmetric distance measures (Nei's and Rogers' genetic distances, chi 2 distance). We assessed initial processes responsible for the geographical distribution of the Mediterranean land snail Helix aspersa. Seventeen enzyme loci from 30 North African snail populations were considered in the present analysis. Five combinations of distance/multivariate analysis were compared: correspondence analysis (CA), nonmetric multidimensional scaling (NMDS) on Nei's, Rogers', and chi 2 distances, and principal coordinates analysis on Rogers' distances. Configuration of the objects resulting from ordination was projected onto three-dimensional graphics with the minimum spanning tree or the relative neighborhood graph superimposed. Pre- and postordination or clustering distance matrices were compared by means of correlation methods. As expected, all combinations led to a clear west versus east pattern of variation. However, the intraregional relationships and degree of connectivity between pairs of operational taxonomic units were not necessarily constant from one method to another. Ordination methods when applied with Nei's and Rogers' distances provided the best fit, with original distances (r = 0.98) compared with UPGMA clustering (r approximately 0.75). The Nei/NMDS combination seems to be a good compromise (distortion index dt = 10%) between Rogers/NMDS, which produces a more confusing pattern of differentiation (dt = 24%), and chi 2/CA, which tends to distort large distances (dt = 31%). NMDS obviously provides a powerful method to summarize relationships between populations, when neither hierarchical structure nor phylogenetic inference are required. These findings led the discussion on the good performance of NMDS, the appropriate distances to be used, and the potential application of this method to other types of allelic data (such as microsatellite loci) or data on nucleotide sequences of genes.  相似文献   

12.
EZ-FIT, an interactive microcomputer software package, has been developed for the analysis of enzyme kinetic and equilibrium binding data. EZ-FIT was designed as a user-friendly menu-driven package that has the facility for data entry, editing, and filing. Data input permits the conversion of cpm, dpm, or optical density to molar per minute per milligram protein. Data can be fit to any of 14 model equations including Michaelis-Menten, Hill, isoenzyme, inhibition, dual substrate, agonist, antagonist, and modified integrated Michaelis-Menten. The program uses the Nelder-Mead simplex and Marquardt nonlinear regression algorithms sequentially. A report of the results includes the parameter estimates with standard errors, a Student t test to determine the accuracy of the parameter values, a Runs statistic test of the residuals, identification of outlying data, an Akaike information criterion test for goodness-of-fit, and, when the experimental variance is included, a chi 2 statistic test for goodness-of-fit. Several different graphs can be displayed: an X-Y, a Scatchard, an Eadie-Hofstee, a Lineweaver-Burk, a semilogarithmic, and a residual plot. A data analysis report and graphs are designed to evaluate the goodness-of-fit of the data to a particular model.  相似文献   

13.
The adequacy of various phenetic and phylogenetic estimation methods was evaluated using simulated data sets. Two parsimony programs were used to construct maximum parsimony trees (WAGNER 78 and HENNIG 86). The CAFCA program was used to perform group-compatibility analysis. Four UPGMA clustering strategies were employed. The simulation model GENESIS was used to generate data sets under different evolutionary conditions. The effects of input parameters and tree properties on the accuracy of the estimated trees were evaluated. UPGMA based on product moment correlations of unstandardized characters appeared to perform best, under all evolutionary conditions tested. The effect of input parameters on the accuracy was not very significant. Among the tree statistics the stemminess of the true tree appeared to be the most important estimator of accuracy.  相似文献   

14.
Hydropathic profiles can be considered as an approach to the three-dimensional structure of a protein and so their use for comparison of homologous proteins is proposed, as they provide information on relative structural conservativeness. A simple approach was developed for comparison of hydropathic profiles and applied to 19 lysozymes c of known primary structure. Trees were constructed in order to discover which method yielded the best estimation of the phenotypic differences between the proteins considered, by means of the goodness-of-fit criterion. Iterative methods, such as the Fitch-and-Margoliash and the unweighted-pair-group methods, gave a better fit than did a non-iterative method. When the hydropathic approach is used for comparison of lysozymes c, the enzyme obtained from chachalaca egg-white is placed closer to those from pheasant-like birds than to those of ducks; this result agrees with the morphological resemblance of the chachalaca to pheasant-like birds. Pigeon egg-white and equine milk lysozymes differ greatly in sequence from other lysozymes c and their hydropathic analysis shows important differences with respect to the other homologous enzymes.  相似文献   

15.
A phylogenetic method is a consistent estimator of phylogeny if and only if it is guaranteed to give the correct tree, given that sufficient (possibly infinite) independent data are examined. The following methods are examined for consistency: UPGMA (unweighted pair-group method, averages), NJ (neighbor joining), MF (modified Farris), and P (parsimony). A two-parameter model of nucleotide sequence substitution is used, and the expected distribution of character states is calculated. Without perfect correction for superimposed substitutions, all four methods may be inconsistent if there is but one branch evolving at a faster rate than the other branches. Partial correction of observed distances improves the robustness of the NJ method to rate variation, and perfect correction makes the NJ method a consistent estimator for all combinations of rates that were examined. The sensitivity of all the methods to unequal rates varies over a wide range, so relative-rate tests are unlikely to be a reliable guide for accepting or rejecting phylogenies based on parsimony analysis.  相似文献   

16.
This paper presents a pipeline, implemented in an open‐source program called GB→TNT (GenBank‐to‐TNT), for creating large molecular matrices, starting from GenBank files and finishing with TNT matrices which incorporate taxonomic information in the terminal names. GB→TNT is designed to retrieve a defined genomic region from a bulk of sequences included in a GenBank file. The user defines the genomic region to be retrieved and several filters (genome, length of the sequence, taxonomic group, etc.); each genomic region represents a different data block in the final TNT matrix. GB→TNT first generates Fasta files from the input GenBank files, then creates an alignment for each of those (by calling an alignment program), and finally merges all the aligned files into a single TNT matrix. The new version of TNT can make use of the taxonomic information contained in the terminal names, allowing easy diagnosis of results, evaluation of fit between the trees and the taxonomy, and automatic labelling or colouring of tree branches with the taxonomic groups they represent. © The Willi Hennig Society 2012.  相似文献   

17.
Fluorescence correlation spectroscopy (FCS) is a sensitive and widely used technique for measuring diffusion. FCS data are conventionally modeled with a finite number of diffusing components and fit with a least-square fitting algorithm. This approach is inadequate for analyzing data obtained from highly heterogeneous systems. We introduce a Maximum Entropy Method based fitting routine (MEMFCS) that analyzes FCS data in terms of a quasicontinuous distribution of diffusing components, and also guarantees a maximally wide distribution that is consistent with the data. We verify that for a homogeneous specimen (green fluorescent protein in dilute aqueous solution), both MEMFCS and conventional fitting yield similar results. Further, we incorporate an appropriate goodness of fit criterion in MEMFCS. We show that for errors estimated from a large number of repeated measurements, the reduced chi(2) value in MEMFCS analysis does approach unity. We find that the theoretical prediction for errors in FCS experiments overestimates the actual error, but can be empirically modified to serve as a guide for estimating the goodness of the fit where reliable error estimates are unavailable. Finally, we compare the performance of MEMFCS with that of a conventional fitting routine for analyzing simulated data describing a highly heterogeneous distribution containing 41 diffusing species. Both methods fit the data well. However, the conventional fit fails to reproduce the essential features of the input distribution, whereas MEMFCS yields a distribution close to the actual input.  相似文献   

18.
Statistical methods for computing the standard errors of the branching points of an evolutionary tree are developed. These methods are for the unweighted pair-group method-determined (UPGMA) trees reconstructed from molecular data such as amino acid sequences, nucleotide sequences, restriction-sites data, and electrophoretic distances. They were applied to data for the human, chimpanzee, gorilla, orangutan, and gibbon species. Among the four different sets of data used, DNA sequences for an 895-nucleotide segment of mitochondrial DNA (Brown et al. 1982) gave the most reliable tree, whereas electrophoretic data (Bruce and Ayala 1979) gave the least reliable one. The DNA sequence data suggested that the chimpanzee is the closest and that the gorilla is the next closest to the human species. The orangutan and gibbon are more distantly related to man than is the gorilla. This topology of the tree is in agreement with that for the tree obtained from chromosomal studies and DNA-hybridization experiments. However, the difference between the branching point for the human and the chimpanzee species and that for the gorilla species and the human-chimpanzee group is not statistically significant. In addition to this analysis, various factors that affect the accuracy of an estimated tree are discussed.   相似文献   

19.
The new procedure for constructing a Wagner network presented differs from Farris’s (1970) method in that the amount of computation required is reduced. The usefulness of this procedure was examined by applying it to the 20 characters considered in a recent monograph of the seven OTUs of the genusPentachaeta. A single network was derived from some 945 or more networks possible for this group. A comparison of the network constructed by this simplified method to that constructed by Farris’ procedure revealed no differences. An attempt to reconstruct the cladistic history of this group by generating a Wagner tree based on the network resulted in four equally possible trees, suggesting that further data are needed before cladogenesis in this group is resolved.  相似文献   

20.
In this paper, we use the EcoRI centromeric satellite DNA family conserved in Sparidae as a taxonomic and a phylogenetic marker. The analyses of 56 monomeric units (187 bp in size) obtained by means of cloning and PCR from 10 sparid species indicate that this repetitive DNA evolves by concerted evolution. Different phylogenetic inference methods, such as neighbor-joining and UPGMA, group the 56 repeats by taxonomic affinity and support the existence of at least two monophyletic groups within the Sparidae family. These results reinforce the recent taxonomic revision of the genera Sparus and Pagrus and contradict previous classifications of the Sparidae family.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号