首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Sampling properties of DNA sequence data in phylogenetic analysis   总被引:20,自引:6,他引:20  
We inferred phylogenetic trees from individual genes and random samples of nucleotides from the mitochondrial genomes of 10 vertebrates and compared the results to those obtained by analyzing the whole genomes. Individual genes are poor samples in that they infrequently lead to the whole-genome tree. A large number of nucleotide sites is needed to exactly determine the whole-genome tree. A relatively small number of sites, however, often results in a tree close to the whole-genome tree. We found that blocks of contiguous sites were less likely to lead to the whole-genome tree than samples composed of sites drawn individually from throughout the genome. Samples of contiguous sites are not representative of the entire genome, a condition that violates a basic assumption of the bootstrap method as it is applied in phylogenetic studies.   相似文献   

2.
3.
4.
5.
6.
7.
Multiple sequence alignment is discussed in light of homology assessments in phylogenetic research. Pairwise and multiple alignment methods are reviewed as exact and heuristic procedures. Since the object of alignment is to create the most efficient statement of initial homology, methods that minimize nonhomology are to be favored. Therefore, among all possible alignments, the one that satisfies the phylogenetic optimality criterion the best should be considered the best alignment. Since all homology statements are subject to testing and explanation this way, consistency of optimality criteria is desirable. This consistency is based on the treatment of alignment gaps as character information and the consistent use of a cost function (e.g., insertion-deletion, transversion, and transition) through analysis from alignment to phylogeny reconstruction. Cost functions are not subject to testing via inspection; hence the assumptions they make should be examined by varying the assumed values in a sensitivity analysis context to test for the robustness of results. Agreement among data may be used to choose an optimal solution set from all of those examined through parameter variation. This idea of consistency between assumption and analysis through alignment and cladogram reconstruction is not limited to parsimony analysis and could and should be applied to other forms of analysis such as maximum likelihood.  相似文献   

8.
9.
Debate exists over how to incorporate information from multipartite sequence data in phylogenetic analyses. Strict combined-data approaches argue for concatenation of all partitions and estimation of one evolutionary history, maximizing the explanatory power of the data. Consensus/independence approaches endorse a two-step procedure where partitions are analyzed independently and then a consensus is determined from the multiple results. Mixtures across the model space of a strict combined-data approach and a priori independent parameters are popular methods to integrate these methods. We propose an alternative middle ground by constructing a Bayesian hierarchical phylogenetic model. Our hierarchical framework enables researchers to pool information across data partitions to improve estimate precision in individual partitions while permitting estimation and testing of tendencies in across-partition quantities. Such across-partition quantities include the distribution from which individual topologies relating the sequences within a partition are drawn. We propose standard hierarchical priors on continuous evolutionary parameters across partitions, while the structure on topologies varies depending on the research problem. We illustrate our model with three examples. We first explore the evolutionary history of the guinea pig (Cavia porcellus) using alignments of 13 mitochondrial genes. The hierarchical model returns substantially more precise continuous parameter estimates than an independent parameter approach without losing the salient features of the data. Second, we analyze the frequency of horizontal gene transfer using 50 prokaryotic genes. We assume an unknown species-level topology and allow individual gene topologies to differ from this with a small estimable probability. Simultaneously inferring the species and individual gene topologies returns a transfer frequency of 17%. We also examine HIV sequences longitudinally sampled from HIV+ patients. We ask whether posttreatment development of CCR5 coreceptor virus represents concerted evolution from middisease CXCR4 virus or reemergence of initial infecting CCR5 virus. The hierarchical model pools partitions from multiple unrelated patients by assuming that the topology for each patient is drawn from a multinomial distribution with unknown probabilities. Preliminary results suggest evolution and not reemergence.  相似文献   

10.
DNA sequence data from plastid matK and trnL-F regions were used in phylogenetic analyses of Diurideae, which indicate that Diurideae are not monophyletic as currently delimited. However, if Chloraeinae and Pterostylidinae are excluded from Diurideae, the remaining subtribes form a well-supported, monophyletic group that is sister to a "spiranthid" clade. Chloraea, Gavilea, and Megastylis pro parte (Chloraeinae) are all placed among the spiranthid orchids and form a grade with Pterostylis leading to a monophyletic Cranichideae. Codonorchis, previously included among Chloraeinae, is sister to Orchideae. Within the more narrowly delimited Diurideae two major lineages are apparent. One includes Diuridinae, Cryptostylidinae, Thelymitrinae, and an expanded Drakaeinae; the other includes Caladeniinae s.s., Prasophyllinae, and Acianthinae. The achlorophyllous subtribe Rhizanthellinae is a member of Diurideae, but its placement is otherwise uncertain. The sequence-based trees indicate that some morphological characters used in previous classifications, such as subterranean storage organs, anther position, growth habit, fungal symbionts, and pollination syndromes have more complex evolutionary histories than previously hypothesized. Treatments based upon these characters have produced conflicting classifications, and molecular data offer a tool for reevaluating these phylogenetic hypotheses.  相似文献   

11.
12.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

13.
A method based on quenched references and global analysis was used to deconvolute timeresolved single photon counting data. The results from both computer simulated data and real experiments showed that highly accurate and reliable deconvolutions were possible. Fluorescence lifetimes and Stern-Volmer quenching constants for quenching with NaI were determined for the reference substances para-terphenyl, PPO (2,5-diphenyloxazol), POPOP (1,4-bis-(5-phenyl-2-oxazolyl)-benzene), and dimethyl-POPOP, all in ethanol. The fluorescence from a mixture of POPOP, anthracene, and diphenylanthracene in ethanol at different wavelengths was successfully resolved into the known relative contributions from the species at each wavelength. Fluorescence intensity decays of tryptophan in solution were studied at different wavelengths and globally analyzed with the method. Also, fluorescence anisotropy described by isotropic and anisotropic rotations in homogeneous and heterogeneous emitting systems were simulated and successfully deconvoluted. The method was applied to real fluorescence anisotropy data of diphenylanthracene and POPOP in paraffin oil, as well as to data from experiments on the blue copper-containing protein stellacyanin and its apo-form. In these cases, the method both corrected for errors due to, for example, the wavelength-dependent transit-times in the photomultiplier, and realized global deconvolutions of the total, parallel, and perpendicular components of the fluorescence. General algorithms for arbitrary fluorescence impulse responses are given.A preliminary account of this work was presented at the NATO ASI in Acireale, Italy (Löfroth 1985a, in press)  相似文献   

14.
Systematists have access to multiple sources of character information in phylogenetic analysis. For example, it is not unusual to have nucleotide sequences from several different genes, or to have molecular and morphological data. How should diverse data be analyzed in phylogenetic analysis? Several methods have been proposed for the treatment of partitioned data: the total evidence, separate analysis, and conditional combination approaches. Here, we review some of the advantages and disadvantages of the different approaches, with special concentration on which methods help us to discern the evolutionary process and provide the most accurate estimates of phylogeny.  相似文献   

15.
Phylogenetic relationships among the five key angiosperm lineages,Ceratophyllum,Chloranthaceae,eudicots,magnoliids,and monocots,have resisted resolution despite several large-scale analyses sampling taxa and characters extensively and using various analytical methods.Meanwhile,compatibility methods,which were explored together with parsimony and likelihood methods during the early development stage of phylogenetics.have been greatly under-appreciated and not been used to analyze the massive amount of sequence data to reconstruct thye basal angiosperm phylogeny.In this study,we used a compatibility method on a data set of eight genes (mitochondrial atp1,matR,and nad5,plastid atpB,marK,rbcL,and rpoC2,and nuclear 18S rDNA)gathered in an earlier study.We selected two sets of characters that are compatible with more of the other characters than a random character would be with at probabilities of pM<0.1 and p<0.5 respectively.The resulting data matrices were subjected to parsimony and likelihood bootstrap analyses.Our unrooted parsimony analyses showed that Ceratophyllum was immediately related to eudicots,this larger lineage was immediately related to magnoliids,and monocots were closely related to Chloranthaceae.All these relationships received 76%-96% bootstrap support.A likelihood analysis of the 8 gene pM<0.5 compatible site matrix recovered the same topology but with low support.Likelihood analyses of other compatible site matrices produced different topologies that were all weakly supported.The topology reconstructed in the parsimony analyses agrees with the one recovered in the previous study using both parsimony and likelihood methods when no character was eliminated.Parts of this topology have also been recovered in several earlier studies.Hence,this topology plausibly reflects the true relationships among the five key angiosperm lineages.  相似文献   

16.
The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameter-rich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new types of data, such as morphology. Based on this foundation, we developed a Bayesian MCMC approach to the analysis of combined data sets and explored its utility in inferring relationships among gall wasps based on data from morphology and four genes (nuclear and mitochondrial, ribosomal and protein coding). Examined models range in complexity from those recognizing only a morphological and a molecular partition to those having complex substitution models with independent parameters for each gene. Bayesian MCMC analysis deals efficiently with complex models: convergence occurs faster and more predictably for complex models, mixing is adequate for all parameters even under very complex models, and the parameter update cycle is virtually unaffected by model partitioning across sites. Morphology contributed only 5% of the characters in the data set but nevertheless influenced the combined-data tree, supporting the utility of morphological data in multigene analyses. We used Bayesian criteria (Bayes factors) to show that process heterogeneity across data partitions is a significant model component, although not as important as among-site rate variation. More complex evolutionary models are associated with more topological uncertainty and less conflict between morphology and molecules. Bayes factors sometimes favor simpler models over considerably more parameter-rich models, but the best model overall is also the most complex and Bayes factors do not support exclusion of apparently weak parameters from this model. Thus, Bayes factors appear to be useful for selecting among complex models, but it is still unclear whether their use strikes a reasonable balance between model complexity and error in parameter estimates.  相似文献   

17.
We introduce a new method for identifying optimal incomplete data sets from large sequence databases based on the graph theoretic concept of alpha-quasi-bicliques. The quasi-biclique method searches large sequence databases to identify useful phylogenetic data sets with a specified amount of missing data while maintaining the necessary amount of overlap among genes and taxa. The utility of the quasi-biclique method is demonstrated on large simulated sequence databases and on a data set of green plant sequences from GenBank. The quasi-biclique method greatly increases the taxon and gene sampling in the data sets while adding only a limited amount of missing data. Furthermore, under the conditions of the simulation, data sets with a limited amount of missing data often produce topologies nearly as accurate as those built from complete data sets. The quasi-biclique method will be an effective tool for exploiting sequence databases for phylogenetic information and also may help identify critical sequences needed to build large phylogenetic data sets.  相似文献   

18.
A Bayesian phylogenetic analysis of 36 Ipomoea species using sequence data from the internal transcribed spacer region was compared with classification schemes based on traditional methods and a previously published cpDNA restriction fragment length polymorphism (RFLP) study. These molecular studies support a diversity of groups that were circumscribed on the basis of phenetic principles and agree generally with the results from cpDNA RFLP analyses. The congruence between the phylogenetic hypotheses based on new molecular data and the understanding of relationships developed in earlier studies indicate that these classifications may reflect evolutionary history. Two large clades of species, with one including sections Tricolores, Calonyction, and Pharbitis and the other including sections Mina and Leptocallis, were identified. Furthermore, morphologically distinct groups of Ipomoea species received support from the DNA sequence data. Indices of convergence for the Markov chain Monte Carlo (MCMC) in the Bayesian phylogenetic analysis were evaluated. A limited range of posterior probabilities for each node in the trees from a set of five MCMC samples provides a useful index of convergence. Bayesian node support values were generally higher than bootstrap values from a maximum parsimony analysis. This is consistent with the notion that these measures of support estimate different qualities of the data.  相似文献   

19.
A phylogenetic analysis of the Arecoid Line (sensu Moore) of palms was conducted using 7 kb of coding and noncoding plastid DNA sequence data. Recovered maximum-parsimony and maximum-likelihood phylogenies support monophyly for the Arecoid Line relative to the rest of the family but paraphyly for subfamily Arecoideae and polyphyly for subfamily Ceroxyloideae (sensu Dransfield and Uhl). Tribes Cocoeae, Geonomeae, Hyophorbeae, and Iriarteae and subfamily Phytelephantoideae were identified as monophyletic as were subfamily Phytelephantoideae + Ravenea (tribe Ceroxyleae of Ceroxyloideae), Podococcus (tribe Podococceae of Arecoideae) + Pseudophoenix (tribe Cyclospatheae of Ceroxyloideae), Reinhardtia (tribe Malortieinae) + tribe Cocoeae (both of Arecoideae), and a clade containing all IndoPacific pseudomonomerous genera of tribe Areceae (Arecoideae). A few taxa show spurious resolution with noncoding plastid DNA data but noncoding data are generally congruent with protein-coding data. Biogeographic interpretation suggests a Gondwanan origin for the Arecoid Line with several lineages found on more than one fragment of the former supercontinent and primary diversification in these groups possibly due to continental breakup vicariance. Three groups involving Cocos, Orania, and the IndoPacific clade demonstrate independent dispersals into the IndoPacific region from a Gondwanan origin.  相似文献   

20.

Background  

Advances in automated DNA sequencing technology have accelerated the generation of metagenomic DNA sequences, especially environmental ribosomal RNA gene (rDNA) sequences. As the scale of rDNA-based studies of microbial ecology has expanded, need has arisen for software that is capable of managing, annotating, and analyzing the plethora of diverse data accumulated in these projects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号