首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods. [Reviewing Editor: Dr. Nicolas Galtier]  相似文献   

2.
Fitzhugh  Kirk 《Acta biotheoretica》2021,69(4):799-819
Acta Biotheoretica - Three competing ‘methods’ have been endorsed for inferring phylogenetic hypotheses: parsimony, likelihood, and Bayesianism. The latter two have been claimed...  相似文献   

3.
Minimum evolution is the guiding principle of an important class of distance-based phylogeny reconstruction methods, including neighbor-joining (NJ), which is the most cited tree inference algorithm to date. The minimum evolution principle involves searching for the tree with minimum length, where the length is estimated using various least-squares criteria. Since evolutionary distances cannot be known precisely but only estimated, it is important to investigate the robustness of phylogenetic reconstruction to imprecise estimates for these distances. The safety radius is a measure of this robustness: it consists of the maximum relative deviation that the input distances can have from the correct distances, without compromising the reconstruction of the correct tree structure. Answering some open questions, we here derive the safety radius of two popular minimum evolution criteria: balanced minimum evolution (BME) and minimum evolution based on ordinary least squares (OLS + ME). Whereas BME has a radius of \frac12\frac{1}{2}, which is the best achievable, OLS + ME has a radius tending to 0 as the number of taxa increases. This difference may explain the gap in reconstruction accuracy observed in practice between OLS + ME and BME (which forms the basis of popular programs such as NJ and FastME).  相似文献   

4.
SYNOPSIS. Conventional cladistic analyses of phylogeny can beinterpreted as operating at the level of phylogenetic trees.They assume that all "evolutionary steps" (transitions fromone character state to the next, along a morphocline) are independentand equal, and, on that basis, select the cladogram which isconsistent with the most parsimonious trees. Evaluation of theassumptions of independence and equality requires considerationof hypotheses at the levelof scenarios. In some cases, argumentsbased on functional analysis can suggest revised interpretationsof either homology or polarity. If properly formulated, thesearguments can alter the evaluation of parsimony for trees tothe extent that even the choice of cladogram is affected. Thestructure of scenario level arguments is identical to that ofarguments operating at tree level. Examples of phylogeneticinference in the context of xiphosurans (horseshoe crabs), usingboth comparative morphological and functional analysis, illustratethis approach. In different cases, orthodox interpretationsof relationship are either challenged or corroborated. Althoughthe introduction of functional analysis into the process ofphylogenetic inference may appear to compromise the usefulnessof the reconstructed phylogeny for testing hypotheses concerningthe role of natural selection in evolution, it actually increasesthe strength of such tests.  相似文献   

5.
6.
7.
Wen-Hsiung Li 《Genetics》1986,113(1):187-213
Mathematical formulas are developed for the evolutionary change of restriction cleavage sites in a DNA sequence, allowing unequal rates between transitional and transversional types of nucleotide substitution. Formulas are also developed for the probability of having a particular pattern of site changes among evolutionary lineages, such as parallel gains or losses of sites, and for inferring the presence or absence of a restriction site in an ancestral sequence from data on the present-day sequences. The unordered compatibility method is proposed for inferring the phylogenetic relationships among relatively closely related organisms, treating restriction sites as cladistic characters. Formulas are derived for the probability (P+) of obtaining the correct network for a given number (N) of informative sites for the cases of four and five species. These formulas are applied to evaluate the performance of the method and to estimate the N value required for P+ to be 95% or larger. The method performs well when the branches between ancestral nodes and the branches leading to the two most recent species are more or less equal in length, but performs poorly when the latter two branches are considerably longer than the former.  相似文献   

8.
Reverse engineering approaches to constructing gene regulatory networks (GRNs) based on genome-wide mRNA expression data have led to significant biological findings, such as the discovery of novel drug targets. However, the reliability of the reconstructed GRNs needs to be improved. Here, we propose an ensemble-based network aggregation approach to improving the accuracy of network topologies constructed from mRNA expression data. To evaluate the performances of different approaches, we created dozens of simulated networks from combinations of gene-set sizes and sample sizes and also tested our methods on three Escherichia coli datasets. We demonstrate that the ensemble-based network aggregation approach can be used to effectively integrate GRNs constructed from different studies – producing more accurate networks. We also apply this approach to building a network from epithelial mesenchymal transition (EMT) signature microarray data and identify hub genes that might be potential drug targets. The R code used to perform all of the analyses is available in an R package entitled “ENA”, accessible on CRAN (http://cran.r-project.org/web/packages/ENA/).  相似文献   

9.
10.
11.
Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and reciprocal smallest distance). We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of orthology data users as to which database best suits their needs. Second, it introduces new methodology to verify orthology. And third, it sets performance standards for current and future approaches.  相似文献   

12.
The metagenomic approach provides direct access to diverse unexplored genomes, especially from uncultivated bacteria in a given environment. This diversity can conceal many new biosynthetic pathways. Type I polyketide synthases (PKSI) are modular enzymes involved in the biosynthesis of many natural products of industrial interest. Among the PKSI domains, the ketosynthase domain (KS) was used to screen a large soil metagenomic library containing more than 100,000 clones to detect those containing PKS genes. Over 60,000 clones were screened, and 139 clones containing KS domains were detected. A 700-bp fragment of the KS domain was sequenced for 40 of 139 randomly chosen clones. None of the 40 protein sequences were identical to those found in public databases, and nucleic sequences were not redundant. Phylogenetic analyses were performed on the protein sequences of three metagenomic clones to select the clones which one can predict to produce new compounds. Two PKS-positive clones do not belong to any of the 23 published PKSI included in the analysis, encouraging further analyses on these two clones identified by the selection process.  相似文献   

13.
系统发育谱方法是目前研究较多的一种基于非同源性的生物大分子功能注释方法。针对现有算法存在的一些缺陷,从两个方面对该方法做了改进:一是构造基于权重的系统发育谱;二是采用改进的聚类算法对发育谱的相似性进行分析。从NCBI上下载100条Escherichia coli K12蛋白质作为实验数据,分别使用改进的算法和经典的层次聚类算法、K均值聚类算法对相似谱进行分析。结果显示,提出的改进算法在对相似谱聚类的精确度上明显优于后两种聚类算法。  相似文献   

14.
Species of the genus Streptomyces, which constitute the vast majority of taxa within the family Streptomycetaceae, are a predominant component of the microbial population in soils throughout the world and have been the subject of extensive isolation and screening efforts over the years because they are a major source of commercially and medically important secondary metabolites. Taxonomic characterization of Streptomyces strains has been a challenge due to the large number of described species, greater than any other microbial genus, resulting from academic and industrial activities. The methods used for characterization have evolved through several phases over the years from those based largely on morphological observations, to subsequent classifications based on numerical taxonomic analyses of standardized sets of phenotypic characters and, most recently, to the use of molecular phylogenetic analyses of gene sequences. The present phylogenetic study examines almost all described species (615 taxa) within the family Streptomycetaceae based on 16S rRNA gene sequences and illustrates the species diversity within this family, which is observed to contain 130 statistically supported clades, as well as many unsupported and single member clusters. Many of the observed clades are consistent with earlier morphological and numerical taxonomic studies, but it is apparent that insufficient variation is present in the 16S rRNA gene sequence within the species of this family to permit bootstrap-supported resolution of relationships between many of the individual clusters.  相似文献   

15.
The distribution of a phenotype on a phylogenetic tree is often a quantity of interest. Many phenotypes have imperfect heritability, so that a measurement of the phenotype for an individual can be thought of as a single realization from the phenotype distribution of that individual. If all individuals in a phylogeny had the same phenotype distribution, measured phenotypes would be randomly distributed on the tree leaves. This is, however, often not the case, implying that the phenotype distribution evolves over time. Here we propose a new model based on this principle of evolving phenotype distribution on the branches of a phylogeny, which is different from ancestral state reconstruction where the phenotype itself is assumed to evolve. We develop an efficient Bayesian inference method to estimate the parameters of our model and to test the evidence for changes in the phenotype distribution. We use multiple simulated data sets to show that our algorithm has good sensitivity and specificity properties. Since our method identifies branches on the tree on which the phenotype distribution has changed, it is able to break down a tree into components for which this distribution is unique and constant. We present two applications of our method, one investigating the association between HIV genetic variation and human leukocyte antigen and the other studying host range distribution in a lineage of Salmonella enterica, and we discuss many other potential applications.  相似文献   

16.
17.
hub蛋白质作为参与较多互作的"中心蛋白".在实现蛋白质功能和生命活动中发挥着关键作用.而结构域作为蛋白质上的基本功能区域,决定着蛋白质功能及蛋白质互作的情况.互作网络中hub蛋白质和结构域对于蛋白质功能的实现均起到决定性的作用.对蛋白质互作与结构域的关系分析表明.蛋白质互作与结构域之间存在着密切的联系.对人类蛋白质互作网络中的hub蛋白与结构域进行关联分析.探讨hub蛋白及其互作partner与结构域数目之间的关系,并通过hub蛋白质之间的互作对相应结构域的关系进行进一步的论证.  相似文献   

18.
19.
The averaged genomic similarities based on multilocus randomly amplified polymorphic DNA (RAPD) were calculated for eight species representing three sections of the genus Vicia: faba, bithynica and narbonensis. The frequency of appearance of the sequences corresponding to 25 decamers selected at random from genomes of different Fabace species was checked, and a high correlation with the frequency observed for Vicia allowed us to assume their similar weight in typing Vicia species. The RAPD-based similarity coefficients compared with those related to whole genome hybridization with barley rDNA and those based on restriction fragment length polymorphism (RFLP) revealed similar interspecies relationships. The averaged RAPD-based similarity coefficient (Pearson’s) was 0.68 for all the species, and was sectionspecific: 0.43 (bithynica), 0.50 (faba) and 0.73 (narbonensis). The averaged similarity coefficient for V. serratifolia (0.63) placed it apart from the rest (0.75) of its section. The results correspond to the interspecies relationships built upon non-genetic data. The averaged similarity coefficient for particular RAPD was related to the presence and type of tandemly repeated motif in a primer: 0.7–0.8 for heterodimers (GC, AG, CA, GT, CT), 0.5–0.6 for homodimers (CC, GG) and 0.6 for no repeat, indicating the sensitivity of diversity range to the type of target sequences.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号