期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Phylogenetic networks based on the molecular clock hypothesis

Bereg S Zhang Y 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(4):661-667

A classical result in phylogenetic trees is that a binary phylogenetic tree adhering to the molecular clock hypothesis exists if and only if the matrix of distances between taxa is ultrametric. The ultrametric condition is very restrictive. In this paper we study phylogenetic networks that can be constructed assuming the molecular clock hypothesis. We characterize distance matrices that admit such networks for 3 and 4 taxa. We also design two algorithms for constructing networks optimizing the least-squares fit. 相似文献

2.

Molecular clock fork phylogenies: closed form analytic maximum likelihood solutions

Chor B Snir S 《Systematic biology》2004,53(6):963-967

Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees, but finding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization (EM) are used in order to find optimal parameters for a given tree. So far, analytic solutions were derived only for the simplest model-three-taxa, two-state characters, under a molecular clock. Quoting Ziheng Yang, who initiated the analytic approach,"this seems to be the simplest case, but has many of the conceptual and statistical complexities involved in phylogenetic estimation."In this work, we give general analytic solutions for a family of trees with four-taxa, two-state characters, under a molecular clock. The change from three to four taxa incurs a major increase in the complexity of the underlying algebraic system, and requires novel techniques and approaches. We start by presenting the general maximum likelihood problem on phylogenetic trees as a constrained optimization problem, and the resulting system of polynomial equations. In full generality, it is infeasible to solve this system, therefore specialized tools for the molecular clock case are developed. Four-taxa rooted trees have two topologies-the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). We combine the ultrametric properties of molecular clock fork trees with the Hadamard conjugation to derive a number of topology dependent identities. Employing these identities, we substantially simplify the system of polynomial equations for the fork. We finally employ symbolic algebra software to obtain closed formanalytic solutions (expressed parametrically in the input data). In general, four-taxa trees can have multiple ML points. In contrast, we can now prove that each fork topology has a unique(local and global) ML point. 相似文献

3.

Maximum likelihood molecular clock comb: analytic solutions.

Benny Chor Amit Khetan Sagi Snir 《Journal of computational biology》2006,13(3):819-837

Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees, but finding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization (EM), are used in order to find optimal parameters for a given tree. So far, analytic solutions were derived only for the simplest model--three taxa, two state characters, under a molecular clock. Four taxa rooted trees have two topologies--the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). In a previous work, we devised a closed form analytic solution for the ML molecular clock fork. In this work, we extend the state of the art in the area of analytic solutions ML trees to the family of all four taxa trees under the molecular clock assumption. The change from the fork topology to the comb incurs a major increase in the complexity of the underlying algebraic system and requires novel techniques and approaches. We combine the ultrametric properties of molecular clock trees with the Hadamard conjugation to derive a number of topology dependent identities. Employing these identities, we substantially simplify the system of polynomial equations. We finally use tools from algebraic geometry (e.g., Gr?bner bases, ideal saturation, resultants) and employ symbolic algebra software to obtain analytic solutions for the comb. We show that in contrast to the fork, the comb has no closed form solutions (expressed by radicals in the input data). In general, four taxa trees can have multiple ML points. In contrast, we can now prove that under the molecular clock assumption, the comb has a unique (local and global) ML point. (Such uniqueness was previously shown for the fork.). 相似文献

4.

On the quality of tree-based protein classification

Lazareva-Ulitsky B Diemer K Thomas PD 《Bioinformatics (Oxford, England)》2005,21(9):1876-1890

相似文献

5.

Evolutionarily distinctive species often capture more phylogenetic diversity than expected

Redding DW Hartmann K Mimoto A Bokal D Devos M Mooers AØ 《Journal of theoretical biology》2008,251(4):606-615

Evolutionary distinctiveness measures of how evolutionarily isolated a species is relative to other members of its clade. Recently, distinctiveness metrics that explicitly incorporate time have been proposed for conservation prioritization. However, we found that such measures differ qualitatively in how well they capture the total amount of evolution (termed phylogenetic diversity, or PD) represented by a set of species. We used simulation and simple graph theory to explore this relationship with reference to phylogenetic tree shape. Overall, the distinctiveness measures capture more PD on more unbalanced trees and on trees with many splits near the present. The rank order of performance was robust across tree shapes, with apportioning measures performing best and node-based measures performing worst. A sample of 50 ultrametric trees from the literature showed the same patterns. Taken together, this suggests that distinctiveness metrics may be a useful addition to other measures of value for conservation prioritization of species. The simplest measure, the age of a species, performed surprisingly well, suggesting that new measures that focus on tree shape near the tips may provide a transparent alternative to more complicated full-tree approaches. 相似文献

6.

Analyzing the Fitch Method for Reconstructing Ancestral States on Ultrametric Phylogenetic Trees

Louxin Zhang Jian Shen Jialiang Yang Guoliang Li 《Bulletin of mathematical biology》2010,72(7):1760-1782

The accuracy of the Fitch method for reconstructing ancestral states on ultrametric phylogenetic trees is studied. Two recurrence relations for computing the accuracy are given here. Using these relations, we analyze the convergence of the accuracy of the Fitch method for reconstructing the root state on a complete binary tree of 2ⁿ leaves as n goes to infinity, present a closed-form formula for the accuracy on ultrametric comb trees, and provide a lower bound on the accuracy on arbitrary ultrametric phylogenetic trees. 相似文献

7.

Multi-Scale Clustering by Building a Robust and Self Correcting Ultrametric Topology on Data Points

Hsieh Fushing Hui Wang Kimberly VanderWaal Brenda McCowan Patrice Koehl 《PloS one》2013,8(2)

The advent of high-throughput technologies and the concurrent advances in information sciences have led to an explosion in size and complexity of the data sets collected in biological sciences. The biggest challenge today is to assimilate this wealth of information into a conceptual framework that will help us decipher biological functions. A large and complex collection of data, usually called a data cloud, naturally embeds multi-scale characteristics and features, generically termed geometry. Understanding this geometry is the foundation for extracting knowledge from data. We have developed a new methodology, called data cloud geometry-tree (DCG-tree), to resolve this challenge. This new procedure has two main features that are keys to its success. Firstly, it derives from the empirical similarity measurements a hierarchy of clustering configurations that captures the geometric structure of the data. This hierarchy is then transformed into an ultrametric space, which is then represented via an ultrametric tree or a Parisi matrix. Secondly, it has a built-in mechanism for self-correcting clustering membership across different tree levels. We have compared the trees generated with this new algorithm to equivalent trees derived with the standard Hierarchical Clustering method on simulated as well as real data clouds from fMRI brain connectivity studies, cancer genomics, giraffe social networks, and Lewis Carroll''s Doublets network. In each of these cases, we have shown that the DCG trees are more robust and less sensitive to measurement errors, and that they provide a better quantification of the multi-scale geometric structures of the data. As such, DCG-tree is an effective tool for analyzing complex biological data sets. 相似文献

8.

A simple method for bracketing absolute divergence times on molecular phylogenies using multiple fossil calibration points

Marshall CR 《The American naturalist》2008,171(6):726-742

A central challenge facing the temporal calibration of molecular phylogenies is finding a quantitative method for estimating maximum age constraints on lineage divergence times. Here, I provide such a method. This method requires an ultrametric tree generated without reference to the fossil record. Exploiting the fact that the relative branch lengths on the ultrametric tree are proportional to time, this method identifies the lineage with the greatest proportion of its true temporal range covered by the fossil record. The oldest fossil of this calibration lineage is used as the minimum age constraint. The maximum age constraint is obtained by adding a confidence interval onto the end point of the calibration lineage, thus making it possible to bracket the true divergence times of all lineages on the tree. The approach can also identify fossils that have been grossly misdated or misassigned to the phylogeny. The method assumes that the relative branch lengths on the ultrametric tree are accurate and that fossilization is random. The effect of violations of these assumptions is assessed. This method is simple to use and is illustrated with a reanalysis of Near et al.'s turtle data. 相似文献

9.

Optimal algorithms for local vertex quartet cleaning

Della Vedova G Wareham HT 《Bioinformatics (Oxford, England)》2002,18(10):1297-1304

MOTIVATION: Reconstructing evolutionary trees is an important problem in biology. A response to the computational intractability of most of the traditional criteria for inferring evolutionary trees has been a focus on new criteria, particularly quartet-based methods that seek to merge trees derived on subsets of four species from a given species-set into a tree for that entire set. Unfortunately, most of these methods are very sensitive to errors in the reconstruction of the trees for individual quartets of species. A recently developed technique called quartet cleaning can alleviate this difficulty in certain cases by using redundant information in the complete set of quartet topologies for a given species-set to correct such errors. RESULTS: In this paper, we describe two new local vertex quartet cleaning algorithms which have optimal time complexity and error-correction bound, respectively. These are the first known local vertex quartet cleaning algorithms that are optimal with respect to either of these attributes. 相似文献

10.

The impact of gene-tree/species-tree discordance on diversification-rate estimation

Burbrink FT Pyron RA 《Evolution; international journal of organic evolution》2011,65(7):1851-1861

Molecular phylogenies are often used to test hypotheses about the tempo and mode of speciation and extinction. One commonly used statistic is Pybus and Harvey's γ, which measures the density of ordered internode distances on an ultrametric tree to infer earlier (negative γ) or later (positive γ) bursts of diversification. However, coalescent theory predicts that γ might be biased toward negative values (inferring early bursts of diversification) when using gene trees rather than species trees. Gene divergences predate species divergences, increasingly so at higher effective population sizes (N(e)), and proportionally more so toward the tips of the tree. Thus, gene trees will have a higher density of older nodes in many cases (particularly at higher N(e)), due to the disproportionate lengthening of terminal branches. This will yield an artifactual signature of early bursts of diversification when estimating γ from gene trees. We simulate gene trees within species trees under both Yule (pure-birth) and birth-death processes, and demonstrate support for these predictions. However, for most realistic estimates of θ in natural populations, gene trees provide relatively good estimates of γ, despite the disproportionate overestimation of younger node ages. This is corroborated with an empirical dataset of North American fence lizards (Sceloporus). 相似文献

11.

Species identification in the taxonomically neglected, highly diverse, neotropical parasitoid wasp genus Notiospathius (Braconidae: Doryctinae) based on an integrative molecular and morphological approach

Ceccarelli FS Sharkey MJ Zaldívar-Riverón A 《Molecular phylogenetics and evolution》2012,62(1):485-495

Various DNA sequence-based methods for species delineation have recently been developed to assess the species-richness of highly diverse, neglected invertebrate taxa. These methods, however, need to be tested under a variety of conditions, including the use of different markers and parameters. Here, we explored the species diversity of a species-rich group of braconid parasitoid wasps, the Neotropical genus Notiospathius, including 233 specimens from 10 different countries. We examined sequences of two mitochondrial (mt) (COI, cyt b) and one nuclear (wg) gene fragments. We analysed them separately as well as concatenating the mt data with the general mixed Yule-coalescent (GMYC) model for species delineation using different tree-building methods and parameters for reconstructing ultrametric trees. We evaluated the performance of GMYC analyses by comparing their species delineations with our morphospecies identifications. Reconstructing ultrametric trees with a relaxed lognormal clock rate using the program BEAST gave the most congruent results with morphology for the two mt markers. A tree obtained with wg using the programs MrBayes+Pathd8 had the fewest cases of incongruence with morphology, though the performance of this nuclear marker was considerably lower than that of COI and cyt b. Species delimitation using the coalescent prior to obtain ultrametric trees was morphologically more congruent with COI, whereas the Yule prior was more congruent with cyt b. The analyses concatenating the mt datasets failed to recover some species supported both by morphology and the separate analyses of the mt markers. The highest morphological congruence was obtained with the GMYC analysis on an ultrametric tree reconstructed with cyt b using the relaxed lognormal clock rate and the Yule prior, thus supporting the importance of using alternative markers when the information of the barcoding locus (COI) is not concordant with morphological evidence. Seventy-one species were delimited based on the congruence found among COI, cyt b and morphology. Both mt markers also revealed the existence of seven potential cryptic species. This high species richness from a scattered geographical sampling indicates that there is a remarkable number of Notiospathius species that remains undiscovered. 相似文献

12.

Maximizing phylogenetic diversity in biodiversity conservation: Greedy solutions to the Noah's Ark problem

Hartmann K Steel M 《Systematic biology》2006,55(4):644-651

The Noah's Ark Problem (NAP) is a comprehensive cost-effectiveness methodology for biodiversity conservation that was introduced by Weitzman (1998) and utilizes the phylogenetic tree containing the taxa of interest to assess biodiversity. Given a set of taxa, each of which has a particular survival probability that can be increased at some cost, the NAP seeks to allocate limited funds to conserving these taxa so that the future expected biodiversity is maximized. Finding optimal solutions using this framework is a computationally difficult problem to which a simple and efficient "greedy" algorithm has been proposed in the literature and applied to conservation problems. We show that, although algorithms of this type cannot produce optimal solutions for the general NAP, there are two restricted scenarios of the NAP for which a greedy algorithm is guaranteed to produce optimal solutions. The first scenario requires the taxa to have equal conservation cost; the second scenario requires an ultrametric tree. The NAP assumes a linear relationship between the funding allocated to conservation of a taxon and the increased survival probability of that taxon. This relationship is briefly investigated and one variation is suggested that can also be solved using a greedy algorithm. 相似文献

13.

Self-Weighted Optimization: Tree Searches and Character State Reconstructions under Implied Transformation Costs

Pablo A. Goloboff 《Cladistics : the international journal of the Willi Hennig Society》1997,13(3):225-245

A method to assess the cost of character state transformations based on their congruence is proposed. Measuring the distortion of different transformations with a convex increasing function of the number of transformations, and choosing those reconstructions which minimize the distortion for all transformations, may provide a better optimality criterion than the linear functions implemented in currently used methods for optimization. If trees are optimized using such a measure, transformation costs are dynamically determined during reconstructions; this leads to selecting trees implying that the possible state transformations are as reliable as possible. The present method is not iterative (thus avoiding the concern of different final results for different starting points), and it has an explicit optimality criterion. It has a high computational cost; algorithms to lessen the computations required for optimizations and searches are described. 相似文献

14.

A Low Dimensional Approach on Network Characterization

Benjamin Y. S. Li Choujun Zhan Lam F. Yeung King T. Ko Genke Yang 《PloS one》2014,9(10)

In many applications, one may need to characterize a given network among a large set of base networks, and these networks are large in size and diverse in structure over the search space. In addition, the characterization algorithms are required to have low volatility and with a small circle of uncertainty. For large datasets, these algorithms are computationally intensive and inefficient. However, under the context of network mining, a major concern of some applications is speed. Hence, we are motivated to develop a fast characterization algorithm, which can be used to quickly construct a graph space for analysis purpose. Our approach is to transform a network characterization measure, commonly formulated based on similarity matrices, into simple vector form signatures. We shall show that the similarity matrix can be represented by a dyadic product of two N-dimensional signature vectors; thus the network alignment process, which is usually solved as an assignment problem, can be reduced into a simple alignment problem based on separate signature vectors. 相似文献

15.

TrExML: a maximum-likelihood approach for extensive tree-space exploration

Wolf MJ Easteal S Kahn M McKay BD Jermiin LS 《Bioinformatics (Oxford, England)》2000,16(4):383-394

MOTIVATION: Maximum-likelihood analysis of nucleotide and amino acid sequences is a powerful approach for inferring phylogenetic relationships and for comparing evolutionary hypotheses. Because it is a computationally demanding and time-consuming process, most algorithms explore only a minute portion of tree-space, with the emphasis on finding the most likely tree while ignoring the less likely, but not significantly worse, trees. However, when such trees exist, it is equally important to identify them to give due consideration to the phylogenetic uncertainty. Consequently, it is necessary to change the focus of these algorithms such that near optimal trees are also identified. RESULTS: This paper presents the Advanced Stepwise Addition Algorithm for exploring tree-space and two algorithms for generating all binary trees on a set of sequences. The Advanced Stepwise Addition Algorithm has been implemented in TrExML, a phylogenetic program for maximum-likelihood analysis of nucleotide sequences. TrExML is shown to be more effective at finding near optimal trees than a similar program, fastDNAml, implying that TrExML offers a better approach to account for phylogenetic uncertainty than has previously been possible. A program, TreeGen, is also described; it generates binary trees on a set of sequences allowing for extensive exploration of tree-space using other programs. AVAILABILITY: TreeGen, TrExML, and the sequence data used to test the programs are available from the following two WWW sites: http://whitetail.bemidji.msus. edu/trexml/and http://jcsmr.anu.edu.au/dmm/humgen.+ ++html. 相似文献

16.

D-ORB: A Web Server to Extract Structural Features of Related But Unaligned RNA Sequences

《Journal of molecular biology》2023,435(15):168181

Identifying the common structural elements of functionally related RNA sequences (family) is usually based on an alignment of the sequences, which is often subject to human bias and may not be accurate. The resulting covariance model (CM) provides probabilities for each base to covary with another, which allows to support evolutionarily the formation of double helical regions and possibly pseudoknots. The coexistence of alternative folds in RNA, resulting from its dynamic nature, may lead to the potential omission of motifs by CM. To overcome this limitation, we present D-ORB, a system of algorithms that identifies overrepresented motifs in the secondary conformational landscapes of a family when compared to those of unrelated sequences. The algorithms are bundled into an easy-to-use website allowing users to submit a family, and optionally provide unrelated sequences. D-ORB produces a non-pseudoknotted secondary structure based on the overrepresented motifs, a deep neural network classifier and two decision trees. When used to model an Rfam family, D-ORB fits overrepresented motifs in the corresponding Rfam structure; more than a hundred Rfam families have been modeled. The statistical approach behind D-ORB derives the structural composition of an RNA family, making it a valuable tool for analyzing and modeling it. Its easy-to-use interface and advanced algorithms make it an essential resource for researchers studying RNA structure. D-ORB is available at https://d-orb.major.iric.ca/. 相似文献

17.

Evolutionary models of phylogenetic trees

Pinelis I 《Proceedings. Biological sciences / The Royal Society》2003,270(1522):1425-1431

The most widely used evolutionary model for phylogenetic trees is the equal-rates Markov (ERM) model. A problem is that the ERM model predicts less imbalance than observed for trees inferred from real data; in fact, the observed imbalance tends to fall between the values predicted by the ERM model and those predicted by the proportional-to-distinguishable-arrangements (PDA) model. Here, a continuous multi-rate (MR) family of evolutionary models is presented which contains entire subfamilies corresponding to both the PDA and ERM models. Furthermore, this MR family covers an entire range from 'completely balanced' to 'completely unbalanced' models. In particular, the MR family contains other known evolutionary models. The MR family is very versatile and virtually free of assumptions on the character of evolution; yet it is highly susceptible to rigorous analyses. In particular, such analyses help to uncover adaptability, quasi-stabilization and prolonged stasis as major possible causes of the imbalance. However, the MR model is functionally simple and requires only three parameters to reproduce the observed imbalance. 相似文献

18.

Benchmarking protein classification algorithms via supervised cross-validation

Kertész-Farkas A Dhir S Sonego P Pacurar M Netoteia S Nijveen H Kuzniar A Leunissen JA Kocsor A Pongor S 《Journal of biochemical and biophysical methods》2008,70(6):1215-1223

Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

The datasets are available at http://hydra.icgeb.trieste.it/benchmark. 相似文献

19.

Thermodynamic Characterization of a Triheme Cytochrome Family from Geobacter sulfurreducens Reveals Mechanistic and Functional Diversity

Leonor Morgado Miguel Pessanha Carlos A. Salgueiro 《Biophysical journal》2010,99(1):293-301

A family of five periplasmic triheme cytochromes (PpcA-E) was identified in Geobacter sulfurreducens, where they play a crucial role by driving electron transfer from the cytoplasm to the cell exterior and assisting the reduction of extracellular acceptors. The thermodynamic characterization of PpcA using NMR and visible spectroscopies was previously achieved under experimental conditions identical to those used for the triheme cytochrome c₇ from Desulfuromonas acetoxidans. Under such conditions, attempts to obtain NMR data were complicated by the relatively fast intermolecular electron exchange. This work reports the detailed thermodynamic characterization of PpcB, PpcD, and PpcE under optimal experimental conditions. The thermodynamic characterization of PpcA was redone under these new conditions to allow a proper comparison of the redox properties with those of other members of this family. The heme reduction potentials of the four proteins are negative, differ from each other, and cover different functional ranges. These reduction potentials are strongly modulated by heme-heme interactions and by interactions with protonated groups (the redox-Bohr effect) establishing different cooperative networks for each protein, which indicates that they are designed to perform different functions in the cell. PpcA and PpcD appear to be optimized to interact with specific redox partners involving e⁻/H⁺ transfer via different mechanisms. Although no evidence of preferential electron transfer pathway or e⁻/H⁺ coupling was found for PpcB and PpcE, the difference in their working potential ranges suggests that they may also have different physiological redox partners. This is the first study, to our knowledge, to characterize homologous cytochromes from the same microorganism and provide evidence of their different mechanistic and functional properties. These findings provide an explanation for the coexistence of five periplasmic triheme cytochromes in G. sulfurreducens. 相似文献

20.

Algorithms for efficient near-perfect phylogenetic tree reconstruction in theory and practice

Sridhar S Dhamdhere K Blelloch G Halperin E Ravi R Schwartz R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(4):561-571

We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. We develop two algorithms for constructing optimal near-perfect phylogenies and provide empirical evidence of their performance. The first simple algorithm is fixed parameter tractable when the number of additional mutations and the number of characters that share four gametes with some other character are constants. The second, more involved algorithm for the problem is fixed parameter tractable when only the number of additional mutations is fixed. We have implemented both algorithms and shown them to be extremely efficient in practice on biologically significant data sets. This work proves the BNPP problem fixed parameter tractable and provides the first practical phylogenetic tree reconstruction algorithms that find guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity. 相似文献