期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Maximizing phylogenetic diversity in biodiversity conservation: Greedy solutions to the Noah's Ark problem

Hartmann K Steel M 《Systematic biology》2006,55(4):644-651

The Noah's Ark Problem (NAP) is a comprehensive cost-effectiveness methodology for biodiversity conservation that was introduced by Weitzman (1998) and utilizes the phylogenetic tree containing the taxa of interest to assess biodiversity. Given a set of taxa, each of which has a particular survival probability that can be increased at some cost, the NAP seeks to allocate limited funds to conserving these taxa so that the future expected biodiversity is maximized. Finding optimal solutions using this framework is a computationally difficult problem to which a simple and efficient "greedy" algorithm has been proposed in the literature and applied to conservation problems. We show that, although algorithms of this type cannot produce optimal solutions for the general NAP, there are two restricted scenarios of the NAP for which a greedy algorithm is guaranteed to produce optimal solutions. The first scenario requires the taxa to have equal conservation cost; the second scenario requires an ultrametric tree. The NAP assumes a linear relationship between the funding allocated to conservation of a taxon and the increased survival probability of that taxon. This relationship is briefly investigated and one variation is suggested that can also be solved using a greedy algorithm. 相似文献

2.

Uncovering hidden phylogenetic consensus in large data sets

Pattengale ND Aberer AJ Swenson KM Stamatakis A Moret BM 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(4):902-911

Many of the steps in phylogenetic reconstruction can be confounded by “rogue” taxa—taxa that cannot be placed with assurance anywhere within the tree, indeed, whose location within the tree varies with almost any choice of algorithm or parameters. Phylogenetic consensus methods, in particular, are known to suffer from this problem. In this paper, we provide a novel framework to define and identify rogue taxa. In this framework, we formulate a bicriterion optimization problem, the relative information criterion, that models the net increase in useful information present in the consensus tree when certain taxa are removed from the input data. We also provide an effective greedy heuristic to identify a subset of rogue taxa and use this heuristic in a series of experiments, with both pathological examples from the literature and a collection of large biological data sets. As the presence of rogue taxa in a set of bootstrap replicates can lead to deceivingly poor support values, we propose a procedure to recompute support values in light of the rogue taxa identified by our algorithm; applying this procedure to our biological data sets caused a large number of edges to move from “unsupported” to “supported” status, indicating that many existing phylogenies should be recomputed and reevaluated to reduce any inaccuracies introduced by rogue taxa. We also discuss the implementation issues encountered while integrating our algorithm into RAxML v7.2.7, particularly those dealing with scaling up the analyses. This integration enables practitioners to benefit from our algorithm in the analysis of very large data sets (up to 2,500 taxa and 10,000 trees, although we present the results of even larger analyses). 相似文献

3.

Resource-aware taxon selection for maximizing phylogenetic diversity

Pardi F Goldman N 《Systematic biology》2007,56(3):431-444

Phylogenetic diversity (PD) is a useful metric for selecting taxa in a range of biological applications, for example, bioconservation and genomics, where the selection is usually constrained by the limited availability of resources. We formalize taxon selection as a conceptually simple optimization problem, aiming to maximize PD subject to resource constraints. This allows us to take into account the different amounts of resources required by the different taxa. Although this is a computationally difficult problem, we present a dynamic programming algorithm that solves it in pseudo-polynomial time. Our algorithm can also solve many instances of the Noah's Ark Problem, a more realistic formulation of taxon selection for biodiversity conservation that allows for taxon-specific extinction risks. These instances extend the set of problems for which solutions are available beyond previously known greedy-tractable cases. Finally, we discuss the relevance of our results to real-life scenarios. 相似文献

4.

Phylogenetic diversity and the greedy algorithm 总被引：1，自引：0，他引：1

Steel M 《Systematic biology》2005,54(4):527-529

Given a phylogenetic tree with leaves labeled by a collection of species, and with weighted edges, the "phylogenetic diversity" of any subset of the species is the sum of the edge weights of the minimal subtree connecting the species. This measure is relevant in biodiversity conservation where one may wish to compare different subsets of species according to how much evolutionary variation they encompass. In this note we show that phylogenetic diversity has an attractive mathematical property that ensures that we can solve the following problem easily by the greedy algorithm: find a subset of the species of any given size k of maximal phylogenetic diversity. We also describe an extension of this result that also allows weights to be assigned to species. 相似文献

5.

Exact and approximation algorithms for DNA tag set design.

Ion I M?ndoiu Drago? Trinc? 《Journal of computational biology》2006,13(3):732-744

In this paper, we propose new solution methods for designing tag sets for use in universal DNA arrays. First, we give integer linear programming formulations for two previous formalizations of the tag set design problem. We show that these formulations can be solved to optimality for problem instances of moderate size by using general purpose optimization packages and also give more scalable algorithms based on an approximation scheme for packing linear programs. Second, we note the benefits of periodic tags and establish an interesting connection between the tag design problem and the problem of packing the maximum number of vertex-disjoint directed cycles in a given graph. We show that combining a simple greedy cycle packing algorithm with a previously proposed alphabetic tree search strategy yields an increase of over 40% in the number of tags compared to previous methods. 相似文献

6.

Phylogenetic diversity within seconds

Minh BQ Klaere S von Haeseler A 《Systematic biology》2006,55(5):769-773

相似文献

7.

Using max cut to enhance rooted trees consistency

Snir S Rao S 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(4):323-333

Supertree methods are used to construct a large tree over a large set of taxa from a set of small trees over overlapping subsets of the complete taxa set. Since accurate reconstruction methods are currently limited to a maximum of a few dozen taxa, the use of a supertree method in order to construct the tree of life is inevitable. Supertree methods are broadly divided according to the input trees: When the input trees are unrooted, the basic reconstruction unit is a quartet tree. In this case, the basic decision problem of whether there exists a tree that agrees with all quartets is NP-complete. On the other hand, when the input trees are rooted, the basic reconstruction unit is a rooted triplet and the above decision problem has a polynomial time algorithm. However, when there is no tree which agrees with all triplets, it would be desirable to find the tree that agrees with the maximum number of triplets. However, this optimization problem was shown to be NP-hard. Current heuristic approaches perform min cut on a graph representing the triplets inconsistency and return a tree that is guaranteed to satisfy some required properties. In this work, we present a different heuristic approach that guarantees the properties provided by the current methods and give experimental evidence that it significantly outperforms currently used methods. This method is based on a divide and conquer approach, where the min cut in the divide step is replaced by a max cut in a variant of the same graph. The latter is achieved by a lightweight semidefinite programming-like heuristic that leads to very fast running times 相似文献

8.

Comparing strategies to preserve evolutionary diversity

Karen Magnuson-Ford Arne Mooers Mike Steel 《Journal of theoretical biology》2010,266(1):107-116

The likely future extinction of various species will result in a decline of two quantities: species richness and phylogenetic diversity (PD, or ‘evolutionary history’). Under a simple stochastic model of extinction, we can estimate the expected loss of these quantities under two conservation strategies: An ‘egalitarian’ approach, which reduces the extinction risk of all species, and a ‘targeted’ approach that concentrates conservation effort on the most endangered taxa. For two such strategies that are constrained to experience the same expected loss of species richness, we ask which strategy results in a greater expected loss of PD. Using mathematical analysis and simulation, we describe how the strategy (egalitarian versus targeted) that minimizes the expected loss of PD depends on the distribution of endangered status across the tips of the tree, and the interaction of this status with the branch lengths. For a particular data set consisting of a phylogenetic tree of 62 lemur species, with extinction risks estimated from the IUCN ‘Red List’, we show that both strategies are virtually equivalent, though randomizing these extinction risks across the tip taxa can cause either strategy to outperform the other. In the second part of the paper, we describe an algorithm to determine how extreme the loss of PD for a given decline in species richness can be. We illustrate the use of this algorithm on the lemur tree. 相似文献

9.

Quartets MaxCut: a divide and conquer quartets algorithm

Snir S Rao S 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2010,7(4):704-718

Accurate phylogenetic reconstruction methods are currently limited to a maximum of few dozens of taxa. Supertree methods construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Hence, in order to construct the tree of life over a million and a half different species, the use of a supertree method over the product of accurate methods, is inevitable. Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet-based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, dealing with false, conflicting quartet trees remains problematic. In this paper, we describe an algorithm for constructing a tree from a set of input quartet trees even with a significant fraction of errors. We show empirically that conflicts in the inputs are handled satisfactorily and that it significantly outperforms and outraces the Matrix Representation with Parsimony (MRP) methods that have previously been most successful in dealing with supertrees. Our algorithm is based on a divide and conquer algorithm where our divide step uses a semidefinite programming (SDP) formulation of MaxCut. We remark that this builds on previous work of ours for piecing together trees from rooted triplet trees. The recursion for unrooted quartets, however, is more complicated in that even with completely consistent set of quartet trees the problem is NP-hard, as opposed to the problem for triples where there is a linear time algorithm. This complexity leads to several issues and some solutions of possible independent interest. 相似文献

10.

A rapid heuristic algorithm for finding minimum evolution trees 总被引：2，自引：0，他引：2

Rodin A Li WH 《Molecular phylogenetics and evolution》2000,16(2):173-179

The minimum sum of branch lengths (S), or the minimum evolution (ME) principle, has been shown to be a good optimization criterion in phylogenetic inference. Unfortunately, the number of topologies to be analyzed is computationally prohibitive when a large number of taxa are involved. Therefore, simplified, heuristic methods, such as the neighbor-joining (NJ) method, are usually employed instead. The NJ method analyzes only a small number of trees (compared with the size of the entire search space); so, the tree obtained may not be the ME tree (for which the S value is minimum over the entire search space). Different compromises between very restrictive and exhaustive search spaces have been proposed recently. In particular, the "stepwise algorithm" (SA) utilizes what is known in computer science as the "beam search," whereas the NJ method employs a "greedy search." SA is virtually guaranteed to find the ME trees while being much faster than exhaustive search algorithms. In this study we propose an even faster method for finding the ME tree. The new algorithm adjusts its search exhaustiveness (from greedy to complete) according to the statistical reliability of the tree node being reconstructed. It is also virtually guaranteed to find the ME tree. The performances and computational efficiencies of ME, SA, NJ, and our new method were compared in extensive simulation studies. The new algorithm was found to perform practically as well as the SA (and, therefore, ME) methods and slightly better than the NJ method. For searching for the globally optimal ME tree, the new algorithm is significantly faster than existing ones, thus making it relatively practical for obtaining all trees with an S value equal to or smaller than that of the NJ tree, even when a large number of taxa is involved. 相似文献

11.

Budgeted Phylogenetic Diversity on Circular Split Systems

《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(1):22-29

In the last 15 years, Phylogenetic Diversity (PD) has gained interest in the community of conservation biologists as a surrogate measure for assessing biodiversity. We have recently proposed two approaches to select taxa for maximizing PD, namely PD with budget constraints and PD on split systems. In this paper, we will unify these two strategies and present a dynamic programming algorithm to solve the unified framework of selecting taxa with maximal PD under budget constraints on circular split systems. An improved algorithm will also be given if the underlying split system is a tree. 相似文献

12.

A truthful online mechanism for virtual machine provisioning and allocation in clouds

Liu Xi Liu Jun 《Cluster computing》2022,25(2):1095-1109

We address the problem of online virtual machine (VM) provisioning and allocation with multiple types of resources. Formulating this problem in an auction-based setting, we propose an accurate mathematical model incorporating the ability to preempt and resume a given task for the sake of best overall use of resources. Our objective is to efficiently provide and allocate multiple VMs to maximize social welfare and encourage users to declare truthful requests. We first design an offline optimal mechanism based on the VCG mechanism; this mechanism has full knowledge of all users and offers ideal solutions. We also design an online greedy mechanism that considers only current knowledge while offering near-optimal solutions instead. Our proposed greedy mechanism consists of winner determination and payment algorithms. Furthermore, we show that the winner determination algorithm is monotonic and that the payment algorithm implements the critical payment. Both our allocation methods offer incentives to users providing true values for the sake of obtaining the best utility. We performed extensive experiments to investigate the performance of our proposed greedy mechanism compared to the optimal mechanism. Experimental results demonstrate that our proposed greedy mechanism obtains near-optimal solutions in a reasonable time.

相似文献

13.

A Mathematical Framework for the Selection of an Optimal Set of Peptides for Epitope-Based Vaccines

下载免费PDF全文

Nora C. Toussaint Pierre D?nnes Oliver Kohlbacher 《PLoS computational biology》2008,4(12)

Epitope-based vaccines (EVs) have a wide range of applications: from therapeutic to prophylactic approaches, from infectious diseases to cancer. The development of an EV is based on the knowledge of target-specific antigens from which immunogenic peptides, so-called epitopes, are derived. Such epitopes form the key components of the EV. Due to regulatory, economic, and practical concerns the number of epitopes that can be included in an EV is limited. Furthermore, as the major histocompatibility complex (MHC) binding these epitopes is highly polymorphic, every patient possesses a set of MHC class I and class II molecules of differing specificities. A peptide combination effective for one person can thus be completely ineffective for another. This renders the optimal selection of these epitopes an important and interesting optimization problem. In this work we present a mathematical framework based on integer linear programming (ILP) that allows the formulation of various flavors of the vaccine design problem and the efficient identification of optimal sets of epitopes. Out of a user-defined set of predicted or experimentally determined epitopes, the framework selects the set with the maximum likelihood of eliciting a broad and potent immune response. Our ILP approach allows an elegant and flexible formulation of numerous variants of the EV design problem. In order to demonstrate this, we show how common immunological requirements for a good EV (e.g., coverage of epitopes from each antigen, coverage of all MHC alleles in a set, or avoidance of epitopes with high mutation rates) can be translated into constraints or modifications of the objective function within the ILP framework. An implementation of the algorithm outperforms a simple greedy strategy as well as a previously suggested evolutionary algorithm and has runtimes on the order of seconds for typical problem sizes. 相似文献

14.

Computing phylogenetic diversity for split systems

Spillner A Nguyen BT Moulton V 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(2):235-244

In conservation biology it is a central problem to measure, predict, and preserve biodiversity as species face extinction. In 1992 Faith proposed measuring the diversity of a collection of species in terms of their relationships on a phylogenetic tree, and to use this information to identify collections of species with high diversity. Here we are interested in some variants of the resulting optimization problem that arise when considering species whose evolution is better represented by a network rather than a tree. More specifically, we consider the problem of computing phylogenetic diversity relative to a split system on a collection of species of size n. We show that for general split systems this problem is NP-hard. In addition we provide some efficient algorithms for some special classes of split systems, in particular presenting an optimal O(n) time algorithm for phylogenetic trees and an O(n log n + nk) time algorithm for choosing an optimal subset of size k relative to a circular split system. 相似文献

15.

Coevolving solutions to the shortest common superstring problem

Zaritsky A Sipper M 《Bio Systems》2004,76(1-3):209-216

The shortest common superstring (SCS) problem, known to be NP-Complete, seeks the shortest string that contains all strings from a given set. In this paper we compare four approaches for finding solutions to the SCS problem: a standard genetic algorithm, a novel cooperative-coevolutionary algorithm, a benchmark greedy algorithm, and a parallel coevolutionary-greedy approach. We show the coevolutionary approach produces the best results, and discuss directions for future research. 相似文献

16.

A greedy algorithm for aligning DNA sequences. 总被引：39，自引：0，他引：39

Z Zhang S Schwartz L Wagner W Miller 《Journal of computational biology》2000,7(1-2):203-214

For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information. 相似文献

17.

Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations

Pahikkala T Okser S Airola A Salakoski T Aittokallio T 《Algorithms for molecular biology : AMB》2012,7(1):11

ABSTRACT: BACKGROUND: Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. RESULTS: We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension - UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a classifier trained using features selected by a statistical p-value-based filter, which is currently the most popular approach for constructing predictive models in GWAS. CONCLUSIONS: Greedy RLS is the first known implementation of a machine learning based method with the capability to conduct a wrapper-based feature selection on an entire GWAS containing several thousand examples and over 400,000 variants. In our experiments, greedy RLS selected a highly predictive subset of genetic variants in a fraction of the time spent by wrapper-based selection methods used together with SVM classifiers. The proposed algorithms are freely available as part of the RLScore software library at http://users.utu.fi/aatapa/RLScore/. 相似文献

18.

Do Triplets Have Enough Information to Construct the Multi-Labeled Phylogenetic Tree?

Reza Hassanzadeh Changiz Eslahchi Wing-Kin Sung 《PloS one》2014,9(7)

The evolutionary history of certain species such as polyploids are modeled by a generalization of phylogenetic trees called multi-labeled phylogenetic trees, or MUL trees for short. One problem that relates to inferring a MUL tree is how to construct the smallest possible MUL tree that is consistent with a given set of rooted triplets, or SMRT problem for short. This problem is NP-hard. There is one algorithm for the SMRT problem which is exact and runs in time, where is the number of taxa. In this paper, we show that the SMRT does not seem to be an appropriate solution from the biological point of view. Indeed, we present a heuristic algorithm named MTRT for this problem and execute it on some real and simulated datasets. The results of MTRT show that triplets alone cannot provide enough information to infer the true MUL tree. So, it is inappropriate to infer a MUL tree using triplet information alone and considering the minimum number of duplications. Finally, we introduce some new problems which are more suitable from the biological point of view. 相似文献

19.

Clustering binary fingerprint vectors with missing values for DNA array data analysis.

Andres Figueroa James Borneman Tao Jiang 《Journal of computational biology》2004,11(5):887-901

Oligonucleotide fingerprinting is a powerful DNA array-based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version and present an efficient greedy algorithm based on MINIMUM CLIQUE PARTITION on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our preliminary experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes. 相似文献

20.

Estimation-based ant colony optimization and local search for the probabilistic traveling salesman problem

Prasanna Balaprakash Mauro Birattari Thomas Stützle Zhi Yuan Marco Dorigo 《Swarm Intelligence》2009,3(3):223-242

The use of ant colony optimization for solving stochastic optimization problems has received a significant amount of attention in recent years. In this paper, we present a study of enhanced ant colony optimization algorithms for tackling a stochastic optimization problem, the probabilistic traveling salesman problem. In particular, we propose an empirical estimation approach to evaluate the cost of the solutions constructed by the ants. Moreover, we use a recent estimation-based iterative improvement algorithm as a local search. Experimental results on a large number of problem instances show that the proposed ant colony optimization algorithms outperform the current best algorithm tailored to solve the given problem, which also happened to be an ant colony optimization algorithm. As a consequence, we have obtained a new state-of-the-art ant colony optimization algorithm for the probabilistic traveling salesman problem. 相似文献