首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Accurate phylogenetic reconstruction methods are currently limited to a maximum of few dozens of taxa. Supertree methods construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Hence, in order to construct the tree of life over a million and a half different species, the use of a supertree method over the product of accurate methods, is inevitable. Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet-based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, dealing with false, conflicting quartet trees remains problematic. In this paper, we describe an algorithm for constructing a tree from a set of input quartet trees even with a significant fraction of errors. We show empirically that conflicts in the inputs are handled satisfactorily and that it significantly outperforms and outraces the Matrix Representation with Parsimony (MRP) methods that have previously been most successful in dealing with supertrees. Our algorithm is based on a divide and conquer algorithm where our divide step uses a semidefinite programming (SDP) formulation of MaxCut. We remark that this builds on previous work of ours for piecing together trees from rooted triplet trees. The recursion for unrooted quartets, however, is more complicated in that even with completely consistent set of quartet trees the problem is NP-hard, as opposed to the problem for triples where there is a linear time algorithm. This complexity leads to several issues and some solutions of possible independent interest.  相似文献   

2.
MOTIVATION: Reconstructing evolutionary trees is an important problem in biology. A response to the computational intractability of most of the traditional criteria for inferring evolutionary trees has been a focus on new criteria, particularly quartet-based methods that seek to merge trees derived on subsets of four species from a given species-set into a tree for that entire set. Unfortunately, most of these methods are very sensitive to errors in the reconstruction of the trees for individual quartets of species. A recently developed technique called quartet cleaning can alleviate this difficulty in certain cases by using redundant information in the complete set of quartet topologies for a given species-set to correct such errors. RESULTS: In this paper, we describe two new local vertex quartet cleaning algorithms which have optimal time complexity and error-correction bound, respectively. These are the first known local vertex quartet cleaning algorithms that are optimal with respect to either of these attributes.  相似文献   

3.
In this paper, we propose new solution methods for designing tag sets for use in universal DNA arrays. First, we give integer linear programming formulations for two previous formalizations of the tag set design problem. We show that these formulations can be solved to optimality for problem instances of moderate size by using general purpose optimization packages and also give more scalable algorithms based on an approximation scheme for packing linear programs. Second, we note the benefits of periodic tags and establish an interesting connection between the tag design problem and the problem of packing the maximum number of vertex-disjoint directed cycles in a given graph. We show that combining a simple greedy cycle packing algorithm with a previously proposed alphabetic tree search strategy yields an increase of over 40% in the number of tags compared to previous methods.  相似文献   

4.
Conformational changes in DNA G-quadruplex (GQ)-forming regions affect genome function and, thus, compose an interesting research topic. Computer modelling may yield insight into quadruplex folding and rearrangement, particularly molecular dynamics simulations. Here, we show that specific parameters, which are distinct from those commonly used in DNA conformational analyses, must be introduced for adequate interpretation and, most importantly, convenient visual representation of the quadruplex modelling results. We report a set of parameters that comprehensively and systematically describe GQ geometry in dynamics. The parameters include those related to quartet planarity, quadruplex twist, and quartet stacking; they are used to quantitatively characterise various types of quadruplexes and rearrangements, such as quartet distortion/disruption or deviation/bulging of a single nucleotide from the quartet plane. Our approach to describing conformational changes in quadruplexes using the new parameters is exemplified by telomeric quadruplex rearrangement, and the benefits of applying this approach to analyse other structures are discussed.  相似文献   

5.
Supertree methods are used to construct a large tree over a large set of taxa from a set of small trees over overlapping subsets of the complete taxa set. Since accurate reconstruction methods are currently limited to a maximum of a few dozen taxa, the use of a supertree method in order to construct the tree of life is inevitable. Supertree methods are broadly divided according to the input trees: When the input trees are unrooted, the basic reconstruction unit is a quartet tree. In this case, the basic decision problem of whether there exists a tree that agrees with all quartets is NP-complete. On the other hand, when the input trees are rooted, the basic reconstruction unit is a rooted triplet and the above decision problem has a polynomial time algorithm. However, when there is no tree which agrees with all triplets, it would be desirable to find the tree that agrees with the maximum number of triplets. However, this optimization problem was shown to be NP-hard. Current heuristic approaches perform min cut on a graph representing the triplets inconsistency and return a tree that is guaranteed to satisfy some required properties. In this work, we present a different heuristic approach that guarantees the properties provided by the current methods and give experimental evidence that it significantly outperforms currently used methods. This method is based on a divide and conquer approach, where the min cut in the divide step is replaced by a max cut in a variant of the same graph. The latter is achieved by a lightweight semidefinite programming-like heuristic that leads to very fast running times  相似文献   

6.
This paper presents a new clique partitioning (CP) model for the Group Technology (GT) problem. The new model, based on a novel 0/1 quadratic programming formulation, addresses multiple objectives in GT problems by drawing on production relationships to assign differing weights to machine/part pairs. The use of this model, which is readily solved by a basic tabu search heuristic, is illustrated by solving 36 standard test problems from the literature. The efficiency of our new CP model is further illustrated by solving three large scale problems whose linear programming relaxations are much too large to be solved by CPLEX. An analysis of the quality of the solutions produced along with comparisons made with other models and methods highlight both the attractiveness and robustness of the proposed method.  相似文献   

7.
Accurate phylogenetic reconstruction methods are inherently computationally heavy and therefore are limited to relatively small numbers of taxa. Supertree construction is the task of amalgamating small trees over partial sets into a big tree over the complete taxa set. The need for fast and accurate supertree methods has become crucial due to the enormous number of new genomic sequences generated by modern technology and the desire to use them for classification purposes. In particular, the Assembling the Tree of Life (ATOL) program aims at constructing the evolutionary history of all living organisms on Earth. When dealing with unrooted trees, a quartet - an unrooted tree over four taxa - is the most basic piece of phylogenetic information. Therefore, quartet amalgamation stands at the heart of any supertree problem as it concerns combining many minimal pieces of information into a single, coherent, and more comprehensive piece of information.We have devised an extremely fast algorithm for quartet amalgamation and implemented it in a very efficient code. The new code can handle over a hundred millions of quartet trees over several hundreds of taxa with very high accuracy.  相似文献   

8.
A compact representation of usual DNA/RNA four-nucleotide sets based on molecular affinity classes is proposed. In a geometrical correspondence to this formulation, it follows that intrinsic tetrahedral symmetry correlates nucleotide properties. This representation also leads to a proper decomposition frame for any sequence-dependent physical expectation. Thermodynamic and other physical properties of nucleotide sequences are most often stated within the scope of nearest-neighbor models and decomposed in terms of dimer properties. The inverse problem of obtaining dimer set properties is, however, well known to be ill-posed due to sequence composition closure relations. Analysis of the dimer set composition and structure within the novel tetrahedral formulation provides important self-consistency relations, solving the ill posed nature of the original formulation. As an applied example, we analyze DNA oligomer duplex free energy data available on the literature. It is shown that imposition of stringent self-consistency relations does not decrease fit quality to the experimental data set. On the other hand, an improved dimer set with physically consistent free energies is obtained. Meaningful corrections to previous determinations are found when the self-consistent set is applied to calculate free energies for sequences with composition order bias.  相似文献   

9.
Here we report that prioritizing sites in order of rarity-weighted richness (RWR) is a simple, reliable way to identify sites that represent all species in the fewest number of sites (minimum set problem) or to identify sites that represent the largest number of species within a given number of sites (maximum coverage problem). We compared the number of species represented in sites prioritized by RWR to numbers of species represented in sites prioritized by the Zonation software package for 11 datasets in which the size of individual planning units (sites) ranged from <1 ha to 2,500 km2. On average, RWR solutions were more efficient than Zonation solutions. Integer programming remains the only guaranteed way find an optimal solution, and heuristic algorithms remain superior for conservation prioritizations that consider compactness and multiple near-optimal solutions in addition to species representation. But because RWR can be implemented easily and quickly in R or a spreadsheet, it is an attractive alternative to integer programming or heuristic algorithms in some conservation prioritization contexts.  相似文献   

10.
Voit and Almeida have proposed the decoupling approach as a method for inferring the S-system models of genetic networks. The decoupling approach defines the inference of a genetic network as a problem requiring the solutions of sets of algebraic equations. The computation can be accomplished in a very short time, as the approach estimates S-system parameters without solving any of the differential equations. Yet the defined algebraic equations are non-linear, which sometimes prevents us from finding reasonable S-system parameters. In this study, we propose a new technique to overcome this drawback of the decoupling approach. This technique transforms the problem of solving each set of algebraic equations into a one-dimensional function optimization problem. The computation can still be accomplished in a relatively short time, as the problem is transformed by solving a linear programming problem. We confirm the effectiveness of the proposed approach through numerical experiments.  相似文献   

11.
For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.  相似文献   

12.
The purpose of this article is to show how the isotropy subgroup of leaf permutations on binary trees can be used to systematically identify tree-informative invariants relevant to models of phylogenetic evolution. In the quartet case, we give an explicit construction of the full set of representations and describe their properties. We apply these results directly to Markov invariants, thereby extending previous theoretical results by systematically identifying linear combinations that vanish for a given quartet. We also note that the theory is fully generalizable to arbitrary trees and is equally applicable to the related case of phylogenetic invariants. All results follow from elementary consideration of the representation theory of finite groups.  相似文献   

13.
S Vajda  C Delisi 《Biopolymers》1990,29(14):1755-1772
A combinatorial optimization approach is used for solving the multiple-minima problem when determining the low-energy conformations of short polypeptides. Each residue is represented by a finite number of discrete states corresponding to single residue local minima of the energy function. These precomputed values constitute a search table and define the conformational space for discrete minimization by a generalized dynamic programming algorithm that significantly limits the number of intermediate conformations to be generated during the search. Since dynamic programming involves stagewise decisions, it results in buildup-type procedures implemented in two different forms. The first procedure predicts a number of conformations by a completely discrete search and these are subsequently refined by local minimization. The second involves limited continuous local minimization within the combinatorial algorithm, generally restricted to two dihedral angles in a buildup step. Both procedures are tested on 17 short peptides previously studied by other global minimization methods but involving the same potential energy function. The discrete method is extremely fast, but proves to be successful only in 14 of the 17 test problems. The version with limited local minimization finds, however, conformations in all the 17 examples that are close to the ones previously presented in the literature or have lower energies. In addition, results are almost independent of the cutoff energy, the most important parameter governing the search. Although the limited local minimization increases the number of energy evaluations, the method still offers substantial advantages in speed.  相似文献   

14.
Reconstruction of phylogenetic trees is a fundamental problem in computational biology. While excellent heuristic methods are available for many variants of this problem, new advances in phylogeny inference will be required if we are to be able to continue to make effective use of the rapidly growing stores of variation data now being gathered. In this paper, we present two integer linear programming (ILP) formulations to find the most parsimonious phylogenetic tree from a set of binary variation data. One method uses a flow-based formulation that can produce exponential numbers of variables and constraints in the worst case. The method has, however, proven extremely efficient in practice on datasets that are well beyond the reach of the available provably efficient methods, solving several large mtDNA and Y-chromosome instances within a few seconds and giving provably optimal results in times competitive with fast heuristics than cannot guarantee optimality. An alternative formulation establishes that the problem can be solved with a polynomial-sized ILP. We further present a web server developed based on the exponential-sized ILP that performs fast maximum parsimony inferences and serves as a front end to a database of precomputed phylogenies spanning the human genome.  相似文献   

15.
Investigators have had much success solving the "hemodynamic forward problem," i.e., predicting pressure and flow at the entrance of an arterial system given knowledge of specific arterial properties and arterial system topology. Recently, the focus has turned to solving the "hemodynamic inverse problem," i.e., inferring mechanical properties of an arterial system from measured input pressure and flow. Conventional methods to solve the inverse problem rely on fitting to data simple models with parameters that represent specific mechanical properties. Controversies have arisen, because different models ascribe pressure and flow to different properties. However, an inherent assumption common to all model-based methods is the existence of a unique set of mechanical properties that yield a particular pressure and flow. The present work illustrates that there are, in fact, an infinite number of solutions to the hemodynamic inverse problem. Thus a measured pressure-flow pair can result from an infinite number of different arterial systems. Except for a few critical properties, conventional approaches to solve the inverse problem for specific arterial properties are futile.  相似文献   

16.
DNA binding sites: representation and discovery   总被引:60,自引:0,他引:60  
The purpose of this article is to provide a brief history of the development and application of computer algorithms for the analysis and prediction of DNA binding sites. This problem can be conveniently divided into two subproblems. The first is, given a collection of known binding sites, develop a representation of those sites that can be used to search new sequences and reliably predict where additional binding sites occur. The second is, given a set of sequences known to contain binding sites for a common factor, but not knowing where the sites are, discover the location of the sites in each sequence and a representation for the specificity of the protein.  相似文献   

17.
For a given set L of species and a set T of triplets on L, we seek to construct a phylogenetic network which is consistent with T i.e. which represents all triplets of T. The level of a network is defined as the maximum number of hybrid vertices in its biconnected components. When T is dense, there exist polynomial time algorithms to construct level-0,1 and 2 networks (Aho et al., 1981; Jansson, Nguyen and Sung, 2006; Jansson and Sung, 2006; Iersel et al., 2009). For higher levels, partial answers were obtained in the paper by Iersel and Kelk (2008), with a polynomial time algorithm for simple networks. In this paper, we detail the first complete answer for the general case, solving a problem proposed in Jansson and Sung (2006) and Iersel et al. (2009). For any k fixed, it is possible to construct a level-k network having the minimum number of hybrid vertices and consistent with T, if there is any, in time O(T(k+1)n([4k/3]+1)).  相似文献   

18.
We study the reliability of phylogeny based on four taxa, when the internal, ancestral, branch is short. Such a quartet approach has been broadly used for inferring phylogenetic patterns. The question of branching pattern between the suborders Ruminantia and Suiformes (order Artiodactyla) and the order Cetacea is chosen as an example. All the combinations of four taxa were generated by taking on and only one species per group under study (three ingroups and one outgroup). Using real sequences, the analysis of these combinations demonstrates that the quartet approach is seriously misleading. Using both maximum parsimony and distance methods, it is possible to find a quartet of species which provided a high bootstrap proportion for each of the three possible unrooted trees. With the same set of sequences, we used all the available species simultaneously to construct a molecular phylogeny. This approach proved much more reliable than the quartet approach. When the number of informative sites is rather low, the branching patterns are not supported through bootstrap analysis, preventing us from false inference due to the lack of information. The reliable resolution of the phylogenetic relationships among Ruminantia, Suiformes, and Cetacea will therefore require a large number of nucleotides, such as the complete mitochondrial genomes of at least 30 species.  相似文献   

19.
The sugarcane transport system is very complex and uses a daily schedule, consisting of a set of locomotives runs, to satisfy the requirements of the mill and harvesters. The total cost of sugarcane transport operations is very high; over 35% of the total cost of sugarcane production in Australia is incurred in cane transport. Producing efficient schedules for sugarcane transport can reduce the cost and limit the negative effects that this system can have on the raw sugar production system. In this paper, the sugarcane rail operations are formulated as a blocking job shop scheduling problem. A mixed integer programming approach is used to formulate the shop job scheduling problem. Mixed integer programming and constraint programming search techniques are integrated for solving the problem. A case study is solved to test the approach.  相似文献   

20.
In a previous paper we defined the associative search problem and presented a system capable of solving it under certain conditions. In this paper we interpret a spatial learning problem as an associative search task and describe the behavior of an adaptive network capable of solving it. This example shows how naturally the associative search problem can arise and permits the search, association, and generalization properties of the adaptive network to bee clearly illustrated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号