首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
ON MISSING ENTRIES IN CLADISTIC ANALYSIS   总被引:5,自引:0,他引:5  
Abstract The exact algorithms of two commonly used parsimony programs, Hennig86 by J. S. Farris and PAUP by D. Swofford, sometimes produce different solutions, and sometimes produce resolutions that are not supported by the data being analysed. The discrepancies apparently involve the treatment of missing entries, which can currently represent unknown data, inapplicable character and/or polymorphic taxa. Each of those potential sources of ambiguity is logically (if not computationally) different; with regard to binary characters, unknown data could be either 0 or 1, inapplicable characters are neither 0 nor 1 and polymorphisms are both 0 and 1. Resolutions that cannot be supported by any possible combination of known state attributions should either be flagged as such or suppressed entirely.  相似文献   

2.
Abstract— The effectiveness and efficiency of J. S. Farris' new microcomputer parsimony program (Hennig86, version 1.5) are evaluated with reference to 60 data sets, including those used to benchmark earlier mainframe and microcomputer packages. By overcoming the arbitrary resolution and consequent redundancy problems that have plagued previously available microcomputer programs, as well as their limitations on data set size, cladogram storage space, and execution speed, Hennig86 advances enormously the accuracy and ease with which cladistic analyses can be conducted. Hennig86 has such an impressive edge in both effectiveness and efficiency that earlier parsimony programs (including those by Farris) have essentially been rendered obsolete. For exact analyses, both exhaustive and minimal options arc provided; of the options available for approximate analyses, the branch breaker (bb) used in conjunction with the mhennig* and tread commands performed best.  相似文献   

3.
We analyse optimal and heuristic place prioritization algorithms for biodiversity conservation area network design which can use probabilistic data on the distribution of surrogates for biodiversity. We show how an Expected Surrogate Set Covering Problem (ESSCP) and a Maximal Expected Surrogate Covering Problem (MESCP) can be linearized for computationally efficient solution. For the ESSCP, we study the performance of two optimization software packages (XPRESS and CPLEX) and five heuristic algorithms based on traditional measures of complementarity and rarity as well as the Shannon and Simpson indices of α‐diversity which are being used in this context for the first time. On small artificial data sets the optimal place prioritization algorithms often produced more economical solutions than the heuristic algorithms, though not always ones guaranteed to be optimal. However, with large data sets, the optimal algorithms often required long computation times and produced no better results than heuristic ones. Thus there is generally little reason to prefer optimal to heuristic algorithms with probabilistic data sets.  相似文献   

4.
In this article, we address the problem of designing a string with optimal complementarity properties with respect to another given string according to a given criterion. The motivation comes from a drug design application, in which the complementarity between two sequences (proteins) is measured according to the values of the hydropathic coefficients associated with the sequence elements (amino acids). We present heuristic and exact optimization algorithms, and we report on some computational experiments on amino peptides taken from Semaphorin and human Interleukin-1β, which have already been investigated in the literature using heuristic algorithms. With our techniques, we proved the optimality of a known solution for Semaphorin-3A, and we discovered several other optimal and near-optimal solutions in a short computing time; we also found in fractions of a second an optimal solution for human interleukin-1β, whose complementary value is one order of magnitude better than previously known ones. The source code of a prototype C++ implementation of our algorithms is freely available for noncommercial use on the web. As a main result, we showed that in this context mathematical programming methods are more successful than heuristics, such as simulated annealing. Our algorithm unfolds its potential, especially when different measures could be used for scoring peptides, and is able to provide not only a single optimal solution, but a ranking of provable good ones; this ranking can then be used by biologists as a starting basis for further refinements, simulations, or in vitro experiments.  相似文献   

5.
MOTIVATION: Multiple sequence alignment is an important tool in computational biology. In order to solve the task of computing multiple alignments in affordable time, the most commonly used multiple alignment methods have to use heuristics. Nevertheless, the computation of optimal multiple alignments is important in its own right, and it provides a means of evaluating heuristic approaches or serves as a subprocedure of heuristic alignment methods. RESULTS: We present an algorithm that uses the divide-and-conquer alignment approach together with recent results on search space reduction to speed up the computation of multiple sequence alignments. The method is adaptive in that depending on the time one wants to spend on the alignment, a better, up to optimal alignment can be obtained. To speed up the computation in the optimal alignment step, we apply the alpha(*) algorithm which leads to a procedure provably more efficient than previous exact algorithms. We also describe our implementation of the algorithm and present results showing the effectiveness and limitations of the procedure.  相似文献   

6.
7.
The MUST package is a phylogenetically oriented set of programs for data management and display, allowing one to handle both raw data (sequences) and results (trees, number of steps, bootstrap proportions). It is complementary to the main available software for phylogenetic analysis (PHYLIP, PAUP, HENNING86, CLUSTAL) with which it is fully compatible. The first part of MUST consists of the acquisition of new sequences, their storage, modification, and checking of sequence integrity in files of aligned sequences. In order to improve alignment, an editor function for aligned sequences offers numerous options, such as selection of subsets of sequences, display of consensus sequences, and search for similarities over small sequence fragments. For phylogenetic reconstruction, the choice of species and portions of sequences to be analyzed is easy and very rapid, permitting fast testing of numerous combinations of sequences and taxa. The resulting files can be formatted for most programs of tree construction. An interactive tree-display program recovers the output of all these programs. Finally, various modules allow an in-depth analysis of results, such as comparison of distance matrices, variation of bootstrap proportions with respect to various parameters or comparison of the number of steps per position. All presently available complete sequences of 28S rRNA are furnished aligned in the package. MUST therefore allows the management of all the operations required for phylogenetic reconstructions.  相似文献   

8.
The berth allocation problem is an optimization problem concerning seaside operations at container terminals. This study investigates the dynamic and continuous berth allocation problem (BAP), whose objective is to minimize the total weighted service time and the deviation cost from vessels’ preferred position. The problem is formulated as a mixed integer programming model. Due to that the BAP is NP-hard, two efficient and effective simulated annealing (SA) algorithms are proposed to locate vessels along the quay. The first SA assigns vessels to available positions along the quay from the left to the right, while the second assigns vessels from both sides. Both small and large-scale instances in the literature are tested to evaluate the effectiveness of the proposed SA algorithms using the optimization software Gurobi and heuristic algorithms from the literature. The results indicate that the proposed SAs can provide optimal solutions in small-scale instances and updates the best solutions in large-scale instances. The improvement over other comparing heuristics is statistically significant.  相似文献   

9.
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.   相似文献   

10.
The shortest common supersequence problem is a classical problem with many applications in different fields such as planning, Artificial Intelligence and especially in Bioinformatics. Due to its NP-hardness, we can not expect to efficiently solve this problem using conventional exact techniques. This paper presents a heuristic to tackle this problem based on the use at different levels of a probabilistic variant of a classical heuristic known as Beam Search. The proposed algorithm is empirically analysed and compared to current approaches in the literature. Experiments show that it provides better quality solutions in a reasonable time for medium and large instances of the problem. For very large instances, our heuristic also provides better solutions, but required execution times may increase considerably.  相似文献   

11.
We have developed a phylogenetic tree reconstruction method that detects and reports multiple topologically distant low-cost solutions. Our method is a generalization of the neighbor-joining method of Saitou and Nei and affords a more thorough sampling of the solution space by keeping track of multiple partial solutions during its execution. The scope of the solution space sampling is controlled by a pair of user-specified parameters--the total number of alternate solutions and the number of alternate solutions that are randomly selected--effecting a smooth trade-off between run time and solution quality and diversity. This method can discover topologically distinct low-cost solutions. In tests on biological and synthetic data sets using either the least-squares distance or minimum-evolution criterion, the method consistently performed as well as, or better than, both the neighbor-joining heuristic and the PHYLIP implementation of the Fitch-Margoliash distance measure. In addition, the method identified alternative tree topologies with costs within 1% or 2% of the best, but with topological distances of 9 or more partitions from the best solution (16 taxa); with 32 taxa, topologies were obtained 17 (least-squares) and 22 (minimum-evolution) partitions from the best topology when 200 partial solutions were retained. Thus, the method can find lower-cost tree topologies and near-best tree topologies that are significantly different from the best topology.  相似文献   

12.

Background

Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses.

Methodology

Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets.

Conclusions

Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to more moderately sized protein datasets.  相似文献   

13.
Supertree methods are used to assemble separate phylogenetic trees with shared taxa into larger trees (supertrees) in an effort to construct more comprehensive phylogenetic hypotheses. In spite of much recent interest in supertrees, there are still few methods for supertree construction. The flip supertree problem is an error correction approach that seeks to find a minimum number of changes (flips) to the matrix representation of the set of input trees to resolve their incompatibilities. A previous flip supertree algorithm was limited to finding exact solutions and was only feasible for small input trees. We developed a heuristic algorithm for the flip supertree problem suitable for much larger input trees. We used a series of 48- and 96-taxon simulations to compare supertrees constructed with the flip supertree heuristic algorithm with supertrees constructed using other approaches, including MinCut (MC), modified MC (MMC), and matrix representation with parsimony (MRP). Flip supertrees are generally far more accurate than supertrees constructed using MC or MMC algorithms and are at least as accurate as supertrees built with MRP. The flip supertree method is therefore a viable alternative to other supertree methods when the number of taxa is large.  相似文献   

14.
With breakpoint distance, the genome rearrangement field delivered one of the currently most popular measures in phylogenetic studies for related species. Here, BREAKPOINT MEDIAN, which is NP-complete already for three given species (whose genomes are represented as signed orderings), is the core basic problem. For the important special case of three species, approximation (ratio 7/6) and exact heuristic algorithms were developed. Here, we provide an exact, fixed-parameter algorithm with provable performance bounds. For instance, a breakpoint median for three signed orderings over nelements that causes at most d breakpoints can be computed in time O((2.15)(d).n). We show the algorithm's practical usefulness through experimental studies. In particular, we demonstrate that a simple implementation of our algorithm combined with a new tree construction heuristic allows for a new approach to breakpoint phylogeny, yielding evolutionary trees that are competitive in comparison with known results developed in a recent series of papers that use clever algorithm engineering methods.  相似文献   

15.
Path matching and graph matching in biological networks.   总被引:2,自引:0,他引:2  
We develop algorithms for the following path matching and graph matching problems: (i) given a query path p and a graph G, find a path p' that is most similar to p in G; (ii) given a query graph G (0) and a graph G, find a graph G (0)' that is most similar to G (0) in G. In these problems, p and G (0) represent a given substructure of interest to a biologist, and G represents a large network in which the biologist desires to find a related substructure. These algorithms allow the study of common substructures in biological networks in order to understand how these networks evolve both within and between organisms. We reduce the path matching problem to finding a longest weighted path in a directed acyclic graph and show that the problem of finding top k suboptimal paths can be solved in polynomial time. This is in contrast with most previous approaches that used exponential time algorithms to find simple paths which are practical only when the paths are short. We reduce the graph matching problem to finding highest scoring subgraphs in a graph and give an exact algorithm to solve the problem when the query graph G (0) is of moderate size. This eliminates the need for less accurate heuristic or randomized algorithms.We show that our algorithms are able to extract biologically meaningful pathways from protein interaction networks in the DIP database and metabolic networks in the KEGG database. Software programs implementing these techniques (PathMatch and GraphMatch) are available at http://faculty.cs.tamu.edu/shsze/pathmatch and http://faculty.cs.tamu.edu/shsze/graphmatch.  相似文献   

16.
Taxonomic schemes for the Heteroderinae Filip'ev & Schuurmans Stekhoven, 1941, sensu Luc et al., (1988) have been unstable due to the large number of genera and the paucity of known reliable characters. Reliable characters are essential when using phylogenetic inference in developing a natural classification. Morphological and developmental studies using light, scanning and transmission electron microscopy have revealed the new characters of host response, en face patterns, phasmid structure and female cuticular layers. These techniques also gave us insight into the homoplasy and polarity of many characters, revealed previously undetected character states and clarified misinterpreted character states. A matrix with the 19 most reliable characters is proposed for 20 operational taxonomic units (OTUs) and we employ this matrix for comparing computer generated phylogenetic analyses of the PHYLIP and PAUP packages. PAUP was deemed the more reliable parsimony algorithm for phylogenetic analysis of the Heteroderinae (Fink, 1986; Platnick, 1987). Monophyly of Atalodera + Sherodera + Thecavermiculatus (tribe Ataloderini), and Cactodera + Heterodera + Afenestrata, as well as Punctodera + Globodera + Dolichodera is supported by both programs. Most importantly, analyses strongly support monophyly of all cyst-forming genera (tribe Heteroderini) contrary to previous hypotheses of repeated evolution of the cyst (Wouts, 1985). In addition, monophyly of the Heteroderini with the Ataloderini is demonstrated. PAUP indicates monophyly of Sarisodera + Rhizonema + Bellodera + Hylonema and Ekphymatodera (tribe Sarisoderini new rank). Monophyly of the Sarisoderini was at first only weakly supported, but, subsequently, the reduced width of the submedial lips of second stage juveniles and males was recognized as a synapomorphy which strengthened subsequent PAUP trees and monophyly of the tribe. The present study rejects as paraphyletic or polyphyletic several previously proposed combinations, including Thecavermiculatus sequoiae (versus Rhizonema sequoiae), Sarisodera africana (versus Afenestrata africana), Dolichodera andinus (versus Thecavermiculatus andinus). The question whether T. andinus is a distinct genus, was not resolved due to insufficient data. PAUP supports our previous observations that Cactodera betulae is intermediate in a transformation series between other Cactodera and Heterodera: it also indicates these species as bring monophyletic with Heterodera + Afenestrata, but not with other Cactodera. Although these phylogenetic analyses strongly support some relationships, they indicate unresolved alternative hypotheses for others. Meloidodera (tribe Meloidoderini) and Cryphodera (tribe Cryphoderini) must be investigated for consideration of a possible synapomorphy not included in the present data matrix. Future studies are proposed to more clearly define the monophyly of the Heteroderini, as well as the Sarisoderini. Tests are also proposed to clarify questions of the monophyly of Verutus (tribe Verutini new rank) with the Heteroderinae versus other Tylenchida.  相似文献   

17.
Phylogeny of Chinese Allium (Liliaceae) using PCR-RFLP analysis   总被引:5,自引:0,他引:5  
Eighteen representative species were selected from all the nine sections of Chinese Allium on the basis of the classification of morphology and cytotaxonomy. The trnK and rpL16 gene fragments of chloroplast DNA were amplified from 18 species by PCR method. The two cpDNA fragments were digested by 26 restriction enzymes, and 303 polymorphic restriction sites were found, of which 163 were informative. The restriction site data were analyzed with PAUP (version 3.1.1) and MEGA (version 1.01) as well as PHYLIP. As a result, the genus Allium could be classified into six subgenera. The recognition of Sect. Anguinum in the Flora of China is reasonable, Sect. Rhizirideum, Sect. Haplostemon and Sect. Cepa are not monophyletic. The infrageneric system of this genus was also discussed.  相似文献   

18.
Single nucleotide polymorphism (SNP) is the most frequent form of DNA variation. The set of SNP's present in a chromosome (called the em haplotype) is of interest in a wide area of applications in molecular biology and biomedicine, including diagnostic and medical therapy. In this paper we propose a new heuristic method for the problem of haplotype reconstruction for (portions of) a pair of homologous human chromosomes from a single individual (SIH). The problem is well known in literature and exact algorithms have been proposed for the case when no (or few) gaps are allowed in the input fragments. These algorithms, though exact and of polynomial complexity, are slow in practice. When gaps are considered no exact method of polynomial complexity is known. The problem is also hard to approximate with guarantees. Therefore fast heuristics have been proposed. In this paper we describe SpeedHap, a new heuristic method that is able to tackle the case of many gapped fragments and retains its effectiveness even when the input fragments have high rate of reading errors (up to 20%) and low coverage (as low as 3). We test SpeedHap on real data from the HapMap Project.  相似文献   

19.
A Boolean network (BN) is a mathematical model of genetic networks. We propose several algorithms for control of singleton attractors in BN. We theoretically estimate the average-case time complexities of the proposed algorithms, and confirm them by computer experiments. The results suggest the importance of gene ordering. Especially, setting internal nodes ahead yields shorter computational time than setting external nodes ahead in various types of algorithms. We also present a heuristic algorithm which does not look for the optimal solution but for the solution whose computational time is shorter than that of the exact algorithms.  相似文献   

20.
Phylogenetic analysis of the SSU rRNA from members of the Chrysophyceae   总被引:1,自引:0,他引:1  
The nucleotide sequence for the nuclear-encoded small subunit ribosomal RNA gene (SSU rRNA) was determined for 24 species of the Chrysophyceae sensu stricto. These sequences were aligned, using primary and secondary structure, with nine previously published sequences for the Chrysophyceae, 14 for the Synurophyceae, and five for the Eustigmatophyceae (outgroup). Data analyses were the substitution rate calibration distance method using neighbor-joining (TREECON), Kimura 2-parameter neighbor-joining method (PAUP) and the maximum parsimony method (PAUP, PHYLIP). Trees from the analyses were largely congruent, but bootstrap support was weak at many nodes. The analyses recovered clades of uniflagellate and biflagellate organisms associated with current higher level taxonomy (e.g., subclass, order). The genus Ochromonas was polyphyletic, and O. tuberculata in particular was distantly related to the other Ochromonas species in the analysis. The family Paraphysomonadaceae occupied a basal position in three of four analyses. The class Synurophyceae appeared to be embedded within the Chrysophyceae, but bootstrap support was weak (< 50%) in all analyses except the PHYLIP parsimony analysis (= 81%). It was considered premature to place the Synurophyceae back into the Chrysophyceae based upon the analysis of one gene, especially given the ultrastructural and pigment differences between the two groups, but the relationship of these two groups deserves further study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号