首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
A stepwise algorithm for finding minimum evolution trees   总被引:1,自引:6,他引:1  
A stepwise algorithm for reconstructing minimum evolution (ME) trees from evolutionary distance data is proposed. In each step, a taxon that potentially has a neighbor (another taxon connected to it with a single interior node) is first chosen and then its true neighbor searched iteratively. For m taxa, at most (m-1)!/2 trees are examined and the tree with the minimum sum of branch lengths (S) is chosen as the final tree. This algorithm provides simple strategies for restricting the tree space searched and allows us to implement efficient ways of dynamically computing the ordinary least squares estimates of S for the topologies examined. Using computer simulation, we found that the efficiency of the ME method in recovering the correct tree is similar to that of the neighbor-joining method (Saitou and Nei 1987). A more exhaustive search is unlikely to improve the efficiency of the ME method in finding the correct tree because the correct tree is almost always included in the tree space searched with this stepwise algorithm. The new algorithm finds trees for which S values may not be significantly different from that of the ME tree if the correct tree contains very small interior branches or if the pairwise distance estimates have large sampling errors. These topologies form a set of plausible alternatives to the ME tree and can be compared with each other using statistical tests based on the minimum evolution principle. The new algorithm makes it possible to use the ME method for large data sets.   相似文献   

2.
S Wang  X Li  J Fang 《BMC bioinformatics》2012,13(1):178-26
ABSTRACT: BACKGROUND: Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. RESULTS: This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. CONCLUSIONS: It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network.  相似文献   

3.
4.
MOTIVATION: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. RESULTS: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.  相似文献   

5.
Cluster Computing - Minimum latency problem (MLP) is an NP-Hard combinatorial optimization problem. Metaheuristics use perturbation and randomization to arrive at a satisfactory solution under time...  相似文献   

6.

Background  

Distance matrix methods constitute a major family of phylogenetic estimation methods, and the minimum evolution (ME) principle (aiming at recovering the phylogeny with shortest length) is one of the most commonly used optimality criteria for estimating phylogenetic trees. The major difficulty for its application is that the number of possible phylogenies grows exponentially with the number of taxa analyzed and the minimum evolution principle is known to belong to the -hard class of problems.  相似文献   

7.
The paper considers the loading problem in flexible manufacturing systems (FMSs). This problem involves the assignment to the machine tools of all operations and associated cutting tools required for part types that have been selected to be produced simultaneously. The loading problem is first formulated as a linear mixed 0–1 program with the objective to minimize the greatest workload assigned to each machine. A heuristic procedure is presented in which an assignment of operations to machine tools is obtained by solving a parameterized generalized assignment problem with an objective function that approximates the use of tool slots required by the operations assigned to the machines. The algorithm is coded in FORTRAN and tested on an IBM-compatible personal computer. Computational results are presented for different test problems to demonstrate the efficiency and effectiveness of the suggested procedure.  相似文献   

8.
This paper addresses the problem of finding attractors in synchronous Boolean networks. The existing Boolean decision diagram-based algorithms have limited capacity due to the excessive memory requirements of decision diagrams. The simulation-based algorithms can be applied to larger networks, however, they are incomplete. We present an algorithm, which uses a SAT-based bounded model checking to find all attractors in a Boolean network. The efficiency of the presented algorithm is evaluated by analyzing seven networks models of real biological processes, as well as 150,000 randomly generated Boolean networks of sizes between 100 and 7,000. The results show that our approach has a potential to handle an order of magnitude larger models than currently possible.  相似文献   

9.
This paper describes a generic algorithm for finding restrictionsites within DNA sequences. The ‘genericity’ ofthe algorithm is made possible through the use of set theory.Basic elements of DNA sequences, i.e. nucleotides (bases), arerepresented in sets, and DNA sequences, whether specific, ambiguousor even protein-coding, are represented as sequences of thosesets. The set intersection operation demonstrates its abilityto perform pattern-matching correctly on various DNA sequences.The performance analysis showed that the degree of complexityof the pattern matching is reduced from exponential to linear.An example is given to show the actual and potential restrictionsites, derived by the generic algorithm, in the DNA sequencetemplate coding for a synthetic calmodulin. Received on October 2, 1990; accepted on December 18, 1990  相似文献   

10.
在生物信息学研究中,生物序列比对问题占有重要的地位。多序列比对问题是一个NPC问题,由于时间和空间的限制不能够求出精确解。文中简要介绍了Feng和Doolittle提出的多序列比对算法的基本思想,并改进了该算法使之具有更好的比对精度。实验结果表明,新算法对解决一般的progressive多序列比对方法中遇到的局部最优问题有较好的效果。  相似文献   

11.
12.
Accurate phylogenetic reconstruction methods are inherently computationally heavy and therefore are limited to relatively small numbers of taxa. Supertree construction is the task of amalgamating small trees over partial sets into a big tree over the complete taxa set. The need for fast and accurate supertree methods has become crucial due to the enormous number of new genomic sequences generated by modern technology and the desire to use them for classification purposes. In particular, the Assembling the Tree of Life (ATOL) program aims at constructing the evolutionary history of all living organisms on Earth. When dealing with unrooted trees, a quartet - an unrooted tree over four taxa - is the most basic piece of phylogenetic information. Therefore, quartet amalgamation stands at the heart of any supertree problem as it concerns combining many minimal pieces of information into a single, coherent, and more comprehensive piece of information.We have devised an extremely fast algorithm for quartet amalgamation and implemented it in a very efficient code. The new code can handle over a hundred millions of quartet trees over several hundreds of taxa with very high accuracy.  相似文献   

13.
一种有效的重复序列识别算法   总被引:1,自引:0,他引:1  
李冬冬  王正志  倪青山 《生物信息学》2005,3(4):163-166,174
重复序列的分析是基因组研究中的一个重要课题,进行这一研究的基础则是从基因组序列中快速有效地找出其中的重复序列。一种投影拼接算法,即利用随机投影获得候选片断集合,利用片断拼接对候选片断进行拼接,以发现基因组中的重复序列。分析了算法的计算复杂度,构造了半仿真测试数据,对算法的测试结果表明了其有效性。  相似文献   

14.
Dynamic programming algorithms that predict RNA secondary structure by minimizing the free energy have had one important limitation. They were able to predict only one optimal structure. Given the uncertainties of the thermodynamic data and the effects of proteins and other environmental factors on structure, the optimal structure predicted by these methods may not have biological significance. We present a dynamic programming algorithm that can determine optimal and suboptimal secondary structures for an RNA. The power and utility of the method is demonstrated in the folding of the intervening sequence of the rRNA of Tetrahymena. By first identifying the major secondary structures corresponding to the lowest free energy minima, a secondary structure of possible biological significance is derived.  相似文献   

15.
A computer program is described which finds sets of diagnostic characters for the recognition of species. Unlike previous algorithms, it finds all the possible sets requested and will also run with reasonable demands on computer time and storage. The program will search for sets with a specified size range and with a given minimum number of diagnostic characters to distinguish a taxon from all the others.  相似文献   

16.
SUMMARY: We describe an algorithm and software tool for comparing alternative phylogenetic trees. The main application of the software is to compare phylogenies obtained using different phylogenetic methods for some fixed set of species or obtained using different gene sequences from those species. The algorithm pairs up each branch in one phylogeny with a matching branch in the second phylogeny and finds the optimum 1-to-1 map between branches in the two trees in terms of a topological score. The software enables the user to explore the corresponding mapping between the phylogenies interactively, and clearly highlights those parts of the trees that differ, both in terms of topology and branch length. AVAILABILITY: The software is implemented as a Java applet at http://www.mrc-bsu.cam.ac.uk/personal/thomas/phylo_comparison/comparison_page.html. It is also available on request from the authors.  相似文献   

17.
18.
19.
We study the problem of approximate non-tandem repeat extraction. Given a long subject string S of length N over a finite alphabet Sigma and a threshold D, we would like to find all short substrings of S of length P that repeat with at most D differences, i.e., insertions, deletions, and mismatches. We give a careful theoretical characterization of the set of seeds (i.e., some maximal exact repeats) required by the algorithm, and prove a sublinear bound on their expected numbers. Using this result, we present a sub-quadratic algorithm for finding all short (i.e., of length O(log N)) approximate repeats. The running time of our algorithm is O(DN(3pow(epsilon)-1)log N), where epsilon = D/P and pow(epsilon) is an increasing, concave function that is 0 when epsilon = 0 and about 0.9 for DNA and protein sequences.  相似文献   

20.

Background

For treating a complex disease such as cancer, we need effective means to control the biological network that underlies the disease. However, biological networks are typically robust to external perturbations, making it difficult to beneficially alter the network dynamics by controlling a single target. In fact, multi-target therapeutics is often more effective compared to monotherapies, and combinatory drugs are commonly used these days for treating various diseases. A practical challenge in combination therapy is that the number of possible drug combinations increases exponentially, which makes the prediction of the optimal drug combination a difficult combinatorial optimization problem. Recently, a stochastic optimization algorithm called the Gur Game algorithm was proposed for drug optimization, which was shown to be very efficient in finding potent drug combinations.

Results

In this paper, we propose a novel stochastic optimization algorithm that can be used for effective optimization of combinatory drugs. The proposed algorithm analyzes how the concentration change of a specific drug affects the overall drug response, thereby making an informed guess on how the concentration should be updated to improve the drug response. We evaluated the performance of the proposed algorithm based on various drug response functions, and compared it with the Gur Game algorithm.

Conclusions

Numerical experiments clearly show that the proposed algorithm significantly outperforms the original Gur Game algorithm, in terms of reliability and efficiency. This enhanced optimization algorithm can provide an effective framework for identifying potent drug combinations that lead to optimal drug response.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号