期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Solving Hard Computational Problems Efficiently: Asymptotic Parametric Complexity 3-Coloring Algorithm

José Antonio Martín H. 《PloS one》2013,8(1)

Many practical problems in almost all scientific and technological disciplines have been classified as computationally hard (NP-hard or even NP-complete). In life sciences, combinatorial optimization problems frequently arise in molecular biology, e.g., genome sequencing; global alignment of multiple genomes; identifying siblings or discovery of dysregulated pathways. In almost all of these problems, there is the need for proving a hypothesis about certain property of an object that can be present if and only if it adopts some particular admissible structure (an NP-certificate) or be absent (no admissible structure), however, none of the standard approaches can discard the hypothesis when no solution can be found, since none can provide a proof that there is no admissible structure. This article presents an algorithm that introduces a novel type of solution method to “efficiently” solve the graph 3-coloring problem; an NP-complete problem. The proposed method provides certificates (proofs) in both cases: present or absent, so it is possible to accept or reject the hypothesis on the basis of a rigorous proof. It provides exact solutions and is polynomial-time (i.e., efficient) however parametric. The only requirement is sufficient computational power, which is controlled by the parameter . Nevertheless, here it is proved that the probability of requiring a value of to obtain a solution for a random graph decreases exponentially: , making tractable almost all problem instances. Thorough experimental analyses were performed. The algorithm was tested on random graphs, planar graphs and 4-regular planar graphs. The obtained experimental results are in accordance with the theoretical expected results. 相似文献

2.

A new fast algorithm for solving the minimum spanning tree problem based on DNA molecules computation 总被引：1，自引：0，他引：1

Zhaocai Wang Dongmei Huang Huajun Meng Chengpei Tang 《Bio Systems》2013

The minimum spanning tree (MST) problem is to find minimum edge connected subsets containing all the vertex of a given undirected graph. It is a vitally important NP-complete problem in graph theory and applied mathematics, having numerous real life applications. Moreover in previous studies, DNA molecular operations usually were used to solve NP-complete head-to-tail path search problems, rarely for NP-hard problems with multi-lateral path solutions result, such as the minimum spanning tree problem. In this paper, we present a new fast DNA algorithm for solving the MST problem using DNA molecular operations. For an undirected graph with n vertex and m edges, we reasonably design flexible length DNA strands representing the vertex and edges, take appropriate steps and get the solutions of the MST problem in proper length range and O(3m + n) time complexity. We extend the application of DNA molecular operations and simultaneity simplify the complexity of the computation. Results of computer simulative experiments show that the proposed method updates some of the best known values with very short time and that the proposed method provides a better performance with solution accuracy over existing algorithms. 相似文献

3.

On the complexity of positional sequencing by hybridization. 总被引：2，自引：0，他引：2

A Ben-Dor I Pe'er R Shamir R Sharan 《Journal of computational biology》2001,8(4):361-371

In sequencing by hybridization (SBH), one has to reconstruct a sequence from its l-long substrings. SBH was proposed as an alternative to gel-based DNA sequencing approaches, but in its original form the method is not competitive. Positional SBH (PSBH) is a recently proposed enhancement of SBH in which one has additional information about the possible positions of each substring along the target sequence. We give a linear time algorithm for solving PSBH when each substring has at most two possible positions. On the other hand, we prove that the problem is NP-complete if each substring has at most three possible positions. We also show that PSBH is NP-complete if the set of allowed positions for each substring is an interval of length k and provide a fast algorithm for the latter problem when k is bounded. 相似文献

4.

An efficient rank based approach for closest string and closest substring 总被引：1，自引：0，他引：1

Dinu LP Ionescu R 《PloS one》2012,7(6):e37576

This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results. 相似文献

5.

基于质粒DNA匹配问题的分子算法 总被引：7，自引：0，他引：7

高琳马润年许进《生物化学与生物物理进展》2002,29(5):820-823

给定无向图,图的最小极大匹配问题是寻找每条边都不相邻的最大集中的最小者,这个问题是著名的NP-完全问题.1994年Adleman博士首次提出用DNA计算解决NP-完全问题,以编码的DNA序列为运算对象,通过分子生物学的运算操作解决复杂的数学难题,使得NP-完全问题的求解可能得到解决.提出了基于质粒DNA的无向图的最大匹配问题的DNA分子生物算法,通过限制性内切酶的酶切和凝胶电泳完成解的产生和最终接的分离,依据分子生物学的实验手段,算法是有效并且可行的. 相似文献

6.

基于分子信标的DNA计算 总被引：12，自引：5，他引：12

殷志祥张风月许进《生物数学学报》2003,18(4):497-501

DNA计算是解决一类难以计算问题的一种新方法,这种计算随着问题的增大可以呈指数增长．迄今为止,许多研究成果已经成功地提高了它的性能和增加了它的可行性,本文在基于表面的DNA计算中采用了分子信标编码策略,并对分子信标在与对应的补链杂交形成双链时的受力进行分析,给出3-SAT问题的另一种解法．这种方法比现有的方法更有效,更具发展前景．因为它具有编码简单;耗材底;操作时间短;技术先进等优点．本文尝试了分子生物学,光学和力学的结合．这一工作为DNA计算能解决NP一完全问题提供了更有力的依据．相似文献

7.

Multiple sequence assembly from reads alignable to a common reference genome

Peng Q Smith AD 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(5):1283-1295

We describe a set of computational problems motivated by certain analysis tasks in genome resequencing. These are assembly problems for which multiple distinct sequences must be assembled, but where the relative positions of reads to be assembled are already known. This information is obtained from a common reference genome and is characteristic of resequencing experiments. The simplest variant of the problem aims at determining a minimum set of superstrings such that each sequenced read matches at least one superstring. We give an algorithm with time complexity O(N), where N is the sum of the lengths of reads, substantially improving on previous algorithms for solving the same problem. We also examine the problem of finding the smallest number of reads to remove such that the remaining reads are consistent with k superstrings. By exploiting a surprising relationship with the minimum cost flow problem, we show that this problem can be solved in polynomial time when nested reads are excluded. If nested reads are permitted, this problem of removing the minimum number of reads becomes NP-hard. We show that permitting mismatches between reads and their nearest superstrings generally renders these problems NP-hard. 相似文献

8.

Inference of haplotypes from samples of diploid populations: complexity and algorithms. 总被引：15，自引：0，他引：15

D Gusfield 《Journal of computational biology》2001,8(3):305-323

The next phase of human genomics will involve large-scale screens of populations for significant DNA polymorphisms, notably single nucleotide polymorphisms (SNPs). Dense human SNP maps are currently under construction. However, the utility of those maps and screens will be limited by the fact that humans are diploid and it is presently difficult to get separate data on the two "copies." Hence, genotype (blended) SNP data will be collected, and the desired haplotype (partitioned) data must then be (partially) inferred. A particular nondeterministic inference algorithm was proposed and studied by Clark (1990) and extensively used by Clark et al. (1998). In this paper, we more closely examine that inference method and the question of whether we can obtain an efficient, deterministic variant to optimize the obtained inferences. We show that the problem is NP-hard and, in fact, Max-SNP complete; that the reduction creates problem instances conforming to a severe restriction believed to hold in real data (Clark, 1990); and that even if we first use a natural exponential-time operation, the remaining optimization problem is NP-hard. However, we also develop, implement, and test an approach based on that operation and (integer) linear programming. The approach works quickly and correctly on simulated data. 相似文献

9.

Hairpin formation in DNA computation presents limits for large NP-complete problems

Li D Huang H Li X Li X 《Bio Systems》2003,72(3):203-207

Recently, several DNA computing paradigms for NP-complete problems were presented, especially for the 3-SAT problem. Can the present paradigms solve more than just trivial instances of NP-complete problems? In this paper we show that with high probability potentially deleterious features such as severe hairpin loops would be likely to arise. If DNA strand x of length n and the 'complement' of the reverse of x have l match bases, then x forms a hairpin loop and is called a (n,l)-hairpin format. Let gamma=2l/n. Then gamma can be considered as a measurement of the stability of hairpin loops. Let p(n,l) be the probability that a n-mer DNA strand is a (n,l)-hairpin format, and q(n,l)((m)) be the probability that m ones are chosen at random from 4(n) n-mer oligonucleotides such that at least one of the m ones is a (n,l)-hairpin format. Then, q(n,l)((m))=1-(1-p(n,l))(m)=mp(n,l). If we require q(n,l)((m))相似文献

10.

A new solution for maximal clique problem based sticker model 总被引：1，自引：0，他引：1

Darehmiraki M 《Bio Systems》2009,95(2):145-149

In this paper, we use stickers to construct a solution space of DNA for the maximal clique problem (MCP). Simultaneously, we also apply the DNA operation in the sticker-based model to develop a DNA algorithm. The results of the proposed algorithm show that the MCP is resolved with biological operations in the sticker-based model for the solution space of the sticker. Moreover, this work presents clear evidence of the ability of DNA computing to solve the NP-complete problem. The potential of DNA computing for the MCP is promising given the operational time complexity of O(nxk). 相似文献

11.

Iterative pass optimization of sequence data 总被引：3，自引：1，他引：2

Wheeler WC 《Cladistics : the international journal of the Willi Hennig Society》2003,19(3):254-260

The problem of determining the minimum-cost hypothetical ancestral sequences for a given cladogram is known to be NP-complete. This "tree alignment" problem has motivated the considerable effort placed in multiple sequence alignment procedures. Wheeler in 1996 proposed a heuristic method, direct optimization, to calculate cladogram costs without the intervention of multiple sequence alignment. This method, though more efficient in time and more effective in cladogram length than many alignment-based procedures, greedily optimizes nodes based on descendent information only. In their proposal of an exact multiple alignment solution, Sankoff et al. in 1976 described a heuristic procedure--the iterative improvement method--to create alignments at internal nodes by solving a series of median problems. The combination of a three-sequence direct optimization with iterative improvement and a branch-length-based cladogram cost procedure, provides an algorithm that frequently results in superior (i.e., lower) cladogram costs. This iterative pass optimization is both computation and memory intensive, but economies can be made to reduce this burden. An example in arthropod systematics is discussed. 相似文献

12.

The surface-based approach for DNA computation is unreliable for SAT

Li D Li X Huang H Li X 《Bio Systems》2005,82(1):20-25

Previous research presented DNA computing on surfaces, which applied to each clause three operations:"mark","destroy", and "unmark", and demonstrated how to solve a four-variable four-clause instance of the 3-SAT. It was claimed that only the strands satisfying the problem remained on the surface at the end of the computation and the surface-based approach was capable of scaling up to larger 3-SAT problems. Accordingly, the identities of the strands were only determined in the"readout" step for the correct solutions to the problem without checking if the strands really satisfied the problem. Thus, based on the claim above, the surface-based approach became a polynomial-time algorithm. In this paper, we show that for some instance of SAT, at the end of the computation all the remaining strands falsify the instance. However, by the previous claim all the strands falsifying the problems would be regarded as the correct solutions to the problems. Therefore, the DNA computing on surfaces is unreliable. For this reason, it is necessary to add a "verify" step after the "readout" step to check if the strands remaining on the surface at the end of the computation really satisfy the problem. 相似文献

13.

Deconvolving sequence variation in mixed DNA populations.

Andy Wildenberg Steven Skiena Pavel Sumazin 《Journal of computational biology》2003,10(3-4):635-652

We present an original approach to identifying sequence variants in a mixed DNA population from sequence trace data. The heart of the method is based on parsimony: given a wildtype DNA sequence, a set of observed variations at each position collected from sequencing data, and a complete catalog of all possible mutations, determine the smallest set of mutations from the catalog that could fully explain the observed variations. The algorithmic complexity of the problem is analyzed for several classes of mutations, including block substitutions, single-range deletions, and single-range insertions. The reconstruction problem is shown to be NP-complete for single-range insertions and deletions, while for block substitutions, single character insertion, and single character deletion mutations, polynomial time algorithms are provided. Once a minimum set of mutations compatible with the observed sequence is found, the relative frequency of those mutations is recovered by solving a system of linear equations. Simulation results show the algorithm successfully deconvolving mutations in p53 known to cause cancer. An extension of the algorithm is proposed as a new method of high throughput screening for single nucleotide polymorphisms by multiplexing DNA. 相似文献

14.

Complexity and approximability of double digest

Cieliebak M Eidenbenz S Woeginger GJ 《Journal of bioinformatics and computational biology》2005,3(2):207-223

We revisit the DOUBLE DIGEST problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes. We first show that DOUBLE DIGEST is strongly NP-complete, improving upon previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites turns out to be strongly NP-complete. In the second part, we model errors in data as they occur in real-life experiments: we propose several optimization variations of DOUBLE DIGEST that model partial cleavage errors. We then show that most of these variations are hard to approximate. In the third part, we investigate variations with the additional restriction that coincident cut sites are disallowed, and we show that it is NP-hard to even find feasible solutions in this case, thus making it impossible to guarantee any approximation ratio at all. 相似文献

15.

Solving the set cover problem and the problem of exact cover by 3-sets in the Adleman-Lipton model 总被引：3，自引：0，他引：3

Chang WL Guo M 《Bio Systems》2003,72(3):263-275

Adleman wrote the first paper in which it is shown that deoxyribonucleic acid (DNA) strands could be employed towards calculating solutions to an instance of the NP-complete Hamiltonian path problem (HPP). Lipton also demonstrated that Adleman's techniques could be used to solve the NP-complete satisfiability (SAT) problem (the first NP-complete problem). In this paper, it is proved how the DNA operations presented by Adleman and Lipton can be used for developing DNA algorithms to resolving the set cover problem and the problem of exact cover by 3-sets. 相似文献

16.

The k partition-distance problem

Chen YH 《Journal of computational biology》2012,19(4):404-417

Many applications of data partitioning (clustering) have been well studied in bioinformatics. Consider, for instance, a set N of organisms (elements) based on DNA marker data. A partition divides all elements in N into two or more disjoint clusters that cover all elements, where a cluster contains a non-empty subset of N. Different partitioning algorithms may produce different partitions. To compute the distance and find the consensus partition (also called consensus clustering) between two or more partitions are important and interesting problems that arise frequently in bioinformatics and data mining, in which different distance functions may be considered in different partition algorithms. In this article, we discuss the k partition-distance problem. Given a set of elements N with k partitions of N, the k partition-distance problem is to delete the minimum number of elements from each partition such that all remaining partitions become identical. This problem is NP-complete for general k?>?2 partitions, and no algorithms are known at present. We design the first known heuristic and approximation algorithms with performance ratios 2 to solve the k partition-distance problem in O(k?·?ρ?·?|N|) time, where ρ is the maximum number of clusters of these k partitions and |N| is the number of elements in N. We also present the first known exact algorithm in O(??·?2(?)·k(2)?·?|N|(2)) time, where ? is the partition-distance of the optimal solution for this problem. Performances of our exact and approximation algorithms in testing the random data with actual sets of organisms based on DNA markers are compared and discussed. Experimental results reveal that our algorithms can improve the computational speed of the exact algorithm for the two partition-distance problem in practice if the maximum number of elements per cluster is less than ρ. From both theoretical and computational points of view, our solutions are at most twice the partition-distance of the optimal solution. A website offering the interactive service of solving the k partition-distance problem using our and previous algorithms is available (see http://mail.tmue.edu.tw/~yhchen/KPDP.html). 相似文献

17.

qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences

H Dinh S Rajasekaran J Davila 《PloS one》2012,7(7):e41425

相似文献

18.

Protein structure alignment by deterministic annealing 总被引：2，自引：0，他引：2

Chen L Zhou T Tang Y 《Bioinformatics (Oxford, England)》2005,21(1):51-62

MOTIVATION: Protein structure alignment is one of the most important computational problems in molecular biology and plays a key role in protein structure prediction, fold family classification, motif finding, phylogenetic tree reconstruction and so on. From the viewpoint of computational complexity, a pairwise structure alignment is also a NP-hard problem, in contrast to the polynomial time algorithm for a pairwise sequence alignment. RESULTS: We propose a method for solving the structure alignment problem in an accurate manner at the amino acid level, based on a mean field annealing technique. We define the structure alignment as a mixed integer-programming (MIP) problem. By avoiding complicated combinatorial computation and exploiting the special structure of the continuous partial problem, we transform the MIP into a reduced non-linear continuous optimization problem (NCOP) with a much simpler form. To optimize the reduced NCOP, a mean field annealing procedure is adopted with a modified Potts model, whose solution is generally identical to that of the MIP. There is no 'soft constraint' in our mean field model and all constraints are automatically satisfied throughout the annealing process, thereby not only making the optimization more efficient but also eliminating many unnecessary parameters that depend on problems and usually require careful tuning. A number of benchmark examples are tested by the proposed method with comparisons to several existing approaches. 相似文献

19.

Computational complexity of a problem in molecular structure prediction.

J T Ngo J Marks 《Protein engineering》1992,5(4):313-321

The computational task of protein structure prediction is believed to require exponential time, but previous arguments as to its intractability have taken into account only the size of a protein's conformational space. Such arguments do not rule out the possible existence of an algorithm, more selective than exhaustive search, that is efficient and exact. (An efficient algorithm is one that is guaranteed, for all possible inputs, to run in time bounded by a function polynomial in the problem size. An intractable problem is one for which no efficient algorithm exists.) Questions regarding the possible intractability of problems are often best answered using the theory of NP-completeness. In this treatment we show the NP-hardness of two typical mathematical statements of empirical potential energy function minimization of macromolecules. Unless all NP-complete problems can be solved efficiently, these results imply that a function minimization algorithm can be efficient for protein structure prediction only if it exploits protein-specific properties that prohibit the simple geometric constructions that we use in our proofs. Analysis of further mathematical statements of molecular structure prediction could constitute a systematic methodology for identifying sources of complexity in protein folding, and for guiding development of predictive algorithms. 相似文献

20.

基于三链核酸的DNA计算 总被引：2，自引：0，他引：2

FANG Gang 张社民朱岩许进《生物信息学》2009,7(3):181-185

一种研究DNA计算的新模型——三链DNA计算模型在本文中提出。此模型是在近年三链核酸的研究成果的基础上建立的。并应用于求解可满足性问题（SAT）,这是一个困难的NP-完全问题。不同于以住的DNA计算方法,基于三链核酸的分子算法通过寡聚脱氧核苷酸（ODN）在RecA蛋白的介导下与同源的双链DNA匹配成三链DNA进行基本的运算,这样可以有效减少计算中的错误。依据分子生物学的实验方法,该算法切实可行并且有效。相似文献