首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Since metabolites cannot be predicted from the genome sequence, high-throughput de novo identification of small molecules is highly sought. Mass spectrometry (MS) in combination with a fragmentation technique is commonly used for this task. Unfortunately, automated analysis of such data is in its infancy. Recently, fragmentation trees have been proposed as an analysis tool for such data. Additional fragmentation steps (MS(n)) reveal more information about the molecule. We propose to use MS(n) data for the computation of fragmentation trees, and present the Colorful Subtree Closure problem to formalize this task: There, we search for a colorful subtree inside a vertex-colored graph, such that the weight of the transitive closure of the subtree is maximal. We give several negative results regarding the tractability and approximability of this and related problems. We then present an exact dynamic programming algorithm, which is parameterized by the number of colors in the graph and is swift in practice. Evaluation of our method on a dataset of 45 reference compounds showed that the quality of constructed fragmentation trees is improved by using MS(n) instead of MS2 measurements.  相似文献   

2.
Scalability of the surface-based DNA algorithm for 3-SAT   总被引:3,自引:0,他引:3  
Li D  Li X  Huang H  Li X 《Bio Systems》2006,85(2):95-98
Since Adleman first proposed DNA computing for the Hamiltonian path problem, several authors have reported DNA computing for 3-SAT. Previous research presented DNA computing on surfaces and demonstrated how to solve a four-variable four-clause instance of 3-SAT, and claimed that the surface-based approach was designed to scale up to larger problems. In this paper we establish an error model for the incomplete "mark" and imperfect "destroy" operations. By using the error model we argue that no matter how large the "mark" and "destroy" rates are we can always give satisfiable instances of 3-SAT such that no DNA strands remain on the surface at the end of the computation. By the surface-based approach the satisfiable instances of 3-SAT would be misdetermined to be unsatisfiable. Thus, the error leads to an incorrect result of the SAT computation. Furthermore, given the "mark" rate p and the "not-destroy" rate rho, we find that the approach can only solve at most N-variable instances of 3-SAT problems, where N=[(2+beta(2)+2+2 square root beta (2))/beta(2)] in which beta=1-1/(p+rhoq) and q=1-p and [a] is the greatest integer less than a or equal to a.  相似文献   

3.
The directed Hamiltonian path (DHP) problem is one of the hard computational problems for which there is no practical algorithm on a conventional computer available. Many problems, including the traveling sales person problem and the longest path problem, can be translated into the DHP problem, which implies that an algorithm for DHP can also solve all the translated problems. To study the robustness of the laboratory protocol of the pioneering DNA computing for the DHP problem performed by Leonard Adleman (1994), we investigated how the graph size, multiplicity of the Hamiltonian paths, and the size of oligonucleotides that encode the vertices would affect the laboratory procedures. We applied Adleman's protocol with 18-mer oligonucleotide per node to a graph with 8 vertices and 14 edges containing two Hamiltonian paths (Adleman used 20-mer oligonucleotides for a graph with 7 nodes, 14 edges and one Hamiltonian path). We found that depending on the graph characteristics such as the number of short cycles, the oligonucleotide size, and the hybridization conditions that used to encode the graph, the protocol should be executed with different parameters from Adleman's.  相似文献   

4.
Xiao D  Li W  Zhang Z  He L 《Bio Systems》2005,82(3):203-207
In this paper, we consider a procedure for solving maximum cut problems in the Adleman-Lipton model. The procedure works in O(n2) steps for maximum cut problems of an undirected graph with n vertices.  相似文献   

5.
The minimum spanning tree (MST) problem is to find minimum edge connected subsets containing all the vertex of a given undirected graph. It is a vitally important NP-complete problem in graph theory and applied mathematics, having numerous real life applications. Moreover in previous studies, DNA molecular operations usually were used to solve NP-complete head-to-tail path search problems, rarely for NP-hard problems with multi-lateral path solutions result, such as the minimum spanning tree problem. In this paper, we present a new fast DNA algorithm for solving the MST problem using DNA molecular operations. For an undirected graph with n vertex and m edges, we reasonably design flexible length DNA strands representing the vertex and edges, take appropriate steps and get the solutions of the MST problem in proper length range and O(3m + n) time complexity. We extend the application of DNA molecular operations and simultaneity simplify the complexity of the computation. Results of computer simulative experiments show that the proposed method updates some of the best known values with very short time and that the proposed method provides a better performance with solution accuracy over existing algorithms.  相似文献   

6.
Li D  Li X  Huang H  Li X 《Bio Systems》2005,82(1):20-25
Previous research presented DNA computing on surfaces, which applied to each clause three operations:"mark","destroy", and "unmark", and demonstrated how to solve a four-variable four-clause instance of the 3-SAT. It was claimed that only the strands satisfying the problem remained on the surface at the end of the computation and the surface-based approach was capable of scaling up to larger 3-SAT problems. Accordingly, the identities of the strands were only determined in the"readout" step for the correct solutions to the problem without checking if the strands really satisfied the problem. Thus, based on the claim above, the surface-based approach became a polynomial-time algorithm. In this paper, we show that for some instance of SAT, at the end of the computation all the remaining strands falsify the instance. However, by the previous claim all the strands falsifying the problems would be regarded as the correct solutions to the problems. Therefore, the DNA computing on surfaces is unreliable. For this reason, it is necessary to add a "verify" step after the "readout" step to check if the strands remaining on the surface at the end of the computation really satisfy the problem.  相似文献   

7.
An analogy between the evolution of organisms and some complex computational problems (cryptosystem cracking, determination of the shortest path in a graph) is considered. It is shown that in the absence of a priori information about possible species of organisms such a problem is complex (is rated in the class NP) and cannot be solved in a polynomial number of steps. This conclusion suggests the need for re-examination of evolution mechanisms. Ideas of a deterministic approach to the evolution are discussed.  相似文献   

8.
Fu B  Beigel R 《Bio Systems》1999,52(1-3):155-163
The length of DNA strands is an important resource in DNA computing. We show how to decrease strand lengths in known molecular algorithms for some NP-complete problems, such as like 3-SAT and Independent Set, without substantially increasing their running time or volume.  相似文献   

9.
Path matching and graph matching in biological networks.   总被引:2,自引:0,他引:2  
We develop algorithms for the following path matching and graph matching problems: (i) given a query path p and a graph G, find a path p' that is most similar to p in G; (ii) given a query graph G (0) and a graph G, find a graph G (0)' that is most similar to G (0) in G. In these problems, p and G (0) represent a given substructure of interest to a biologist, and G represents a large network in which the biologist desires to find a related substructure. These algorithms allow the study of common substructures in biological networks in order to understand how these networks evolve both within and between organisms. We reduce the path matching problem to finding a longest weighted path in a directed acyclic graph and show that the problem of finding top k suboptimal paths can be solved in polynomial time. This is in contrast with most previous approaches that used exponential time algorithms to find simple paths which are practical only when the paths are short. We reduce the graph matching problem to finding highest scoring subgraphs in a graph and give an exact algorithm to solve the problem when the query graph G (0) is of moderate size. This eliminates the need for less accurate heuristic or randomized algorithms.We show that our algorithms are able to extract biologically meaningful pathways from protein interaction networks in the DIP database and metabolic networks in the KEGG database. Software programs implementing these techniques (PathMatch and GraphMatch) are available at http://faculty.cs.tamu.edu/shsze/pathmatch and http://faculty.cs.tamu.edu/shsze/graphmatch.  相似文献   

10.
The normalization of data obtained from hybridization experiments with DNA chips to determine mRNA expression and concentration (gene expression profiling) is an unsolved problem. Furthermore, slight changes in mRNA expression or small numbers of mRNA molecules which may be relevant to disease cannot be detected so far. We have designed a method to calculate the number of molecules of a single mRNA species in a complex mRNA preparation. The basic concept is the transformation of a quantitative problem into a qualitative problem. Individual molecules pertaining to the same molecular species (IMPSMS) are transformed to a mixture of new different molecular species (DMS) and amplified. We propose two implementations of the method. The first procedure is based on a method for cloning tagged nucleic acid molecules onto the surface of micro-beads. It should be possible to transform and determine up to 10(6) IMPSMS into new DMS. The second strategy uses multimeric linkers, a method frequently used in DNA computing to assemble random DNA. The second strategy should be easier to implement but is limited to a few hundred IMPSMS.  相似文献   

11.
In 1945, Fox developed the strategy for sequencing long proteins by using overlapping fragments. We show how the formal mathematical technique for the construction of interval graphs (Gilmore and Hoffman, 1964) is useful both pedagogically for understanding the underlying logic of sequencing linear molecules and is more amenable to automation because of its algorithmic nature. We also present a computer program, that employs the interval graph algorithm, which can be used to sequence proteins when given digest data. An example is given to illustrate all the steps involved in the algorithmic processing of the data. The need for such developments with respect to molecular evolution is discussed.  相似文献   

12.
FlexProt is a novel technique for the alignment of flexible proteins. Unlike all previous algorithms designed to solve the problem of structural comparisons allowing hinge-bending motions, FlexProt does not require an a priori knowledge of the location of the hinge(s). FlexProt carries out the flexible alignment, superimposing the matching rigid subpart pairs, and detects the flexible hinge regions simultaneously. A large number of methods are available to handle rigid structural alignment. However, proteins are flexible molecules, which may appear in different conformations. Hence, protein structural analysis requires algorithms that can deal with molecular flexibility. Here, we present a method addressing specifically a flexible protein alignment task. First, the method efficiently detects maximal congruent rigid fragments in both molecules. Transforming the task into a graph theoretic problem, our method proceeds to calculate the optimal arrangement of previously detected maximal congruent rigid fragments. The fragment arrangement does not violate the protein sequence order. A clustering procedure is performed on fragment-pairs which have the same 3-D rigid transformation regardless of insertions and deletions (such as loops and turns) which separate them. Although the theoretical worst case complexity of the algorithm is O(n(6)), in practice FlexProt is highly efficient. It performs a structural comparison of a pair of proteins 300 amino acids long in about seven seconds on a standard desktop PC (400 MHz Pentium II processor with 256MB internal memory). We have performed extensive experiments with the algorithm. An assortment of these results is presented here. FlexProt can be accessed via WWW at bioinfo3d.cs.tau.ac.il/FlexProt/.  相似文献   

13.
Water molecules immobilized on a protein or DNA surface are known to play an important role in intramolecular and intermolecular interactions. Comparative analysis of related three-dimensional (3D) structures allows to predict the locations of such water molecules on the protein surface. We have developed and implemented the algorithm WLAKE detecting "conserved" water molecules, i.e. those located in almost the same positions in a set of superimposed structures of related proteins or macromolecular complexes. The problem is reduced to finding maximal cliques in a certain graph. Despite exponential algorithm complexity, the program works appropriately fast for dozens of superimposed structures. WLAKE was used to predict functionally significant water molecules in enzyme active sites (transketolases) as well as in intermolecular (ETS-DNA complexes) and intramolecular (thiol-disulfide interchange protein) interactions. The program is available online at http://monkey.belozersky.msu.ru/~evgeniy/wLake/wLake.html.  相似文献   

14.

Background  

To understand the dynamic behavior of cellular systems, mathematical modeling is often necessary and comprises three steps: (1) experimental measurement of participating molecules, (2) assignment of rate laws to each reaction, and (3) parameter calibration with respect to the measurements. In each of these steps the modeler is confronted with a plethora of alternative approaches, e. g., the selection of approximative rate laws in step two as specific equations are often unknown, or the choice of an estimation procedure with its specific settings in step three. This overall process with its numerous choices and the mutual influence between them makes it hard to single out the best modeling approach for a given problem.  相似文献   

15.
A central problem in genome rearrangement is finding a most parsimonious rearrangement scenario using certain rearrangement operations. An important problem of this type is sorting a signed genome by reversals and translocations (SBRT). Hannenhalli and Pevzner presented a duality theorem for SBRT which leads to a polynomial time algorithm for sorting a multi-chromosomal genome using a minimum number of reversals and translocations. However, there is one case for which their theorem and algorithm fail. We describe that case and suggest a correction to the theorem and the polynomial algorithm. The solution of SBRT uses a reduction to the problem of sorting a signed permutation by reversals (SBR). The best extant algorithms for SBR require quadratic time. The common approach to solve SBR is by finding a safe reversal using the overlap graph or the interleaving graph of a permutation. We describe a family of signed permutations which proves a quadratic lower bound on the number of affected vertices in the overlap/interleaving graph during any optimal sorting scenario. This implies, in particular, an Omega(n3) lower bound for Bergeron's algorithm.  相似文献   

16.
The synthesis of oligodeoxynucleotides is marred by several problems that contribute to the formation of defective molecules. This in turn seriously limits the usefulness of such reagents in DNA diagnostics, molecular cloning, DNA structural analysis and in antisense therapy. In particular, depurination reactions during the cyclical steps of synthesis lead to strand scission during cleavage of the completed molecules from the support. Here we present a remedy to this problem: a novel disiloxyl linkage that connects oligonucleotides to the support withstands reaction conditions that allow the removal of the 5' parts of any depurinated molecules. This ensures that all molecules that preserve the 5' protecting group when cleaved from the support will have both correct 3'- and 5'-ends. We demonstrate the application of the support for synthesis of padlock probe molecules.  相似文献   

17.
Yue Cao  Yang Shen 《Proteins》2020,88(8):1091-1099
Structural information about protein-protein interactions, often missing at the interactome scale, is important for mechanistic understanding of cells and rational discovery of therapeutics. Protein docking provides a computational alternative for such information. However, ranking near-native docked models high among a large number of candidates, often known as the scoring problem, remains a critical challenge. Moreover, estimating model quality, also known as the quality assessment problem, is rarely addressed in protein docking. In this study, the two challenging problems in protein docking are regarded as relative and absolute scoring, respectively, and addressed in one physics-inspired deep learning framework. We represent protein and complex structures as intra- and inter-molecular residue contact graphs with atom-resolution node and edge features. And we propose a novel graph convolutional kernel that aggregates interacting nodes’ features through edges so that generalized interaction energies can be learned directly from 3D data. The resulting energy-based graph convolutional networks (EGCN) with multihead attention are trained to predict intra- and inter-molecular energies, binding affinities, and quality measures (interface RMSD) for encounter complexes. Compared to a state-of-the-art scoring function for model ranking, EGCN significantly improves ranking for a critical assessment of predicted interactions (CAPRI) test set involving homology docking; and is comparable or slightly better for Score_set, a CAPRI benchmark set generated by diverse community-wide docking protocols not known to training data. For Score_set quality assessment, EGCN shows about 27% improvement to our previous efforts. Directly learning from 3D structure data in graph representation, EGCN represents the first successful development of graph convolutional networks for protein docking.  相似文献   

18.
High-throughput data from various omics and sequencing techniques have rendered the automated metabolic network reconstruction a highly relevant problem. Our approach reflects the inherent probabilistic nature of the steps involved in metabolic network reconstruction. Here, the goal is to arrive at networks which combine probabilistic information with the possibility to obtain a small number of disconnected network constituents by reduction of a given preliminary probabilistic metabolic network. We define automated metabolic network reconstruction as an optimization problem on four-partite graph (nodes representing genes, enzymes, reactions, and metabolites) which integrates: (1) probabilistic information obtained from the existing process for metabolic reconstruction from a given genome, (2) connectedness of the raw metabolic network, and (3) clustering of components in the reconstructed metabolic network. The practical implications of our theoretical analysis refer to the quality of reconstructed metabolic networks and shed light on the problem of finding more efficient and effective methods for automated reconstruction. Our main contributions include: a completeness result for the defined problem, polynomial-time approximation algorithm, and an optimal polynomial-time algorithm for trees. Moreover, we exemplify our approach by the reconstruction of the sucrose biosynthesis pathway in Chlamydomonas reinhardtii.  相似文献   

19.
We consider the following problem: Given a set of binary sequences, determine lower bounds on the minimum number of recombinations required to explain the history of the sample, under the infinite-sites model of mutation. The problem has implications for finding recombination hotspots and for the Ancestral Recombination Graph reconstruction problem. Hudson and Kaplan gave a lower bound based on the four-gamete test. In practice, their bound R/sub m/ often greatly underestimates the minimum number of recombinations. The problem was recently revisited by Myers and Griffiths, who introduced two new lower bounds R/sub h/ and R/sub s/ which are provably better, and also yield good bounds in practice. However, the worst-case complexities of their procedures for computing R/sub h/ and R/sub s/ are exponential and super-exponential, respectively. In this paper, we show that the number of nontrivial connected components, R/sub c/, in the conflict graph for a given set of sequences, computable in time 0(nm/sup 2/), is also a lower bound on the minimum number of recombination events. We show that in many cases, R/sub c/ is a better bound than R/sub h/. The conflict graph was used by Gusfield et al. to obtain a polynomial time algorithm for the galled tree problem, which is a special case of the Ancestral Recombination Graph (ARG) reconstruction problem. Our results also offer some insight into the structural properties of this graph and are of interest for the general Ancestral Recombination Graph reconstruction problem.  相似文献   

20.
In this article we consider the problem of determining the minimum cost configuration (number of machines and pallets) for a flexible manufacturing system with the constraint of meeting a prespecified throughput, while simultaneously allocating the total workload among the machines (or groups of machines). Our procedure allows consideration of upper and lower bounds on the workload at each machine group. These bounds arise as a consequence of precedence constraints among the various operations and/or limitations on the number or combinations of operations that can be assigned to a machine because of constraints on tool slots or the space required to store assembly components. Earlier work on problems of this nature assumes that the workload allocation is given. For the single-machine-type problem we develop an efficient implicit enumeration procedure that uses fathoming rules to eliminate dominated configurations, and we present computational results. We discuss how this procedure can be used as a building block in solving the problem with multiple machine types.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号