首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
In this paper we present a branch and bound algorithm for local gapless multiple sequence alignment (motif alignment) and its implementation. The algorithm uses both score-based bounding and a novel bounding technique based on the "consistency" of the alignment. A sequence order independent search tree is used in conjunction with a technique for avoiding redundant calculations inherent in the structure of the tree. This is the first program to exploit the fact that the motif alignment problem is easier for short motifs. Indeed, for a short fixed motif width, the running time of the algorithm is asymptotically linear in the size of the input. We tested the performance of the program on a dataset of 300 E. coli promoter sequences and a dataset of 85 lipocalin protein sequences. For a motif width of 4, the optimal alignment of the entire set of sequences can be found. For the more natural motif width of 6, the program can align 21 sequences of length 100, more than twice the number of sequences which can be aligned by the best previous exact algorithm. The algorithm can relax the constraint of requiring each sequence to be aligned, and align 105 of the 300 promoter sequences with a motif width of 6. For the lipocalin dataset, we introduce a technique for reducing the effective alphabet size with a minimal loss of useful information. With this technique, we show that the program can find meaningful motifs in a reasonable amount of time by optimizing the score over three motif positions.  相似文献   

2.
The problem of multiple surface clustering is a challenging task, particularly when the surfaces intersect. Available methods such as Isomap fail to capture the true shape of the surface near by the intersection and result in incorrect clustering. The Isomap algorithm uses shortest path between points. The main draw back of the shortest path algorithm is due to the lack of curvature constrained where causes to have a path between points on different surfaces. In this paper we tackle this problem by imposing a curvature constraint to the shortest path algorithm used in Isomap. The algorithm chooses several landmark nodes at random and then checks whether there is a curvature constrained path between each landmark node and every other node in the neighborhood graph. We build a binary feature vector for each point where each entry represents the connectivity of that point to a particular landmark. Then the binary feature vectors could be used as a input of conventional clustering algorithm such as hierarchical clustering. We apply our method to simulated and some real datasets and show, it performs comparably to the best methods such as K-manifold and spectral multi-manifold clustering.  相似文献   

3.
Buzas JS  Wager CG  Lansky DM 《Biometrics》2011,67(4):1189-1196
This article explores effective implementation of split-plot designs in serial dilution bioassay using robots. We show that the shortest path for a robot to fill plate wells for a split-plot design is equivalent to the shortest common supersequence problem in combinatorics. We develop an algorithm for finding the shortest common supersequence, provide an R implementation, and explore the distribution of the number of steps required to implement split-plot designs for bioassay through simulation. We also show how to construct collections of split plots that can be filled in a minimal number of steps, thereby demonstrating that split-plot designs can be implemented with nearly the same effort as strip-plot designs. Finally, we provide guidelines for modeling data that result from these designs.  相似文献   

4.
MOTIVATION: Multiple sequence alignment is an important tool in computational biology. In order to solve the task of computing multiple alignments in affordable time, the most commonly used multiple alignment methods have to use heuristics. Nevertheless, the computation of optimal multiple alignments is important in its own right, and it provides a means of evaluating heuristic approaches or serves as a subprocedure of heuristic alignment methods. RESULTS: We present an algorithm that uses the divide-and-conquer alignment approach together with recent results on search space reduction to speed up the computation of multiple sequence alignments. The method is adaptive in that depending on the time one wants to spend on the alignment, a better, up to optimal alignment can be obtained. To speed up the computation in the optimal alignment step, we apply the alpha(*) algorithm which leads to a procedure provably more efficient than previous exact algorithms. We also describe our implementation of the algorithm and present results showing the effectiveness and limitations of the procedure.  相似文献   

5.
Yang J  Chen Y 《PloS one》2011,6(7):e22557
Betweenness centrality is an essential index for analysis of complex networks. However, the calculation of betweenness centrality is quite time-consuming and the fastest known algorithm uses O(N(M + N log N)) time and O(N + M) space for weighted networks, where N and M are the number of nodes and edges in the network, respectively. By inserting virtual nodes into the weighted edges and transforming the shortest path problem into a breadth-first search (BFS) problem, we propose an algorithm that can compute the betweenness centrality in O(wDN2) time for integer-weighted networks, where w is the average weight of edges and D is the average degree in the network. Considerable time can be saved with the proposed algorithm when w < log N/D + 1, indicating that it is suitable for lightly weighted large sparse networks. A similar concept of virtual node transformation can be used to calculate other shortest path based indices such as closeness centrality, graph centrality, stress centrality, and so on. Numerical simulations on various randomly generated networks reveal that it is feasible to use the proposed algorithm in large network analysis.  相似文献   

6.
In the most basic application of Ant Colony Optimization (ACO), a set of artificial ants find the shortest path between a source and a destination. Ants deposit pheromone on paths they take, preferring paths that have more pheromone on them. Since shorter paths are traversed faster, more pheromone accumulates on them in a given time, attracting more ants and leading to reinforcement of the pheromone trail on shorter paths. This is a positive feedback process that can also cause trails to persist on longer paths, even when a shorter path becomes available. To counteract this persistence on a longer path, ACO algorithms employ remedial measures, such as using negative feedback in the form of uniform evaporation on all paths. Obtaining high performance in ACO algorithms typically requires fine tuning several parameters that govern pheromone deposition and removal. This paper proposes a new ACO algorithm, called EigenAnt, for finding the shortest path between a source and a destination, based on selective pheromone removal that occurs only on the path that is actually chosen for each trip. We prove that the shortest path is the only stable equilibrium for EigenAnt, which means that it is maintained for arbitrary initial pheromone concentrations on paths, and even when path lengths change with time. The EigenAnt algorithm uses only two parameters and does not require them to be finely tuned. Simulations that illustrate these properties are provided.  相似文献   

7.
In this study, we address a job sequencing and tool switching problem arising in flexible manufacturing systems. We consider the single machine problem of minimizing total flow time. We prove that the problem is NP-hard in the strong sense and show that the tool switching problem is polynomially solvable for a given sequence. We propose a branch-and-bound algorithm whose efficiency is improved by precedence relations and several lower and upper bounding techniques. Our computational results reveal that the branch and bound approach produces optimal solutions in reasonable times for moderate sized problems. Our upper bounds produce very satisfactory solutions; therefore they can be an attractive alternative to solve larger sized problems.  相似文献   

8.
Several localized position based routing algorithms for wireless networks were described recently. In greedy routing algorithm (that has close performance to the shortest path algorithm, if successful), sender or node S currently holding the message m forwards m to one of its neighbors that is the closest to destination. The algorithm fails if S does not have any neighbor that is closer to destination than S. FACE algorithm guarantees the delivery of m if the network, modeled by unit graph, is connected. GFG algorithm combines greedy and FACE algorithms. Greedy algorithm is applied as long as possible, until delivery or a failure. In case of failure, the algorithm switches to FACE algorithm until a node closer to destination than last failure node is found, at which point greedy algorithm is applied again. Past traffic does not need to be memorized at nodes. In this paper we further improve the performance of GFG algorithm, by reducing its average hop count. First we improve the FACE algorithm by adding a sooner-back procedure for earlier escape from FACE mode. Then we perform a shortcut procedure at each forwarding node S. Node S uses the local information available to calculate as many hops as possible and forwards the packet to the last known hop directly instead of forwarding it to the next hop. The second improvement is based on the concept of dominating sets. Each node in the network is classified as internal or not, based on geographic position of its neighboring nodes. The network of internal nodes defines a connected dominating set, i.e., and each node must be either internal or directly connected to an internal node. In addition, internal nodes are connected. We apply several existing definitions of internal nodes, namely the concepts of intermediate, inter-gateway and gateway nodes. We propose to run GFG routing, enhanced by shortcut procedure, on the dominating set, except possibly the first and last hops. The performance of proposed algorithms is measured by comparing its average hop count with hop count of the basic GFG algorithm and the benchmark shortest path algorithm, and very significant improvements were obtained for low degree graphs. More precisely, we obtained localized routing algorithm that guarantees delivery and has very low excess in terms of hop count compared to the shortest path algorithm. The experimental data show that the length of additional path (in excess of shortest path length) can be reduced to about half of that of existing GFG algorithm.  相似文献   

9.
Metabolic pathway analysis web service (Pathway Hunter Tool at CUBIC)   总被引:1,自引:0,他引:1  
MOTIVATION: Pathway Hunter Tool (PHT), is a fast, robust and user-friendly tool to analyse the shortest paths in metabolic pathways. The user can perform shortest path analysis for one or more organisms or can build virtual organisms (networks) using enzymes. Using PHT, the user can also calculate the average shortest path (Jungnickel, 2002 Graphs, Network and Algorithm. Springer-Verlag, Berlin), average alternate path and the top 10 hubs in the metabolic network. The comparative study of metabolic connectivity and observing the cross talk between metabolic pathways among various sequenced genomes is possible. RESULTS: A new algorithm for finding the biochemically valid connectivity between metabolites in a metabolic network was developed and implemented. A predefined manual assignment of side metabolites (like ATP, ADP, water, CO(2) etc.) and main metabolites is not necessary as the new concept uses chemical structure information (global and local similarity) between metabolites for identification of the shortest path.  相似文献   

10.
We have parallelized the FASTA algorithm for biological sequencecomparison using Linda, a machine-independent parallel programminglanguage. The resulting parallel program runs on a variety ofdifferent parallel machines. A straightforward parallelizationstrategy works well if the amount of computation to be doneis relatively large. When the amount of computation is reduced,however, disk I/O becomes a bottleneck which may prevent additionalspeed-up as the number of processors is increased. The paperdescribes the parallelization of FASTA, and uses FASTA to illustratethe I/O bottleneck problem that may arise when performing paralleldatabase search with a fast sequence comparison algorithm. Thepaper also describes several program design strategies thatcan help with this problem. The paper discusses how this bottleneckis an example of a general problem that may occur when parallelizing,or otherwise speeding up, a time-consuming computation. Received on July 25, 1990; accepted on October 15, 1990  相似文献   

11.
一个新的核酸序列比对算法及其在序列全局比对中的应用   总被引:1,自引:0,他引:1  
目前在序列比对中所广泛使用的动态规划算法,虽然能达到最优比对结果,但却由于具有高计算复杂度O(N_2)而极大地降低了计算效率。将多阶段动态规划决策算法用于两两序列比对并用Visual BASIC编程实现,结果发现该新算法在将计算复杂度减小到O(N)的同时,也能够获得较为理想的计算精度,预期将在序列全局比对中起重要作用。  相似文献   

12.
BACKGROUND: Several deterministic and stochastic combinatorial optimization algorithms have been applied to computational protein design and homology modeling. As structural targets increase in size, however, it has become necessary to find more powerful methods to address the increased combinatorial complexity. RESULTS: We present a new deterministic combinatorial search algorithm called 'Branch-and-Terminate' (B&T), which is derived from the Branch-and-Bound search method. The B&T approach is based on the construction of an efficient but very restrictive bounding expression, which is used for the search of a combinatorial tree representing the protein system. The bounding expression is used both to determine the optimal organization of the tree and to perform a highly effective pruning procedure named 'termination'. For some calculations, the B&T method rivals the current deterministic standard, dead-end elimination (DEE), sometimes finding the solution up to 21 times faster. A more significant feature of the B&T algorithm is that it can provide an efficient way to complete the optimization of problems that have been partially reduced by a DEE algorithm. CONCLUSIONS: The B&T algorithm is an effective optimization algorithm when used alone. Moreover, it can increase the problem size limit of amino acid sidechain placement calculations, such as protein design, by completing DEE optimizations that reach a point at which the DEE criteria become inefficient. Together the two algorithms make it possible to find solutions to problems that are intractable by either algorithm alone.  相似文献   

13.
Photographs of mistletoe (Viscum album L.) berries taken by a permanently fixed camera during their development in autumn were subjected to an outline shape analysis by fitting path curves using a mathematical algorithm from projective geometry. During growth and maturation processes the shape of mistletoe berries can be described by a set of such path curves, making it possible to extract changes of shape using one parameter called Lambda. Lambda describes the outline shape of a path curve. Here we present methods and software to capture and measure these changes of form over time. The present paper describes the software used to automatize a number of tasks including contour recognition, optimization of fitting the contour via hill-climbing, derivation of the path curves, computation of Lambda and blinding the pictures for the operator. The validity of the program is demonstrated by results from three independent measurements showing circadian rhythm in mistletoe berries. The program is available as open source and will be applied in a project to analyze the chronobiology of shape in mistletoe berries and the buds of their host trees.  相似文献   

14.
MOTIVATION: Algorithm development for finding typical patterns in sequences, especially multiple pseudo-repeats (pseudo-periodic regions), is at the core of many problems arising in biological sequence and structure analysis. In fact, one of the most significant features of biological sequences is their high quasi-repetitiveness. Variation in the quasi-repetitiveness of genomic and proteomic texts demonstrates the presence and density of different biologically important information. It is very important to develop sensitive automatic computational methods for the identification of pseudo-periodic regions of sequences through which we can infer, describe and understand biological properties, and seek precise molecular details of biological structures, dynamics, interactions and evolution. RESULTS: We develop a novel, powerful computational tool for partitioning a sequence to pseudo-periodic regions. The pseudo-periodic partition is defined as a partition, which intuitively has the minimal bias to some perfect-periodic partition of the sequence based on the evolutionary distance. We devise a quadratic time and space algorithm for detecting a pseudo-periodic partition for a given sequence, which actually corresponds to the shortest path in the main diagonal of the directed (acyclic) weighted graph constructed by the Smith-Waterman self-alignment of the sequence. We use several typical examples to demonstrate the utilization of our algorithm and software system in detecting functional or structural domains and regions of proteins. A big advantage of our software program is that there is a parameter, the granularity factor, associated with it and we can freely choose a biological sequence family as a training set to determine the best parameter. In general, we choose all repeats (including many pseudo-repeats) in the SWISS-PROT amino acid sequence database as a typical training set. We show that the granularity factor is 0.52 and the average agreement accuracy of pseudo-periodic partitions, detected by our software for all pseudo-repeats in the SWISS-PROT database, is as high as 97.6%.  相似文献   

15.
The proliferation of cloud data center applications and network function virtualization (NFV) boosts dynamic and QoS dependent traffic into the data centers network. Currently, lots of network routing protocols are requirement agnostic, while other QoS-aware protocols are computationally complex and inefficient for small flows. In this paper, a computationally efficient congestion avoidance scheme, called CECT, for software-defined cloud data centers is proposed. The proposed algorithm, CECT, not only minimizes network congestion but also reallocates the resources based on the flow requirements. To this end, we use a routing architecture to reconfigure the network resources triggered by two events: (1) the elapsing of a predefined time interval, or, (2) the occurrence of congestion. Moreover, a forwarding table entries compression technique is used to reduce the computational complexity of CECT. In this way, we mathematically formulate an optimization problem and define a genetic algorithm to solve the proposed optimization problem. We test the proposed algorithm on real-world network traffic. Our results show that CECT is computationally fast and the solution is feasible in all cases. In order to evaluate our algorithm in term of throughput, CECT is compared with ECMP (where the shortest path algorithm is used as the cost function). Simulation results confirm that the throughput obtained by running CECT is improved up to 3× compared to ECMP while packet loss is decreased up to 2×.  相似文献   

16.
17.
An artificial neural network with a two-layer feedback topology and generalized recurrent neurons, for solving nonlinear discrete dynamic optimization problems, is developed. A direct method to assign the weights of neural networks is presented. The method is based on Bellmann's Optimality Principle and on the interchange of information which occurs during the synaptic chemical processing among neurons. The neural network based algorithm is an advantageous approach for dynamic programming due to the inherent parallelism of the neural networks; further it reduces the severity of computational problems that can occur in methods like conventional methods. Some illustrative application examples are presented to show how this approach works out including the shortest path and fuzzy decision making problems.  相似文献   

18.
We propose new algorithms for computing pairwise rearrangement scenarios that conserve the combinatorial structure of genomes. More precisely, we investigate the problem of sorting signed permutations by reversals without breaking common intervals. We describe a combinatorial framework for this problem that allows us to characterize classes of signed permutations for which one can compute, in polynomial time, a shortest reversal scenario that conserves all common intervals. In particular, we define a class of permutations for which this computation can be done in linear time with a very simple algorithm that does not rely on the classical Hannenhalli-Pevzner theory for sorting by reversals. We apply these methods to the computation of rearrangement scenarios between permutations obtained from 16 synteny blocks of the X chromosomes of the human, mouse, and rat  相似文献   

19.
Tree structures are useful for describing and analyzing biological objects and processes. Consequently, there is a need to design metrics and algorithms to compare trees. A natural comparison metric is the "Tree Edit Distance," the number of simple edit (insert/delete) operations needed to transform one tree into the other. Rooted-ordered trees, where the order between the siblings is significant, can be compared in polynomial time. Rooted-unordered trees are used to describe processes or objects where the topology, rather than the order or the identity of each node, is important. For example, in immunology, rooted-unordered trees describe the process of immunoglobulin (antibody) gene diversification in the germinal center over time. Comparing such trees has been proven to be a difficult computational problem that belongs to the set of NP-Complete problems. Comparing two trees can be viewed as a search problem in graphs. A* is a search algorithm that explores the search space in an efficient order. Using a good lower bound estimation of the degree of difference between the two trees, A* can reduce search time dramatically. We have designed and implemented a variant of the A* search algorithm suitable for calculating tree edit distance. We show here that A* is able to perform an edit distance measurement in reasonable time for trees with dozens of nodes.  相似文献   

20.
The determination of the secondary structure topology is a critical step in deriving the atomic structure from the protein density map obtained from electron cryo-microscopy technique. This step often relies on the matching of two sources of information. One source comes from the secondary structures detected from the protein density map at the medium resolution, such as 5-10 ?. The other source comes from the predicted secondary structures from the amino acid sequence. Due to the inaccuracy in either source of information, a pool of possible secondary structure positions needs to be sampled. This paper studies the question, that is, how to reduce the computation of the mapping when the inaccuracy of the secondary structure predictions is considered. We present a method that combines the concept of dynamic graph with our previous work of using constrained shortest path to identify the topology of the secondary structures. We show a reduction of 34.55% of run-time as comparison to the na?ve way of handling the inaccuracies. We also show an improved accuracy when the potential secondary structure errors are explicitly sampled verses the use of one consensus prediction. Our framework demonstrated the potential of developing computationally effective exact algorithms to identify the optimal topology of the secondary structures when the inaccuracy of the predicted data is considered.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号