首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A birth-death process is a continuous-time Markov chain that counts the number of particles in a system over time. In the general process with n current particles, a new particle is born with instantaneous rate λ(n) and a particle dies with instantaneous rate μ(n). Currently no robust and efficient method exists to evaluate the finite-time transition probabilities in a general birth-death process with arbitrary birth and death rates. In this paper, we first revisit the theory of continued fractions to obtain expressions for the Laplace transforms of these transition probabilities and make explicit an important derivation connecting transition probabilities and continued fractions. We then develop an efficient algorithm for computing these probabilities that analyzes the error associated with approximations in the method. We demonstrate that this error-controlled method agrees with known solutions and outperforms previous approaches to computing these probabilities. Finally, we apply our novel method to several important problems in ecology, evolution, and genetics.  相似文献   

2.
Intermodal networks offer much flexibility in transport planning, and have the potential to efficiently consolidate goods, even if these goods have distinct pickup locations and destinations. Typically, there is an abundant amount of feasible routes and consolidation opportunities, which makes it challenging to quickly identify good solutions. We propose a planning algorithm for dynamic pickup- and delivery problems in intermodal networks, where freight is consolidated by means of reloads to reduce both costs and emissions. Based on an enumerative arc-expansion procedure, a large number of intermodal routes is generated for each order, of which we store the k best. We subsequently evaluate consolidation opportunities for the k best routes by applying a decision tree structure, taking into account reload operations, timetables, and synchronization of departure windows. Compared to direct road transport, numerical experiments on various virtual problem instances show an average cost saving of 34 %, and an average reduction in \(CO_2\) emissions of 30 %. Furthermore, we test our algorithm on a real-life case of a leading logistics service provider based in the Netherlands, which yields significant benefits as well, both in terms of costs and environmental impact.  相似文献   

3.
In comparative genomics, algorithms that sort permutations by reversals are often used to propose evolutionary scenarios of rearrangements between species. One of the main problems of such methods is that they give one solution while the number of optimal solutions is huge, with no criteria to discriminate among them. Bergeron et al. started to give some structure to the set of optimal solutions, in order to be able to deliver more presentable results than only one solution or a complete list of all solutions. However, no algorithm exists so far to compute this structure except through the enumeration of all solutions, which takes too much time even for small permutations. Bergeron et al. state as an open problem the design of such an algorithm. We propose in this paper an answer to this problem, that is, an algorithm which gives all the classes of solutions and counts the number of solutions in each class, with a better theoretical and practical complexity than the complete enumeration method. We give an example of how to reduce the number of classes obtained, using further constraints. Finally, we apply our algorithm to analyse the possible scenarios of rearrangement between mammalian sex chromosomes.  相似文献   

4.
K-ary clustering with optimal leaf ordering for gene expression data   总被引:2,自引:0,他引:2  
MOTIVATION: A major challenge in gene expression analysis is effective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at the same time providing a global view of the expression data. However, hierarchical clustering is very sensitive to noise, it usually lacks of a method to actually identify distinct clusters, and produces a large number of possible leaf orderings of the hierarchical clustering tree. In this paper we propose a new hierarchical clustering algorithm which reduces susceptibility to noise, permits up to k siblings to be directly related, and provides a single optimal order for the resulting tree. RESULTS: We present an algorithm that efficiently constructs a k-ary tree, where each node can have up to k children, and then optimally orders the leaves of that tree. By combining k clusters at each step our algorithm becomes more robust against noise and missing values. By optimally ordering the leaves of the resulting tree we maintain the pairwise relationships that appear in the original method, without sacrificing the robustness. Our k-ary construction algorithm runs in O(n(3)) regardless of k and our ordering algorithm runs in O(4(k)n(3)). We present several examples that show that our k-ary clustering algorithm achieves results that are superior to the binary tree results in both global presentation and cluster identification. AVAILABILITY: We have implemented the above algorithms in C++ on the Linux operating system.  相似文献   

5.
Multiple loci analysis has become popular with the advanced developments in biological experiments. A lot of studies have been focused on the biological and the statistical properties of such multiple loci analysis. In this paper, we study one of the important computational problems: solving the probabilities of haplotype classes from a large linear system Ax = b derived from the recombination events in multiple loci analysis. Since the size of the recombination matrix A increases exponentially with respect to the number of loci, fast solvers are required to deal with a large number of loci in the analysis. By exploiting the nice structure of the matrix A, we develop an efficient recursive algorithm for solving such structured linear systems. In particular, the complexity of the proposed algorithm for the n loci problem is of O(n2(n)) operations and the memory requirement is of O(2(n)) locations for the 2(n)-by-2(n) matrix A. Numerical examples are given to demonstrate the effectiveness of our efficient solver. Finally, we apply our proposed method to analyze the haplotype classes for a set of single nucleotides polymorphisms (SNPs) from Hapmap data.  相似文献   

6.
Important desired properties of an algorithm to construct a supertree (species tree) by reconciling input trees are its low complexity and applicability to large biological data. In its common statement the problem is proved to be NP-hard, i.e. to have an exponential complexity in practice. We propose a reformulation of the supertree building problem that allows a computationally effective solution. We introduce a biologically natural requirement that the supertree is sought for such that it does not contain clades incompatible with those existing in the input trees. The algorithm was tested with simulated and biological trees and was shown to possess an almost square complexity even if horizontal transfers are allowed. If HGTs are not assumed, the algorithm is mathematically correct and possesses the longest running time of n3 x[V0]3, where n is the number of input trees and [V0] is the total number of species. The authors are unaware of analogous solutions in published evidence. The corresponding inferring program, its usage examples and manual are freely available at http://lab6.iitp.ru/en/super3gl. The available program does not implement HGTs. The generalized case is described in the publication "A tree nearest in average to a set of trees" (Information Transmission Problems, 2011).  相似文献   

7.
Carlborg O  Andersson L  Kinghorn B 《Genetics》2000,155(4):2003-2010
Here we describe a general method for improving computational efficiency in simultaneous mapping of multiple interacting quantitative trait loci (QTL). The method uses a genetic algorithm to search for QTL in the genome instead of an exhaustive enumerative ("step-by-step") search. It can be used together with any method of QTL mapping based on a genomic search, since it only provides a more efficient way to search the genome for QTL. The computational demand decreases by a factor of approximately 130 when using genetic algorithm-based mapping instead of an exhaustive enumerative search for two QTL in a genome size of 2000 cM using a resolution of 1 cM. The advantage of using a genetic algorithm increases further for larger genomes, higher resolutions, and searches for more QTL. We show that a genetic algorithm-based search has efficiency higher than or equal to a search method conditioned on previously identified QTL for all epistatic models tested and that this efficiency is comparable to that of an exhaustive search for multiple QTL. The genetic algorithm is thus a powerful and computationally tractable alternative to the exhaustive enumerative search for simultaneous mapping of multiple interacting QTL. The use of genetic algorithms for simultaneous mapping of more than two QTL and for determining empirical significance thresholds using permutation tests is also discussed.  相似文献   

8.
Micro flow bio-molecular computation   总被引:1,自引:0,他引:1  
Gehani A  Reif J 《Bio Systems》1999,52(1-3):197-216
In this paper we provide a model for micro-flow based bio-molecular computation (MF-BMC). It provides an abstraction for the design of algorithms which account for the constraints of the model. Our MF-BMC model uses abstractions of both the recombinant DNA (RDNA) technology as well as of the micro-flow technology and takes into account both of their limitations. For example, when considering the efficiency of the recombinant DNA operation of annealing, we take into account the limitation imposed by the concentration of the reactants. The fabrication technology used to construct MEMS is limited to constructing relatively thin 3D structures. We abstract this by limiting the model to a small constant number of layers (as is done with VLSI models). Besides our contribution of the MF-BMC model, the paper contains two other classes of results. The main result is the volume and time efficient algorithm for message routing in the MF-BMC model, specifically useful for PA-Match. We will show that routing of strands between chambers will occur in time O(N x D/ m x n), where N is the number of strands in the MF-BMC, n is the number of chambers where RDNA operations are occurring, D is the diameter of the topology of the layout of the chambers, and m is proportional to the channel width. Operations that need annealing, such as PA-Match, are shown feasible in O(N2logN/n/n) volume instead of the previous use of omega(N2) volume, with reasonable time constraints. Applications of the volume efficient algorithm include the use of the Join operation for databases, logarithmic depth solutions to SAT (Boolean formula satisfiability) problems and parallel algorithms that execute on a PRAM. Existent algorithms can be mapped to ones that work efficiently in the MF-BMC model, whereas previous methods for applications such as PRAM simulation in BMC were not both time and volume efficient. Our other class of results are theoretical lower bounds on the quantities of DNA and the time needed to solve a problem in the MF-BMC model, analogous to lower bounds in VLSI. We bound the product BT from below, and further show that BT2 has a stronger lower bound of I2. Here B is the maximum amount of information encoded in the MF-BMC system at a time. T is the time for an algorithm to complete, and I is the information content of a problem.  相似文献   

9.
MOTIVATION: Existing algorithms for automated protein structure alignment generate contradictory results and are difficult to interpret. An algorithm which can provide a context for interpreting the alignment and uses a simple method to characterize protein structure similarity is needed. RESULTS: We describe a heuristic for limiting the search space for structure alignment comparisons between two proteins, and an algorithm for finding minimal root-mean-squared-distance (RMSD) alignments as a function of the number of matching residue pairs within this limited search space. Our alignment algorithm uses coordinates of alpha-carbon atoms to represent each amino acid residue and requires a total computation time of O(m(3) n(2)), where m and n denote the lengths of the protein sequences. This makes our method fast enough for comparisons of moderate-size proteins (fewer than approximately 800 residues) on current workstation-class computers and therefore addresses the need for a systematic analysis of multiple plausible shape similarities between two proteins using a widely accepted comparison metric.  相似文献   

10.
A simple spectrophotometric determination of solid supported amino groups   总被引:1,自引:0,他引:1  
A simple spectrophotometric method for the quantitative determination of solid phase supported amino groups is described. The method involves reacting the solid support with an excess of activated acylating agent, N-succinimidyl-3-(2-pyridyldithio)propionate (SPDP) and an efficient acylation catalyst, 4-dimethylaminopyridine, and after thoroughly removing the unreacted SPDP, the solid support is reacted with an excess of dithiothreitol to quantitatively release pyridine-2-thione from the solid support to the solution. After an appropriate dilution, the released pyridine-2-thione which has a strong absorbance at 343 nm, is quantified by reading its absorbance in a spectrophotometer at 343 nm.  相似文献   

11.
We present two parameterized algorithms for the closest string problem. The first runs in O(nL + nd · 17.97d) time for DNA strings and in O(nL + nd · 61.86d) time for protein strings, where n is the number of input strings, L is the length of each input string, and d is the given upper bound on the number of mismatches between the center string and each input string. The second runs in O(nL + nd · 13.92d) time for DNA strings and in O(nL + nd · 47.21d) time for protein strings. We then extend the first algorithm to a new parameterized algorithm for the closest substring problem that runs in O((n - 1)m2(L + d · 17.97d · m[log2(d+1)])) time for DNA strings and in O((n - 1)m2(L + d · 61.86d · m[log2(d+1)])) time for protein strings, where n is the number of input strings, L is the length of the center substring, L - 1 + m is the maximum length of a single input string, and d is the given upper bound on the number of mismatches between the center substring and at least one substring of each input string. All the algorithms significantly improve the previous bests. To verify experimentally the theoretical improvements in the time complexity, we implement our algorithm in C and apply the resulting program to the planted (L, d)-motif problem proposed by Pevzner and Sze in 2000. We compare our program with the previously best exact program for the problem, namely PMSPrune (designed by Davila et al. in 2007). Our experimental data show that our program runs faster for practical cases and also for several challenging cases. Our algorithm uses less memory too.  相似文献   

12.
本文给出了一个利用已知能量数据构成具有最小自由能的单链RNA分子二级结构的计算机算法,并给出了此算法的可行性证明和应用实例。  相似文献   

13.
We present the first practical algorithm for the optimal linear leaf ordering of trees that are generated by hierarchical clustering. Hierarchical clustering has been extensively used to analyze gene expression data, and we show how optimal leaf ordering can reveal biological structure that is not observed with an existing heuristic ordering method. For a tree with n leaves, there are 2(n-1) linear orderings consistent with the structure of the tree. Our optimal leaf ordering algorithm runs in time O(n(4)), and we present further improvements that make the running time of our algorithm practical.  相似文献   

14.
Wang X  Bao Z  Hu J  Wang S  Zhan A 《Bio Systems》2008,91(1):117-125
A new DNA computing algorithm based on a ligase chain reaction is demonstrated to solve an SAT problem. The proposed DNA algorithm can solve an n-variable m-clause SAT problem in m steps and the computation time required is O (3m+n). Instead of generating the full-solution DNA library, we start with an empty test tube and then generate solutions that partially satisfy the SAT formula. These partial solutions are then extended step by step by the ligation of new variables using Taq DNA ligase. Correct strands are amplified and false strands are pruned by a ligase chain reaction (LCR) as soon as they fail to satisfy the conditions. If we score and sort the clauses, we can use this algorithm to markedly reduce the number of DNA strands required throughout the computing process. In a computer simulation, the maximum number of DNA strands required was 2(0.48n) when n=50, and the exponent ratio varied inversely with the number of variables n and the clause/variable ratio m/n. This algorithm is highly space-efficient and error-tolerant compared to conventional brute-force searching, and thus can be scaled-up to solve large and hard SAT problems.  相似文献   

15.
We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings, which is intrinsically related to information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. In principle, the distance of two l long sequences can be calculated in O(l) time. We implemented the algorithm using suffix arrays our implementation is fast enough to enable the construction of the proteome phylogenomic tree for hundreds of species and the genome phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic and taxonomic truth." To assess our approach, our results were compared to the traditional (single-gene or protein-based) maximum likelihood method. The obtained trees were compared to implementations of a number of alternative approaches, including two that were previously published in the literature, and to the published results of a third approach. Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin. The simplicity and speed of our method allows for a whole genome analysis with the greatest scope attempted so far. We describe here five different applications of the method, which not only show the validity of the method, but also suggest a number of novel phylogenetic insights.  相似文献   

16.
An algorithm for approximate tandem repeats.   总被引:4,自引:0,他引:4  
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g., abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g., abcdaacd. In this paper we consider two criterions of similarity: the Hamming distance (k mismatches) and the edit distance (k differences). For a string S of length n and an integer k our algorithm reports all locally optimal approximate repeats, r = umacro ?, for which the Hamming distance of umacro and ? is at most k, in O(nk log (n/k)) time, or all those for which the edit distance of umacro and ? is at most k, in O(nk log k log (n/k)) time. This paper concentrates on a more general type of repeat called multiple tandem repeats. A multiple tandem repeat in a sequence S is a (periodic) substring r of S of the form r = u(a)u', where u is a prefix of r and u' is a prefix of u. An approximate multiple tandem repeat is a multiple repeat with errors; the repeated subsequences are similar but not identical. We precisely define approximate multiple repeats, and present an algorithm that finds all repeats that concur with our definition. The time complexity of the algorithm, when searching for repeats with up to k errors in a string S of length n, is O(nka log (n/k)) where a is the maximum number of periods in any reported repeat. We present some experimental results concerning the performance and sensitivity of our algorithm. The problem of finding repeats within a string is a computational problem with important applications in the field of molecular biology. Both exact and inexact repeats occur frequently in the genome, and certain repeats occurring in the genome are known to be related to diseases in the human.  相似文献   

17.
酶标免疫测定法(ELISA)中最关键的化合物是酶-抗体结合物,将酶和抗体交联起来需用交联剂。本文作者使用了N-琥珀酰亚胺基3-(2-吡啶基二硫)丙酸酯(简称SPDP)将辣根过氧化物酶(HRP)和兔抗小鼠IgG(兔IgG)交联起来。我们试验了SPDP/HRP,SPDP/IgG和HRP/IgG的不同比例,以期获得活性高的酶-抗体结合物。此外还研究了从结合物中去除自由HRP和自由IgG的方法。用SDS-PAGE及硝酸纤维膜电泳转移法证明本法制备的结合物不含HRP及IgG的自身聚合物。用ELISA法鉴定结合物制品时,一般稀释度可达到1:10,000以上,有的可达到1:20,000(当结合物浓度A_(280nm)=1.0,底物显色A_(492nm)=1.0时)。  相似文献   

18.
Genome rearrangement is an important area in computational biology and bioinformatics. The translocation operation is one of the popular operations for genome rearrangement. It was proved that computing the unsigned translocation distance is NP-hard. In this paper, we present a (1.5 + epsilon)-approximation algorithm for computing unsigned translocation distance which improves upon the best known 1.75-ratio. The running time of our algorithm is O(n2 + (4/epsilon)1.5 square root log(4/epsilon )2(4/epsilon), where n is the total number of genes in the genome.  相似文献   

19.
We make a novel contribution to the theory of biopolymer folding, by developing an efficient algorithm to compute the number of locally optimal secondary structures of an RNA molecule, with respect to the Nussinov-Jacobson energy model. Additionally, we apply our algorithm to analyze the folding landscape of selenocysteine insertion sequence (SECIS) elements from A. Bock (personal communication), hammerhead ribozymes from Rfam (Griffiths-Jones et al., 2003), and tRNAs from Sprinzl's database (Sprinzl et al., 1998). It had previously been reported that tRNA has lower minimum free energy than random RNA of the same compositional frequency (Clote et al., 2003; Rivas and Eddy, 2000), although the situation is less clear for mRNA (Seffens and Digby, 1999; Workman and Krogh, 1999; Cohen and Skienna, 2002),(1) which plays no structural role. Applications of our algorithm extend knowledge of the energy landscape differences between naturally occurring and random RNA. Given an RNA molecule a(1), ... , a(n) and an integer k > or = 0, a k-locally optimal secondary structure S is a secondary structure on a(1), ... , a(n) which has k fewer base pairs than the maximum possible number, yet for which no basepairs can be added without violation of the definition of secondary structure (e.g., introducing a pseudoknot). Despite the fact that the number numStr(k) of k-locally optimal structures for a given RNA molecule in general is exponential in n, we present an algorithm running in time O(n (4)) and space O(n (3)), which computes numStr(k) for each k. Structurally important RNA, such as SECIS elements, hammerhead ribozymes, and tRNA, all have a markedly smaller number of k-locally optimal structures than that of random RNA of the same dinucleotide frequency, for small and moderate values of k. This suggests a potential future role of our algorithm as a tool to detect noncoding RNA genes.  相似文献   

20.
Locality is an important and well-studied notion in comparative analysis of biological sequences. Similarly, taking into account affine gap penalties when calculating biological sequence alignments is a well-accepted technique for obtaining better alignments. When dealing with RNA, one has to take into consideration not only sequential features, but also structural features of the inspected molecule. This makes the computation more challenging, and usually prohibits the comparison only to small RNAs. In this paper we introduce two local metrics for comparing RNAs that extend the Smith-Waterman metric and its normalized version used for string comparison. We also present a global RNA alignment algorithm which handles affine gap penalties. Our global algorithm runs in O(m(2)n(1 + lg n/m)) time, while our local algorithms run in O(m(2)n(1 + lg n/m)) and O(n(2)m) time, respectively, where m 相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号