首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data.

Results

We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars.

Conclusions

The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms.  相似文献   

2.
Similarity problems intensively investigated in computational molecular biology have the following two stringology models: find the longest string included in any string of a given finite language, and find the shortest string including every string of a given finite language. These two problems are exemplified by the two well-known pairs of problems, the longest common subsequence (or substring) problem and the shortest common supersequence (or superstring) problem. interpretations.

In this paper we consider opposite problems connected with string non-inclusion relations: find the shortest string included in no string of a given finite language and find the longest string including no string of a given finite language. The predicate “string is not included in string β” is interpreted either as “ is not a subsequence of β” or as “ is not a substring of β”. The main purpose is to determine the complexity status of the non-similarity problems. Using graph approaches, we present NP-hardness proofs for the first interpretation and polynomial-time algorithms for the second one. Special cases of the problems, and related issues are discussed.  相似文献   


3.
The properties (or labels) of nodes in networks can often be predicted based on their proximity and their connections to other labeled nodes. So-called “label propagation algorithms” predict the labels of unlabeled nodes by propagating information about local label density iteratively through the network. These algorithms are fast, simple and scale to large networks but nonetheless regularly perform better than slower and much more complex algorithms on benchmark problems. We show here, however, that these algorithms have an intrinsic limitation that prevents them from adapting to some common patterns of network node labeling; we introduce a new algorithm, 3Prop, that retains all their advantages but is much more adaptive. As we show, 3Prop performs very well on node labeling problems ill-suited to label propagation, including predicting gene function in protein and genetic interaction networks and gender in friendship networks, and also performs slightly better on problems already well-suited to label propagation such as labeling blogs and patents based on their citation networks. 3Prop gains its adaptability by assigning separate weights to label information from different steps of the propagation. Surprisingly, we found that for many networks, the third iteration of label propagation receives a negative weight.

Availability

The code is available from the authors by request.  相似文献   

4.
Molecular biologists strive to infer evolutionary relationships from quantitative macromolecular comparisons obtained by immunological, DNA hybridization, electrophoretic or amino acid sequencing techniques. The problem is to find unrooted phylogenies that best approximate a given dissimilarity matrix according to a goodness-of-fit measure, for example the least-squares-fit criterion or Farris'sf statistic. Computational costs of known algorithms guaranteeing optimal solutions to these problems increase exponentially with problem size; practical computational considerations limit the algorithms to analyzing small problems. It is established here that problems of phylogenetic inference based on the least-squares-fit criterion and thef statistic are NP-complete and thus are so difficult computationally that efficient optimal algorithms are unlikely to exist for them. The Natural Sciences and Engineering Research Council of Canada partially supported this research through an individual operating grant (A4142) to W.H.E. Day.  相似文献   

5.
The potential effectiveness of statistical haplotype inference makes it an area of active exploration over the last decade. There are several complications of statistical inference, including: the same algorithm can produce different solutions for the same data set, which reflects the internal algorithm variability; different algorithms can give different solutions for the same data set, reflecting the discordance among algorithms; and the algorithms per se are unable to evaluate the reliability of the solutions even if they are unique, this being a general limitation of all inference methods. With the aim of increasing the confidence of statistical inference results, consensus strategy appears to be an effective means to deal with these problems. Several authors have explored this with different emphases. Here we discuss two recent studies examining the internal algorithm variability and among-algorithm discordance, respectively, and evaluate the different outcomes of these analyses, in light of Orzack (2009) comment. Until other, better methods are developed, a combination of these two approaches should provide a practical way to increase the confidence of statistical haplotyping results.  相似文献   

6.
《Genomics》2020,112(5):3207-3217
Cancer subtype stratification, which may help to make a better decision in treating cancerous patients, is one of the most crucial and challenging problems in cancer studies. To this end, various computational methods such as Feature selection, which enhances the accuracy of the classification and is an NP-Hard problem, have been proposed. However, the performance of the applied methods is still low and can be increased by the state-of-the-art and efficient methods. We used 11 efficient and popular meta-heuristic algorithms including WCC, LCA, GA, PSO, ACO, ICA, LA, HTS, FOA, DSOS and CUK along with SVM classifier to stratify human breast cancer molecular subtypes using mRNA and micro-RNA expression data. The applied algorithms select 186 mRNAs and 116 miRNAs out of 9692 mRNAs and 489 miRNAs, respectively. Although some of the selected mRNAs and miRNAs are common in different algorithms results, six miRNAs including miR-190b, miR-18a, miR-301a, miR-34c-5p, miR-18b, and miR-129-5p were selected by equal or more than three different algorithms. Further, six mRNAs, including HAUS6, LAMA2, TSPAN33, PLEKHM3, GFRA3, and DCBLD2, were chosen through two different algorithms. We have reported these miRNAs and mRNAs as important diagnostic biomarkers to the stratification of breast cancer subtypes. By investigating the literature, it is also observed that most of our reported mRNAs and miRNAs have been proposed and introduced as biomarkers in cancer subtypes stratification.  相似文献   

7.
Pedigree data structures have a number of applications in genetics, including the estimation of allelic or haplotype probabilities in humans and agricultural species, and the estimation of breeding values in agricultural species. Sequential algorithms for general purpose CPU-based computers are commonly used, but are inadequate for some tasks on large data sets. We show that pedigree data can be directly represented on Field Programmable Gate Arrays (FPGA), allowing highly efficient massively parallel simulation of the flow of genes. Operating on the whole pedigree in parallel, the transmission of genes can occur for all individuals in a single clock cycle. By using FPGA, the algorithms to estimate inbreeding coefficients and allelic probabilities are shown to operate hundreds to thousands of times faster than the corresponding sequentially based algorithms. Where problems can be largely represented in an integer form, FPGA provide an efficient platform for computations on pedigree data.  相似文献   

8.

Background  

Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data.  相似文献   

9.
Many swarm optimization algorithms have been introduced since the early 60’s, Evolutionary Programming to the most recent, Grey Wolf Optimization. All of these algorithms have demonstrated their potential to solve many optimization problems. This paper provides an in-depth survey of well-known optimization algorithms. Selected algorithms are briefly explained and compared with each other comprehensively through experiments conducted using thirty well-known benchmark functions. Their advantages and disadvantages are also discussed. A number of statistical tests are then carried out to determine the significant performances. The results indicate the overall advantage of Differential Evolution (DE) and is closely followed by Particle Swarm Optimization (PSO), compared with other considered approaches.  相似文献   

10.
This article is a detailed case study of a particular FMS that will be operational in 1989. It describes the daily planning and operating problems that will need to be addressed. The algorithms that will operate this system are presented. Given the daily changing production requirements, the algorithms begin with an aggregate planning feasibility check. Then planning, scheduling, inventory management, and breakdowns are addressed. The key problems in operating this system are tool management problems. Detailed tooling data and their analysis are presented in an appendix to address these problems.  相似文献   

11.
The organization of order picking operations is one of the most critical issues in warehouse management. In this paper, novel tabu search (TS) algorithms integrated with a novel clustering algorithm are proposed to solve the order batching and picker routing problems jointly for multiple-cross-aisle warehouse systems. A clustering algorithm that generates an initial solution for the TS algorithms is developed to provide fast and effective solutions to the order-batching problem. Unlike most common picker routing heuristics, we model the routing problem of pickers as a classical TSP and propose efficient Nearest Neighbor+Or-opt and Savings+2-Opt heuristics to meet the specific features for the problem. Various problem instances including the number of orders, weight of items, and picking coordinates are generated randomly, and detailed numerical experiments are carried out to evaluate the performances of the proposed methods. In conclusion, the TS algorithms come out to be the most efficient methods in terms of solution quality and computational efficiency.  相似文献   

12.
全局极小化方法及其在结构生物学中的应用近年来取得了显著的进展.适当简化的分子对接问题是全局极小化方法的一个很好目标,并且是当前一个相当活跃的研究领域.对接可分为两类:主要用于从头配体设计的细致对接和用于已知化合物数据库筛选以发现药物的粗略对接,它们对全局极小化算法的要求是不同的.简要评述了新出现的适合于对接问题的随机和确定性全局极小化算法,其中势能平滑算法看来很有希望,值得密切关注.  相似文献   

13.

Background  

Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered.  相似文献   

14.
15.

Background  

Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems.  相似文献   

16.
A new type of learning algorithms with the supervisor for estimating multidimensional functions is considered. These methods based on Support Vector Machines are widely used due to their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data. Support vector machines and related kernel methods are extremely good at solving prediction problems in computational biology. A background about statistical learning theory and kernel feature spaces is given including practical and algorithmic considerations.  相似文献   

17.
An efficient rank based approach for closest string and closest substring   总被引:1,自引:0,他引:1  
Dinu LP  Ionescu R 《PloS one》2012,7(6):e37576
This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results.  相似文献   

18.
Computational scientists have developed algorithms inspired by natural evolution for at least 50 years. These algorithms solve optimization and design problems by building solutions that are 'more fit' relative to desired properties. However, the basic assumptions of this approach are outdated. We propose a research programme to develop a new field: computational evolution. This approach will produce algorithms that are based on current understanding of molecular and evolutionary biology and could solve previously unimaginable or intractable computational and biological problems.  相似文献   

19.
Finding motifs using random projections.   总被引:19,自引:0,他引:19  
  相似文献   

20.
Bioinspired algorithms, such as evolutionary algorithms and ant colony optimization, are widely used for different combinatorial optimization problems. These algorithms rely heavily on the use of randomness and are hard to understand from a theoretical point of view. This paper contributes to the theoretical analysis of ant colony optimization and studies this type of algorithm on one of the most prominent combinatorial optimization problems, namely the traveling salesperson problem (TSP). We present a new construction graph and show that it has a stronger local property than one commonly used for constructing solutions of the TSP. The rigorous runtime analysis for two ant colony optimization algorithms, based on these two construction procedures, shows that they lead to good approximation in expected polynomial time on random instances. Furthermore, we point out in which situations our algorithms get trapped in local optima and show where the use of the right amount of heuristic information is provably beneficial.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号