首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The artificial bee colony (ABC) algorithm is a recent class of swarm intelligence algorithms that is loosely inspired by the foraging behavior of honeybee swarms. It was introduced in 2005 using continuous optimization problems as an example application. Similar to what has happened with other swarm intelligence techniques, after the initial proposal, several researchers have studied variants of the original algorithm. Unfortunately, often these variants have been tested under different experimental conditions and different fine-tuning efforts for the algorithm parameters. In this article, we review various variants of the original ABC algorithm and experimentally study nine ABC algorithms under two settings: either using the original parameter settings as proposed by the authors, or using an automatic algorithm configuration tool using a same tuning effort for each algorithm. We also study the effect of adding local search to the ABC algorithms. Our experimental results show that local search can improve considerably the performance of several ABC variants and that it reduces strongly the performance differences between the studied ABC variants. We also show that the best ABC variants are competitive with recent state-of-the-art algorithms on the benchmark set we used, which establishes ABC algorithms as serious competitors in continuous optimization.  相似文献   

2.

Background

The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome [1] would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.

Results

To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.

Conclusion

We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.  相似文献   

3.
Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.  相似文献   

4.

Background

RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data.

Results

We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars.

Conclusions

The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms.  相似文献   

5.
Microarray gene expression data generally suffers from missing value problem due to a variety of experimental reasons. Since the missing data points can adversely affect downstream analysis, many algorithms have been proposed to impute missing values. In this survey, we provide a comprehensive review of existing missing value imputation algorithms, focusing on their underlying algorithmic techniques and how they utilize local or global information from within the data, or their use of domain knowledge during imputation. In addition, we describe how the imputation results can be validated and the different ways to assess the performance of different imputation algorithms, as well as a discussion on some possible future research directions. It is hoped that this review will give the readers a good understanding of the current development in this field and inspire them to come up with the next generation of imputation algorithms.  相似文献   

6.

Background  

Partitioning of a protein into structural components, known as domains, is an important initial step in protein classification and for functional and evolutionary studies. While the systematic assignments of domains by human experts exist (CATH and SCOP), the introduction of high throughput technologies for structure determination threatens to overwhelm expert approaches. A variety of algorithmic methods have been developed to expedite this process, allowing almost instant structural decomposition into domains. The performance of algorithmic methods can approach 85% agreement on the number of domains with the consensus reached by experts. However, each algorithm takes a somewhat different conceptual approach, each with unique strengths and weaknesses. Currently there is no simple way to automatically compare assignments from different structure-based domain assignment methods, thereby providing a comprehensive understanding of possible structure partitioning as well as providing some insight into the tendencies of particular algorithms. Most importantly, a consensus assignment drawn from multiple assignment methods can provide a singular and presumably more accurate view.  相似文献   

7.
8.
Neuroprosthetic devices such as a computer cursor can be controlled by the activity of cortical neurons when an appropriate algorithm is used to decode motor intention. Algorithms which have been proposed for this purpose range from the simple population vector algorithm (PVA) and optimal linear estimator (OLE) to various versions of Bayesian decoders. Although Bayesian decoders typically provide the most accurate off-line reconstructions, it is not known which model assumptions in these algorithms are critical for improving decoding performance. Furthermore, it is not necessarily true that improvements (or deficits) in off-line reconstruction will translate into improvements (or deficits) in on-line control, as the subject might compensate for the specifics of the decoder in use at the time. Here we show that by comparing the performance of nine decoders, assumptions about uniformly distributed preferred directions and the way the cursor trajectories are smoothed have the most impact on decoder performance in off-line reconstruction, while assumptions about tuning curve linearity and spike count variance play relatively minor roles. In on-line control, subjects compensate for directional biases caused by non-uniformly distributed preferred directions, leaving cursor smoothing differences as the largest single algorithmic difference driving decoder performance.  相似文献   

9.
Current computer simulation techniques provide robust tools for studying the detailed structure and functional dynamics of proteins, as well as their interaction with each other and with other biomolecules. In this minireview, we provide an illustration of recent progress and future challenges in computer modeling by discussing computational studies of ATP-binding cassette (ABC) transporters. ABC transporters have multiple components that work in a well coordinated fashion to enable active transport across membranes. The mechanism by which members of this superfamily execute transport remains largely unknown. Molecular dynamics simulations initiated from high-resolution crystal structures of several ABC transporters have proven to be useful in the investigation of the nature of conformational coupling events that may drive transport. In addition, fruitful efforts have been made to predict unknown structures of medically relevant ABC transporters, such as P-glycoprotein, using homology-based computational methods. The various techniques described here are also applicable to gaining an atomically detailed understanding of the functional mechanisms of proteins in general.  相似文献   

10.
In many technical fields, single-objective optimization procedures in continuous domains involve expensive numerical simulations. In this context, an improvement of the Artificial Bee Colony (ABC) algorithm, called the Artificial super-Bee enhanced Colony (AsBeC), is presented. AsBeC is designed to provide fast convergence speed, high solution accuracy and robust performance over a wide range of problems. It implements enhancements of the ABC structure and hybridizations with interpolation strategies. The latter are inspired by the quadratic trust region approach for local investigation and by an efficient global optimizer for separable problems. Each modification and their combined effects are studied with appropriate metrics on a numerical benchmark, which is also used for comparing AsBeC with some effective ABC variants and other derivative-free algorithms. In addition, the presented algorithm is validated on two recent benchmarks adopted for competitions in international conferences. Results show remarkable competitiveness and robustness for AsBeC.  相似文献   

11.

Artificial Bee Colony (ABC) algorithm is a nature-inspired algorithm that showed its efficiency for optimizations. However, the ABC algorithm showed some imbalances between exploration and exploitation. In order to improve the exploitation and enhance the convergence speed, a multi-population ABC algorithm based on global and local optimum (namely MPGABC) is proposed in this paper. First, in MPGABC, the initial population is generated using both chaotic systems and opposition-based learning methods. The colony in MPGABC is divided into several sub-populations to increase diversity. Moreover, the solution search mechanism is modified by introducing global and local optima in the solution search equations of both employed and onlookers. The scout bees in the proposed algorithm are generated similarly to the initial population. Finally, the proposed algorithm is compared with several state-of-art ABC algorithm variants on a set of 13 classical benchmark functions. The experimental results show that MPGABC competes and outperforms other ABC algorithm variants.

  相似文献   

12.
Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper, we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the online type, and they generalize classical multipattern matching, filtering, and superalphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database. Experimental performance evaluation is provided to compare the new techniques against each other as well as against some other online and index-based algorithms proposed in the literature. Compared to the brute-force O(mn) approach, our solutions can be faster by a factor that is proportional to the matrix length m. Our multiple-matrix filtration algorithm had the best performance in the experiments. On a current PC, this algorithm finds significant matches (p = 0.0001) of the 123 JASPAR matrices in the human genome in about 18 minutes.  相似文献   

13.
ABC1K atypical kinases in plants: filling the organellar kinase void   总被引:1,自引:0,他引:1  
Surprisingly few protein kinases have been demonstrated in chloroplasts or mitochondria. Here, we discuss the activity of bc(1) complex kinase (ABC1K) protein family, which we suggest locate in mitochondria and plastids, thus filling the kinase void. The ABC1Ks are atypical protein kinases and their ancestral function is the regulation of quinone synthesis. ABC1Ks have proliferated from one or two members in non-photosynthetic organisms to more than 16 members in algae and higher plants. In this review, we reconstruct the evolutionary history of the ABC1K family, provide a functional domain analysis for angiosperms and a nomenclature for ABC1Ks in Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa) and maize (Zea mays). Finally, we hypothesize that targets of ABC1Ks include enzymes of prenyl-lipid metabolism as well as components of the organellar gene expression machineries.  相似文献   

14.
We present an original approach to identifying sequence variants in a mixed DNA population from sequence trace data. The heart of the method is based on parsimony: given a wildtype DNA sequence, a set of observed variations at each position collected from sequencing data, and a complete catalog of all possible mutations, determine the smallest set of mutations from the catalog that could fully explain the observed variations. The algorithmic complexity of the problem is analyzed for several classes of mutations, including block substitutions, single-range deletions, and single-range insertions. The reconstruction problem is shown to be NP-complete for single-range insertions and deletions, while for block substitutions, single character insertion, and single character deletion mutations, polynomial time algorithms are provided. Once a minimum set of mutations compatible with the observed sequence is found, the relative frequency of those mutations is recovered by solving a system of linear equations. Simulation results show the algorithm successfully deconvolving mutations in p53 known to cause cancer. An extension of the algorithm is proposed as a new method of high throughput screening for single nucleotide polymorphisms by multiplexing DNA.  相似文献   

15.
ABSTRACT: BACKGROUND: Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. RESULTS: We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension - UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent test data than a classifier trained using features selected by a statistical p-value-based filter, which is currently the most popular approach for constructing predictive models in GWAS. CONCLUSIONS: Greedy RLS is the first known implementation of a machine learning based method with the capability to conduct a wrapper-based feature selection on an entire GWAS containing several thousand examples and over 400,000 variants. In our experiments, greedy RLS selected a highly predictive subset of genetic variants in a fraction of the time spent by wrapper-based selection methods used together with SVM classifiers. The proposed algorithms are freely available as part of the RLScore software library at http://users.utu.fi/aatapa/RLScore/.  相似文献   

16.
Obtaining satisfactory results with neural networks depends on the availability of large data samples. The use of small training sets generally reduces performance. Most classical Quantitative Structure-Activity Relationship (QSAR) studies for a specific enzyme system have been performed on small data sets. We focus on the neuro-fuzzy prediction of biological activities of HIV-1 protease inhibitory compounds when inferring from small training sets. We propose two computational intelligence prediction techniques which are suitable for small training sets, at the expense of some computational overhead. Both techniques are based on the FAMR model. The FAMR is a Fuzzy ARTMAP (FAM) incremental learning system used for classification and probability estimation. During the learning phase, each sample pair is assigned a relevance factor proportional to the importance of that pair. The two proposed algorithms in this paper are: 1) The GA-FAMR algorithm, which is new, consists of two stages: a) During the first stage, we use a genetic algorithm (GA) to optimize the relevances assigned to the training data. This improves the generalization capability of the FAMR. b) In the second stage, we use the optimized relevances to train the FAMR. 2) The Ordered FAMR is derived from a known algorithm. Instead of optimizing relevances, it optimizes the order of data presentation using the algorithm of Dagher et al. In our experiments, we compare these two algorithms with an algorithm not based on the FAM, the FS-GA-FNN introduced in [4], [5]. We conclude that when inferring from small training sets, both techniques are efficient, in terms of generalization capability and execution time. The computational overhead introduced is compensated by better accuracy. Finally, the proposed techniques are used to predict the biological activities of newly designed potential HIV-1 protease inhibitors.  相似文献   

17.
The focus of research in swarm intelligence has been largely on the algorithmic side with relatively little attention being paid to the study of problems and the behaviour of algorithms in relation to problems. When a new algorithm or variation on an existing algorithm is proposed in the literature, there is seldom any discussion or analysis of algorithm weaknesses and on what kinds of problems the algorithm is expected to fail. Fitness landscape analysis is an approach that can be used to analyse optimisation problems. By characterising problems in terms of fitness landscape features, the link between problem types and algorithm performance can be studied. This article investigates a number of measures for analysing the ability of a search process to improve fitness on a particular problem (called evolvability in literature but referred to as searchability in this study to broaden the scope to non-evolutionary-based search techniques). A number of existing fitness landscape analysis techniques originally proposed for discrete problems are adapted to work in continuous search spaces. For a range of benchmark problems, the proposed searchability measures are viewed alongside performance measures for a traditional global best particle swarm optimisation (PSO) algorithm. Empirical results show that no single measure can be used as a predictor of PSO performance, but that multiple measures of different fitness landscape features can be used together to predict PSO failure.  相似文献   

18.
In this article, we address the problem of designing a string with optimal complementarity properties with respect to another given string according to a given criterion. The motivation comes from a drug design application, in which the complementarity between two sequences (proteins) is measured according to the values of the hydropathic coefficients associated with the sequence elements (amino acids). We present heuristic and exact optimization algorithms, and we report on some computational experiments on amino peptides taken from Semaphorin and human Interleukin-1β, which have already been investigated in the literature using heuristic algorithms. With our techniques, we proved the optimality of a known solution for Semaphorin-3A, and we discovered several other optimal and near-optimal solutions in a short computing time; we also found in fractions of a second an optimal solution for human interleukin-1β, whose complementary value is one order of magnitude better than previously known ones. The source code of a prototype C++ implementation of our algorithms is freely available for noncommercial use on the web. As a main result, we showed that in this context mathematical programming methods are more successful than heuristics, such as simulated annealing. Our algorithm unfolds its potential, especially when different measures could be used for scoring peptides, and is able to provide not only a single optimal solution, but a ranking of provable good ones; this ranking can then be used by biologists as a starting basis for further refinements, simulations, or in vitro experiments.  相似文献   

19.
A structure-based approach for prediction of MHC-binding peptides   总被引:5,自引:0,他引:5  
Identification of immunodominant peptides is the first step in the rational design of peptide vaccines aimed at T-cell immunity. The advances in sequencing techniques and the accumulation of many protein sequences without the purified protein challenge the development of computer algorithms to identify dominant T-cell epitopes based on sequence data alone. Here, we focus on antigenic peptides recognized by cytotoxic T cells. The selection of T-cell epitopes along a protein sequence is influenced by the specificity of each of the processing stages that precede antigen presentation. The most selective of these processing stages is the binding of the peptides to the major histocompatibility complex molecules, and therefore many of the predictive algorithms focus on this stage. Most of these algorithms are based on known binding peptides whose sequences have been used for the characterization of binding motifs or profiles. Here, we describe a structure-based algorithm that does not rely on previous binding data. It is based on observations from crystal structures that many of the bound peptides adopt similar conformations and placements within the MHC groove. The algorithm uses a structural template of the peptide in the MHC groove upon which peptide candidates are threaded and their fit to the MHC groove is evaluated by statistical pairwise potentials. It can rank all possible peptides along a protein sequence or within a suspected group of peptides, directing the experimental efforts towards the most promising peptides. This approach is especially useful when no previous peptide binding data are available.  相似文献   

20.
Comparative analyses of cellular interaction networks enable understanding of the cell's modular organization through identification of functional modules and complexes. These techniques often rely on topological features such as connectedness and density, based on the premise that functionally related proteins are likely to interact densely and that these interactions follow similar evolutionary trajectories. Significant recent work has focused on efficient algorithms for identification of such functional modules and their conservation. In spite of algorithmic advances, development of a comprehensive infrastructure for interaction databases is in relative infancy compared to corresponding sequence analysis tools. One critical, and as yet unresolved aspect of this infrastructure is a measure of the statistical significance of a match, or a dense subcomponent. In the absence of analytical measures, conventional methods rely on computationally expensive simulations based on ad-hoc models for quantifying significance. In this paper, we present techniques for analytically quantifying statistical significance of dense components in reference model graphs. We consider two reference models--a G(n, p) model in which each pair of nodes in a graph has an identical likelihood, p, of sharing an edge, and a two-level G(n, p) model, which accounts for high-degree hub nodes generally observed in interaction networks. Experiments performed on a rich collection of protein interaction (PPI) networks show that the proposed model provides a reliable means of evaluating statistical significance of dense patterns in these networks. We also adapt existing state-of-the-art network clustering algorithms by using our statistical significance measure as an optimization criterion. Comparison of the resulting module identification algorithm, SIDES, with existing methods shows that SIDES outperforms existing algorithms in terms of sensitivity and specificity of identified clusters with respect to available GO annotations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号