首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
This paper proposes and evaluates a multi-objective evolutionary algorithm for survival analysis. One aim of survival analysis is the extraction of models from data that approximate lifetime/failure time distributions. These models can be used to estimate the time that an event takes to happen to an object. To use of multi-objective evolutionary algorithms for survival analysis has several advantages. They can cope with feature interactions, noisy data, and are capable of optimising several objectives. This is important, as model extraction is a multi-objective problem. It has at least two objectives, which are the extraction of accurate and simple models. Accurate models are required to achieve good predictions. Simple models are important to prevent overfitting, improve the transparency of the models, and to save computational resources. Although there is a plethora of evolutionary approaches to extract models for classification and regression, the presented approach is one of the first applied to survival analysis. The approach is evaluated on several artificial datasets and one medical dataset. It is shown that the approach is capable of producing accurate models, even for problems that violate some of the assumptions made by classical approaches.  相似文献   



Generally speaking, different classifiers tend to work well for certain types of data and conversely, it is usually not known a priori which algorithm will be optimal in any given classification application. In addition, for most classification problems, selecting the best performing classification algorithm amongst a number of competing algorithms is a difficult task for various reasons. As for example, the order of performance may depend on the performance measure employed for such a comparison. In this work, we present a novel adaptive ensemble classifier constructed by combining bagging and rank aggregation that is capable of adaptively changing its performance depending on the type of data that is being classified. The attractive feature of the proposed classifier is its multi-objective nature where the classification results can be simultaneously optimized with respect to several performance measures, for example, accuracy, sensitivity and specificity. We also show that our somewhat complex strategy has better predictive performance as judged on test samples than a more naive approach that attempts to directly identify the optimal classifier based on the training data performances of the individual classifiers.  相似文献   

A method frequently used in classification systems for improving classification accuracy is to combine outputs of several classifiers. Among various types of classifiers, fuzzy ones are tempting because of using intelligible fuzzy if-then rules. In the paper we build an AdaBoost ensemble of relational neuro-fuzzy classifiers. Relational fuzzy systems bond input and output fuzzy linguistic values by a binary relation; thus, fuzzy rules have additional, comparing to traditional fuzzy systems, weights - elements of a fuzzy relation matrix. Thanks to this the system is better adjustable to data during learning. In the paper an ensemble of relational fuzzy systems is proposed. The problem is that such an ensemble contains separate rule bases which cannot be directly merged. As systems are separate, we cannot treat fuzzy rules coming from different systems as rules from the same (single) system. In the paper, the problem is addressed by a novel design of fuzzy systems constituting the ensemble, resulting in normalization of individual rule bases during learning. The method described in the paper is tested on several known benchmarks and compared with other machine learning solutions from the literature.  相似文献   

One of the hallmarks of biological organisms is their ability to integrate disparate information sources to optimize their behavior in complex environments. How this capability can be quantified and related to the functional complexity of an organism remains a challenging problem, in particular since organismal functional complexity is not well-defined. We present here several candidate measures that quantify information and integration, and study their dependence on fitness as an artificial agent ("animat") evolves over thousands of generations to solve a navigation task in a simple, simulated environment. We compare the ability of these measures to predict high fitness with more conventional information-theoretic processing measures. As the animat adapts by increasing its "fit" to the world, information integration and processing increase commensurately along the evolutionary line of descent. We suggest that the correlation of fitness with information integration and with processing measures implies that high fitness requires both information processing as well as integration, but that information integration may be a better measure when the task requires memory. A correlation of measures of information integration (but also information processing) and fitness strongly suggests that these measures reflect the functional complexity of the animat, and that such measures can be used to quantify functional complexity even in the absence of fitness data.  相似文献   

Phylogenetic networks aim to represent the evolutionary history of taxa. Within these, reticulate networks are explicitly able to accommodate evolutionary events like recombination, hybridization, or lateral gene transfer. Although several metrics exist to compare phylogenetic networks, they make several assumptions regarding the nature of the networks that are not likely to be fulfilled by the evolutionary process. In order to characterize the potential disagreement between the algorithms and the biology, we have used the coalescent with recombination to build the type of networks produced by reticulate evolution and classified them as regular, tree sibling, tree child, or galled trees. We show that, as expected, the complexity of these reticulate networks is a function of the population recombination rate. At small recombination rates, most of the networks produced are already more complex than regular or tree sibling networks, whereas with moderate and large recombination rates, no network fit into any of the standard classes. We conclude that new metrics still need to be devised in order to properly compare two phylogenetic networks that have arisen from reticulating evolutionary process.  相似文献   

DNA微阵列技术的发展为基因表达研究提供更有效的工具。分析这些大规模基因数据主要应用聚类方法。最近,提出双聚类技术来发现子矩阵以揭示各种生物模式。多目标优化算法可以同时优化多个相互冲突的目标,因而是求解基因表达矩阵的双聚类的一种很好的方法。本文基于克隆选择原理提出了一个新奇的多目标免疫优化双聚类算法,来挖掘微阵列数据的双聚类。在两个真实数据集上的实验结果表明该方法比其他多目标进化双聚娄算法表现出更优越的性能。  相似文献   

This paper studies the application of evolutionary algorithms for bi-objective travelling salesman problem. Two evolutionary algorithms, including estimation of distribution algorithm (EDA) and genetic algorithm (GA), are considered. The solution to this problem is a set of trade-off alternatives. The problem is solved by optimizing the order of the cities so as to simultaneously minimize the two objectives of travelling distance and travelling cost incurred by the travelling salesman. In this paper, binary-representation-based evolutionary algorithms are replaced with an integer-representation. Three existing EDAs are altered to use this integer-representation, namely restricted Boltzmann machine (RBM), univariate marginal distribution algorithm (UMDA), and population-based incremental learning (PBIL). Each city is associated with a representative integer, and the probability of any of this representative integer to be located in any position of the chromosome is constructed through the modeling approach of the EDAs. New sequences of cities are obtained by sampling from the probabilistic model. A refinement operator and a local search operator are proposed in this piece of work. The EDAs are subsequently hybridized with GA in order to complement the limitations of both algorithms. The effect that each of these operators has on the quality of the solutions are investigated. Empirical results show that the hybrid algorithms are capable of finding a set of good trade-off solutions.  相似文献   

The application of multi-objective optimisation to evolutionary robotics is receiving increasing attention. A survey of the literature reveals the different possibilities it offers to improve the automatic design of efficient and adaptive robotic systems, and points to the successful demonstrations available for both task-specific and task-agnostic approaches (i.e., with or without reference to the specific design problem to be tackled). However, the advantages of multi-objective approaches over single-objective ones have not been clearly spelled out and experimentally demonstrated. This paper fills this gap for task-specific approaches: starting from well-known results in multi-objective optimisation, we discuss how to tackle commonly recognised problems in evolutionary robotics. In particular, we show that multi-objective optimisation (i) allows evolving a more varied set of behaviours by exploring multiple trade-offs of the objectives to optimise, (ii) supports the evolution of the desired behaviour through the introduction of objectives as proxies, (iii) avoids the premature convergence to local optima possibly introduced by multi-component fitness functions, and (iv) solves the bootstrap problem exploiting ancillary objectives to guide evolution in the early phases. We present an experimental demonstration of these benefits in three different case studies: maze navigation in a single robot domain, flocking in a swarm robotics context, and a strictly collaborative task in collective robotics.  相似文献   

Community detection has drawn a lot of attention as it can provide invaluable help in understanding the function and visualizing the structure of networks. Since single objective optimization methods have intrinsic drawbacks to identifying multiple significant community structures, some methods formulate the community detection as multi-objective problems and adopt population-based evolutionary algorithms to obtain multiple community structures. Evolutionary algorithms have strong global search ability, but have difficulty in locating local optima efficiently. In this study, in order to identify multiple significant community structures more effectively, a multi-objective memetic algorithm for community detection is proposed by combining multi-objective evolutionary algorithm with a local search procedure. The local search procedure is designed by addressing three issues. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are set as initial individuals for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy based on label propagation rule is expanded to search the local optimal solutions efficiently. The extensive experiments on both artificial and real-world networks evaluate the proposed method from three aspects. Firstly, experiments on influence of local search procedure demonstrate that the local search procedure can speed up the convergence to better partitions and make the algorithm more stable. Secondly, comparisons with a set of classic community detection methods illustrate the proposed method can find single partitions effectively. Finally, the method is applied to identify hierarchical structures of networks which are beneficial for analyzing networks in multi-resolution levels.  相似文献   

The choice of a probabilistic model to describe sequence evolution can and should be justified. Underfitting the data through the use of overly simplistic models may miss out on interesting phenomena and lead to incorrect inferences. Overfitting the data with models that are too complex may ascribe biological meaning to statistical artifacts and result in falsely significant findings. We describe a likelihood-based approach for evolutionary model selection. The procedure employs a genetic algorithm (GA) to quickly explore a combinatorially large set of all possible time-reversible Markov models with a fixed number of substitution rates. When applied to stem RNA data subject to well-understood evolutionary forces, the models found by the GA 1) capture the expected overall rate patterns a priori; 2) fit the data better than the best available models based on a priori assumptions, suggesting subtle substitution patterns not previously recognized; 3) cannot be rejected in favor of the general reversible model, implying that the evolution of stem RNA sequences can be explained well with only a few substitution rate parameters; and 4) perform well on simulated data, both in terms of goodness of fit and the ability to estimate evolutionary rates. We also investigate the utility of several distance measures for comparing and contrasting inferred evolutionary models. Using widely available small computer clusters, our approach allows, for the first time, to evaluate the performance of existing RNA evolutionary models by comparing them with a large pool of candidate models and to validate common modeling assumptions. In addition, the new method provides the foundation for rigorous selection and comparison of substitution models for other types of sequence data.  相似文献   

In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g., gene sequence, mRNA expression, and interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel-based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large-scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm to determine weights for a larger set of kernels by optimizing a criterion based on the area under the ROC curve. We show that setting the right kernel weights can indeed improve performance. We compare this to the existing kernel weight optimization methods (i.e., (regularized) optimization of the SVM criterion or aligning the kernel with an ideal kernel) and find that these do not result in a significant performance improvement and can even cause a decrease in performance. Results also show that an expert approach of assigning high weights to features with high individual performance is not necessarily the best strategy.  相似文献   

During business collaboration, partners may benefit through sharing data. People may use data mining tools to discover useful relationships from shared data. However, some relationships are sensitive to the data owners and they hope to conceal them before sharing. In this paper, we address this problem in forms of association rule hiding. A hiding method based on evolutionary multi-objective optimization (EMO) is proposed, which performs the hiding task by selectively inserting items into the database to decrease the confidence of sensitive rules below specified thresholds. The side effects generated during the hiding process are taken as optimization goals to be minimized. HypE, a recently proposed EMO algorithm, is utilized to identify promising transactions for modification to minimize side effects. Results on real datasets demonstrate that the proposed method can effectively perform sanitization with fewer damages to the non-sensitive knowledge in most cases.  相似文献   

There have been several proposals on how to apply the ant colony optimization (ACO) metaheuristic to multi-objective combinatorial optimization problems (MOCOPs). This paper proposes a new formulation of these multi-objective ant colony optimization (MOACO) algorithms. This formulation is based on adding specific algorithm components for tackling multiple objectives to the basic ACO metaheuristic. Examples of these components are how to represent multiple objectives using pheromone and heuristic information, how to select the best solutions for updating the pheromone information, and how to define and use weights to aggregate the different objectives. This formulation reveals more similarities than previously thought in the design choices made in existing MOACO algorithms. The main contribution of this paper is an experimental analysis of how particular design choices affect the quality and the shape of the Pareto front approximations generated by each MOACO algorithm. This study provides general guidelines to understand how MOACO algorithms work, and how to improve their design.  相似文献   

Martin O  Schomburg D 《Proteins》2008,70(4):1367-1378
Biological systems and processes rely on a complex network of molecular interactions. While the association of biological macromolecules is a fundamental biochemical phenomenon crucial for the understanding of complex living systems, protein-protein docking methods aim for the computational prediction of protein complexes from individual subunits. Docking algorithms generally produce large numbers of putative protein complexes with only few of these conformations resembling the native complex structure within an acceptable degree of structural similarity. A major challenge in the field of docking is to extract near-native structure(s) out of the large pool of solutions, the so called scoring or ranking problem. A series of structural, chemical, biological and physical properties are used in this work to classify docked protein-protein complexes. These properties include specialized energy functions, evolutionary relationship, class specific residue interface propensities, gap volume, buried surface area, empiric pair potentials on residue and atom level as well as measures for the tightness of fit. Efficient comprehensive scoring functions have been developed using probabilistic Support Vector Machines in combination with this array of properties on the largest currently available protein-protein docking benchmark. The established classifiers are shown to be specific for certain types of protein-protein complexes and are able to detect near-native complex conformations from large sets of decoys with high sensitivity. Using classification probabilities the ranking of near-native structures was drastically improved, leading to a significant enrichment of near-native complex conformations within the top ranks. It could be shown that the developed schemes outperform five other previously published scoring functions.  相似文献   

Ho SY  Hsieh CH  Chen HM  Huang HL 《Bio Systems》2006,85(3):165-176
An accurate classifier with linguistic interpretability using a small number of relevant genes is beneficial to microarray data analysis and development of inexpensive diagnostic tests. Several frequently used techniques for designing classifiers of microarray data, such as support vector machine, neural networks, k-nearest neighbor, and logistic regression model, suffer from low interpretabilities. This paper proposes an interpretable gene expression classifier (named iGEC) with an accurate and compact fuzzy rule base for microarray data analysis. The design of iGEC has three objectives to be simultaneously optimized: maximal classification accuracy, minimal number of rules, and minimal number of used genes. An "intelligent" genetic algorithm IGA is used to efficiently solve the design problem with a large number of tuning parameters. The performance of iGEC is evaluated using eight commonly-used data sets. It is shown that iGEC has an accurate, concise, and interpretable rule base (1.1 rules per class) on average in terms of test classification accuracy (87.9%), rule number (3.9), and used gene number (5.0). Moreover, iGEC not only has better performance than the existing fuzzy rule-based classifier in terms of the above-mentioned objectives, but also is more accurate than some existing non-rule-based classifiers.  相似文献   

Evolutionary algorithms are widespread heuristic methods inspired by natural evolution to solve difficult problems for which analytical approaches are not suitable. In many domains experimenters are not only interested in discovering optimal solutions, but also in finding the largest number of different solutions satisfying minimal requirements. However, the formulation of an effective performance measure describing these requirements, also known as fitness function, represents a major challenge. The difficulty of combining and weighting multiple problem objectives and constraints of possibly varying nature and scale into a single fitness function often leads to unsatisfactory solutions. Furthermore, selective reproduction of the fittest solutions, which is inspired by competition-based selection in nature, leads to loss of diversity within the evolving population and premature convergence of the algorithm, hindering the discovery of many different solutions.Here we present an alternative abstraction of artificial evolution, which does not require the formulation of a composite fitness function. Inspired from viability theory in dynamical systems, natural evolution and ethology, the proposed method puts emphasis on the elimination of individuals that do not meet a set of changing criteria, which are defined on the problem objectives and constraints.Experimental results show that the proposed method maintains higher diversity in the evolving population and generates more unique solutions when compared to classical competition-based evolutionary algorithms. Our findings suggest that incorporating viability principles into evolutionary algorithms can significantly improve the applicability and effectiveness of evolutionary methods to numerous complex problems of science and engineering, ranging from protein structure prediction to aircraft wing design.  相似文献   

Nowadays, scientists and companies are confronted with multiple competing goals such as makespan in high-performance computing and economic cost in Clouds that have to be simultaneously optimised. Multi-objective scheduling of scientific applications in these systems is therefore receiving increasing research attention. Most existing approaches typically aggregate all objectives in a single function, defined a-priori without any knowledge about the problem being solved, which negatively impacts the quality of the solutions. In contrast, Pareto-based approaches having as outcome a set of (nearly) optimal solutions that represent a tradeoff among the different objectives, have been scarcely studied. In this paper, we analyse MOHEFT, a Pareto-based list scheduling heuristic that provides the user with a set of tradeoff optimal solutions from which the one that better suits the user requirements can be manually selected. We demonstrate the potential of our method for multi-objective workflow scheduling on the commercial Amazon EC2 Cloud. We compare the quality of the MOHEFT tradeoff solutions with two state-of-the-art approaches using different synthetic and real-world workflows: the classical HEFT algorithm for single-objective scheduling and the SPEA2* genetic algorithm used in multi-objective optimisation problems. The results demonstrate that our approach is able to compute solutions of higher quality than SPEA2*. In addition, we show that MOHEFT is more suitable than SPEA2* for workflow scheduling in the context of commercial Clouds, since the genetic-based approach is unable of dealing with some of the constraints imposed by these systems.  相似文献   

1. Interspecific trade-offs are thought to facilitate coexistence between species at small spatial scales. The discovery-dominance trade-off, analogous to a competition-colonisation trade-off, is considered an important structuring mechanism in ant ecology. A trade-off between species' ability to discover food resources and to dominate them may explain how so many species apparently dependent on similar resources can coexist. 2. The discovery-dominance trade-off is thought to be broken by invasive species in enemy-free space or territorial species whose activity is fuelled by domination of carbohydrate resources. It may also be mediated by factors such as temperature and habitat structure. 3. We investigate the generality and form of the discovery-dominance relationship in an experiment using habitats of contrasting complexity across three continents. In addition, to assess how widespread the discovery-dominance trade-off is, we conducted a systematic review combining all empirical studies (published and from our experiment). 4. From our own fieldwork and meta-analyses of available studies, we find surprisingly little empirical support for the trade-off, with results indicating that mean effect sizes were either not significantly different from 0 or significantly positive. The trade-off was only detected in studies with parasitoids present. Additionally, experimental data from simple and complex habitats within each continent suggest that simple habitats may facilitate both food resource discovery and dominance. 5. We conclude that the discovery-dominance trade-off is the exception, rather than the rule. Instead, these abilities were commonly correlated. Real food resources provide many axes along which partitioning may occur, and discovery-dominance trade-offs are not a prerequisite for coexistence.  相似文献   

The term complexity means several things to biologists. When qualifying morphological phenotype, on the one hand, it is used to signify the sheer complicatedness of living systems, especially as a result of the multicomponent aspect of biological form. On the other hand, it has been used to represent the intricate nature of the connections between constituents that make up form: a more process-based explanation. In the context of evolutionary arguments, complexity has been defined, in a quantifiable fashion, as the amount of information, an informatic template such as a sequence of nucleotides or amino acids stores about its environment. In this perspective, we begin with a brief review of the history of complexity theory. We then introduce a developmental and an evolutionary understanding of what it means for biological systems to be complex. We propose that the complexity of living systems can be understood through two interdependent structural properties: multiscalarity of interconstituent mechanisms and excitability of the biological materials. The answer to whether a system becomes more or less complex over time depends on the potential for its constituents to interact in novel ways and combinations to give rise to new structures and functions, as well as on the evolution of excitable properties that would facilitate the exploration of interconstituent organization in the context of their microenvironments and macroenvironments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号