首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The factors determining a drug's success are manifold, making de novo drug design an inherently multi-objective optimisation (MOO) problem. With the advent of machine learning and optimisation methods, the field of multi-objective compound design has seen a rapid increase in developments and applications. Population-based metaheuris-tics and deep reinforcement learning are the most commonly used artificial intelligence methods in the field, but recently conditional learning methods are gaining popularity. The former approaches are coupled with a MOO strat-egy which is most commonly an aggregation function, but Pareto-based strategies are widespread too. Besides these and conditional learning, various innovative approaches to tackle MOO in drug design have been proposed. Here we provide a brief overview of the field and the latest innovations.  相似文献   

2.
There have been several proposals on how to apply the ant colony optimization (ACO) metaheuristic to multi-objective combinatorial optimization problems (MOCOPs). This paper proposes a new formulation of these multi-objective ant colony optimization (MOACO) algorithms. This formulation is based on adding specific algorithm components for tackling multiple objectives to the basic ACO metaheuristic. Examples of these components are how to represent multiple objectives using pheromone and heuristic information, how to select the best solutions for updating the pheromone information, and how to define and use weights to aggregate the different objectives. This formulation reveals more similarities than previously thought in the design choices made in existing MOACO algorithms. The main contribution of this paper is an experimental analysis of how particular design choices affect the quality and the shape of the Pareto front approximations generated by each MOACO algorithm. This study provides general guidelines to understand how MOACO algorithms work, and how to improve their design.  相似文献   

3.
The assumption that total abundance of RNAs in a cell is roughly the same in different cells is underlying most studies based on gene expression analyses. But experiments have shown that changes in the expression of some master regulators such as c-MYC can cause global shift in the expression of almost all genes in some cell types like cancers. Such shift will violate this assumption and can cause wrong or biased conclusions for standard data analysis practices, such as detection of differentially expressed (DE) genes and molecular classification of tumors based on gene expression. Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data. To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis. We collected data with known global shift effect and also generated data to simulate different situations of the effect based on a wide collection of real gene expression data, and conducted comparative studies on representative existing methods. We observed that some DE analysis methods are more tolerant to the global shift while others are very sensitive to it. Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.  相似文献   

4.
Durability and kinematics are two critical factors which must be considered during total knee replacement (TKR) implant design. It is hypothesized, however, that there exists a competing relationship between these two performance measures, such that improvement of one requires sacrifice with respect to the other. No previous studies have used rigorous and systematic methods to quantify this relationship. During this study, multiobjective design optimization (MOO) using the adaptive weighted sum (AWS) method is used to determine a set of Pareto-optimal implant designs considering durability and kinematics simultaneously. Previously validated numerical simulations and a parametric modeller are used in conjunction with the AWS method in order to generate a durability-versus-kinematics Pareto curve. In terms of kinematics, a design optimized for kinematics alone outperformed a design optimized for durability by 61.8%. In terms of durability, the design optimized for durability outperformed the kinematics-optimized design by 70.6%. Considering the entire Pareto curve, a balanced (1:1) trade-off could be obtained when equal weighting was placed on both performance measures; however improvement of one performance measure required greater sacrifices with respect to the other when the weighting was extremized. For the first time, the competing relationship between durability and kinematics was confirmed and quantified using optimization methods. This information can aid future developments in TKR design and can be expanded to other total joint replacement designs.  相似文献   

5.
This paper discusses two sample nonparametric comparison of survival functions when only interval‐censored failure time data are available. The problem considered often occurs in, for example, biological and medical studies such as medical follow‐up studies and clinical trials. For the problem, we present and study several nonparametric test procedures that include methods based on both absolute and squared survival differences as well as simple survival differences. The presented tests provide alternatives to existing methods, most of which are rank‐based tests and not sensitive to nonproportional or nonmonotone alternatives. Simulation studies are performed to evaluate and compare the proposed methods with existing methods and suggest that the proposed tests work well for nonmonotone alternatives as well as monotone alternatives. An illustrative example is presented.  相似文献   

6.
Cluster analysis of gene-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and constructing gene regulatory networks. The motivation for considering mutual information is its capacity to measure a general dependence among gene random variables. We propose a novel clustering strategy based on minimizing mutual information among gene clusters. Simulated annealing is employed to solve the optimization problem. Bootstrap techniques are employed to get more accurate estimates of mutual information when the data sample size is small. Moreover, we propose to combine the mutual information criterion and traditional distance criteria such as the Euclidean distance and the fuzzy membership metric in designing the clustering algorithm. The performances of the new clustering methods are compared with those of some existing methods, using both synthesized data and experimental data. It is seen that the clustering algorithm based on a combined metric of mutual information and fuzzy membership achieves the best performance. The supplemental material is available at www.gspsnap.tamu.edu/gspweb/zxb/glioma_zxb.  相似文献   

7.
赖氨酸琥珀酰化是一种新型的翻译后修饰,在蛋白质调节和细胞功能控制中发挥重要作用,所以准确识别蛋白质中的琥珀酰化位点是有必要的。传统的实验耗费物力和财力。通过计算方法预测是近段时间以来提出的一种高效的预测方法。本研究中,我们开发了一种新的预测方法iSucc-PseAAC,它是通过使用多种分类算法结合不同的特征提取方法。最终发现,基于耦合序列(PseAAC)特征提取下,使用支持向量机分类效果是最好的,并结合集成学习解决了数据不平衡问题。与现有方法预测效果对比,iSucc-PseAAC在区分赖氨酸琥珀酰化位点方面,更具有意义和实用性。  相似文献   

8.
Deb K  Raji Reddy A 《Bio Systems》2003,72(1-2):111-129
In the area of bioinformatics, the identification of gene subsets responsible for classifying available disease samples to two or more of its variants is an important task. Such problems have been solved in the past by means of unsupervised learning methods (hierarchical clustering, self-organizing maps, k-mean clustering, etc.) and supervised learning methods (weighted voting approach, k-nearest neighbor method, support vector machine method, etc.). Such problems can also be posed as optimization problems of minimizing gene subset size to achieve reliable and accurate classification. The main difficulties in solving the resulting optimization problem are the availability of only a few samples compared to the number of genes in the samples and the exorbitantly large search space of solutions. Although there exist a few applications of evolutionary algorithms (EAs) for this task, here we treat the problem as a multiobjective optimization problem of minimizing the gene subset size and minimizing the number of misclassified samples. Moreover, for a more reliable classification, we consider multiple training sets in evaluating a classifier. Contrary to the past studies, the use of a multiobjective EA (NSGA-II) has enabled us to discover a smaller gene subset size (such as four or five) to correctly classify 100% or near 100% samples for three cancer samples (Leukemia, Lymphoma, and Colon). We have also extended the NSGA-II to obtain multiple non-dominated solutions discovering as much as 352 different three-gene combinations providing a 100% correct classification to the Leukemia data. In order to have further confidence in the identification task, we have also introduced a prediction strength threshold for determining a sample's belonging to one class or the other. All simulation results show consistent gene subset identifications on three disease samples and exhibit the flexibilities and efficacies in using a multiobjective EA for the gene subset identification task.  相似文献   

9.
Ho SY  Hsieh CH  Chen HM  Huang HL 《Bio Systems》2006,85(3):165-176
An accurate classifier with linguistic interpretability using a small number of relevant genes is beneficial to microarray data analysis and development of inexpensive diagnostic tests. Several frequently used techniques for designing classifiers of microarray data, such as support vector machine, neural networks, k-nearest neighbor, and logistic regression model, suffer from low interpretabilities. This paper proposes an interpretable gene expression classifier (named iGEC) with an accurate and compact fuzzy rule base for microarray data analysis. The design of iGEC has three objectives to be simultaneously optimized: maximal classification accuracy, minimal number of rules, and minimal number of used genes. An "intelligent" genetic algorithm IGA is used to efficiently solve the design problem with a large number of tuning parameters. The performance of iGEC is evaluated using eight commonly-used data sets. It is shown that iGEC has an accurate, concise, and interpretable rule base (1.1 rules per class) on average in terms of test classification accuracy (87.9%), rule number (3.9), and used gene number (5.0). Moreover, iGEC not only has better performance than the existing fuzzy rule-based classifier in terms of the above-mentioned objectives, but also is more accurate than some existing non-rule-based classifiers.  相似文献   

10.
Nowadays, scientists and companies are confronted with multiple competing goals such as makespan in high-performance computing and economic cost in Clouds that have to be simultaneously optimised. Multi-objective scheduling of scientific applications in these systems is therefore receiving increasing research attention. Most existing approaches typically aggregate all objectives in a single function, defined a-priori without any knowledge about the problem being solved, which negatively impacts the quality of the solutions. In contrast, Pareto-based approaches having as outcome a set of (nearly) optimal solutions that represent a tradeoff among the different objectives, have been scarcely studied. In this paper, we analyse MOHEFT, a Pareto-based list scheduling heuristic that provides the user with a set of tradeoff optimal solutions from which the one that better suits the user requirements can be manually selected. We demonstrate the potential of our method for multi-objective workflow scheduling on the commercial Amazon EC2 Cloud. We compare the quality of the MOHEFT tradeoff solutions with two state-of-the-art approaches using different synthetic and real-world workflows: the classical HEFT algorithm for single-objective scheduling and the SPEA2* genetic algorithm used in multi-objective optimisation problems. The results demonstrate that our approach is able to compute solutions of higher quality than SPEA2*. In addition, we show that MOHEFT is more suitable than SPEA2* for workflow scheduling in the context of commercial Clouds, since the genetic-based approach is unable of dealing with some of the constraints imposed by these systems.  相似文献   

11.
《IRBM》2023,44(3):100749
ObjectiveThe most widespread and intrusive cancer type among women is breast cancer. Globally, this type of cancer causes more mortality among women, next to lung cancer. This made the researchers to focus more on developing effective Computer-Aided Detection (CAD) methodologies for the classification of such deadly cancer types. In order to improve the rate of survival and earlier diagnosis, an optimistic research methodology is required in the classification of breast cancer. Consequently, an improved methodology that integrates the principle of deep learning with metaheuristic and classification algorithms is proposed for the severity classification of breast cancer. Hence to enhance the recent findings, an improved CAD methodology is proposed for redressing the healthcare problem.Material and MethodsThe work intends to cast a light-of-research towards classifying the severities present in digital mammogram images. For evaluating the work, the publicly available MIAS, INbreast, and WDBC databases are utilized. The proposed work employs transfer learning for extricating the features. The novelty of the work lies in improving the classification performance of the weighted k-nearest neighbor (wKNN) algorithm using particle swarm optimization (PSO), dragon-fly optimization algorithm (DFOA), and crow-search optimization algorithm (CSOA) as a transformation technique i.e., transforming non-linear input features into minimal linear separable feature vectors.ResultsThe results obtained for the proposed work are compared then with the Gaussian Naïve Bayes and linear Support Vector Machine algorithms, where the highest accuracy for classification is attained for the proposed work (CSOA-wKNN) with 84.35% for MIAS, 83.19% for INbreast, and 97.36% for WDBC datasets respectively.ConclusionThe obtained results reveal that the proposed Computer-Aided-Diagnosis (CAD) tool is robust for the severity classification of breast cancer.  相似文献   

12.
MOTIVATION: Gene expression data offer a large number of potentially useful predictors for the classification of tissue samples into classes, such as diseased and non-diseased. The predictive error rate of classifiers can be estimated using methods such as cross-validation. We have investigated issues of interpretation and potential bias in the reporting of error rate estimates. The issues considered here are optimization and selection biases, sampling effects, measures of misclassification rate, baseline error rates, two-level external cross-validation and a novel proposal for detection of bias using the permutation mean. RESULTS: Reporting an optimal estimated error rate incurs an optimization bias. Downward bias of 3-5% was found in an existing study of classification based on gene expression data and may be endemic in similar studies. Using a simulated non-informative dataset and two example datasets from existing studies, we show how bias can be detected through the use of label permutations and avoided using two-level external cross-validation. Some studies avoid optimization bias by using single-level cross-validation and a test set, but error rates can be more accurately estimated via two-level cross-validation. In addition to estimating the simple overall error rate, we recommend reporting class error rates plus where possible the conditional risk incorporating prior class probabilities and a misclassification cost matrix. We also describe baseline error rates derived from three trivial classifiers which ignore the predictors. AVAILABILITY: R code which implements two-level external cross-validation with the PAMR package, experiment code, dataset details and additional figures are freely available for non-commercial use from http://www.maths.qut.edu.au/profiles/wood/permr.jsp  相似文献   

13.
We provide a novel interpretation of the dual of support vector machines (SVMs) in terms of scatter with respect to class prototypes and their mean. As a key contribution, we extend this framework to multiple classes, providing a new joint Scatter SVM algorithm, at the level of its binary counterpart in the number of optimization variables. This enables us to implement computationally efficient solvers based on sequential minimal and chunking optimization. As a further contribution, the primal problem formulation is developed in terms of regularized risk minimization and the hinge loss, revealing the score function to be used in the actual classification of test patterns. We investigate Scatter SVM properties related to generalization ability, computational efficiency, sparsity and sensitivity maps, and report promising results.  相似文献   

14.
Li  Chunlin  Cai  Qianqian  Luo  Youlong 《Cluster computing》2022,25(2):1421-1439

Improper data replacement and inappropriate selection of job scheduling policy are important reasons for the degradation of Spark system operation speed, which directly causes the performance degradation of Spark parallel computing. In this paper, we analyze the existing caching mechanism of Spark and find that there is still more room for optimization of the existing caching policy. For the task structure analysis, the key information of Spark tasks is taken out to obtain the data and memory usage during the task runtime, and based on this, an RDD weight calculation method is proposed, which integrates various factors affecting the RDD usage and establishes an RDD weight model. Based on this model, a minimum weight replacement algorithm based on RDD structure analyzing is proposed. The algorithm ensure that the relatively more valuable data in the data replacement process can be cached into memory. In addition, the default job scheduling algorithm of the Spark framework considers a single factor, which cannot form effective scheduling for jobs and causes a waste of cluster resources. In this paper, an adaptive job scheduling policy based on job classification is proposed to solve the above problem. The policy can classify job types and schedule resources more effectively for different types of jobs. The experimental results show that the proposed dynamic data replacement algorithm effectively improves Spark's memory utilization. The proposed job classification-based adaptive job scheduling algorithm effectively improves the system resource utilization and shortens the job completion time.

  相似文献   

15.
Confounding factors exist widely in various biological data owing to technical variations, population structures and experimental conditions. Such factors may mask the true signals and lead to spurious associations in the respective biological data, making it necessary to adjust confounding factors accordingly. However, existing confounder correction methods were mainly developed based on the original data or the pairwise Euclidean distance, either one of which is inadequate for analyzing different types of data, such as sequencing data.In this work, we proposed a method called Adjustment for Confounding factors using Principal Coordinate Analysis, or AC-PCoA, which reduces data dimension and extracts the information from different distance measures using principal coordinate analysis, and adjusts confounding factors across multiple datasets by minimizing the associations between lower-dimensional representations and confounding variables. Application of the proposed method was further extended to classification and prediction. We demonstrated the efficacy of AC-PCoA on three simulated datasets and five real datasets. Compared to the existing methods, AC-PCoA shows better results in visualization, statistical testing, clustering, and classification.  相似文献   

16.
Oligonucleotide fingerprinting is a powerful DNA array-based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version and present an efficient greedy algorithm based on MINIMUM CLIQUE PARTITION on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our preliminary experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes.  相似文献   

17.
In recent years, event-based approaches have been gaining ground in coevolutionary and biogeographical inference. Unlike pattern-based methods, event-based protocols deal directly with evolutionary events, such as dispersals and host switches. Three protocols have been proposed to date: (1) a coevolutionary method based on optimization of a standard two-dimensional cost matrix; (2) dispersal–vicariance analysis, based on optimization of a three-dimensional cost matrix; and (3) the maximum cospeciation method, thus far not considered a cost matrix method. I describe here general three-dimensional cost matrix optimization algorithms and how they can be applied to the maximum cospeciation problem. The new algorithms demonstrate that all existing event-based protocols, as well as possible future methods based on more complicated process models, can be incorporated into the three-dimensional cost matrix optimization framework.  相似文献   

18.
In this paper, heuristic solution techniques for the multi-objective orienteering problem are developed. The motivation stems from the problem of planning individual tourist routes in a city. Each point of interest in a city provides different benefits for different categories (e.g., culture, shopping). Each tourist has different preferences for the different categories when selecting and visiting the points of interests (e.g., museums, churches). Hence, a multi-objective decision situation arises. To determine all the Pareto optimal solutions, two metaheuristic search techniques are developed and applied. We use the Pareto ant colony optimization algorithm and extend the design of the variable neighborhood search method to the multi-objective case. Both methods are hybridized with path relinking procedures. The performances of the two algorithms are tested on several benchmark instances as well as on real world instances from different Austrian regions and the cities of Vienna and Padua. The computational results show that both implemented methods are well performing algorithms to solve the multi-objective orienteering problem.  相似文献   

19.
Huang HL  Lee CC  Ho SY 《Bio Systems》2007,90(1):78-86
It is essential to select a minimal number of relevant genes from microarray data while maximizing classification accuracy for the development of inexpensive diagnostic tests. However, it is intractable to simultaneously optimize gene selection and classification accuracy that is a large parameter optimization problem. We propose an efficient evolutionary approach to gene selection from microarray data which can be combined with the optimal design of various multiclass classifiers. The proposed method (named GeneSelect) consists of three parts which are fully cooperated: an efficient encoding scheme of candidate solutions, a generalized fitness function, and an intelligent genetic algorithm (IGA). An existing hybrid approach based on genetic algorithm and maximum likelihood classification (GA/MLHD) is proposed to select a small number of relevant genes for accurate classification of samples. To evaluate the performance of GeneSelect, the gene selection is combined with the same maximum likelihood classification (named IGA/MLHD) for convenient comparisons. The performance of IGA/MLHD is applied to 11 cancer-related human gene expression datasets. The simulation results show that IGA/MLHD is superior to GA/MLHD in terms of the number of selected genes, classification accuracy, and robustness of selected genes and accuracy.  相似文献   

20.
A new program called GAMMA (genetic algorithm for multiple molecule alignment) has been developed for the superimposition of several three-dimensional chemical structures. Superimposition of molecules and evaluation of structural similarity is an important task in drug design and pharmaceutical research. Similarities of compounds are determined by this program either based on their structural or their physicochemical properties by defining different matching criteria. These matching criteria are atomic properties such as atomic number or partial atomic charges. The program is based on a combination of a genetic algorithm with a numerical optimization process. A major goal of this hybrid procedure is to address the conformational flexibility of ligand molecules adequately. Thus, only one conformation per structure is necessary and the program can work even when only one conformation of a compound is stored in a database. The genetic algorithm optimizes in a nondeterministic process the size and the geometric fit of the overlay. The geometric fit of the conformations is further improved by changing torsional angles combining the genetic algorithm and the directed tweak method. The determination of the fitness of a superimposition is based on the Pareto optimization. As an application the superimposition of a set of Cytochrome P450c17 enzyme inhibitors has been performed.Electronic Supplementary Material available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号