首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In order to overcome the problems of poor understandability of the pattern recognition-based transient stability assessment (PRTSA) methods, a new rule extraction method based on extreme learning machine (ELM) and an improved Ant-miner (IAM) algorithm is presented in this paper. First, the basic principles of ELM and Ant-miner algorithm are respectively introduced. Then, based on the selected optimal feature subset, an example sample set is generated by the trained ELM-based PRTSA model. And finally, a set of classification rules are obtained by IAM algorithm to replace the original ELM network. The novelty of this proposal is that transient stability rules are extracted from an example sample set generated by the trained ELM-based transient stability assessment model by using IAM algorithm. The effectiveness of the proposed method is shown by the application results on the New England 39-bus power system and a practical power system — the southern power system of Hebei province.  相似文献   

2.
MOTIVATION: Evaluating the accuracy of predicted models is critical for assessing structure prediction methods. Because this problem is not trivial, a large number of different assessment measures have been proposed by various authors, and it has already become an active subfield of research (Moult et al. (1997,1999) and CAFASP (Fischer et al. 1999) prediction experiments have demonstrated that it has been difficult to choose one single, 'best' method to be used in the evaluation. Consequently, the CASP3 evaluation was carried out using an extensive set of especially developed numerical measures, coupled with human-expert intervention. As part of our efforts towards a higher level of automation in the structure prediction field, here we investigate the suitability of a fully automated, simple, objective, quantitative and reproducible method that can be used in the automatic assessment of models in the upcoming CAFASP2 experiment. Such a method should (a) produce one single number that measures the quality of a predicted model and (b) perform similarly to human-expert evaluations. RESULTS: MaxSub is a new and independently developed method that further builds and extends some of the evaluation methods introduced at CASP3. MaxSub aims at identifying the largest subset of C(alpha) atoms of a model that superimpose 'well' over the experimental structure, and produces a single normalized score that represents the quality of the model. Because there exists no evaluation method for assessment measures of predicted models, it is not easy to evaluate how good our new measure is. Even though an exact comparison of MaxSub and the CASP3 assessment is not straightforward, here we use a test-bed extracted from the CASP3 fold-recognition models. A rough qualitative comparison of the performance of MaxSub vis-a-vis the human-expert assessment carried out at CASP3 shows that there is a good agreement for the more accurate models and for the better predicting groups. As expected, some differences were observed among the medium to poor models and groups. Overall, the top six predicting groups ranked using the fully automated MaxSub are also the top six groups ranked at CASP3. We conclude that MaxSub is a suitable method for the automatic evaluation of models.  相似文献   

3.
A new machine learning method referred to as F-score_ELM was proposed to classify the lying and truth-telling using the electroencephalogram (EEG) signals from 28 guilty and innocent subjects. Thirty-one features were extracted from the probe responses from these subjects. Then, a recently-developed classifier called extreme learning machine (ELM) was combined with F-score, a simple but effective feature selection method, to jointly optimize the number of the hidden nodes of ELM and the feature subset by a grid-searching training procedure. The method was compared to two classification models combining principal component analysis with back-propagation network and support vector machine classifiers. We thoroughly assessed the performance of these classification models including the training and testing time, sensitivity and specificity from the training and testing sets, as well as network size. The experimental results showed that the number of the hidden nodes can be effectively optimized by the proposed method. Also, F-score_ELM obtained the best classification accuracy and required the shortest training and testing time.  相似文献   

4.
In this paper, the recently developed Extreme Learning Machine (ELM) is used for direct multicategory classification problems in the cancer diagnosis area. ELM avoids problems like local minima, improper learning rate and overfitting commonly faced by iterative learning methods and completes the training very fast. We have evaluated the multi-category classification performance of ELM on three benchmark microarray datasets for cancer diagnosis, namely, the GCM dataset, the Lung dataset and the Lymphoma dataset. The results indicate that ELM produces comparable or better classification accuracies with reduced training time and implementation complexity compared to artificial neural networks methods like conventional back-propagation ANN, Linder's SANN, and Support Vector Machine methods like SVM-OVO and Ramaswamy's SVM-OVA. ELM also achieves better accuracies for classification of individual categories.  相似文献   

5.
Phosphorylation is catalyzed by protein kinases and is irreplaceable in regulating biological processes. Identification of phosphorylation sites with their corresponding kinases contributes to the understanding of molecular mechanisms. Mass spectrometry analysis of phosphor-proteomes generates a large number of phosphorylated sites. However, experimental methods are costly and time-consuming, and most phosphorylation sites determined by experimental methods lack kinase information. Therefore, computational methods are urgently needed to address the kinase identification problem. To this end, we propose a new kernel-based machine learning method called Supervised Laplacian Regularized Least Squares (SLapRLS), which adopts a new method to construct kernels based on the similarity matrix and minimizes both structure risk and overall inconsistency between labels and similarities. The results predicted using both Phospho.ELM and an additional independent test dataset indicate that SLapRLS can more effectively identify kinases compared to other existing algorithms.  相似文献   

6.
Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient.  相似文献   

7.
On the basis of Bayesian probabilistic inference, Gaussian process (GP) is a powerful machine learning method for nonlinear classification and regression, but has only very limited applications in the new areas of computational vaccinology and immunoinformatics. In the current work, we present a paradigmatic study of using GP regression technique to quantitatively model and predict the binding affinities of over 7000 immunodominant peptide epitopes to six types of human major histocompatibility complex (MHC) proteins. In this procedure, the sequence patterns of diverse peptides are characterized quantitatively and the resulting variables are then correlated with the experimentally measured affinities between different MHC and their peptide ligands, by using a linearity- and nonlinearity-hybrid GP approach. We also make systematical comparisons between the GP and two sophisticated modeling methods as partial least square (PLS) regression and support vector machine (SVM) with respect to their fitting ability, predictive power and generalization capability. The results suggest that GP could be a new and effective tool for the modeling and prediction of MHC-peptide interactions and would be promising in the field of computer-aided vaccine design (CAVD).  相似文献   

8.
To achieve high assessment accuracy for credit risk, a novel multistage deep belief network (DBN) based extreme learning machine (ELM) ensemble learning methodology is proposed. In the proposed methodology, three main stages, i.e., training subsets generation, individual classifiers training and final ensemble output, are involved. In the first stage, bagging sampling algorithm is applied to generate different training subsets for guaranteeing enough training data. Second, the ELM, an effective AI forecasting tool with the unique merits of time-saving and high accuracy, is utilized as the individual classifier, and diverse ensemble members can be accordingly formulated with different subsets and different initial conditions. In the final stage, the individual results are fused into final classification output via the DBN model with sufficient hidden layers, which can effectively capture the valuable information hidden in ensemble members. For illustration and verification, the experimental study on one publicly available credit risk dataset is conducted, and the results show the superiority of the proposed multistage DBN-based ELM ensemble learning paradigm in terms of high classification accuracy.  相似文献   

9.
Zhou H  Skolnick J 《Proteins》2008,71(3):1211-1218
In this work, we develop a fully automated method for the quality assessment prediction of protein structural models generated by structure prediction approaches such as fold recognition servers, or ab initio methods. The approach is based on fragment comparisons and a consensus C(alpha) contact potential derived from the set of models to be assessed and was tested on CASP7 server models. The average Pearson linear correlation coefficient between predicted quality and model GDT-score per target is 0.83 for the 98 targets, which is better than those of other quality assessment methods that participated in CASP7. Our method also outperforms the other methods by about 3% as assessed by the total GDT-score of the selected top models.  相似文献   

10.
Protein phosphorylation is one of the essential posttranslation modifications playing a vital role in the regulation of many fundamental cellular processes. We propose a LightGBM-based computational approach that uses evolutionary, geometric, sequence environment, and amino acid-specific features to decipher phosphate binding sites from a protein sequence. Our method, while compared with other existing methods on 2429 protein sequences taken from standard Phospho.ELM (P.ELM) benchmark data set featuring 11 organisms reports a higher F1 score = 0.504 (harmonic mean of the precision and recall) and ROC AUC = 0.836 (area under the curve of the receiver operating characteristics). The computation time of our proposed approach is much less than that of the recently developed deep learning-based framework. Structural analysis on selected protein sequences informs that our prediction is the superset of the phosphorylation sites, as mentioned in P.ELM data set. The foundation of our scheme is manual feature engineering and a decision tree-based classification. Hence, it is intuitive, and one can interpret the final tree as a set of rules resulting in a deeper understanding of the relationships between biophysical features and phosphorylation sites. Our innovative problem transformation method permits more control over precision and recall as is demonstrated by the fact that if we incorporate output probability of the existing deep learning framework as an additional feature, then our prediction improves (F1 score = 0.546; ROC AUC = 0.849). The implementation of our method can be accessed at http://cse.iitkgp.ac.in/~pralay/resources/PPSBoost/ and is mirrored at https://cosmos.iitkgp.ac.in/PPSBoost .  相似文献   

11.

Background

Effective and accurate diagnosis of attention-deficit/hyperactivity disorder (ADHD) is currently of significant interest. ADHD has been associated with multiple cortical features from structural MRI data. However, most existing learning algorithms for ADHD identification contain obvious defects, such as time-consuming training, parameters selection, etc. The aims of this study were as follows: (1) Propose an ADHD classification model using the extreme learning machine (ELM) algorithm for automatic, efficient and objective clinical ADHD diagnosis. (2) Assess the computational efficiency and the effect of sample size on both ELM and support vector machine (SVM) methods and analyze which brain segments are involved in ADHD.

Methods

High-resolution three-dimensional MR images were acquired from 55 ADHD subjects and 55 healthy controls. Multiple brain measures (cortical thickness, etc.) were calculated using a fully automated procedure in the FreeSurfer software package. In total, 340 cortical features were automatically extracted from 68 brain segments with 5 basic cortical features. F-score and SFS methods were adopted to select the optimal features for ADHD classification. Both ELM and SVM were evaluated for classification accuracy using leave-one-out cross-validation.

Results

We achieved ADHD prediction accuracies of 90.18% for ELM using eleven combined features, 84.73% for SVM-Linear and 86.55% for SVM-RBF. Our results show that ELM has better computational efficiency and is more robust as sample size changes than is SVM for ADHD classification. The most pronounced differences between ADHD and healthy subjects were observed in the frontal lobe, temporal lobe, occipital lobe and insular.

Conclusion

Our ELM-based algorithm for ADHD diagnosis performs considerably better than the traditional SVM algorithm. This result suggests that ELM may be used for the clinical diagnosis of ADHD and the investigation of different brain diseases.  相似文献   

12.
In order to make renewable fuels and chemicals from microbes, new methods are required to engineer microbes more intelligently. Computational approaches, to engineer strains for enhanced chemical production typically rely on detailed mechanistic models (e.g., kinetic/stoichiometric models of metabolism)—requiring many experimental datasets for their parameterization—while experimental methods may require screening large mutant libraries to explore the design space for the few mutants with desired behaviors. To address these limitations, we developed an active and machine learning approach (ActiveOpt) to intelligently guide experiments to arrive at an optimal phenotype with minimal measured datasets. ActiveOpt was applied to two separate case studies to evaluate its potential to increase valine yields and neurosporene productivity in Escherichia coli. In both the cases, ActiveOpt identified the best performing strain in fewer experiments than the case studies used. This work demonstrates that machine and active learning approaches have the potential to greatly facilitate metabolic engineering efforts to rapidly achieve its objectives.  相似文献   

13.
One of the principal problems facing palaeodemography is age estimation in adult skeletons and the centrist tendency that affects many age estimation methods by artificially increasing the proportion of individuals in the 30–45-year age category. Several recent publications have indicated that cementum annulations are significantly correlated with known age of extraction or death. This study addresses the question of how demographic dynamics are altered for an archaeological sample when cementum-based age estimates are used as opposed to those obtained via conventional macroscopic methods. Age pyramids were constructed and demographic profiles were compared for the early Holocene skeletal population from Damdama (India). The results demonstrate that the use of cementum annulations for age estimation in only a subset of the skeletal sample has a significant impact on the demographic profile with regard to specific parameters such as mean age at death and life expectancy at birth. This confirms the importance of using cementum annulations to refine age estimates in archaeological samples, which, when combined with a fertility-centred approach to demography, can provide new insights into population dynamics in the past.  相似文献   

14.
DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.  相似文献   

15.
Extreme learning machine (ELM) is a novel and fast learning method to train single layer feed-forward networks. However due to the demand for larger number of hidden neurons, the prediction speed of ELM is not fast enough. An evolutionary based ELM with differential evolution (DE) has been proposed to reduce the prediction time of original ELM. But it may still get stuck at local optima. In this paper, a novel algorithm hybridizing DE and metaheuristic coral reef optimization (CRO), which is called differential evolution coral reef optimization (DECRO), is proposed to balance the explorative power and exploitive power to reach better performance. The thought and the implement of DECRO algorithm are discussed in this article with detail. DE, CRO and DECRO are applied to ELM training respectively. Experimental results show that DECRO-ELM can reduce the prediction time of original ELM, and obtain better performance for training ELM than both DE and CRO.  相似文献   

16.
17.
MOTIVATION: Small non-coding RNA (ncRNA) genes play important regulatory roles in a variety of cellular processes. However, detection of ncRNA genes is a great challenge to both experimental and computational approaches. In this study, we describe a new approach called positive sample only learning (PSoL) to predict ncRNA genes in the Escherichia coli genome. Although PSoL is a machine learning method for classification, it requires no negative training data, which, in general, is hard to define properly and affects the performance of machine learning dramatically. In addition, using the support vector machine (SVM) as the core learning algorithm, PSoL can integrate many different kinds of information to improve the accuracy of prediction. Besides the application of PSoL for predicting ncRNAs, PSoL is applicable to many other bioinformatics problems as well. RESULTS: The PSoL method is assessed by 5-fold cross-validation experiments which show that PSoL can achieve about 80% accuracy in recovery of known ncRNAs. We compared PSoL predictions with five previously published results. The PSoL method has the highest percentage of predictions overlapping with those from other methods.  相似文献   

18.
Papermaking wastewater accounts for a large proportion of industrial wastewater, and it is essential to obtain accurate and reliable effluent indices in real-time. Considering the complexity, nonlinearity, and time variability of wastewater treatment processes, a dynamic kernel extreme learning machine (DKELM) method is proposed to predict the key quality indices of effluent chemical oxygen demand (COD). A time lag coefficient is introduced and a kernel function is embedded into the extreme learning machine (ELM) to extract dynamic information and obtain better prediction accuracy. A case study for modeling a wastewater treatment process is demonstrated to evaluate the performance of the proposed DKELM. The results illustrate that both training and prediction accuracy of the DKELM model is superior to other models. For the prediction of the quality indices of effluent COD, the determinate coefficient of the DKELM model is increased by 27.52 %, 21.36 %, 10.42 %, and 10.81 %, compared with partial least squares, ELM, dynamic ELM, and kernel ELM, respectively.  相似文献   

19.
Abstract: Obtaining reliable results from life-cycle assessment studies is often quite difficult because life-cycle inventory (LCI) data are usually erroneous, incomplete, and even physically meaningless. The real data must satisfy the laws of thermodynamics, so the quality of LCI data may be enhanced by adjusting them to satisfy these laws. This is not a new idea, but a formal thermodynamically sound and statistically rigorous approach for accomplishing this task is not yet available. This article proposes such an approach based on methods for data rectification developed in process systems engineering. This approach exploits redundancy in the available data and models and solves a constrained optimization problem to remove random errors and estimate some missing values. The quality of the results and presence of gross errors are determined by statistical tests on the constraints and measurements. The accuracy of the rectified data is strongly dependent on the accuracy and completeness of the available models, which should capture information such as the life-cycle network, stream compositions, and reactions. Such models are often not provided in LCI databases, so the proposed approach tackles many new challenges that are not encountered in process data rectification. An iterative approach is developed that relies on increasingly detailed information about the life-cycle processes from the user. A comprehensive application of the method to the chlor-alkali inventory being compiled by the National Renewable Energy Laboratory demonstrates the benefits and challenges of this approach.  相似文献   

20.
Accurate retention time (RT) prediction is important for spectral library-based analysis in data-independent acquisition mass spectrometry-based proteomics. The deep learning approach has demonstrated superior performance over traditional machine learning methods for this purpose. The transformer architecture is a recent development in deep learning that delivers state-of-the-art performance in many fields such as natural language processing, computer vision, and biology. We assess the performance of the transformer architecture for RT prediction using datasets from five deep learning models Prosit, DeepDIA, AutoRT, DeepPhospho, and AlphaPeptDeep. The experimental results on holdout datasets and independent datasets exhibit state-of-the-art performance of the transformer architecture. The software and evaluation datasets are publicly available for future development in the field.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号