首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An ensemble performs well when the component classifiers are diverse yet accurate, so that the failure of one is compensated for by others. A number of methods have been investigated for constructing ensemble in which some of them train classifiers with the generated patterns. This study investigates a new technique of training pattern generation. The method alters input feature values of some patterns using the values of other patterns to generate different patterns for different classifiers. The effectiveness of neural network ensemble based on the proposed technique was evaluated using a suite of 25 benchmark classification problems, and was found to achieve performance better than or competitive with related conventional methods. Experimental investigation of different input values alteration techniques finds that alteration with pattern values in the same class is better for generalization, although other alteration techniques may offer more diversity.  相似文献   

2.
The safety of human–machine systems can be indirectly evaluated based on operator’s cognitive load levels at each temporal instant. However, relevant features of cognitive states are hidden behind in multiple sources of cortical neural responses. In this study, we developed a novel neural network ensemble, SE-SDAE, based on stacked denoising autoencoders (SDAEs) which identify different levels of cognitive load by electroencephalography (EEG) signals. To improve the generalization capability of the ensemble framework, a stacking-based approach is adopted to fuse the abstracted EEG features from activations of deep-structured hidden layers. In particular, we also combine multiple K-nearest neighbor and naive Bayesian classifiers with SDAEs to generate a heterogeneous classification committee to enhance ensemble’s diversity. Finally, we validate the proposed SE-SDAE by comparing its performance with mainstream pattern classifiers for cognitive load evaluation to show its effectiveness.  相似文献   

3.
The objective of this study was to evaluate the performance of stacked species distribution models in predicting the alpha and gamma species diversity patterns of two important plant clades along elevation in the Andes. We modelled the distribution of the species in the Anthurium genus (53 species) and the Bromeliaceae family (89 species) using six modelling techniques. We combined all of the predictions for the same species in ensemble models based on two different criteria: the average of the rescaled predictions by all techniques and the average of the best techniques. The rescaled predictions were then reclassified into binary predictions (presence/absence). By stacking either the original predictions or binary predictions for both ensemble procedures, we obtained four different species richness models per taxa. The gamma and alpha diversity per elevation band (500 m) was also computed. To evaluate the prediction abilities for the four predictions of species richness and gamma diversity, the models were compared with the real data along an elevation gradient that was independently compiled by specialists. Finally, we also tested whether our richness models performed better than a null model of altitudinal changes of diversity based on the literature. Stacking of the ensemble prediction of the individual species models generated richness models that proved to be well correlated with the observed alpha diversity richness patterns along elevation and with the gamma diversity derived from the literature. Overall, these models tend to overpredict species richness. The use of the ensemble predictions from the species models built with different techniques seems very promising for modelling of species assemblages. Stacking of the binary models reduced the over-prediction, although more research is needed. The randomisation test proved to be a promising method for testing the performance of the stacked models, but other implementations may still be developed.  相似文献   

4.
The ability to detect and discriminate attributes of sounds improves with practice. Determining how such auditory learning generalizes to stimuli and tasks that are not encountered during training can guide the development of training regimens used to improve hearing abilities in particular populations as well as provide insight into the neural mechanisms mediating auditory performance. Here we review the newly emerging literature on the generalization of auditory learning, focusing on behavioural investigations of generalization on basic auditory tasks in human listeners. The review reveals a variety of generalization patterns across different trained tasks that can not be summarized with a simple rule, and a diversity of views about the definition, evaluation and interpretation of generalization.  相似文献   

5.
One outstanding and unsolved challenge in ecology and conservation biology is to understand how pollinator diversity affects plant performance. Here, we provide evidence of the functional role of pollination diversity in a plant species, Erysimum mediohispanicum (Brassicaceae). Pollinator abundance, richness and diversity as well as plant reproduction and recruitment were determined in eight plant populations. We found that E. mediohispanicum was generalized both at the regional and local (population) scale, since its flowers were visited by more than 100 species of insects with very different morphology, size and behaviour. However, populations differed in the degree of generalization. Generalization correlated with pollinator abundance and plant population size, but not with habitat, ungulate damage intensity, altitude or spatial location. More importantly, the degree of generalization had significant consequences for plant reproduction and recruitment. Plants from populations with intermediate generalization produced more seeds than plants from populations with low or high degrees of generalization. These differences were not the result of differences in number of flowers produced per plant. In addition, seedling emergence in a common garden was highest in plants from populations with intermediate degree of generalization. This outcome suggests the existence of an optimal level of generalizations even for generalized plant species.  相似文献   

6.
Sahli HF  Conner JK 《Oecologia》2006,148(3):365-372
Despite the development of diversity indices in community ecology that incorporate both richness and evenness, pollination biologists commonly use only pollinator richness to estimate generalization. Similarly, while pollination biologists have stressed the utility of pollinator importance, incorporating both pollinator abundance and effectiveness, importance values have not been included in estimates of generalization in pollination systems. In this study, we estimated pollinator generalization for 17 plant species using Simpson’s diversity index, which includes richness and evenness. We compared these estimates with estimates based on only pollinator richness, and compared diversity estimates calculated using importance data with those using only visitation data. We found that pollinator richness explains only 57–65% of the variation in diversity, and that, for most plant species, pollinator importance was determined primarily by differences in visitation rather than by differences in effectiveness. While simple richness may suffice for broad comparisons of pollinator generalization, measures that incorporate evenness will provide a much more accurate understanding of generalization. Although incorporating labor-intensive measurements of pollinator effectiveness are less necessary for broad surveys, effectiveness estimates will be important for detailed studies of some plant species. Unfortunately, at this point it is impossible to predict a priori which species these are.  相似文献   

7.
Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification.  相似文献   

8.
Pan XY  Tian Y  Huang Y  Shen HB 《Genomics》2011,97(5):257-264
Epistatic miniarray profiling (E-MAP) is a powerful tool for analyzing gene functions and their biological relevance. However, E-MAP data suffers from large proportion of missing values, which often results in misleading and biased analysis results. It is urgent to develop effective missing value estimation methods for E-MAP. Although several independent algorithms can be applied to achieve this goal, their performance varies significantly on different datasets, indicating different algorithms having their own advantages and disadvantages. In this paper, we propose a novel ensemble approach EMDI based on the high-level diversity to impute missing values that consists of two global and four local base estimators. Experimental results on five E-MAP datasets show that EMDI outperforms all single base algorithms, demonstrating an appropriate combination providing complementarity among different methods. Comparison results between several fusion strategies also demonstrate that the proposed high-level diversity scheme is superior to others. EMDI is freely available at www.csbio.sjtu.edu.cn/bioinf/EMDI/.  相似文献   

9.
10.
MOTIVATION: In many fields of pattern recognition, combination has proved efficient to increase the generalization performance of individual prediction methods. Numerous systems have been developed for protein secondary structure prediction, based on different principles. Finding better ensemble methods for this task may thus become crucial. Furthermore, efforts need to be made to help the biologist in the post-processing of the outputs. RESULTS: An ensemble method has been designed to post-process the outputs of discriminant models, in order to obtain an improvement in prediction accuracy while generating class posterior probability estimates. Experimental results establish that it can increase the recognition rate of protein secondary structure prediction methods that provide inhomogeneous scores, even though their individual prediction successes are largely different. This combination thus constitutes a help for the biologist, who can use it confidently on top of any set of prediction methods. Moreover, the resulting estimates can be used in various ways, for instance to determine which areas in the sequence are predicted with a given level of reliability. AVAILABILITY: The prediction is freely available over the Internet on the Network Protein Sequence Analysis (NPS@) WWW server at http://pbil.ibcp.fr/NPSA/npsa_server.ht ml. The source code of the combiner can be obtained on request for academic use.  相似文献   

11.
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.  相似文献   

12.
13.

Background

microRNAs (miRNAs) are short regulatory RNAs that are involved in several diseases, including cancers. Identifying miRNA functions is very important in understanding disease mechanisms and determining the efficacy of drugs. An increasing number of computational methods have been developed to explore miRNA functions by inferring the miRNA-mRNA regulatory relationships from data. Each of the methods is developed based on some assumptions and constraints, for instance, assuming linear relationships between variables. For such reasons, computational methods are often subject to the problem of inconsistent performance across different datasets. On the other hand, ensemble methods integrate the results from individual methods and have been proved to outperform each of their individual component methods in theory.

Results

In this paper, we investigate the performance of some ensemble methods over the commonly used miRNA target prediction methods. We apply eight different popular miRNA target prediction methods to three cancer datasets, and compare their performance with the ensemble methods which integrate the results from each combination of the individual methods. The validation results using experimentally confirmed databases show that the results of the ensemble methods complement those obtained by the individual methods and the ensemble methods perform better than the individual methods across different datasets. The ensemble method, Pearson+IDA+Lasso, which combines methods in different approaches, including a correlation method, a causal inference method, and a regression method, is the best performed ensemble method in this study. Further analysis of the results of this ensemble method shows that the ensemble method can obtain more targets which could not be found by any of the single methods, and the discovered targets are more statistically significant and functionally enriched. The source codes, datasets, miRNA target predictions by all methods, and the ground truth for validation are available in the Supplementary materials.  相似文献   

14.
A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance.  相似文献   

15.
The current distribution of the endangered Mexican beech [Fagus grandifolia var. mexicana (Martinez) Little] is restricted to relict isolated populations in small remnants of montane cloud forest in northeastern Mexico, and little is known about its associated biota. We sampled bolete diversity in two of these monospecific forests in the state of Hidalgo, Mexico. We compared alpha diversity, including species richness and ensemble structure, and analyzed beta diversity (dissimilarity in species composition) between forests. We found 26 bolete species, five of which are probably new. Species diversity and evenness were similar between forests. Beta diversity was low, and the similarities of bolete samples from within and between forests were not significantly different. These results support the idea that the two forests share a single bolete ensemble with a common history. In contrast, cumulative species richness differed between the forests, implying that factors other than the mere presence of the host species have contributed to shaping the biodiversity of ectomycorrhizal fungi in relict Mexican beech forests.  相似文献   

16.
Nanni L  Lumini A 《Amino acids》2009,36(2):167-175
It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is approximately 0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.  相似文献   

17.

Background

The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data.

Results

In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification.

Conclusions

The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.
  相似文献   

18.
Habitat suitability models, usually referred to as species distribution models (SDMs), are widely applied in ecology for many purposes, including species conservation, habitat discovery, and gain evolutionary insights by estimating the distribution of species. Machine learning algorithms as well as statistical models have been recently used to predict the distribution of species. However, they seemed to have some limitations due to the data and the models used. Therefore, this study proposes a novel approach for assessing habitat suitability based on ensemble learning techniques. Three heterogeneous ensembles were built using the stacked generalization method to model the distribution of four wheatear species (Oenanthe deserti, Oenanthe leucopyga, Oenanthe leucura, and Oenanthe oenanthe) located in Morocco. Initially, a set of base-learners were constructed by virtue of training for each specie's dataset six machine learning algorithms (Multi-Layer Perceptron (MLP), Support Vector Classifier (SVC), K-nearest neighbors (KNN), Decision Trees (DT), Gradient Boosting Classifier (GB), and Random Forest (RF)). Then, the predictions of these base learners were fed as training data to train three meta-learners (Logistic Regression (LR), SVC, and MLP). To evaluate and assess the performance of the proposed approaches, we used: (1) six performance criteria (accuracy, recall, precision, F1-score, AUC, and TSS), (2) Borda Count (BC) ranking method based on multiple criteria to rank the best-performing models, and (3) Scott Knott (SK) test to statistically compare the performance of the presented models. The results based on the six-evaluation metrics showed that stacked ensembles outperformed their singles in all species datasets, and the stacked model with SVC as a meta-learner outperformed the other two ensembles. The results showed the potential of using ensemble learning techniques to model species distribution and recommend the use of the stacked generalization technique as a combination strategy since it gave better results compared to single models in four wheatear species datasets. Moreover, to assess the impact of future climate changes on the distribution of the four wheatear species, the best-performing distribution model was selected and projected into the current and future climatic conditions. The distributions of the Moroccan wheatear birds were found to be slightly affected by future climate changes.  相似文献   

19.
Ensemble habitat selection modeling is becoming a popular approach among ecologists to answer different questions. Since we are still in the early stages of development and application of ensemble modeling, there remain many questions regarding performance and parameterization. One important gap, which this paper addresses, is how the number of background points used to train models influences the performance of the ensemble model. We used an empirical presence-only dataset and three different selections of background points to train scale-optimized habitat selection models using six modeling algorithms (GLM, GAM, MARS, ANN, Random Forest, and MaxEnt). We tested four ensemble models using different combinations of the component models: (a) equal numbers of background points and presences, (b) background points equaled ten times the number of presences, (c) 10,000 background points, and (d) optimized background points for each component model. Among regression-based approaches, MARS performed best when built with 10,000 background points. Among machine learning models, RF performed the best when built with equal presences and background points. Among the four ensemble models, AUC indicated that the best performing model was the ensemble with each component model including the optimized number of background points, while TSS increased as the number of background points models increased. We found that an ensemble of models, each trained with an optimal number of background points, outperformed ensembles of models trained with the same number of background points, although differences in performance were slight. When using a single modeling method, RF with equal number of presences and background points can perform better than an ensemble model, but the performance fluctuates when the number of background points is not properly selected. On the other hand, ensemble modeling provides consistently high accuracy regardless of background point sampling approach. Further, optimizing the number of background points for each component model within an ensemble model can provide the best model improvement. We suggest evaluating more models across multiple species to investigate how background point selection might affect ensemble models in different scenarios.  相似文献   

20.
Artificial neural network (ANN) models have been widely used in environmental modeling with considerable success. To improve the reliability of ANN models, ensemble simulations were applied in this study to develop four ANN ensemble models for chlorophyll a simulation in the largest freshwater lake (Lake Poyang) in China. Reliability (evaluated by model fit and stability) of these ANN ensemble models was compared with that of single ANN models from ensemble members. The model fit of these single ANN models varied significantly over repeated runs, indicating the unstable performance of the single ANN models. Comparing with the single ANN models, the ANN ensemble models showed a better model fit and stability, implying the potential of ensemble simulation in achieving a more reliable model. An ensemble size of 30 was adequate for the ANN ensemble models to achieve a good model fit, while an ensemble size of 50 was adequate to achieve good stability. This case study highlighted both the necessity and potential of the ensemble simulation approach to achieve a reliable ANN model with good model fit and stability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号