首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The assessment of the risk of default on credit is important for financial institutions. Different Artificial Neural Networks (ANN) have been suggested to tackle the credit scoring problem, however, the obtained error rates are often high. In the search for the best ANN algorithm for credit scoring, this paper contributes with the application of an ANN Training Algorithm inspired by the neurons' biological property of metaplasticity. This algorithm is especially efficient when few patterns of a class are available, or when information inherent to low probability events is crucial for a successful application, as weight updating is overemphasized in the less frequent activations than in the more frequent ones. Two well-known and readily available such as: Australia and German data sets has been used to test the algorithm. The results obtained by AMMLP shown have been superior to state-of-the-art classification algorithms in credit scoring.  相似文献   

2.
In the medical domain, it is very significant to develop a rule-based classification model. This is because it has the ability to produce a comprehensible and understandable model that accounts for the predictions. Moreover, it is desirable to know not only the classification decisions but also what leads to these decisions. In this paper, we propose a novel dynamic quantitative rule-based classification model, namely DQB, which integrates quantitative association rule mining and the Artificial Bee Colony (ABC) algorithm to provide users with more convenience in terms of understandability and interpretability via an accurate class quantitative association rule-based classifier model. As far as we know, this is the first attempt to apply the ABC algorithm in mining for quantitative rule-based classifier models. In addition, this is the first attempt to use quantitative rule-based classification models for classifying microarray gene expression profiles. Also, in this research we developed a new dynamic local search strategy named DLS, which is improved the local search for artificial bee colony (ABC) algorithm. The performance of the proposed model has been compared with well-known quantitative-based classification methods and bio-inspired meta-heuristic classification algorithms, using six gene expression profiles for binary and multi-class cancer datasets. From the results, it can be concludes that a considerable increase in classification accuracy is obtained for the DQB when compared to other available algorithms in the literature, and it is able to provide an interpretable model for biologists. This confirms the significance of the proposed algorithm in the constructing a classifier rule-based model, and accordingly proofs that these rules obtain a highly qualified and meaningful knowledge extracted from the training set, where all subset of quantitive rules report close to 100% classification accuracy with a minimum number of genes. It is remarkable that apparently (to the best of our knowledge) several new genes were discovered that have not been seen in any past studies. For the applicability demand, based on the results acqured from microarray gene expression analysis, we can conclude that DQB can be adopted in a different real world applications with some modifications.  相似文献   

3.
In the 'indirect' method of detecting genetic associations between a trait and a DNA variant, we type several markers in a gene or chromosome region of linkage disequilibrium. If there is association between markers and the trait, we presume the existence of one or more causal polymorphisms in the region. In order to obtain a sufficiently dense set of markers it will almost always be necessary to use single nucleotide polymorphisms (SNPs). Although there is an emerging literature on methods for choosing an optimal set of 'haplotype tag SNPs' (htSNPs) to detect association between a genetic region and a trait, less attention has been given to the problem of how such studies should be analysed when completed, and how the initial data which was used to select the htSNPs should be incorporated into the analysis. This paper discusses this problem for both population- and family-based association studies. The role of the R2 measure of association between a causal locus and various methods of scoring of marker haplotypes is highlighted. In most cases, the simplest method of scoring (locus coding), which does not require phase resolution, is shown generally to be more powerful than scoring methods that include haplotype information. A new 'multi-locus TDT' is also proposed.  相似文献   

4.
A key challenge for ecologists is to quantify, explain and predict the ecology and behaviour of animals from knowledge of their basic physiology. Compared to our knowledge of many other types of distribution and behaviour, and how these are linked to individual function, we have a poor level of understanding of the causal basis for orientation behaviours. Most explanations for patterns of animal orientation assume that animals will modify their exposure to environmental factors by altering their orientation. We used a keystone grazer on rocky shores, the limpet Cellana tramoserica, to test this idea. Manipulative experiments were done to evaluate whether orientation during emersion affected limpet desiccation or body temperature. Body temperature was determined from infrared thermography, a technique that minimises disturbance to the test organism. No causal relationships were found between orientation and (i) level of desiccation and (ii) their body temperature. These results add to the growing knowledge that responses to desiccation and thermal stress may be less important in modifying the behaviour of intertidal organisms than previously supposed and that thermoregulation does not always reflect patterns of animal orientation. Much of what we understand about orientation comes from studies of animals able to modify orientation over very short time scales. Our data suggests that for animals whose location is less flexible, orientation decisions may have less to do with responses to environmental factors and more to do with structural habitat properties or intrinsic individual attributes. Therefore we suggest future studies into processes affecting orientation must include organisms with differing levels of behavioural plasticity.  相似文献   

5.
树种多样性是生态学研究的重要内容,树木的种类和空间分布信息可有效服务于可持续森林管理。但在复杂林分条件下,获取高精度分类结果的难度大。而无人机遥感可获取局域超精细数据,为树种分类精度的提高提供了可能。基于可见光、高光谱、激光雷达等多源无人机遥感数据,探究其在亚热带林分条件下的树种分类潜力。研究发现:(1)随机森林分类器总体精度和各树种的F1分数最高,适合亚热带多树种的分类制图,其区分13种类别(8乔木,4草本)的总体精度为95.63%,Kappa系数为0.948;(2)多源数据的使用可以显著提高分类精度,全特征模型精度最高,且高光谱和激光雷达数据显著影响全特征模型分类精度,可见光纹理数据作用较小;(3)分类特征重要性从大到小排序为结构信息,植被指数,纹理信息,最小噪声变换分量。  相似文献   

6.
Microarray technology is becoming a powerful tool for clinical diagnosis, as it has potential to discover gene expression patterns that are characteristic for a particular disease. To date, this possibility has received much attention in the context of cancer research, especially in tumor classification. However, most published articles have concentrated on the development of binary classification methods while neglected ubiquitous multiclass problems. Unfortunately, only a few multiclass classification approaches have had poor predictive accuracy. In an effort to improve classification accuracy, we developed a novel multiclass microarray data classification method. First, we applied a "one versus rest-support vector machine" to classify the samples. Then the classification confidence of each testing sample was evaluated according to its distribution in feature space and some with poor confidence were extracted. Next, a novel strategy, which we named as "class priority estimation method based on centroid distance", was used to make decisions about categories for those poor confidence samples. This approach was tested on seven benchmark multiclass microarray datasets, with encouraging results, demonstrating effectiveness and feasibility.  相似文献   

7.
MOTIVATION: We focus on the prediction of disulfide bridges in proteins starting from their amino acid sequence and from the knowledge of the disulfide bonding state of each cysteine. The location of disulfide bridges is a structural feature that conveys important information about the protein main chain conformation and can therefore help towards the solution of the folding problem. Existing approaches based on weighted graph matching algorithms do not take advantage of evolutionary information. Recursive neural networks (RNN), on the other hand, can handle in a natural way complex data structures such as graphs whose vertices are labeled by real vectors, allowing us to incorporate multiple alignment profiles in the graphical representation of disulfide connectivity patterns. RESULTS: The core of the method is the use of machine learning tools to rank alternative disulfide connectivity patterns. We develop an ad-hoc RNN architecture for scoring labeled undirected graphs that represent connectivity patterns. In order to compare our algorithm with previous methods, we report experimental results on the SWISS-PROT 39 dataset. We find that using multiple alignment profiles allows us to obtain significant prediction accuracy improvements, clearly demonstrating the important role played by evolutionary information. AVAILABILITY: The Web interface of the predictor is available at http://neural.dsi.unifi.it/cysteines  相似文献   

8.
MOTIVATION: As more genomes are sequenced, the demand for fast gene classification techniques is increasing. To analyze a newly sequenced genome, first the genes are identified and translated into amino acid sequences which are then classified into structural or functional classes. The best-performing protein classification methods are based on protein homology detection using sequence alignment methods. Alignment methods have recently been enhanced by discriminative methods like support vector machines (SVMs) as well as by position-specific scoring matrices (PSSM) as obtained from PSI-BLAST. However, alignment methods are time consuming if a new sequence must be compared to many known sequences-the same holds for SVMs. Even more time consuming is to construct a PSSM for the new sequence. The best-performing methods would take about 25 days on present-day computers to classify the sequences of a new genome (20,000 genes) as belonging to just one specific class--however, there are hundreds of classes. Another shortcoming of alignment algorithms is that they do not build a model of the positive class but measure the mutual distance between sequences or profiles. Only multiple alignments and hidden Markov models are popular classification methods which build a model of the positive class but they show low classification performance. The advantage of a model is that it can be analyzed for chemical properties common to the class members to obtain new insights into protein function and structure. We propose a fast model-based recurrent neural network for protein homology detection, the 'Long Short-Term Memory' (LSTM). LSTM automatically extracts indicative patterns for the positive class, but in contrast to profile methods it also extracts negative patterns and uses correlations between all detected patterns for classification. LSTM is capable to automatically extract useful local and global sequence statistics like hydrophobicity, polarity, volume, polarizability and combine them with a pattern. These properties make LSTM complementary to alignment-based approaches as it does not use predefined similarity measures like BLOSUM or PAM matrices. RESULTS: We have applied LSTM to a well known benchmark for remote protein homology detection, where a protein must be classified as belonging to a SCOP superfamily. LSTM reaches state-of-the-art classification performance but is considerably faster for classification than other approaches with comparable classification performance. LSTM is five orders of magnitude faster than methods which perform slightly better in classification and two orders of magnitude faster than the fastest SVM-based approaches (which, however, have lower classification performance than LSTM). Only PSI-BLAST and HMM-based methods show comparable time complexity as LSTM, but they cannot compete with LSTM in classification performance. To test the modeling capabilities of LSTM, we applied LSTM to PROSITE classes and interpreted the extracted patterns. In 8 out of 15 classes, LSTM automatically extracted the PROSITE motif. In the remaining 7 cases alternative motifs are generated which give better classification results on average than the PROSITE motifs. AVAILABILITY: The LSTM algorithm is available from http://www.bioinf.jku.at/software/LSTM_protein/.  相似文献   

9.

Background

Detection and quantification of cyclic alternating patterns (CAP) components has the potential to serve as a disease bio-marker. Few methods exist to discriminate all the different CAP components, they do not present appropriate sensitivities, and often they are evaluated based on accuracy (AC) that is not an appropriate measure for imbalanced datasets.

Methods

We describe a knowledge discovery methodology in data (KDD) aiming the development of automatic CAP scoring approaches. Automatic CAP scoring was faced from two perspectives: the binary distinction between A-phases and B-phases, and also for multi-class classification of the different CAP components. The most important KDD stages are: extraction of 55 features, feature ranking/transformation, and classification. Classification is performed by (i) support vector machine (SVM), (ii) k-nearest neighbors (k-NN), and (iii) discriminant analysis. We report the weighted accuracy (WAC) that accounts for class imbalance.

Results

The study includes 30 subjects from the CAP Sleep Database of Physionet. The best alternative for the discrimination of the different A-phase subtypes involved feature ranking by the minimum redundancy maximum relevance algorithm (mRMR) and classification by SVM, with a WAC of 51%. Concerning the binary discrimination between A-phases and B-phases, k-NN with mRMR ranking achieved the best WAC of 80%.

Conclusions

We describe a KDD that, to the best of our knowledge, was for the first time applied to CAP scoring. In particular, the fully discrimination of the three different A-phases subtypes is a new perspective, since past works tried multi-class approaches but based on grouping of different sub-types. We also considered the weighted accuracy, in addition to simple accuracy, resulting in a more trustworthy performance assessment. Globally, better subtype sensitivities than other published approaches were achieved.
  相似文献   

10.
Wang Y  Wang YT  Jung TP 《PloS one》2012,7(5):e37665
Electroencephalogram (EEG)-based brain-computer interfaces (BCIs) often use spatial filters to improve signal-to-noise ratio of task-related EEG activities. To obtain robust spatial filters, large amounts of labeled data, which are often expensive and labor-intensive to obtain, need to be collected in a training procedure before online BCI control. Several studies have recently developed zero-training methods using a session-to-session scenario in order to alleviate this problem. To our knowledge, a state-to-state translation, which applies spatial filters derived from one state to another, has never been reported. This study proposes a state-to-state, zero-training method to construct spatial filters for extracting EEG changes induced by motor imagery. Independent component analysis (ICA) was separately applied to the multi-channel EEG in the resting and the motor imagery states to obtain motor-related spatial filters. The resultant spatial filters were then applied to single-trial EEG to differentiate left- and right-hand imagery movements. On a motor imagery dataset collected from nine subjects, comparable classification accuracies were obtained by using ICA-based spatial filters derived from the two states (motor imagery: 87.0%, resting: 85.9%), which were both significantly higher than the accuracy achieved by using monopolar scalp EEG data (80.4%). The proposed method considerably increases the practicality of BCI systems in real-world environments because it is less sensitive to electrode misalignment across different sessions or days and does not require annotated pilot data to derive spatial filters.  相似文献   

11.
To achieve high assessment accuracy for credit risk, a novel multistage deep belief network (DBN) based extreme learning machine (ELM) ensemble learning methodology is proposed. In the proposed methodology, three main stages, i.e., training subsets generation, individual classifiers training and final ensemble output, are involved. In the first stage, bagging sampling algorithm is applied to generate different training subsets for guaranteeing enough training data. Second, the ELM, an effective AI forecasting tool with the unique merits of time-saving and high accuracy, is utilized as the individual classifier, and diverse ensemble members can be accordingly formulated with different subsets and different initial conditions. In the final stage, the individual results are fused into final classification output via the DBN model with sufficient hidden layers, which can effectively capture the valuable information hidden in ensemble members. For illustration and verification, the experimental study on one publicly available credit risk dataset is conducted, and the results show the superiority of the proposed multistage DBN-based ELM ensemble learning paradigm in terms of high classification accuracy.  相似文献   

12.

Background  

Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification.  相似文献   

13.
The way that we interpret and interact with the world entails making decisions on the basis of available sensory evidence. Recent primate neurophysiology [1-6], human neuroimaging [7-13], and modeling experiments [14-19] have demonstrated that perceptual decisions are based on an integrative process in which sensory evidence accumulates over time until an internal decision bound is reached. Here we used repetitive transcranial magnetic stimulation (rTMS) to provide causal support for the role of the dorsolateral prefrontal cortex (DLPFC) in this integrative process. Specifically, we used a speeded perceptual categorization task designed to induce a time-dependent accumulation of sensory evidence through rapidly updating dynamic stimuli and found that disruption of the left DLPFC with low-frequency rTMS reduced accuracy and increased response times relative to a sham condition. Importantly, using the drift-diffusion model, we show that these behavioral effects correspond to a decrease in drift rate, a parameter describing the rate and thereby the efficiency of the sensory evidence integration in the decision process. These results provide causal evidence linking the DLPFC to the mechanism of evidence accumulation during perceptual decision making.  相似文献   

14.
MOTIVATION: Likelihood ratio approximants (LRA) have been widely used for model comparison in statistics. The present study was undertaken in order to explore their utility as a scoring (ranking) function in the classification of protein sequences. RESULTS: We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring methods (Smith-Waterman, BLAST, local alignment kernel and compression based distances) were compared on datasets designed to test sequence similarities between proteins distantly related in terms of structure or evolution. It was found that LRA-based scoring can significantly outperform simple scoring methods.  相似文献   

15.

Background

Brain state classification has been accomplished using features such as voxel intensities, derived from functional magnetic resonance imaging (fMRI) data, as inputs to efficient classifiers such as support vector machines (SVM) and is based on the spatial localization model of brain function. With the advent of the connectionist model of brain function, features from brain networks may provide increased discriminatory power for brain state classification.

Methodology/Principal Findings

In this study, we introduce a novel framework where in both functional connectivity (FC) based on instantaneous temporal correlation and effective connectivity (EC) based on causal influence in brain networks are used as features in an SVM classifier. In order to derive those features, we adopt a novel approach recently introduced by us called correlation-purged Granger causality (CPGC) in order to obtain both FC and EC from fMRI data simultaneously without the instantaneous correlation contaminating Granger causality. In addition, statistical learning is accelerated and performance accuracy is enhanced by combining recursive cluster elimination (RCE) algorithm with the SVM classifier. We demonstrate the efficacy of the CPGC-based RCE-SVM approach using a specific instance of brain state classification exemplified by disease state prediction. Accordingly, we show that this approach is capable of predicting with 90.3% accuracy whether any given human subject was prenatally exposed to cocaine or not, even when no significant behavioral differences were found between exposed and healthy subjects.

Conclusions/Significance

The framework adopted in this work is quite general in nature with prenatal cocaine exposure being only an illustrative example of the power of this approach. In any brain state classification approach using neuroimaging data, including the directional connectivity information may prove to be a performance enhancer. When brain state classification is used for disease state prediction, our approach may aid the clinicians in performing more accurate diagnosis of diseases in situations where in non-neuroimaging biomarkers may be unable to perform differential diagnosis with certainty.  相似文献   

16.
MOTIVATION: While processing of MHC class II antigens for presentation to helper T-cells is essential for normal immune response, it is also implicated in the pathogenesis of autoimmune disorders and hypersensitivity reactions. Sequence-based computational techniques for predicting HLA-DQ binding peptides have encountered limited success, with few prediction techniques developed using three-dimensional models. METHODS: We describe a structure-based prediction model for modeling peptide-DQ3.2beta complexes. We have developed a rapid and accurate protocol for docking candidate peptides into the DQ3.2beta receptor and a scoring function to discriminate binders from the background. The scoring function was rigorously trained, tested and validated using experimentally verified DQ3.2beta binding and non-binding peptides obtained from biochemical and functional studies. RESULTS: Our model predicts DQ3.2beta binding peptides with high accuracy [area under the receiver operating characteristic (ROC) curve A(ROC) > 0.90], compared with experimental data. We investigated the binding patterns of DQ3.2beta peptides and illustrate that several registers exist within a candidate binding peptide. Further analysis reveals that peptides with multiple registers occur predominantly for high-affinity binders.  相似文献   

17.
遥感主体图的准确度对景观生态学研究的影响   总被引:5,自引:1,他引:4  
邵国凡 《生态学报》2004,24(9):1857-1862
用各种案例系统地解释了遥感数据分类误差对景观指数误差的必然影响。一方面 ,遥感数据在各种时间和空间尺度上为景观生态学研究提供必需的土地类型数据 ;另一方面 ,遥感技术的灵活性和复杂性可以产生出各种质量的土地类型数据。但景观生态学方面的用户对土地类型数据基本上是没有选择地使用 ,甚至是不知好坏地使用 ,所以景观生态学的发现和结论具有不可避免的任意性。总结了在各种情况下景观指数的变动区间 ,指出了现实较低的遥感数据的分类准确度会引起更低的景观指数的准确度 ,当进行景观变化分析时 ,这种误差的放大效应将更加明显。当前 ,人们对除面积以外的景观指数的误差仍然束手无策 ,尽可能地提高遥感数据的分类准确度是唯一力所能及的办法。  相似文献   

18.
Front-of-package nutrition symbols (FOPs) are presumably readily noticeable and require minimal prior nutrition knowledge to use. Although there is evidence to support this notion, few studies have focused on Facts Up Front type symbols which are used in the US. Participants with varying levels of prior knowledge were asked to view two products and decide which was more healthful. FOPs on packages were manipulated so that one product was more healthful, allowing us to assess accuracy. Attention to nutrition information was assessed via eye tracking to determine what if any FOP information was used to make their decisions. Results showed that accuracy was below chance on half of the comparisons despite consulting FOPs. Negative correlations between attention to calories, fat, and sodium and accuracy indicated that consumers over-relied on these nutrients. Although relatively little attention was allocated to fiber and sugar, associations between attention and accuracy were positive. Attention to vitamin D showed no association to accuracy, indicating confusion surrounding what constitutes a meaningful change across products. Greater nutrition knowledge was associated with greater accuracy, even when less attention was paid. Individuals, particularly those with less knowledge, are misled by calorie, sodium, and fat information on FOPs.  相似文献   

19.
While feedforward neural networks have been widely accepted as effective tools for solving classification problems, the issue of finding the best network architecture remains unresolved, particularly so in real-world problem settings. We address this issue in the context of credit card screening, where it is important to not only find a neural network with good predictive performance but also one that facilitates a clear explanation of how it produces its predictions. We show that minimal neural networks with as few as one hidden unit provide good predictive accuracy, while having the added advantage of making it easier to generate concise and comprehensible classification rules for the user. To further reduce model size, a novel approach is suggested in which network connections from the input units to this hidden unit are removed by a very straightaway pruning procedure. In terms of predictive accuracy, both the minimized neural networks and the rule sets generated from them are shown to compare favorably with other neural network based classifiers. The rules generated from the minimized neural networks are concise and thus easier to validate in a real-life setting.  相似文献   

20.
Researchers have recently paid attention to social contact patterns among individuals due to their useful applications in such areas as epidemic evaluation and control, public health decisions, chronic disease research and social network research. Although some studies have estimated social contact patterns from social networks and surveys, few have considered how to infer the hierarchical structure of social contacts directly from census data. In this paper, we focus on inferring an individual’s social contact patterns from detailed census data, and generate various types of social contact patterns such as hierarchical-district-structure-based, cross-district and age-district-based patterns. We evaluate newly generated contact patterns derived from detailed 2011 Hong Kong census data by incorporating them into a model and simulation of the 2009 Hong Kong H1N1 epidemic. We then compare the newly generated social contact patterns with the mixing patterns that are often used in the literature, and draw the following conclusions. First, the generation of social contact patterns based on a hierarchical district structure allows for simulations at different district levels. Second, the newly generated social contact patterns reflect individuals social contacts. Third, the newly generated social contact patterns improve the accuracy of the SEIR-based epidemic model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号