首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We provide a decidable hierarchical classification of first-order recurrent neural networks made up of McCulloch and Pitts cells. This classification is achieved by proving an equivalence result between such neural networks and deterministic Büuchi automata, and then translating the Wadge classification theory from the abstract machine to the neural network context. The obtained hierarchy of neural networks is proved to have width 2 and height omega + 1, and a decidability procedure of this hierarchy is provided. Notably, this classification is shown to be intimately related to the attractive properties of the considered networks.  相似文献   

2.
3.
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

The datasets are available at http://hydra.icgeb.trieste.it/benchmark.  相似文献   


4.
SUMMARY 1. The prediction of species distributions is of primary importance in ecology and conservation biology. Statistical models play an important role in this regard; however, researchers have little guidance when choosing between competing methodologies because few comparative studies have been conducted. 2. We provide a comprehensive comparison of traditional and alternative techniques for predicting species distributions using logistic regression analysis, linear discriminant analysis, classification trees and artificial neural networks to model: (1) the presence/absence of 27 fish species as a function of habitat conditions in 286 temperate lakes located in south‐central Ontario, Canada and (2) simulated data sets exhibiting deterministic, linear and non‐linear species response curves. 3. Detailed evaluation of model predictive power showed that approaches produced species models that differed in overall correct classification, specificity (i.e. ability to correctly predict species absence) and sensitivity (i.e. ability to correctly predict speciespresence) and in terms of which of the study lakes they correctly classified. Onaverage, neural networks outperformed the other modelling approaches, although all approaches predicted species presence/absence with moderate to excellent success. 4. Based on simulated non‐linear data, classification trees and neural networks greatly outperformed traditional approaches, whereas all approaches exhibited similar correct classification rates when modelling simulated linear data. 5. Detailed evaluation of model explanatory insight showed that the relative importance of the habitat variables in the species models varied among the approaches, where habitat variable importance was similar among approaches for some species and very different for others. 6. In general, differences in predictive power (both correct classification rate and identity of the lakes correctly classified) among the approaches corresponded with differences in habitat variable importance, suggesting that non‐linear modelling approaches (i.e. classification trees and neural networks) are better able to capture and model complex, non‐linear patterns found in ecological data. The results from the comparisons using simulated data further support this notion. 7. By employing parallel modelling approaches with the same set of data and focusing on comparing multiple metrics of predictive performance, researchers can begin to choose predictive models that not only provide the greatest predictive power, but also best fit the proposed application.  相似文献   

5.
The aim of this study was to present a new training algorithm using artificial neural networks called multi-objective least absolute shrinkage and selection operator (MOBJ-LASSO) applied to the classification of dynamic gait patterns. The movement pattern is identified by 20 characteristics from the three components of the ground reaction force which are used as input information for the neural networks in gender-specific gait classification. The classification performance between MOBJ-LASSO (97.4%) and multi-objective algorithm (MOBJ) (97.1%) is similar, but the MOBJ-LASSO algorithm achieved more improved results than the MOBJ because it is able to eliminate the inputs and automatically select the parameters of the neural network. Thus, it is an effective tool for data mining using neural networks. From 20 inputs used for training, MOBJ-LASSO selected the first and second peaks of the vertical force and the force peak in the antero-posterior direction as the variables that classify the gait patterns of the different genders.  相似文献   

6.
Gas chromatographic fatty acid methyl ester analysis of bacteria is an easy, cheap and fast-automated identification tool routinely used in microbiological research. This paper reports on the application of artificial neural networks for genus-wide FAME-based identification of Bacillus species. Using 1,071 FAME profiles covering a genus-wide spectrum of 477 strains and 82 species, different balanced and imbalanced data sets have been created according to different validation methods and model parameters. Following training and validation, each classifier was evaluated on its ability to identify the profiles of a test set. Comparison of the classifiers showed a good identification rate favoring the imbalanced data sets. The presence of the Bacillus cereus and Bacillus subtilis groups made clear that it is of great importance to take into account the limitations of FAME analysis resolution for the construction of identification models. Indeed, as members of such a group cannot easily be distinguished from one another based upon FAME data alone, identification models built upon this data can neither be successful at keeping them apart. Comparison of the different experimental setups ultimately led to a few general recommendations. With respect to the routinely used commercial Sherlock Microbial Identification System (MIS, Microbial ID, Inc. (MIDI), Newark, Delaware, USA), the artificial neural network test results showed a significant improvement in Bacillus species identification. These results indicate that machine learning techniques such as artificial neural networks are most promising tools for FAME-based classification and identification of bacterial species.  相似文献   

7.
Protein classification artificial neural system.   总被引:2,自引:0,他引:2       下载免费PDF全文
A neural network classification method is developed as an alternative approach to the large database search/organization problem. The system, termed Protein Classification Artificial Neural System (ProCANS), has been implemented on a Cray supercomputer for rapid superfamily classification of unknown proteins based on the information content of the neural interconnections. The system employs an n-gram hashing function that is similar to the k-tuple method for sequence encoding. A collection of modular back-propagation networks is used to store the large amount of sequence patterns. The system has been trained and tested with the first 2,148 of the 8,309 entries of the annotated Protein Identification Resource protein sequence database (release 29). The entries included the electron transfer proteins and the six enzyme groups (oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases), with a total of 620 superfamilies. After a total training time of seven Cray central processing unit (CPU) hours, the system has reached a predictive accuracy of 90%. The classification is fast (i.e., 0.1 Cray CPU second per sequence), as it only involves a forward-feeding through the networks. The classification time on a full-scale system embedded with all known superfamilies is estimated to be within 1 CPU second. Although the training time will grow linearly with the number of entries, the classification time is expected to remain low even if there is a 10-100-fold increase of sequence entries. The neural database, which consists of a set of weight matrices of the networks, together with the ProCANS software, can be ported to other computers and made available to the genome community. The rapid and accurate superfamily classification would be valuable to the organization of protein sequence databases and to the gene recognition in large sequencing projects.  相似文献   

8.
Neural networks have been applied to a number of protein structure problems. In some applications their success has not been substantiated by a comparison with the performance of a suitable alternative statistical method on the same data. In this paper, a two-layer feed-forward neural network has been trained to recognize ATP/GTP-binding [corrected] local sequence motifs. The neural network correctly classified 78% of the 349 sequences used. This was much better than a simple motif-searching program. A more sophisticated statistical method was developed, however, which performed marginally better (80% correct classification) than the neural network. The neural network and the statistical method performed similarly on sequences of varying degrees of homology. These results do not imply that neural networks, especially those with hidden layers, are not useful tools, but they do suggest that two-layer networks in particular should be carefully tested against other statistical methods.  相似文献   

9.
While feedforward neural networks have been widely accepted as effective tools for solving classification problems, the issue of finding the best network architecture remains unresolved, particularly so in real-world problem settings. We address this issue in the context of credit card screening, where it is important to not only find a neural network with good predictive performance but also one that facilitates a clear explanation of how it produces its predictions. We show that minimal neural networks with as few as one hidden unit provide good predictive accuracy, while having the added advantage of making it easier to generate concise and comprehensible classification rules for the user. To further reduce model size, a novel approach is suggested in which network connections from the input units to this hidden unit are removed by a very straightaway pruning procedure. In terms of predictive accuracy, both the minimized neural networks and the rule sets generated from them are shown to compare favorably with other neural network based classifiers. The rules generated from the minimized neural networks are concise and thus easier to validate in a real-life setting.  相似文献   

10.
Artificial astrocytes improve neural network performance   总被引:1,自引:0,他引:1  
Compelling evidence indicates the existence of bidirectional communication between astrocytes and neurons. Astrocytes, a type of glial cells classically considered to be passive supportive cells, have been recently demonstrated to be actively involved in the processing and regulation of synaptic information, suggesting that brain function arises from the activity of neuron-glia networks. However, the actual impact of astrocytes in neural network function is largely unknown and its application in artificial intelligence remains untested. We have investigated the consequences of including artificial astrocytes, which present the biologically defined properties involved in astrocyte-neuron communication, on artificial neural network performance. Using connectionist systems and evolutionary algorithms, we have compared the performance of artificial neural networks (NN) and artificial neuron-glia networks (NGN) to solve classification problems. We show that the degree of success of NGN is superior to NN. Analysis of performances of NN with different number of neurons or different architectures indicate that the effects of NGN cannot be accounted for an increased number of network elements, but rather they are specifically due to astrocytes. Furthermore, the relative efficacy of NGN vs. NN increases as the complexity of the network increases. These results indicate that artificial astrocytes improve neural network performance, and established the concept of Artificial Neuron-Glia Networks, which represents a novel concept in Artificial Intelligence with implications in computational science as well as in the understanding of brain function.  相似文献   

11.
Gene expression arrays typically have 50 to 100 samples and 1000 to 20,000 variables (genes). There have been many attempts to adapt statistical models for regression and classification to these data, and in many cases these attempts have challenged the computational resources. In this article we expose a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture discriminant analysis, the Cox model and neural networks. For all of these models, we show that dramatic computational savings are possible over naive implementations, using standard transformations in numerical linear algebra.  相似文献   

12.
Neural network schemes for detecting rare events in human genomic DNA   总被引:4,自引:0,他引:4  
MOTIVATION: Many problems in molecular biology as well as other areas involve detection of rare events in unbalanced data. We develop two sample stratification schemes in conjunction with neural networks for rare event detection in such databases. Sample stratification is a technique for making each class in a sample have equal influence on decision making. The first scheme proposed stratifies a sample by adding up the weighted sum of the derivatives during the backward pass of training. The second scheme proposed uses a technique of modified bootstrap aggregating. After training neural networks with multiple sets of bootstrapped examples of the rare event classes and subsampled examples of common event classes, multiple voting for classification is performed. RESULTS: These two schemes make rare event classes have a better chance of being included in the sample used for training neural networks and thus improve the classification accuracy for rare event detection. The experimental performance of the two schemes using two sets of human DNA sequences as well as another set of Gaussian data indicates that proposed schemes have the potential of significantly improving accuracy of neural networks to recognize rare events.  相似文献   

13.
14.
A neural network architecture for data classification   总被引:1,自引:0,他引:1  
This article aims at showing an architecture of neural networks designed for the classification of data distributed among a high number of classes. A significant gain in the global classification rate can be obtained by using our architecture. This latter is based on a set of several little neural networks, each one discriminating only two classes. The specialization of each neural network simplifies their structure and improves the classification. Moreover, the learning step automatically determines the number of hidden neurons. The discussion is illustrated by tests on databases from the UCI machine learning database repository. The experimental results show that this architecture can achieve a faster learning, simpler neural networks and an improved performance in classification.  相似文献   

15.
In this paper, we propose to use probabilistic neural networks (PNNs) for classification of bacterial growth/no-growth data and modeling the probability of growth. The PNN approach combines both Bayes theorem of conditional probability and Parzen's method for estimating the probability density functions of the random variables. Unlike other neural network training paradigms, PNNs are characterized by high training speed and their ability to produce confidence levels for their classification decision. As a practical application of the proposed approach, PNNs were investigated for their ability in classification of growth/no-growth state of a pathogenic Escherichia coli R31 in response to temperature and water activity. A comparison with the most frequently used traditional statistical method based on logistic regression and multilayer feedforward artificial neural network (MFANN) trained by error backpropagation was also carried out. The PNN-based models were found to outperform linear and nonlinear logistic regression and MFANN in both the classification accuracy and ease by which PNN-based models are developed.  相似文献   

16.
Neural networks are increasingly being used in science to infer hidden dynamics of natural systems from noisy observations, a task typically handled by hierarchical models in ecology. This article describes a class of hierarchical models parameterised by neural networks – neural hierarchical models. The derivation of such models analogises the relationship between regression and neural networks. A case study is developed for a neural dynamic occupancy model of North American bird populations, trained on millions of detection/non‐detection time series for hundreds of species, providing insights into colonisation and extinction at a continental scale. Flexible models are increasingly needed that scale to large data and represent ecological processes. Neural hierarchical models satisfy this need, providing a bridge between deep learning and ecological modelling that combines the function representation power of neural networks with the inferential capacity of hierarchical models.  相似文献   

17.
18.
To improve recognition results, decisions of multiple neural networks can be aggregated into a committee decision. In contrast to the ordinary approach of utilizing all neural networks available to make a committee decision, we propose creating adaptive committees, which are specific for each input data point. A prediction network is used to identify classification neural networks to be fused for making a committee decision about a given input data point. The jth output value of the prediction network expresses the expectation level that the jth classification neural network will make a correct decision about the class label of a given input data point. The proposed technique is tested in three aggregation schemes, namely majority vote, averaging, and aggregation by the median rule and compared with the ordinary neural networks fusion approach. The effectiveness of the approach is demonstrated on two artificial and three real data sets.  相似文献   

19.
20.
Gaussian processes compare favourably with backpropagation neural networks as a tool for regression, and Bayesian neural networks have Gaussian process behaviour when the number of hidden neurons tends to infinity. We describe a simple recurrent neural network with connection weights trained by one-shot Hebbian learning. This network amounts to a dynamical system which relaxes to a stable state in which it generates predictions identical to those of Gaussian process regression. In effect an infinite number of hidden units in a feed-forward architecture can be replaced by a merely finite number, together with recurrent connections.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号