共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
MOTIVATION: We propose representing individual positions in DNA sequences by virtual potentials generated by other bases of the same sequence. This is a compact representation of the neighbourhood of a base. The distribution of the virtual potentials over the whole sequence can be used as a representation of the entire sequence (SEQREP code). It is a flexible code, with a length independent of the sequence size, does not require previous alignment, and is convenient for processing by neural networks or statistical techniques. RESULTS: To evaluate its biological significance, the SEQREP code was used for training Kohonen self-organizing maps (SOMs) in two applications: (a) detection of Alu sequences, and (b) classification of sequences encoding for HIV-1 envelope glycoprotein (env) into subtypes A-G. It was demonstrated that SOMs clustered sequences belonging to different classes into distinct regions. For independent test sets, very high rates of correct predictions were obtained (97% in the first application, 91% in the second). Possible areas of application of SEQREP codes include functional genomics, phylogenetic analysis, detection of repetitions, database retrieval, and automatic alignment. AVAILABILITY: Software for representing sequences by SEQREP code, and for training Kohonen SOMs is made freely available from http://www.dq.fct.unl.pt/qoa/jas/seqrep. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.dq.fct.unl.pt/qoa/jas/seqrep/bioinf2002 相似文献
3.
4.
5.
Pascual-Montano A Taylor KA Winkler H Pascual-Marqui RD Carazo JM 《Journal of structural biology》2002,138(1-2):114-122
Tomography emerges as a powerful methodology for determining the complex architectures of biological specimens that are better regarded from the structural point of view as singular entities. However, once the structure of a sufficiently large number of singular specimens is solved, quite possibly structural patterns start to emerge. This latter situation is addressed here, where the clustering of a set of 3D reconstructions using a novel quantitative approach is presented. In general terms, we propose a new variant of a self-organizing neural network for the unsupervised classification of 3D reconstructions. The novelty of the algorithm lies in its rigorous mathematical formulation that, starting from a large set of noisy input data, finds a set of "representative" items, organized onto an ordered output map, such that the probability density of this set of representative items resembles at its possible best the probability density of the input data. In this study, we evaluate the feasibility of application of the proposed neural approach to the problem of identifying similar 3D motifs within tomograms of insect flight muscle. Our experimental results prove that this technique is suitable for this type of problem, providing the electron microscopy community with a new tool for exploring large sets of tomogram data to find complex patterns. 相似文献
6.
7.
Jae Kwang Kim Myoung Rae Cho Hyung Jin Baek Tae Hun Ryu Chang Yeon Yu Myong Jo Kim Eiichiro Fukusaki Akio Kobayashi 《Journal of Plant Biology》2007,50(4):517-521
Novel tools are needed for efficient analysis and visualization of the massive data sets associated with metabolomics. Here, we describe a batch-learning self-organizing map (BL-SOM) for metabolome informatics that makes the learning process and resulting map independent of the order of data input. This approach was successfully used in analyzing and organizing the metabolome data forArabidopsis thaliana cells cultured under salt stress. Our 6 × 4 matrix presented patterns of metabolite levels at different time periods. A negative correlation was found between the levels of amino acids and metabolites related to glycolysis metabolism in response to this stress. Therefore, BL-SOM could be an excellent tool for clustering and visualizing high dimensional, complex metabolome data in a single map. 相似文献
8.
The use of self-organizing maps to analyze data often depends on finding effective methods to visualize the SOM's structure. In this paper we propose a new way to perform that visualization using a variant of Andrews' Curves. Also we show that the interaction between these two methods allows us to find sub-clusters within identified clusters. Perhaps more importantly, using the SOM to pre-process data by identifying gross features enables us to use Andrews' Curves on data sets which would have previously been too large for the methodology. Finally we show how a three way interaction between the human user and these two methods can be a valuable exploratory data analysis tool. 相似文献
9.
Analysis of gene expression data using self-organizing maps. 总被引:29,自引:0,他引:29
DNA microarray technologies together with rapidly increasing genomic sequence information is leading to an explosion in available gene expression data. Currently there is a great need for efficient methods to analyze and visualize these massive data sets. A self-organizing map (SOM) is an unsupervised neural network learning algorithm which has been successfully used for the analysis and organization of large data files. We have here applied the SOM algorithm to analyze published data of yeast gene expression and show that SOM is an excellent tool for the analysis and visualization of gene expression profiles. 相似文献
10.
In this paper, we propose a method of reducing topological defects in self-organizing maps (SOMs) using multiple scale neighborhood functions. The multiple scale neighborhood functions are inspired by multiple scale channels in the human visual system. To evaluate the proposed method, we applied it to the traveling salesman problem (TSP), and examined two indexes: the tour length of the solution and the number of kinks in the solution. Consequently, the two indexes are lower for the proposed method. These results indicate that our proposed method has the ability to reduce topological defects. 相似文献
11.
12.
The Self-organizing map (SOM) is an unsupervised learning method based on the neural computation, which has found wide applications.
However, the learning process sometime takes multi-stable states, within which the map is trapped to an undesirable disordered
state including topological defects on the map. These topological defects critically aggravate the performance of the SOM.
In order to overcome this problem, we propose to introduce an asymmetric neighborhood function for the SOM algorithm. Compared
with the conventional symmetric one, the asymmetric neighborhood function accelerates the ordering process even in the presence
of the defect. However, this asymmetry tends to generate a distorted map. This can be suppressed by an improved method of
the asymmetric neighborhood function. In the case of one-dimensional SOM, it is found that the required steps for perfect
ordering is numerically shown to be reduced from O(N
3) to O(N
2). We also discuss the ordering process of a twisted state in two-dimensional SOM, which can not be rectified by the ordinary
symmetric neighborhood function. 相似文献
13.
The MMSOM identification method, which had been presented by the authors, is improved to the multiple modeling by the irregular self-organizing map (MMISOM) using the irregular SOM (ISOM). Inputs to the neural networks are parameters of the instantaneous model computed adaptively at every instant. The neural network learns these models. The reference vectors of its output nodes are estimation of the parameters of the local models. At every instant, the model with closest output to the plant output is selected as the model of the plant. ISOM used in this paper is a graph of all the nodes and some of the weighted links between them to make a minimum spanning tree graph. It is shown in this paper that it is possible to add new models if the number of models is initially less than the appropriate one. The MMISOM shows more flexibility to cover the linear model space of the plant when the space is concave. 相似文献
14.
The MMSOM identification method, which had been presented by the authors, is improved to the multiple modeling by the irregular self-organizing map (MMISOM) using the irregular SOM (ISOM). Inputs to the neural networks are parameters of the instantaneous model computed adaptively at every instant. The neural network learns these models. The reference vectors of its output nodes are estimation of the parameters of the local models. At every instant, the model with closest output to the plant output is selected as the model of the plant. ISOM used in this paper is a graph of all the nodes and some of the weighted links between them to make a minimum spanning tree graph. It is shown in this paper that it is possible to add new models if the number of models is initially less than the appropriate one. The MMISOM shows more flexibility to cover the linear model space of the plant when the space is concave. 相似文献
15.
Leflaive J Céréghino R Danger M Lacroix G Ten-Hage L 《Journal of microbiological methods》2005,62(1):89-102
The use of community-level physiological profiles obtained with Biolog microplates is widely employed to consider the functional diversity of bacterial communities. Biolog produces a great amount of data which analysis has been the subject of many studies. In most cases, after some transformations, these data were investigated with classical multivariate analyses. Here we provided an alternative to this method, that is the use of an artificial intelligence technique, the Self-Organizing Maps (SOM, unsupervised neural network). We used data from a microcosm study of algae-associated bacterial communities placed in various nutritive conditions. Analyses were carried out on the net absorbances at two incubation times for each substrates and on the chemical guild categorization of the total bacterial activity. Compared to Principal Components Analysis and cluster analysis, SOM appeared as a valuable tool for community classification, and to establish clear relationships between clusters of bacterial communities and sole-carbon sources utilization. Specifically, SOM offered a clear bidimensional projection of a relatively large volume of data and were easier to interpret than plots commonly obtained with multivariate analyses. They would be recommended to pattern the temporal evolution of communities' functional diversity. 相似文献
16.
A new method based on neural networks to cluster proteins into families is described. The network is trained with the Kohonen unsupervised learning algorithm, using matrix pattern representations of the protein sequences as inputs. The components (x, y) of these 20×20 matrix patterns are the normalized frequencies of all pairs xy of amino acids in each sequence. We investigate the influence of different learning parameters in the final topological maps obtained with a learning set of ten proteins belonging to three established families. In all cases, except in those where the synaptic vectors remains nearly unchanged during learning, the ten proteins are correctly classified into the expected families. The classification by the trained network of mutated or incomplete sequences of the learned proteins is also analysed. The neural network gives a correct classification for a sequence mutated in 21.5%±7% of its amino acids and for fragments representing 7.5%±3% of the original sequence. Similar results were obtained with a learning set of 32 proteins belonging to 15 families. These results show that a neural network can be trained following the Kohonen algorithm to obtain topological maps of protein sequences, where related proteins are finally associated to the same winner neuron or to neighboring ones, and that the trained network can be applied to rapidly classify new sequences. This approach opens new possibilities to find rapid and efficient algorithms to organize and search for homologies in the whole protein database. 相似文献
17.
Kamimura R 《Biological cybernetics》2011,104(4-5):305-338
In this article, we propose a new learning method called "self-enhancement learning." In this method, targets for learning are not given from the outside, but they can be spontaneously created within a neural network. To realize the method, we consider a neural network with two different states, namely, an enhanced and a relaxed state. The enhanced state is one in which the network responds very selectively to input patterns, while in the relaxed state, the network responds almost equally to input patterns. The gap between the two states can be reduced by minimizing the Kullback-Leibler divergence between the two states with free energy. To demonstrate the effectiveness of this method, we applied self-enhancement learning to the self-organizing maps, or SOM, in which lateral interactions were added to an enhanced state. We applied the method to the well-known Iris, wine, housing and cancer machine learning database problems. In addition, we applied the method to real-life data, a student survey. Experimental results showed that the U-matrices obtained were similar to those produced by the conventional SOM. Class boundaries were made clearer in the housing and cancer data. For all the data, except for the cancer data, better performance could be obtained in terms of quantitative and topological errors. In addition, we could see that the trustworthiness and continuity, referring to the quality of neighborhood preservation, could be improved by the self-enhancement learning. Finally, we used modern dimensionality reduction methods and compared their results with those obtained by the self-enhancement learning. The results obtained by the self-enhancement were not superior to but comparable with those obtained by the modern dimensionality reduction methods. 相似文献
18.
This paper presents an approach to the well-known Travelling Salesman Problem (TSP) using Self-Organizing Maps (SOM). The SOM algorithm has interesting topological information about its neurons configuration on cartesian space, which can be used to solve optimization problems. Aspects of initialization, parameters adaptation, and complexity analysis of the proposed SOM based algorithm are discussed. The results show an average deviation of 3.7% from the optimal tour length for a set of 12 TSP instances. 相似文献
19.
Clustering of ant communities and indicator species analysis using self-organizing maps 总被引:1,自引:0,他引:1
Sang-Hyun Park Shingo Hosoishi Kazuo Ogata Yuzuru Kuboki 《Comptes rendus biologies》2014,337(9):545-552
To understand the complex relationships that exist between ant assemblages and their habitats, we performed a self-organizing map (SOM) analysis to clarify the interactions among ant diversity, spatial distribution, and land use types in Fukuoka City, Japan. A total of 52 species from 12 study sites with nine land use types were collected from 1998 to 2012. A SOM was used to classify the collected data into three clusters based on the similarities between the ant communities. Consequently, each cluster reflected both the species composition and habitat characteristics in the study area. A detrended correspondence analysis (DCA) corroborated these findings, but removal of unique and duplicate species from the dataset in order to avoid sampling errors had a marked effect on the results; specifically, the clusters produced by DCA before and after the exclusion of specific data points were very different, while the clusters produced by the SOM were consistent. In addition, while the indicator value associated with SOMs clearly illustrated the importance of individual species in each cluster, the DCA scatterplot generated for species was not clear. The results suggested that SOM analysis was better suited for understanding the relationships between ant communities and species and habitat characteristics. 相似文献
20.
One particularly time-consuming step in protein crystallography is interpreting the electron density map; that is, fitting a complete molecular model of the protein into a 3D image of the protein produced by the crystallographic process. In poor-quality electron density maps, the interpretation may require a significant amount of a crystallographer's time. Our work investigates automating the time-consuming initial backbone trace in poor-quality density maps. We describe ACMI (Automatic Crystallographic Map Interpreter), which uses a probabilistic model known as a Markov field to represent the protein. Residues of the protein are modeled as nodes in a graph, while edges model pairwise structural interactions. Modeling the protein in this manner allows the model to be flexible, considering an almost infinite number of possible conformations, while rejecting any that are physically impossible. Using an efficient algorithm for approximate inference--belief propagation--allows the most probable trace of the protein's backbone through the density map to be determined. We test ACMI on a set of ten protein density maps (at 2.5 to 4.0 A resolution), and compare our results to alternative approaches. At these resolutions, ACMI offers a more accurate backbone trace than current approaches. 相似文献