首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Self-organizing maps: ordering,convergence properties and energy functions   总被引:6,自引:0,他引:6  
We investigate the convergence properties of the self-organizing feature map algorithm for a simple, but very instructive case: the formation of a topographic representation of the unit interval [0,1] by a linear chain of neurons. We extend the proofs of convergence of Kohonen and of Cottrell and Fort to hold in any case where the neighborhood function, which is used to scale the change in the weight values at each neuron, is a monotonically decreasing function of distance from the winner neuron. We prove that the learning dynamics cannot be described by a gradient descent on a single energy function, but may be described using a set of potential functions, one for each neuron, which are independently minimized following a stochastic gradient descent. We derive the correct potential functions for the oneand multi-dimensional case, and show that the energy functions given by Tolat (1990) are an approximation which is no longer valid in the case of highly disordered maps or steep neighborhood functions.  相似文献   

3.
The function of a protein is intimately tied to its subcellular localization. Although localizations have been measured for many yeast proteins through systematic GFP fusions, similar studies in other branches of life are still forthcoming. In the interim, various machine-learning methods have been proposed to predict localization using physical characteristics of a protein, such as amino acid content, hydrophobicity, side-chain mass and domain composition. However, there has been comparatively little work on predicting localization using protein networks. Here, we predict protein localizations by integrating an extensive set of protein physical characteristics over a protein's extended protein-protein interaction neighborhood, using a classification framework called 'Divide and Conquer k-Nearest Neighbors' (DC-kNN). These predictions achieve significantly higher accuracy than two well-known methods for predicting protein localization in yeast. Using new GFP imaging experiments, we show that the network-based approach can extend and revise previous annotations made from high-throughput studies. Finally, we show that our approach remains highly predictive in higher eukaryotes such as fly and human, in which most localizations are unknown and the protein network coverage is less substantial.  相似文献   

4.
Self-organizing maps: stationary states,metastability and convergence rate   总被引:1,自引:0,他引:1  
We investigate the effect of various types of neighborhood function on the convergence rates and the presence or absence of metastable stationary states of Kohonen's self-organizing feature map algorithm in one dimension. We demonstrate that the time necessary to form a topographic representation of the unit interval [0, 1] may vary over several orders of magnitude depending on the range and also the shape of the neighborhood function, by which the weight changes of the neurons in the neighborhood of the winning neuron are scaled. We will prove that for neighborhood functions which are convex on an interval given by the length of the Kohonen chain there exist no metastable states. For all other neighborhood functions, metastable states are present and may trap the algorithm during the learning process. For the widely-used Gaussian function there exists a threshold for the width above which metastable states cannot exist. Due to the presence or absence of metastable states, convergence time is very sensitive to slight changes in the shape of the neighborhood function. Fastest convergence is achieved using neighborhood functions which are "convex" over a large range around the winner neuron and yet have large differences in value at neighboring neurons.  相似文献   

5.
 In an unpredictable environment, the distributions of alleles from which polymorphism can be maintained forever belong to a certain set, the C-viability kernel. Such a set is calculated in the two-locus haploid model, as well as the corresponding fitnesses at any time which make this maintenance possible. The dependence of the C-viability kernel on the set U of admissible fitnesses and on the recombination rate r is studied. Notably, the C-viability kernel varies rapidly in the neighborhood of equal fitness of AB and ab; it becomes empty when ab has a fitness below a certain function, which is delineated, of the recombination rate. The properties of the two-locus model under constraints, out of equilibrium and with unpredictable selection are thus presented. Received: 20 May 1999  相似文献   

6.
In protein-protein interaction (PPI) networks, functional similarity is often inferred based on the function of directly interacting proteins, or more generally, some notion of interaction network proximity among proteins in a local neighborhood. Prior methods typically measure proximity as the shortest-path distance in the network, but this has only a limited ability to capture fine-grained neighborhood distinctions, because most proteins are close to each other, and there are many ties in proximity. We introduce diffusion state distance (DSD), a new metric based on a graph diffusion property, designed to capture finer-grained distinctions in proximity for transfer of functional annotation in PPI networks. We present a tool that, when input a PPI network, will output the DSD distances between every pair of proteins. We show that replacing the shortest-path metric by DSD improves the performance of classical function prediction methods across the board.  相似文献   

7.
Protein interaction networks are a promising type of data for studying complex biological systems. However, despite the rich information embedded in these networks, these networks face important data quality challenges of noise and incompleteness that adversely affect the results obtained from their analysis. Here, we apply a robust measure of local network structure called common neighborhood similarity (CNS) to address these challenges. Although several CNS measures have been proposed in the literature, an understanding of their relative efficacies for the analysis of interaction networks has been lacking. We follow the framework of graph transformation to convert the given interaction network into a transformed network corresponding to a variety of CNS measures evaluated. The effectiveness of each measure is then estimated by comparing the quality of protein function predictions obtained from its corresponding transformed network with those from the original network. Using a large set of human and fly protein interactions, and a set of over GO terms for both, we find that several of the transformed networks produce more accurate predictions than those obtained from the original network. In particular, the measure and other continuous CNS measures perform well this task, especially for large networks. Further investigation reveals that the two major factors contributing to this improvement are the abilities of CNS measures to prune out noisy edges and enhance functional coherence in the transformed networks.  相似文献   

8.
Revealing organizational principles of biological networks is an important goal of systems biology. In this study, we sought to analyze the dynamic organizational principles within the protein interaction network by studying the characteristics of individual neighborhoods of proteins within the network based on their gene expression as well as protein-protein interaction patterns. By clustering proteins into distinct groups based on their neighborhood gene expression characteristics, we identify several significant trends in the dynamic organization of the protein interaction network. We show that proteins with distinct neighborhood gene expression characteristics are positioned in specific localities in the protein interaction network thereby playing specific roles in the dynamic network connectivity. Remarkably, our analysis reveals a neighborhood characteristic that corresponds to the most centrally located group of proteins within the network. Further, we show that the connectivity pattern displayed by this group is consistent with the notion of “rich club connectivity” in complex networks. Importantly, our findings are largely reproducible in networks constructed using independent and different datasets.  相似文献   

9.
MOTIVATION: Protein-protein interactions have proved to be a valuable starting point for understanding the inner workings of the cell. Computational methodologies have been built which both predict interactions and use interaction datasets in order to predict other protein features. Such methods require gold standard positive (GSP) and negative (GSN) interaction sets. Here we examine and demonstrate the usefulness of homologous interactions in predicting good quality positive and negative interaction datasets. RESULTS: We generate GSP interaction sets as subsets from experimental data using only interaction and sequence information. We can therefore produce sets for several species (many of which at present have no identified GSPs). Comprehensive error rate testing demonstrates the power of the method. We also show how the use of our datasets significantly improves the predictive power of algorithms for interaction prediction and function prediction. Furthermore, we generate GSN interaction sets for yeast and examine the use of homology along with other protein properties such as localization, expression and function. Using a novel method to assess the accuracy of a negative interaction set, we find that the best single selector for negative interactions is a lack of co-function. However, an integrated method using all the characteristics shows significant improvement over any current method for identifying GSN interactions. The nature of homologous interactions is also examined and we demonstrate that interologs are found more commonly within species than across species. CONCLUSION: GSP sets built using our homologous verification method are demonstrably better than standard sets in terms of predictive ability. We can build such GSP sets for several species. When generating GSNs we show a combination of protein features and lack of homologous interactions gives the highest quality interaction sets. AVAILABILITY: GSP and GSN datasets for all the studied species can be downloaded from http://www.stats.ox.ac.uk/~deane/HPIV.  相似文献   

10.
Protein function is a complex notion, which is now receiving renewed attention from a bioinformatics and genomics perspective. After a general discussion of the principles of experimental methods employed to decipher gene/protein function, the contributions made by new, high-throughput methods in terms of function discovery are discussed. Recent work on functional ontologies and the necessity to describe function within the context of hierarchical levels of complexity are presented. The concepts of molecular interactions and genetic networks are then discussed, leading to a useful new framework with which to describe protein function using new tools such as 2D interaction maps. Finally, it is proposed that interaction data could be used to develop new methods for the functional classification of proteins. An example of functional comparisons on a real data set of yeast chromosomal proteins is presented.  相似文献   

11.
Population structure and the spread of disease   总被引:1,自引:0,他引:1  
A common assumption of many mathematical models for the spread of disease is that there is random mixing among all individuals in the host population. This paper analyzes and develops a model for the spread of disease in a population consisting of several interacting subpopulations. The model considers 2 different types of interactions between individuals: 1) within a subpopulation because of geographic proximity, and 2) of the same or different subpopulations because of attendance at common social functions. A stability analysis performed on the equilibria of the model shows 2 stable states: 1) a population composed solely of susceptible individuals with no disease present, and 2) an interior point where there are susceptible, infective, and recovered individuals present at all times. The analysis shows that the threshold for disease maintenance is more easily exceed in centers that are members of a small local cluster than in randomly mixing centers, but that the spread of the disease throughout the population occurs more rapidly when the initial case attends a randomly mixing center. The conditions under which a disease will become established are dependent upon the transmission rate for the disease, the birth and death rate in each neighborhood, the recovery rate from the disease in each neighborhood, and the movement patterns of the individuals in the population. The study of the spread of disease in a population by means of mathematical models provides a valuable addition to the statistical data analyzed by epidemiologists. This model is relevant any time there is a division of the population into several interacting groups in which the probability of disease spread is a function both of neighborhood contact because of geographic proximity and of social interactions between groups.  相似文献   

12.
13.
Biological interpretation of large scale omics data, such as protein-protein interaction data and microarray gene expression data, requires that the function of many genes in a data set is annotated or predicted. Here the predicted function for a gene does not necessarily have to be a detailed biochemical function; a broad class of function, or low-resolution function, may be sufficient to understand why a set of genes shows the observed expression pattern or interaction pattern. In this Highlight, we focus on two recent approaches for function prediction which aim to provide large coverage in function prediction, namely omics data driven approaches and a thorough data mining approach on homology search results.  相似文献   

14.
The Self-organizing map (SOM) is an unsupervised learning method based on the neural computation, which has found wide applications. However, the learning process sometime takes multi-stable states, within which the map is trapped to an undesirable disordered state including topological defects on the map. These topological defects critically aggravate the performance of the SOM. In order to overcome this problem, we propose to introduce an asymmetric neighborhood function for the SOM algorithm. Compared with the conventional symmetric one, the asymmetric neighborhood function accelerates the ordering process even in the presence of the defect. However, this asymmetry tends to generate a distorted map. This can be suppressed by an improved method of the asymmetric neighborhood function. In the case of one-dimensional SOM, it is found that the required steps for perfect ordering is numerically shown to be reduced from O(N 3) to O(N 2). We also discuss the ordering process of a twisted state in two-dimensional SOM, which can not be rectified by the ordinary symmetric neighborhood function.  相似文献   

15.
To understand the function of the encoded proteins, we need to be able to know the subcellular location of a protein. The most common method used for determining subcellular location is fluorescence microscopy which allows subcellular localizations to be imaged in high throughput. Image feature calculation has proven invaluable in the automated analysis of cellular images. This article proposes a novel method named LDPs for feature extraction based on invariant of translation and rotation from given images, the nature which is to count the local difference features of images, and the difference features are given by calculating the D-value between the gray value of the central pixel c and the gray values of eight pixels in the neighborhood. The novel method is tested on two image sets, the first set is which fluorescently tagged protein was endogenously expressed in 10 sebcellular locations, and the second set is which protein was transfected in 11 locations. A SVM was trained and tested for each image set and classification accuracies of 96.7 and 92.3 % were obtained on the endogenous and transfected sets respectively.  相似文献   

16.
MOTIVATION: The goal of neighborhood analysis is to find a set of genes (the neighborhood) that is similar to an initial 'seed' set of genes. Neighborhood analysis methods for network data are important in systems biology. If individual network connections are susceptible to noise, it can be advantageous to define neighborhoods on the basis of a robust interconnectedness measure, e.g. the topological overlap measure. Since the use of multiple nodes in the seed set may lead to more informative neighborhoods, it can be advantageous to define multi-node similarity measures. RESULTS: The pairwise topological overlap measure is generalized to multiple network nodes and subsequently used in a recursive neighborhood construction method. A local permutation scheme is used to determine the neighborhood size. Using four network applications and a simulated example, we provide empirical evidence that the resulting neighborhoods are biologically meaningful, e.g. we use neighborhood analysis to identify brain cancer related genes. AVAILABILITY: An executable Windows program and tutorial for multi-node topological overlap measure (MTOM) based analysis can be downloaded from the webpage (http://www.genetics.ucla.edu/labs/horvath/MTOM/).  相似文献   

17.
Izrailev S  Farnum MA 《Proteins》2004,57(4):711-724
The problem of assigning a biochemical function to newly discovered proteins has been traditionally approached by expert enzymological analysis, sequence analysis, and structural modeling. In recent years, the appearance of databases containing protein-ligand interaction data for large numbers of protein classes and chemical compounds have provided new ways of investigating proteins for which the biochemical function is not completely understood. In this work, we introduce a method that utilizes ligand-binding data for functional classification of enzymes. The method makes use of the existing Enzyme Commission (EC) classification scheme and the data on interactions of small molecules with enzymes from the BRENDA database. A set of ligands that binds to an enzyme with unknown biochemical function serves as a query to search a protein-ligand interaction database for enzyme classes that are known to interact with a similar set of ligands. These classes provide hypotheses of the query enzyme's function and complement other computational annotations that take advantage of sequence and structural information. Similarity between sets of ligands is computed using point set similarity measures based upon similarity between individual compounds. We present the statistics of classification of the enzymes in the database by a cross-validation procedure and illustrate the application of the method on several examples.  相似文献   

18.
理解群落结构和动态的主导机制是生态学研究的基本目标之一。群落内树种的存活受到其邻近树木的显著影响。为探究不同树种的存活对邻体组成的响应差异, 本研究基于鼎湖山南亚热带阔叶林20 ha森林动态监测样地中常见的90个树种的存活监测数据和功能性状数据, 建立了一系列关于邻体效应的树种存活模型。结果表明: 约58%的树种存活对邻体组成有敏感的响应, 共存树种间的功能性状差异影响着50%的树种存活动态。不同树种对邻体组成的响应差异与其耐阴性相关, 耐阴能力较弱的树种更倾向于表现出对邻体的敏感性。低比叶面积、高叶干物质含量、木材密度和最大胸径意味着较强的耐阴能力, 与光资源利用策略有关的生态位分化可能是邻域尺度上物种共存的原因。本研究为量化邻体间的相互作用和解释局域群落的物种共存提供了新的视角。  相似文献   

19.
Predicting protein function is one of the most challenging problems of the post-genomic era. The development of experimental methods for genome scale analysis of molecular interaction networks has provided new approaches to inferring protein function. In this paper we introduce a new graph-based semi-supervised classification algorithm Sequential Linear Neighborhood Propagation (SLNP), which addresses the problem of the classification of partially labeled protein interaction networks. The proposed SLNP first constructs a sequence of node sets according to their shortest distance to the labeled nodes, and then predicts the function of the unlabel proteins from the set closer to labeled one, using Linear Neighborhood Propagation. Its performance is assessed on the Saccharomyces cerevisiae PPI network data sets, with good results compared with three current state-of-the-art algorithms, especially in settings where only a small fraction of the proteins are labeled.  相似文献   

20.
F(st) in a Hierarchical Island Model   总被引:1,自引:0,他引:1       下载免费PDF全文
M. Slatkin  L. Voelm 《Genetics》1991,127(3):627-629
It is shown that in a hierarchical island model, in which demes within a neighborhood exchange migrants at a much higher rate than do demes in different neighborhoods, hierarchical F statistics introduced by S. Wright can indicate the extent of gene flow within and between neighborhoods. At equilibrium, the within-neighborhood inbreeding coefficient, FSN, is approximately 1/(1 + 4Nm1) where N is the deme size and m1 is the migration rate among demes in the same neighborhood. The between-neighborhood inbreeding coefficient, FNT, is approximately 1/(1 + 4Ndm2) where d is the number of demes in a neighborhood and m2 is the migration rate among demes in different neighborhoods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号