首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
MOTIVATION: Much research has been dedicated to large-scale protein interaction networks including the analysis of scale-free topologies, network modules and the relation of domain-domain to protein-protein interaction networks. Identifying locally significant proteins that mediate the function of modules is still an open problem. Method: We use a layered clustering algorithm for interaction networks, which groups proteins by the similarity of their direct neighborhoods. We identify locally significant proteins, called mediators, which link different clusters. We apply the algorithm to a yeast network. RESULTS: Clusters and mediators are organized in hierarchies, where clusters are mediated by and act as mediators for other clusters. We compare the clusters and mediators to known yeast complexes and find agreement with precision of 71% and recall of 61%. We analyzed the functions, processes and locations of mediators and clusters. We found that 55% of mediators to a cluster are enriched with a set of diverse processes and locations, often related to translocation of biomolecules. Additionally, 82% of clusters are enriched with one or more functions. The important role of mediators is further corroborated by a comparatively higher degree of conservation across genomes. We illustrate the above findings with an example of membrane protein translocation from the cytoplasm to the inner nuclear membrane. AVAILABILITY: All software is freely available under Supplementary information.  相似文献   

3.

Background  

Nowadays modern biology aims at unravelling the strands of complex biological structures such as the protein-protein interaction (PPI) networks. A key concept in the organization of PPI networks is the existence of dense subnetworks (functional modules) in them. In recent approaches clustering algorithms were applied at these networks and the resulting subnetworks were evaluated by estimating the coverage of well-established protein complexes they contained. However, most of these algorithms elaborate on an unweighted graph structure which in turn fails to elevate those interactions that would contribute to the construction of biologically more valid and coherent functional modules.  相似文献   

4.
5.
6.
MOTIVATION: Determining orthology relations among genes across multiple genomes is an important problem in the post-genomic era. Identifying orthologous genes can not only help predict functional annotations for newly sequenced or poorly characterized genomes, but can also help predict new protein-protein interactions. Unfortunately, determining orthology relation through computational methods is not straightforward due to the presence of paralogs. Traditional approaches have relied on pairwise sequence comparisons to construct graphs, which were then partitioned into putative clusters of orthologous groups. These methods do not attempt to preserve the non-transitivity and hierarchic nature of the orthology relation. RESULTS: We propose a new method, COCO-CL, for hierarchical clustering of homology relations and identification of orthologous groups of genes. Unlike previous approaches, which are based on pairwise sequence comparisons, our method explores the correlation of evolutionary histories of individual genes in a more global context. COCO-CL can be used as a semi-independent method to delineate the orthology/paralogy relation for a refined set of homologous proteins obtained using a less-conservative clustering approach, or as a refiner that removes putative out-paralogs from clusters computed using a more inclusive approach. We analyze our clustering results manually, with support from literature and functional annotations. Since our orthology determination procedure does not employ a species tree to infer duplication events, it can be used in situations when the species tree is unknown or uncertain. CONTACT: jothi@mail.nih.gov, przytyck@mail.nih.gov SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.  相似文献   

7.
Advances in large-scale technologies in proteomics, such as yeast two-hybrid screening and mass spectrometry, have made it possible to generate large Protein Interaction Networks (PINs). Recent methods for identifying dense sub-graphs in such networks have been based solely on graph theoretic properties. Therefore, there is a need for an approach that will allow us to combine domain-specific knowledge with topological properties to generate functionally relevant sub-graphs from large networks. This article describes two alternative network measures for analysis of PINs, which combine functional information with topological properties of the networks. These measures, called weighted clustering coefficient and weighted average nearest-neighbors degree, use weights representing the strengths of interactions between the proteins, calculated according to their semantic similarity, which is based on the Gene Ontology terms of the proteins. We perform a global analysis of the yeast PIN by systematically comparing the weighted measures with their topological counterparts. To show the usefulness of the weighted measures, we develop an algorithm for identification of functional modules, called SWEMODE (Semantic WEights for MODule Elucidation), that identifies dense sub-graphs containing functionally similar proteins. The proposed method is based on the ranking of nodes, i.e., proteins, according to their weighted neighborhood cohesiveness. The highest ranked nodes are considered as seeds for candidate modules. The algorithm then iterates through the neighborhood of each seed protein, to identify densely connected proteins with high functional similarity, according to the chosen parameters. Using a yeast two-hybrid data set of experimentally determined protein-protein interactions, we demonstrate that SWEMODE is able to identify dense clusters containing proteins that are functionally similar. Many of the identified modules correspond to known complexes or subunits of these complexes.  相似文献   

8.
Feature selection is widely established as one of the fundamental computational techniques in mining microarray data. Due to the lack of categorized information in practice, unsupervised feature selection is more practically important but correspondingly more difficult. Motivated by the cluster ensemble techniques, which combine multiple clustering solutions into a consensus solution of higher accuracy and stability, recent efforts in unsupervised feature selection proposed to use these consensus solutions as oracles. However,these methods are dependent on both the particular cluster ensemble algorithm used and the knowledge of the true cluster number. These methods will be unsuitable when the true cluster number is not available, which is common in practice. In view of the above problems, a new unsupervised feature ranking method is proposed to evaluate the importance of the features based on consensus affinity. Different from previous works, our method compares the corresponding affinity of each feature between a pair of instances based on the consensus matrix of clustering solutions. As a result, our method alleviates the need to know the true number of clusters and the dependence on particular cluster ensemble approaches as in previous works. Experiments on real gene expression data sets demonstrate significant improvement of the feature ranking results when compared to several state-of-the-art techniques.  相似文献   

9.
Ensemble clustering methods have become increasingly important to ease the task of choosing the most appropriate cluster algorithm for a particular data analysis problem. The consensus clustering (CC) algorithm is a recognized ensemble clustering method that uses an artificial intelligence technique to optimize a fitness function. We formally prove the existence of a subspace of the search space for CC, which contains all solutions of maximal fitness and suggests two greedy algorithms to search this subspace. We evaluate the algorithms on two gene expression data sets and one synthetic data set, and compare the result with the results of other ensemble clustering approaches.  相似文献   

10.

Background

Recent computational techniques have facilitated analyzing genome-wide protein-protein interaction data for several model organisms. Various graph-clustering algorithms have been applied to protein interaction networks on the genomic scale for predicting the entire set of potential protein complexes. In particular, the density-based clustering algorithms which are able to generate overlapping clusters, i.e. the clusters sharing a set of nodes, are well-suited to protein complex detection because each protein could be a member of multiple complexes. However, their accuracy is still limited because of complex overlap patterns of their output clusters.

Results

We present a systematic approach of refining the overlapping clusters identified from protein interaction networks. We have designed novel metrics to assess cluster overlaps: overlap coverage and overlapping consistency. We then propose an overlap refinement algorithm. It takes as input the clusters produced by existing density-based graph-clustering methods and generates a set of refined clusters by parameterizing the metrics. To evaluate protein complex prediction accuracy, we used the f-measure by comparing each refined cluster to known protein complexes. The experimental results with the yeast protein-protein interaction data sets from BioGRID and DIP demonstrate that accuracy on protein complex prediction has increased significantly after refining cluster overlaps.

Conclusions

The effectiveness of the proposed cluster overlap refinement approach for protein complex detection has been validated in this study. Analyzing overlaps of the clusters from protein interaction networks is a crucial task for understanding of functional roles of proteins and topological characteristics of the functional systems.
  相似文献   

11.
Recent analyses of human-associated bacterial diversity have categorized individuals into ‘enterotypes’ or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes.  相似文献   

12.
The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods.  相似文献   

13.
Co-conservation (phylogenetic profiles) is a well-established method for predicting functional relationships between proteins. Several publicly available databases use this method and additional clustering strategies to develop networks of protein interactions (cluster co-conservation (CCC)). CCC has previously been limited to interactions within a single target species. We have extended CCC to develop protein interaction networks based on co-conservation between protein pairs across multiple species, cross-species cluster co-conservation.  相似文献   

14.
Various mammalian cells including tumor cells secrete extracellular vesicles (EVs), otherwise known as exosomes and microvesicles. EVs are nanosized bilayered proteolipids and play multiple roles in intercellular communication. Although many vesicular proteins have been identified, their functional interrelationships and the mechanisms of EV biogenesis remain unknown. By interrogating proteomic data using systems approaches, we have created a protein interaction network of human colorectal cancer cell-derived EVs which comprises 1491 interactions between 957 vesicular proteins. We discovered that EVs have well-connected clusters with several hub proteins similar to other subcellular networks. We also experimentally validated that direct protein interactions between cellular proteins may be involved in protein sorting during EV formation. Moreover, physically and functionally interconnected protein complexes form functional modules involved in EV biogenesis and functions. Specifically, we discovered that SRC signaling plays a major role in EV biogenesis, and confirmed that inhibition of SRC kinase decreased the intracellular biogenesis and cell surface release of EVs. Our study provides global insights into the cargo-sorting, biogenesis, and pathophysiological roles of these complex extracellular organelles.  相似文献   

15.
MOTIVATION: Network-centered studies in systems biology attempt to integrate the topological properties of biological networks with experimental data in order to make predictions and posit hypotheses. For any topology-based prediction, it is necessary to first assess the significance of the analyzed property in a biologically meaningful context. Therefore, devising network null models, carefully tailored to the topological and biochemical constraints imposed on the network, remains an important computational problem. RESULTS: We first review the shortcomings of the existing generic sampling scheme-switch randomization-and explain its unsuitability for application to metabolic networks. We then devise a novel polynomial-time algorithm for randomizing metabolic networks under the (bio)chemical constraint of mass balance. The tractability of our method follows from the concept of mass equivalence classes, defined on the representation of compounds in the vector space over chemical elements. We finally demonstrate the uniformity of the proposed method on seven genome-scale metabolic networks, and empirically validate the theoretical findings. The proposed method allows a biologically meaningful estimation of significance for metabolic network properties.  相似文献   

16.
Wang B  Gao L 《Proteome science》2012,10(Z1):S16

Background

Network alignment is one of the most common biological network comparison methods. Aligning protein-protein interaction (PPI) networks of different species is of great important to detect evolutionary conserved pathways or protein complexes across species through the identification of conserved interactions, and to improve our insight into biological systems. Global network alignment (GNA) problem is NP-complete, for which only heuristic methods have been proposed so far. Generally, the current GNA methods fall into global heuristic seed-and-extend approaches. These methods can not get the best overall consistent alignment between networks for the opinionated local seed. Furthermore These methods are lost in maximizing the number of aligned edges between two networks without considering the original structures of functional modules.

Methods

We present a novel seed selection strategy for global network alignment by constructing the pairs of hub nodes of networks to be aligned into multiple seeds. Beginning from every hub seed and using the membership similarity of nodes to quantify to what extent the nodes can participate in functional modules associated with current seed topologically we align the networks by modules. By this way we can maintain the functional modules are not damaged during the heuristic alignment process. And our method is efficient in resolving the fatal problem of most conventional algorithms that the initialization selected seeds have a direct influence on the alignment result. The similarity measures between network nodes (e.g., proteins) include sequence similarity, centrality similarity, and dynamic membership similarity and our algorithm can be called Multiple Hubs-based Alignment (MHA).

Results

When applying our seed selection strategy to several pairs of real PPI networks, it is observed that our method is working to strike a balance, extending the conserved interactions while maintaining the functional modules unchanged. In the case study, we assess the effectiveness of MHA on the alignment of the yeast and fly PPI networks. Our method outperforms state-of-the-art algorithms at detecting conserved functional modules and retrieves in particular 86% more conserved interactions than IsoRank.

Conclusions

We believe that our seed selection strategy will lead us to obtain more topologically and biologically similar alignment result. And it can be used as the reference and complement of other heuristic methods to seek more meaningful alignment results.
  相似文献   

17.
Protein kinases are critical to cellular signalling and post-translational gene regulation, but their biological substrates are difficult to identify. We show that cyclin-dependent kinase (CDK) consensus motifs are frequently clustered in CDK substrate proteins. Based on this, we introduce a new computational strategy to predict the targets of CDKs and use it to identify new biologically interesting candidates. Our data suggest that regulatory modules may exist in protein sequence as clusters of short sequence motifs.  相似文献   

18.
CFinder: locating cliques and overlapping modules in biological networks   总被引:6,自引:0,他引:6  
Most cellular tasks are performed not by individual proteins, but by groups of functionally associated proteins, often referred to as modules. In a protein association network modules appear as groups of densely interconnected nodes, also called communities or clusters. These modules often overlap with each other and form a network of their own, in which nodes (links) represent the modules (overlaps). We introduce CFinder, a fast program locating and visualizing overlapping, densely interconnected groups of nodes in undirected graphs, and allowing the user to easily navigate between the original graph and the web of these groups. We show that in gene (protein) association networks CFinder can be used to predict the function(s) of a single protein and to discover novel modules. CFinder is also very efficient for locating the cliques of large sparse graphs. Availability: CFinder (for Windows, Linux and Macintosh) and its manual can be downloaded from http://angel.elte.hu/clustering. Supplementary information: Supplementary data are available on Bioinformatics online.  相似文献   

19.
MOTIVATION: Extracting functional information from protein-protein interactions (PPI) poses significant challenges arising from the noisy, incomplete, generic and static nature of data obtained from high-throughput screening. Typical proteins are composed of multiple domains, often regarded as their primary functional and structural units. Motivated by these considerations, domain-domain interactions (DDI) for network-based analyses have received significant recent attention. This article performs a formal comparative investigation of the relationship between functional coherence and topological proximity in PPI and DDI networks. Our investigation provides the necessary basis for continued and focused investigation of DDIs as abstractions for functional characterization and modularization of networks. RESULTS: We investigate the problem of assessing the functional coherence of two biomolecules (or segments thereof) in a formal framework. We establish essential attributes of admissible measures of functional coherence, and demonstrate that existing, well-accepted measures are ill-suited to comparative analyses involving different entities (i.e. domains versus proteins). We propose a statistically motivated functional similarity measure that takes into account functional specificity as well as the distribution of functional attributes across entity groups to assess functional similarity in a statistically meaningful and biologically interpretable manner. Results on diverse data, including high-throughput and computationally predicted PPIs, as well as structural and computationally inferred DDIs for different organisms show that: (i) the relationship between functional similarity and network proximity is captured in a much more (biologically) intuitive manner by our measure, compared to existing measures and (ii) network proximity and functional similarity are significantly more correlated in DDI networks than in PPI networks, and that structurally determined DDIs provide better functional relevance as compared to computationally inferred DDIs.  相似文献   

20.

Background  

Genes work coordinately as gene modules or gene networks. Various computational approaches have been proposed to find gene modules based on gene expression data; for example, gene clustering is a popular method for grouping genes with similar gene expression patterns. However, traditional gene clustering often yields unsatisfactory results for regulatory module identification because the resulting gene clusters are co-expressed but not necessarily co-regulated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号