首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP), offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i) do not fully exploit available parallel computing power and (ii) they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks.  相似文献   

2.
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank‐ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly‐used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6‐bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column‐specific properties such as sequence entropy and random noise were subtracted; “central” positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints—detectable by divergent algorithms—that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions. Proteins 2015; 83:2293–2306. © 2015 Wiley Periodicals, Inc.  相似文献   

3.
The relationship between the design and functionality of molecular networks is now a key issue in biology. Comparison of regulatory networks performing similar tasks can provide insights into how network architecture is constrained by the functions it directs. Here, we discuss methods of network comparison based on network architecture and signaling logic. Introducing local and global signaling scores for the difference between two networks, we quantify similarities between evolutionarily closely and distantly related bacteriophages. Despite the large evolutionary separation between phage λ and 186, their networks are found to be similar when difference is measured in terms of global signaling. We finally discuss how network alignment can be used to pinpoint protein similarities viewed from the network perspective.  相似文献   

4.
Many complex networks such as computer and social networks exhibit modular structures, where links between nodes are much denser within modules than between modules. It is widely believed that cellular networks are also modular, reflecting the relative independence and coherence of different functional units in a cell. While many authors have claimed that observations from the yeast protein–protein interaction (PPI) network support the above hypothesis, the observed structural modularity may be an artifact because the current PPI data include interactions inferred from protein complexes through approaches that create modules (e.g., assigning pairwise interactions among all proteins in a complex). Here we analyze the yeast PPI network including protein complexes (PIC network) and excluding complexes (PEC network). We find that both PIC and PEC networks show a significantly greater structural modularity than that of randomly rewired networks. Nonetheless, there is little evidence that the structural modules correspond to functional units, particularly in the PEC network. More disturbingly, there is no evolutionary conservation among yeast, fly, and nematode modules at either the whole-module or protein-pair level. Neither is there a correlation between the evolutionary or phylogenetic conservation of a protein and the extent of its participation in various modules. Using computer simulation, we demonstrate that a higher-than-expected modularity can arise during network growth through a simple model of gene duplication, without natural selection for modularity. Taken together, our results suggest the intriguing possibility that the structural modules in the PPI network originated as an evolutionary byproduct without biological significance.  相似文献   

5.
We present a new method for protein structure comparison that combines indexing and dynamic programming (DP). The method is based on simple geometric features of triplets of secondary structures of proteins. These features provide indexes to a hash table that allows fast retrieval of similarity information for a query protein. After the query protein is matched with all proteins in the hash table producing a list of putative similarities, the dynamic programming algorithm is used to align the query protein with each protein of this list. Since the pairwise comparison with DP is applied only to a small subset of proteins and, furthermore, DP re-uses information that is already computed and stored in the hash table, the approach is very fast even when searching the entire PDB. We have done extensive experimentation showing that our approach achieves results of quality comparable to that of other existing approaches but is generally faster.  相似文献   

6.
MOTIVATION: A global view of the protein space is essential for functional and evolutionary analysis of proteins. In order to achieve this, a similarity network can be built using pairwise relationships among proteins. However, existing similarity networks employ a single similarity measure and therefore their utility depends highly on the quality of the selected measure. A more robust representation of the protein space can be realized if multiple sources of information are used. RESULTS: We propose a novel approach for analyzing multi-attribute similarity networks by combining random walks on graphs with Bayesian theory. A multi-attribute network is created by combining sequence and structure based similarity measures. For each attribute of the similarity network, one can compute a measure of affinity from a given protein to every other protein in the network using random walks. This process makes use of the implicit clustering information of the similarity network, and we show that it is superior to naive, local ranking methods. We then combine the computed affinities using a Bayesian framework. In particular, when we train a Bayesian model for automated classification of a novel protein, we achieve high classification accuracy and outperform single attribute networks. In addition, we demonstrate the effectiveness of our technique by comparison with a competing kernel-based information integration approach.  相似文献   

7.
Increasingly large numbers of proteins require methods for functional annotation. This is typically based on pairwise inference from the homology of either protein sequence or structure. Recently, similarity networks have been presented to leverage both the ability to visualize relationships between proteins and assess the transferability of functional inference. Here we present PANADA, a novel toolkit for the visualization and analysis of protein similarity networks in Cytoscape. Networks can be constructed based on pairwise sequence or structural alignments either on a set of proteins or, alternatively, by database search from a single sequence. The Panada web server, executable for download and examples and extensive help files are available at URL: http://protein.bio.unipd.it/panada/.  相似文献   

8.
Protein interaction networks are known to exhibit remarkable structures: scale-free and small-world and modular structures. To explain the evolutionary processes of protein interaction networks possessing scale-free and small-world structures, preferential attachment and duplication-divergence models have been proposed as mathematical models. Protein interaction networks are also known to exhibit another remarkable structural characteristic, modular structure. How the protein interaction networks became to exhibit modularity in their evolution? Here, we propose a hypothesis of modularity in the evolution of yeast protein interaction network based on molecular evolutionary evidence. We assigned yeast proteins into six evolutionary ages by constructing a phylogenetic profile. We found that all the almost half of hub proteins are evolutionarily new. Examining the evolutionary processes of protein complexes, functional modules and topological modules, we also found that member proteins of these modules tend to appear in one or two evolutionary ages. Moreover, proteins in protein complexes and topological modules show significantly low evolutionary rates than those not in these modules. Our results suggest a hypothesis of modularity in the evolution of yeast protein interaction network as systems evolution.  相似文献   

9.
One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.  相似文献   

10.
MOTIVATION: We consider the problem of finding similarities in protein structure databases. Current techniques sequentially compare the given query protein to all of the proteins in the database to find similarities. Therefore, the cost of similarity queries increases linearly as the volume of the protein databases increase. As the sizes of experimentally determined and theoretically estimated protein structure databases grow, there is a need for scalable searching techniques. RESULTS: Our techniques extract feature vectors on triplets of SSEs (Secondary Structure Elements). Later, these feature vectors are indexed using a multidimensional index structure. For a given query protein, this index structure is used to quickly prune away unpromising proteins in the database. The remaining proteins are then aligned using a popular alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times while maintaining similar sensitivity.  相似文献   

11.
MOTIVATION: Analysis of large biological data sets using a variety of parallel processor computer architectures is a common task in bioinformatics. The efficiency of the analysis can be significantly improved by properly handling redundancy present in these data combined with taking advantage of the unique features of these compute architectures. RESULTS: We describe a generalized approach to this analysis, but present specific results using the program CEPAR, an efficient implementation of the Combinatorial Extension algorithm in a massively parallel (PAR) mode for finding pairwise protein structure similarities and aligning protein structures from the Protein Data Bank. CEPAR design and implementation are described and results provided for the efficiency of the algorithm when run on a large number of processors. AVAILABILITY: Source code is available by contacting one of the authors.  相似文献   

12.
Evolutionary networks in the formatted protein sequence space.   总被引:4,自引:0,他引:4  
In our recent work, a new approach to establish sequence relatedness, by walking through the protein sequence space, was introduced. The sequence space is built from 20 amino acid long fragments of proteins from a very large collection of fully sequenced prokaryotic genomes. The fragments, points in the space, are connected, if they are closely related (high sequence identity). The connected fragments form variety of networks of sequence kinship. In this research the networks in the formatted sequence space and their topology are analyzed. For lower identity thresholds a huge network of complex structure is formed, involving up to 10% points of the space. When the threshold is increased, the major network splits into a set of smaller clusters with a wide diversity of sizes and topologies. Such "evolutionary networks" may serve as a powerful sequence annotation tool that allows one to reveal fine details in the evolutionary history of proteins.  相似文献   

13.
Noble WS  Kuang R  Leslie C  Weston J 《The FEBS journal》2005,272(20):5119-5128
Perhaps the most widely used applications of bioinformatics are tools such as psi-blast for searching sequence databases. We describe a recently developed protein database search algorithm called rankprop. rankprop relies upon a precomputed network of pairwise protein similarities. The algorithm performs a diffusion operation from a specified query protein across the protein similarity network. The resulting activation scores, assigned to each database protein, encode information about the global structure of the protein similarity network. This type of algorithm has a rich history in associationist psychology, artificial intelligence and web search. We describe the rankprop algorithm and its relatives, and we provide evidence that the algorithm successfully improves upon the rankings produced by psi-blast.  相似文献   

14.

Background

Details of the mechanisms and selection pressures that shape the emergence and development of complex biological systems, such as the human immune system, are poorly understood. A recent definition of a reference set of proteins essential for the human immunome, combined with information about protein interaction networks for these proteins, facilitates evolutionary study of this biological machinery.

Results

Here, we present a detailed study of the development of the immunome protein interaction network during eight evolutionary steps from Bilateria ancestors to human. New nodes show preferential attachment to high degree proteins. The efficiency of the immunome protein interaction network increases during the evolutionary steps, whereas the vulnerability of the network decreases.

Conclusion

Our results shed light on selective forces acting on the emergence of biological networks. It is likely that the high efficiency and low vulnerability are intrinsic properties of many biological networks, which arise from the effects of evolutionary processes yet to be uncovered.  相似文献   

15.
Residue networks representing 595 nonhomologous proteins are studied. These networks exhibit universal topological characteristics as they belong to the topological class of modular networks formed by several highly interconnected clusters separated by topological cavities. There are some networks that tend to deviate from this universality. These networks represent small-size proteins having <200 residues. This article explains such differences in terms of the domain structure of these proteins. On the other hand, the topological cavities characterizing proteins residue networks match very well with protein binding sites. This study investigates the effect of the cutoff value used in building the residue network. For small cutoff values, <5 Å, the cavities found are very large corresponding almost to the whole protein surface. On the contrary, for large cutoff value, >10.0 Å, only very large cavities are detected and the networks look very homogeneous. These findings are useful for practical purposes as well as for identifying protein-like complex networks. Finally, this article shows that the main topological class of residue networks is not reproduced by random networks growing according to Erdös-Rényi model or the preferential attachment method of Barabási-Albert. However, the Watts-Strogatz model reproduces very well the topological class as well as other topological properties of residue network. A more biologically appealing modification of the Watts-Strogatz model to describe residue networks is proposed.  相似文献   

16.
Protein networks, describing physical interactions as well as functional associations between proteins, have been unravelled for many organisms in the recent past. Databases such as the STRING provide excellent resources for the analysis of such networks. In this contribution, we revisit the organisation of protein networks, particularly the centrality–lethality hypothesis, which hypothesises that nodes with higher centrality in a network are more likely to produce lethal phenotypes on removal, compared to nodes with lower centrality. We consider the protein networks of a diverse set of 20 organisms, with essentiality information available in the Database of Essential Genes and assess the relationship between centrality measures and lethality. For each of these organisms, we obtained networks of high-confidence interactions from the STRING database, and computed network parameters such as degree, betweenness centrality, closeness centrality and pairwise disconnectivity indices. We observe that the networks considered here are predominantly disassortative. Further, we observe that essential nodes in a network have a significantly higher average degree and betweenness centrality, compared to the network average. Most previous studies have evaluated the centrality–lethality hypothesis for Saccharomyces cerevisiae and Escherichia coli; we here observe that the centrality–lethality hypothesis hold goods for a large number of organisms, with certain limitations. Betweenness centrality may also be a useful measure to identify essential nodes, but measures like closeness centrality and pairwise disconnectivity are not significantly higher for essential nodes.  相似文献   

17.
MOTIVATION: It is known that the physico-chemical characteristics of proteins underlying specific folding of the polypeptide chain and the protein function are evolutionary conserved. Detection of such characteristics while analyzing homologous sequences would expand essentially the knowledge on protein function, structure, and evolution. These characteristics are maintained constant, in particular, by co-ordinated substitutions. In this process, the destabilizing effect of a substitution may be compensated by another substitution at a different position within the same protein, making the overall change in this protein characteristic insignificant. Consequently, the patterns of co-ordinated substitutions contain important information on conserved physico-chemical properties of proteins, requiring their investigation and development of the corresponding methods and software for correlation analysis of protein sequences available to a wide range of users. RESULTS: A software package for analyzing correlated amino acid substitutions at different positions within aligned protein sequences was developed. The approach implies searching for evolutionary conserved physico-chemical characteristics of proteins based on the information on the pairwise correlations of amino acid substitutions at different protein positions. The software was applied to analyze DNA-binding domains of the homeodomain class. As a result, two conservative physico-chemical characteristics preserved due to the co-ordinated substitutions at certain groups of positions in the protein sequence. Possible functional roles of these characteristics are discussed. AVAILABILITY: The program package is available at http://wwwmgs.bionet.nsc.ru/programs/CRASP/.  相似文献   

18.
Protein interaction networks display approximate scale-free topology, in which hub proteins that interact with a large number of other proteins determine the overall organization of the network. In this study, we aim to determine whether hubs are distinguishable from other networked proteins by specific sequence features. Proteins of different connectednesses were compared in the interaction networks of Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, and Homo sapienswith respect to the distribution of predicted structural disorder, sequence repeats, low complexity regions, and chain length. Highly connected proteins ("hub proteins") contained significantly more of, and greater proportion of, these sequence features and tended to be longer overall as compared to less connected proteins. These sequence features provide two different functional means for realizing multiple interactions: (1) extended interaction surface and (2) flexibility and adaptability, providing a mechanism for the same region to bind distinct partners. Our view contradicts the prevailing view that scaling in protein interactomes arose from gene duplication and preferential attachment of equivalent proteins. We propose an alternative evolutionary network specialization process, in which certain components of the protein interactome improved their fitness for binding by becoming longer or accruing regions of disorder and/or internal repeats and have therefore become specialized in network organization.  相似文献   

19.
The topology of regulatory networks contains clues to their overall design principles and evolutionary history. We find that while in- and out-degrees of a given protein in the regulatory network are not correlated with each other, there exists a strong negative correlation between the out-degree of a regulatory protein and in-degrees of its targets. Such correlation positions large regulatory modules on the periphery of the network and makes them rather well separated from each other. We also address the question of relative importance of different classes of proteins quantified by the lethality of null-mutants lacking one of them as well as by the level of their evolutionary conservation. It was found that in the yeast regulatory network highly connected proteins are in fact less important than their low-connected counterparts.  相似文献   

20.
MOTIVATION: Probabilistic Boolean networks (PBNs) have been proposed to model genetic regulatory interactions. The steady-state probability distribution of a PBN gives important information about the captured genetic network. The computation of the steady-state probability distribution usually includes construction of the transition probability matrix and computation of the steady-state probability distribution. The size of the transition probability matrix is 2(n)-by-2(n) where n is the number of genes in the genetic network. Therefore, the computational costs of these two steps are very expensive and it is essential to develop a fast approximation method. RESULTS: In this article, we propose an approximation method for computing the steady-state probability distribution of a PBN based on neglecting some Boolean networks (BNs) with very small probabilities during the construction of the transition probability matrix. An error analysis of this approximation method is given and theoretical result on the distribution of BNs in a PBN with at most two Boolean functions for one gene is also presented. These give a foundation and support for the approximation method. Numerical experiments based on a genetic network are given to demonstrate the efficiency of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号