首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

With ever increasing amount of available data on biological networks, modeling and understanding the structure of these large networks is an important problem with profound biological implications. Cellular functions and biochemical events are coordinately carried out by groups of proteins interacting each other in biological modules. Identifying of such modules in protein interaction networks is very important for understanding the structure and function of these fundamental cellular networks. Therefore, developing an effective computational method to uncover biological modules should be highly challenging and indispensable.

Results

The purpose of this study is to introduce a new quantitative measure modularity density into the field of biomolecular networks and develop new algorithms for detecting functional modules in protein-protein interaction (PPI) networks. Specifically, we adopt the simulated annealing (SA) to maximize the modularity density and evaluate its efficiency on simulated networks. In order to address the computational complexity of SA procedure, we devise a spectral method for optimizing the index and apply it to a yeast PPI network.

Conclusions

Our analysis of detected modules by the present method suggests that most of these modules have well biological significance in context of protein complexes. Comparison with the MCL and the modularity based methods shows the efficiency of our method.
  相似文献   

2.
Protein interaction networks are known to exhibit remarkable structures: scale-free and small-world and modular structures. To explain the evolutionary processes of protein interaction networks possessing scale-free and small-world structures, preferential attachment and duplication-divergence models have been proposed as mathematical models. Protein interaction networks are also known to exhibit another remarkable structural characteristic, modular structure. How the protein interaction networks became to exhibit modularity in their evolution? Here, we propose a hypothesis of modularity in the evolution of yeast protein interaction network based on molecular evolutionary evidence. We assigned yeast proteins into six evolutionary ages by constructing a phylogenetic profile. We found that all the almost half of hub proteins are evolutionarily new. Examining the evolutionary processes of protein complexes, functional modules and topological modules, we also found that member proteins of these modules tend to appear in one or two evolutionary ages. Moreover, proteins in protein complexes and topological modules show significantly low evolutionary rates than those not in these modules. Our results suggest a hypothesis of modularity in the evolution of yeast protein interaction network as systems evolution.  相似文献   

3.
Functional modularity is a key attribute of cellular systems and has important roles in evolution. However, the extent to which functional modularity affects protein evolution is largely unknown. Here, we analyzed the evolution of both sequence and expression level of proteins in the yeast Saccharomyces cerevisiae and found that proteins within the same functional modules evolve at more similar rates than those between different modules. We also found stronger co-evolution of expression levels between proteins within functional modules than between them. These results suggest that a coordinated evolution of both the sequence and expression level of proteins is constrained by functional modularity.  相似文献   

4.
Our efforts to classify the functional units of many proteins, the modules, are reviewed. The data from the sequencing projects for various model organisms are extremely helpful in deducing the evolution of proteins and modules. For example, a dramatic increase of modular proteins can be observed from yeast to C. elegans in accordance with new protein functions that had to be introduced in multicellular organisms. Our sequence characterization of modules relies on sensitive similarity search algorithms and the collection of multiple sequence alignments for each module. To trace the evolution of modules and to further automate the classification, we have developed a sequence and a module alerting system that checks newly arriving sequence data for the presence of already classified modules. Using these systems, we were able to identify an unexpected similarity between extracellular C1Q modules with bacterial proteins.  相似文献   

5.
6.
The biotechnological application of enzymes necessitates a permanent quest for new biocatalysts. Among others, improvement of catalytic activity, modification of substrate specificity, or increase in stability of the enzymes are desirable goals. The exploration of homologous enzymes from various sources or DNA-based methods, like site-directed mutagenesis or directed evolution, yield an incredible variety of biocatalysts but they all rely on the restricted number of canonical amino acids. Chemistry offers an almost unlimited palette of additional modifications which can endow the proteins with improved or even completely new properties. Numerous techniques to furnish proteins with non-natural amino acids or non-proteinogenic modules have been introduced and are reviewed with special focus on expressed protein ligation, a method that combines the potential of protein biosynthesis and chemical synthesis. An erratum to this article can be found at  相似文献   

7.
Programs exist for searching protein sequences for potential membrane-penetrating segments (hydrophobic regions) and for lipid-binding sites with highly defined tertiary structures, such as PH, FERM, C2, ENTH, and other domains. However, a rapidly growing number of membrane-associated proteins (including cytoskeletal proteins, kinases, GTP-binding proteins, and their effectors) bind lipids through less structured regions. Here, we describe the development and testing of a simple computer search program that identifies unstructured potential membrane-binding sites. Initially, we found that both basic and hydrophobic amino acids, irrespective of sequence, contribute to the binding to acidic phospholipid vesicles of synthetic peptides that correspond to the putative membrane-binding domains of Acanthamoeba class I myosins. Based on these results, we modified a hydrophobicity scale giving Arg- and Lys-positive, rather than negative, values. Using this basic and hydrophobic scale with a standard search algorithm, we successfully identified previously determined unstructured membrane-binding sites in all 16 proteins tested. Importantly, basic and hydrophobic searches identified previously unknown potential membrane-binding sites in class I myosins, PAKs and CARMIL (capping protein, Arp2/3, myosin I linker; a membrane-associated cytoskeletal scaffold protein), and synthetic peptides and protein domains containing these newly identified sites bound to acidic phospholipids in vitro.  相似文献   

8.
Summary Chou-Fasman parameters, measuring preferences of each amino acid for different conformational regions in proteins, were used to obtain an amino acid difference index of conformational parameter distance (CPD) values. CPD values were found to be significantly lower for amino acid exchanges representing in the genetic code transitions of purines, GA than for exchanges representing either transitions of pyrimidines, CU, or transversions of purines and pyrimidines. Inasmuch as the distribution of CPD values in these non GA exchanges resembles that obtained for amino acid pairs with double or triple base differences in their underlying codons, we conclude that the genetic code was not particularly designed to minimize effects of mutation on protein conformation. That natural selection minimizes these changes, however, was shown by tabulating results obtained by the maximum parsimony method for eight protein genealogies with a total occurrence of 4574 base substitutions. At the beginning position of the codons GA transitions were in very great excess over other base substitutions, and, conversely, CU transitions were deficient. At the middle position of the codons only fast evolving proteins showed an excess of GA transitions, as though selection mainly preserved conformation in these proteins while weeding out mutations affecting chemical properties of functional sites in slow evolving proteins. In both fast and slow evolving proteins the net direction of transitions and transversions was found to be from G beginning codons to non-G beginning codons resulting in more commonly occurring amino acids, especially alanine with its generalized conformational properties, being replaced at suitable sites by amino acids with more specialized conformational and chemical properties. Historical circumstances pertaining to the origin of the genetic code and the nature of primordial proteins could account for such directional changes leading to increases in the functional density of proteins.In order to further explore the course of protein evolution, a modified parsimony algorithm was developed for constructing protein genealogies on the basis of minimum CPD length. The algorithm's ability to judge with finer discrimination that in protein evolution certain pathways of amino acid substitution should occur more readily than others was considered a potential advantage over strict maximum parsimony. In developing this CPD algorithm, the path of minimum CPD length through intermediate amino acids allowed by the genetic code for each pair of amino acids was determined. It was found that amino acid exchanges representing two base changes have a considerably lower average CPD value per base substitution than the amino acid exchanges representing single base changes. Amino acid exchanges representing three base changes have yet a further marked reduction in CPD per base change. This shows how extreme constraining effects of stabilizing selection can be circumvented, for by way of intermediate amino acids almost any amino acid can ultimately be substituted for another without damage to an evolving protein's conformation during the process.  相似文献   

9.
Domains are the building blocks of proteins and play a crucial role in protein-protein interactions. Here, we propose a new approach for the analysis and prediction of domain-domain interfaces. Our method, which relies on the representation of domains as residue-interacting networks, finds an optimal decomposition of domain structures into modules. The resulting modules comprise highly cooperative residues, which exhibit few connections with other modules. We found that non-overlapping binding sites in a domain, involved in different domain-domain interactions, are generally contained in different modules. This observation indicates that our modular decomposition is able to separate protein domains into regions with specialized functions. Our results show that modules with high modularity values identify binding site regions, demonstrating the predictive character of modularity. Furthermore, the combination of modularity with other characteristics, such as sequence conservation or surface patches, was found to improve our predictions. In an attempt to give a physical interpretation to the modular architecture of domains, we analyzed in detail six examples of protein domains with available experimental binding data. The modular configuration of the TEM1-beta-lactamase binding site illustrates the energetic independence of hotspots located in different modules and the cooperativity of those sited within the same modules. The energetic and structural cooperativity between intramodular residues is also clearly shown in the example of the chymotrypsin inhibitor, where non-binding site residues have a synergistic effect on binding. Interestingly, the binding site of the T cell receptor beta chain variable domain 2.1 is contained in one module, which includes structurally distant hot regions displaying positive cooperativity. These findings support the idea that modules possess certain functional and energetic independence. A modular organization of binding sites confers robustness and flexibility to the performance of the functional activity, and facilitates the evolution of protein interactions.  相似文献   

10.
The modular nature of protein folds suggests that present day proteins evolved via duplication and recombination of smaller functional elements. However, the reconstruction of these putative evolutionary pathways after many millions of years of evolutionary drift has thus far proven difficult, with all attempts to date failing to produce a functional protein. Tachylecin-2 is a monomeric 236 amino acid, five-bladed beta-propeller with five sugar-binding sites. This protein was isolated from a horseshoe crab that emerged ca 500 million years ago. The modular, yet ancient, nature of Tachylectin-2 makes it an excellent model for exploring the evolution of proteins from smaller subunits. To this end, we generated genetically diverse libraries by incremental truncation of the Tachylectin-2 gene and screened them for functional lectins. A number of approximately 100 amino acid residue segments were isolated with the ability to assemble into active homo-pentamers. The topology of most of these segments follows a "hidden" module that differs from the modules observed in wild-type Tachylectin-2, yet their biophysical properties and sugar binding activities resemble the wild-type's. Since the pentamer's molecular mass is twofold higher than the wild-type (approximately 500 amino acid residues), the structure of these oligomeric forms is likely to also differ. Our laboratory evolution experiments highlight the versatility and modularity of the beta-propeller fold, while substantiating the hypothesis that proteins with high internal symmetry, such as beta-propellers, evolved from short, functional gene segments that, at later stages, duplicated, fused, and rearranged, to yield the folds we recognise today.  相似文献   

11.
Summary The markedly nonuniform, even systematic distribution of sequences in the protein universe has been analyzed by methods of protein taxonomy. Mapping of the natural hierarchical system of proteins has revealed some dense cores, i.e., well-defined clusterings of proteins that seem to be natural structural groupings, possibly seeds for a future protein taxonomy.The aim was not to force proteins into more or less man-made categories by discriminant analysis, but to find structurally similar groups, possibly of common evolutionary origin. Single-valued distance measures between pairs of superfamilies from the Protein Identification Resource were defined by two 2-like methods on tripeptide frequencies and the variable-length subsequence identity method derived from dot-matrix comparisons. Distance matrices were processed by several methods of cluster analysis to detect phylogenetic continuum between highly divergent proteins.Only well-defined clusters characterized by relatively unique structural, intracellular environmental, organismal, and functional attribute states were selected as major protein groups, including subsets of viral and Escherichia coli proteins, hormones, inhibitors, plant, ribosomal, serum and structural proteins, amino acid synthases, and clusters dominated by certain oxidoreductases and apolar and DNA-associated enzymes.The limited repertoire of functional patterns due to small genome size, the high rate of recombination, specific features of the bacterial membranes, or of the virus cycle canalize certain proteins of viruses and Gram-negative bacteria, respectively, to organismal groups.  相似文献   

12.
Domains are the building blocks of proteins and play a crucial role in protein–protein interactions. Here, we propose a new approach for the analysis and prediction of domain–domain interfaces. Our method, which relies on the representation of domains as residue-interacting networks, finds an optimal decomposition of domain structures into modules. The resulting modules comprise highly cooperative residues, which exhibit few connections with other modules. We found that non-overlapping binding sites in a domain, involved in different domain–domain interactions, are generally contained in different modules. This observation indicates that our modular decomposition is able to separate protein domains into regions with specialized functions. Our results show that modules with high modularity values identify binding site regions, demonstrating the predictive character of modularity. Furthermore, the combination of modularity with other characteristics, such as sequence conservation or surface patches, was found to improve our predictions. In an attempt to give a physical interpretation to the modular architecture of domains, we analyzed in detail six examples of protein domains with available experimental binding data. The modular configuration of the TEM1-β-lactamase binding site illustrates the energetic independence of hotspots located in different modules and the cooperativity of those sited within the same modules. The energetic and structural cooperativity between intramodular residues is also clearly shown in the example of the chymotrypsin inhibitor, where non–binding site residues have a synergistic effect on binding. Interestingly, the binding site of the T cell receptor β chain variable domain 2.1 is contained in one module, which includes structurally distant hot regions displaying positive cooperativity. These findings support the idea that modules possess certain functional and energetic independence. A modular organization of binding sites confers robustness and flexibility to the performance of the functional activity, and facilitates the evolution of protein interactions.  相似文献   

13.
The Shannon information entropy of protein sequences.   总被引:6,自引:1,他引:5       下载免费PDF全文
A comprehensive data base is analyzed to determine the Shannon information content of a protein sequence. This information entropy is estimated by three methods: a k-tuplet analysis, a generalized Zipf analysis, and a "Chou-Fasman gambler." The k-tuplet analysis is a "letter" analysis, based on conditional sequence probabilities. The generalized Zipf analysis demonstrates the statistical linguistic qualities of protein sequences and uses the "word" frequency to determine the Shannon entropy. The Zipf analysis and k-tuplet analysis give Shannon entropies of approximately 2.5 bits/amino acid. This entropy is much smaller than the value of 4.18 bits/amino acid obtained from the nonuniform composition of amino acids in proteins. The "Chou-Fasman" gambler is an algorithm based on the Chou-Fasman rules for protein structure. It uses both sequence and secondary structure information to guess at the number of possible amino acids that could appropriately substitute into a sequence. As in the case for the English language, the gambler algorithm gives significantly lower entropies than the k-tuplet analysis. Using these entropies, the number of most probable protein sequences can be calculated. The number of most probable protein sequences is much less than the number of possible sequences but is still much larger than the number of sequences thought to have existed throughout evolution. Implications of these results for mutagenesis experiments are discussed.  相似文献   

14.
Alignment free methods based on Chaos Game Representation (CGR), also known as sequence signature approaches, have proven of great interest for DNA sequence analysis. Indeed, they have been successfully applied for sequence comparison, phylogeny, detection of horizontal transfers or extraction of representative motifs in regulation sequences. Transposing such methods to proteins poses several fundamental questions related to representation space dimensionality. Several studies have tackled these points, but none has, so far, brought the application of CGRs to proteins to their fully expected potential. Yet, several studies have shown that techniques based on n-peptide frequencies can be relevant for proteins. Here, we investigate the effectiveness of a strategy based on the CGR approach using a fixed reverse encoding of amino acids into nucleic sequences. We first explore its relevance to protein classification into functional families. We then attempt to apply it to the prediction of protein structural classes. Our results suggest that the reverse encoding approach could be relevant in both cases. We show that it is able to classify functional families of proteins by extracting signatures close to the ProSite patterns. Applied to structural classification, the approach reaches scores of correct classification close to 84%, i.e. close to the scores of related methods in the field. Various optimizations of the approach are still possible, which open the door for future applications.  相似文献   

15.
Proteins that assimilate particular elements were found to avoid using amino acids containing the element, which indicates that the metabolic constraints of amino acids may influence the evolution of proteins. We suspected that low contents of carbon, nitrogen, and sulfur may also be selected for economy in highly abundant proteins that consume large amounts of the resources of cells. By analyzing recently available proteomic data in Escherichia coli, Saccharomyces cerevisiae, and Schizosaccharomyces pombe, we found that at least the carbon and nitrogen contents in amino acid side chains are negatively correlated with protein abundance. An amino acid with a high number of carbon atoms in its side chain generally requires relatively more energy for its synthesis. Thus, it may be selected against in highly abundant proteins either because of economy in building blocks or because of economy in energy. Previous studies showed that highly abundant proteins preferentially use cheap (in terms of energy) amino acids. We found that the carbon content is still negatively correlated with protein abundance after controlling for the energetic cost of the amino acids. However, the negative correlation between protein abundance and energetic cost disappeared after controlling for carbon content. Building blocks seem to be more restricted than energy. It seems that the amino acid sequences of highly abundant proteins have to compromise between optimization for their biological functions and reducing the consumption of limiting resources. By contrast, the amino acid sequences of weakly expressed proteins are more likely to be optimized for their biological functions. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

16.
The accumulation of divergent histone H4 amino acid sequences within and between ciliate lineages challenges traditional views of the evolution of this essential eukaryotic protein. We analyzed histone H4 sequences from 13 species of ciliates and compared these data with sequences from well-sampled eukaryotic clades. Ciliate histone H4s differ from one another at as many as 46% of their amino acids, in contrast with the highly conserved character of this protein in most other eukaryotes. Equally striking, we find paralogs of histone H4 within ciliate genomes that differ by up to 25% of their amino acids, whereas paralogs in other eukaryotes share identical or nearly identical amino acid sequences. Moreover, the most divergent H4 proteins within ciliates are found in the lineages with highly processed macronuclear genomes. Our analyses demonstrate that the dual nature of ciliate genomes-the presence of a "germline" micronucleus and a "somatic" macronucleus within each cell-allowed the dramatic variation in ciliate histone genes by altering functional constraints or enabling adaptive evolution of the histone H4 protein, or both.  相似文献   

17.
The analysis of disulphide bond containing proteins in the Protein Data Bank (PDB) revealed that out of 27,209 protein structures analyzed, 12,832 proteins contain at least one intra-chain disulphide bond and 811 proteins contain at least one inter-chain disulphide bond. The intra-chain disulphide bond containing proteins can be grouped into 256 categories based on the number of disulphide bonds and the disulphide bond connectivity patterns (DBCPs) that were generated according to the position of half-cystine residues along the protein chain. The PDB entries corresponding to these 256 categories represent 509 unique SCOP superfamilies. A simple web-based computational tool is made freely available at the website http://www.ccmb.res.in/bioinfo/dsbcp that allows flexible queries to be made on the database in order to retrieve useful information on the disulphide bond containing proteins in the PDB. The database is useful to identify the different SCOP superfamilies associated with a particular disulphide bond connectivity pattern or vice versa. It is possible to define a query based either on a single field or a combination of the following fields, i.e., PDB code, protein name, SCOP superfamily name, number of disulphide bonds, disulphide bond connectivity pattern and the number of amino acid residues in a protein chain and retrieve information that match the criterion. Thereby, the database may be useful to select suitable protein structural templates in order to model the more distantly related protein homologs/analogs using the comparative modeling methods.  相似文献   

18.

Background

Identifying protein complexes is crucial to understanding principles of cellular organization and functional mechanisms. As many evidences have indicated that the subgraphs with high density or with high modularity in PPI network usually correspond to protein complexes, protein complexes detection methods based on PPI network focused on subgraph's density or its modularity in PPI network. However, dense subgraphs may have low modularity and subgraph with high modularity may have low density, which results that protein complexes may be subgraphs with low modularity or with low density in the PPI network. As the density-based methods are difficult to mine protein complexes with low density, and the modularity-based methods are difficult to mine protein complexes with low modularity, both two methods have limitation for identifying protein complexes with various density and modularity.

Results

To identify protein complexes with various density and modularity, including those have low density but high modularity and those have low modularity but high density, we define a novel subgraph's fitness, f ρ , as f ρ = (density) ρ *(modularity)1-ρ, and propose a novel algorithm, named LF_PIN, to identify protein complexes by expanding seed edges to subgraphs with the local maximum fitness value. Experimental results of LF-PIN in S.cerevisiae show that compared with the results of fitness equal to density (ρ = 1) or equal to modularity (ρ = 0), the LF-PIN identifies known protein complexes more effectively when the fitness value is decided by both density and modularity (0<ρ<1). Compared with the results of seven competing protein complex detection methods (CMC, Core-Attachment, CPM, DPClus, HC-PIN, MCL, and NFC) in S.cerevisiae and E.coli, LF-PIN outperforms other seven methods in terms of matching with known complexes and functional enrichment. Moreover, LF-PIN has better performance in identifying protein complexes with low density or with low modularity.

Conclusions

By considering both the density and the modularity, LF-PIN outperforms other protein complexes detection methods that only consider density or modularity, especially in identifying known protein complexes with low density or low modularity.
  相似文献   

19.
范燚  韩新焕  郁芸 《生物信息学》2012,10(3):169-173
查询人的BRCA1蛋白的氨基酸序列,利用生物信息学的方法进行相似性搜索,获得一系列BRCA1蛋白的氨基酸序列。选择了其中的11条序列,对BRCA1蛋白进行了多重序列分析和进化分析,对BRCA1蛋白的BRCT结构域进行三维同源模型的构建与比较分析。分析结果表明:BRCA1中某些特定部位的氨基酸序列高度保守;确定氨基酸的保守位点并联合进化分析可对基因错义突变的致病性做初步地猜测;相近物种来源的BRCA1具有较近的亲缘关系,而且具有极其相似的三维空间结构。这些为研究BRCA1蛋白的结构与功能关系提供指导意义。  相似文献   

20.
Knowledge of structural class plays an important role in understanding protein folding patterns. In this study, a simple and powerful computational method, which combines support vector machine with PSI-BLAST profile, is proposed to predict protein structural class for low-similarity sequences. The evolution information encoding in the PSI-BLAST profiles is converted into a series of fixed-length feature vectors by extracting amino acid composition and dipeptide composition from the profiles. The resulting vectors are then fed to a support vector machine classifier for the prediction of protein structural class. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence similarity lower than 40% and 25%, respectively. The overall accuracies attain 70.7% and 72.9% for 1189 and 25PDB datasets, respectively. Comparison of our results with other methods shows that our method is very promising to predict protein structural class particularly for low-similarity datasets and may at least play an important complementary role to existing methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号