首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Protein interaction networks display approximate scale-free topology, in which hub proteins that interact with a large number of other proteins determine the overall organization of the network. In this study, we aim to determine whether hubs are distinguishable from other networked proteins by specific sequence features. Proteins of different connectednesses were compared in the interaction networks of Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, and Homo sapienswith respect to the distribution of predicted structural disorder, sequence repeats, low complexity regions, and chain length. Highly connected proteins ("hub proteins") contained significantly more of, and greater proportion of, these sequence features and tended to be longer overall as compared to less connected proteins. These sequence features provide two different functional means for realizing multiple interactions: (1) extended interaction surface and (2) flexibility and adaptability, providing a mechanism for the same region to bind distinct partners. Our view contradicts the prevailing view that scaling in protein interactomes arose from gene duplication and preferential attachment of equivalent proteins. We propose an alternative evolutionary network specialization process, in which certain components of the protein interactome improved their fitness for binding by becoming longer or accruing regions of disorder and/or internal repeats and have therefore become specialized in network organization.  相似文献   

2.
Proteins that can interact with multiple partners play central roles in the network of protein-protein interactions. They are called hub proteins, and recently it was suggested that an abundance of intrinsically disordered regions on their surfaces facilitates their binding to multiple partners. However, in those studies, the hub proteins were identified as proteins with multiple partners, regardless of whether the interactions were transient or permanent. As a result, a certain number of hub proteins are subunits of stable multi-subunit proteins, such as supramolecules. It is well known that stable complexes and transient complexes have different structural features, and thus the statistics based on the current definition of hub proteins will hide the true nature of hub proteins. Therefore, in this paper, we first describe a new approach to identify proteins with multiple partners dynamically, using the Protein Data Bank, and then we performed statistical analyses of the structural features of these proteins. We refer to the proteins as transient hub proteins or sociable proteins, to clarify the difference with hub proteins. As a result, we found that the main difference between sociable and nonsociable proteins is not the abundance of disordered regions, in contrast to the previous studies, but rather the structural flexibility of the entire protein. We also found greater predominance of charged and polar residues in sociable proteins than previously reported.  相似文献   

3.
4.
Tobi D 《Proteins》2012,80(4):1167-1176
A novel methodology for comparison of protein dynamics is presented. Protein dynamics is calculated using the Gaussian network model and the modes of motion are globally aligned using the dynamic programming algorithm of Needleman and Wunsch, commonly used for sequence alignment. The alignment is fast and can be used to analyze large sets of proteins. The methodology is applied to the four major classes of the SCOP database: "all alpha proteins," "all beta proteins," "alpha and beta proteins," and "alpha/beta proteins". We show that different domains may have similar global dynamics. In addition, we report that the dynamics of "all alpha proteins" domains are less specific to structural variations within a given fold or superfamily compared with the other classes. We report that domain pairs with the most similar and the least similar global dynamics tend to be of similar length. The significance of the methodology is that it suggests a new and efficient way of mapping between the global structural features of protein families/subfamilies and their encoded dynamics.  相似文献   

5.
Proper protein localization is essential for critical cellular processes, including vesicle‐mediated transport and protein translocation. Tail‐anchored (TA) proteins are integrated into organellar membranes via the C‐terminus, orienting the N‐terminus towards the cytosol. Localization of TA proteins occurs posttranslationally and is governed by the C‐terminus, which contains the integral transmembrane domain (TMD) and targeting sequence. Targeting of TA proteins is dependent on the hydrophobicity of the TMD as well as the length and composition of flanking amino acid sequences. We previously identified an unusual homologue of elongator protein, Elp3, in the apicomplexan parasite Toxoplasma gondii as a TA protein targeting the outer mitochondrial membrane. We sought to gain further insight into TA proteins and their targeting mechanisms using this early‐branching eukaryote as a model. Our bioinformatics analysis uncovered 59 predicted TA proteins in Toxoplasma, 9 of which were selected for follow‐up analyses based on representative features. We identified novel TA proteins that traffic to specific organelles in Toxoplasma, including the parasite endoplasmic reticulum, mitochondrion, and Golgi apparatus. Domain swap experiments elucidated that targeting of TA proteins to these specific organelles was strongly influenced by the TMD sequence, including charge of the flanking C‐terminal sequence.   相似文献   

6.

Background

Protein-protein interactions are critical to elucidating the role played by individual proteins in important biological pathways. Of particular interest are hub proteins that can interact with large numbers of partners and often play essential roles in cellular control. Depending on the number of binding sites, protein hubs can be classified at a structural level as singlish-interface hubs (SIH) with one or two binding sites, or multiple-interface hubs (MIH) with three or more binding sites. In terms of kinetics, hub proteins can be classified as date hubs (i.e., interact with different partners at different times or locations) or party hubs (i.e., simultaneously interact with multiple partners).

Methodology

Our approach works in 3 phases: Phase I classifies if a protein is likely to bind with another protein. Phase II determines if a protein-binding (PB) protein is a hub. Phase III classifies PB proteins as singlish-interface versus multiple-interface hubs and date versus party hubs. At each stage, we use sequence-based predictors trained using several standard machine learning techniques.

Conclusions

Our method is able to predict whether a protein is a protein-binding protein with an accuracy of 94% and a correlation coefficient of 0.87; identify hubs from non-hubs with 100% accuracy for 30% of the data; distinguish date hubs/party hubs with 69% accuracy and area under ROC curve of 0.68; and SIH/MIH with 89% accuracy and area under ROC curve of 0.84. Because our method is based on sequence information alone, it can be used even in settings where reliable protein-protein interaction data or structures of protein-protein complexes are unavailable to obtain useful insights into the functional and evolutionary characteristics of proteins and their interactions.

Availability

We provide a web server for our three-phase approach: http://hybsvm.gdcb.iastate.edu.  相似文献   

7.

Background

Membrane proteins perform essential roles in diverse cellular functions and are regarded as major pharmaceutical targets. The significance of membrane proteins has led to the developing dozens of resources related with membrane proteins. However, most of these resources are built for specific well-known membrane protein groups, making it difficult to find common and specific features of various membrane protein groups.

Methods

We collected human membrane proteins from the dispersed resources and predicted novel membrane protein candidates by using ortholog information and our membrane protein classifiers. The membrane proteins were classified according to the type of interaction with the membrane, subcellular localization, and molecular function. We also made new feature dataset to characterize the membrane proteins in various aspects including membrane protein topology, domain, biological process, disease, and drug. Moreover, protein structure and ICD-10-CM based integrated disease and drug information was newly included. To analyze the comprehensive information of membrane proteins, we implemented analysis tools to identify novel sequence and functional features of the classified membrane protein groups and to extract features from protein sequences.

Results

We constructed HMPAS with 28,509 collected known membrane proteins and 8,076 newly predicted candidates. This system provides integrated information of human membrane proteins individually and in groups organized by 45 subcellular locations and 1,401 molecular functions. As a case study, we identified associations between the membrane proteins and diseases and present that membrane proteins are promising targets for diseases related with nervous system and circulatory system. A web-based interface of this system was constructed to facilitate researchers not only to retrieve organized information of individual proteins but also to use the tools to analyze the membrane proteins.

Conclusions

HMPAS provides comprehensive information about human membrane proteins including specific features of certain membrane protein groups. In this system, user can acquire the information of individual proteins and specified groups focused on their conserved sequence features, involved cellular processes, and diseases. HMPAS may contribute as a valuable resource for the inference of novel cellular mechanisms and pharmaceutical targets associated with the human membrane proteins. HMPAS is freely available at http://fcode.kaist.ac.kr/hmpas.
  相似文献   

8.
9.
It has been claimed that proteins with more interaction partners (hubs) are both physiologically more important (i.e., less dispensable) and, owing to an assumed high density of binding sites, slow evolving. Not all analyses, however, support these results, probably because of biased and less-than reliable global protein interaction data. Here we provide the first examination of these issues using a comprehensive literature-curated dataset of well-substantiated protein interactions in Saccharomyces cerevisiae. Whereas use of less reliable yeast two-hybrid data alone can reject the possibility that local connectivity correlates with measures of dispensability, in higher quality datasets a relatively robust correlation is observed. In contrast, local connectivity does not correlate with the rate of protein evolution even in reliable datasets. This perhaps surprising lack of correlation with evolutionary rate appears in part to arise from the fact that hub proteins do not have a higher density of residues associated with binding. However, hub proteins do have at least one other set of unusual features, namely rapid turnover and regulation, as manifest in high mRNA decay rates and a large number of phosphorylation sites. This, we suggest, is an adaptation to minimize unwanted activation of pathways that might be mediated by adventitious binding to hubs, were they to actively persist longer than required at any given time point. We conclude that hub proteins are more important for cellular growth rate and under tight regulation but are not slow evolving.  相似文献   

10.

Background

Protein complexes are important for understanding principles of cellular organization and functions. With the availability of large amounts of high-throughput protein-protein interactions (PPI), many algorithms have been proposed to discover protein complexes from PPI networks. However, existing algorithms generally do not take into consideration the fact that not all the interactions in a PPI network take place at the same time. As a result, predicted complexes often contain many spuriously included proteins, precluding them from matching true complexes.

Results

We propose two methods to tackle this problem: (1) The localization GO term decomposition method: We utilize cellular component Gene Ontology (GO) terms to decompose PPI networks into several smaller networks such that the proteins in each decomposed network are annotated with the same cellular component GO term. (2) The hub removal method: This method is based on the observation that hub proteins are more likely to fuse clusters that correspond to different complexes. To avoid this, we remove hub proteins from PPI networks, and then apply a complex discovery algorithm on the remaining PPI network. The removed hub proteins are added back to the generated clusters afterwards. We tested the two methods on the yeast PPI network downloaded from BioGRID. Our results show that these methods can improve the performance of several complex discovery algorithms significantly. Further improvement in performance is achieved when we apply them in tandem.

Conclusions

The performance of complex discovery algorithms is hindered by the fact that not all the interactions in a PPI network take place at the same time. We tackle this problem by using localization GO terms or hubs to decompose a PPI network before complex discovery, which achieves considerable improvement.
  相似文献   

11.
Pang E  Tan T  Lin K 《Molecular bioSystems》2012,8(3):766-771
Domain-domain interactions are a critical type of the mechanisms mediating protein-protein interactions (PPIs). For a given protein domain, its ability to combine with distinct domains is usually referred to as promiscuity or versatility. Interestingly, a previous study has reported that a domain's promiscuity may reflect its ability to interact with other domains in human proteins. In this work, promiscuous domains were first identified from the yeast genome. Then, we sought to determine what roles promiscuous domains might play in the PPI network. Mapping the promiscuous domains onto the proteins in this network revealed that, consistent with the previous knowledge, the hub proteins were significantly enriched with promiscuous domains. We also found that the set of hub proteins were not the same set as those proteins with promiscuous domains, although there was some overlap. Analysis of the topological properties of this yeast PPI network showed that the characteristic path length of the network increased significantly after deleting proteins with promiscuous domains. This indicated that communication between two proteins was longer and the network stability decreased. These observations suggested that, as the hub proteins, proteins with promiscuous domains might play a role in maintaining network stability. In addition, functional analysis revealed that proteins with promiscuous domains mainly participated in the "Folding, Sorting, and Degradation" and "Replication and Repair" biological pathways, and that they significantly execute key molecular functions, such as "nucleoside-triphosphatase activity (GO:0017111)."  相似文献   

12.
Protein sequence features are explored in relation to the production of over-expressed extracellular proteins by fungi. Knowledge on features influencing protein production and secretion could be employed to improve enzyme production levels in industrial bioprocesses via protein engineering. A large set, over 600 homologous and nearly 2,000 heterologous fungal genes, were overexpressed in Aspergillus niger using a standardized expression cassette and scored for high versus no production. Subsequently, sequence-based machine learning techniques were applied for identifying relevant DNA and protein sequence features. The amino-acid composition of the protein sequence was found to be most predictive and interpretation revealed that, for both homologous and heterologous gene expression, the same features are important: tyrosine and asparagine composition was found to have a positive correlation with high-level production, whereas for unsuccessful production, contributions were found for methionine and lysine composition. The predictor is available online at http://bioinformatics.tudelft.nl/hipsec. Subsequent work aims at validating these findings by protein engineering as a method for increasing expression levels per gene copy.  相似文献   

13.
蛋白质相互作用网络的构建可以为探究茶树生长过程中的关键蛋白并预测其功能提供理论参考。以贵州都匀地区福鼎大白茶为研究对象,利用三代Nanopore测序技术和同源比对方法构建福鼎大白茶根、茎、叶差异基因蛋白质相互作用网络,通过网络进一步预测关键蛋白及其功能。结果表明,叶与根、叶与茎、茎与根和根茎叶差异基因蛋白质相互作用网络中的关键蛋白分别为53、39、42和53个,并且预测出了关键蛋白的功能,如TEA003744的功能可能为腺苷酸激酶活性、蛋白质丝氨酸/苏氨酸激酶活性和ATP结合,参与光合作用和蛋白质磷酸化过程;TEA026776可能为发育蛋白,参与细胞分化过程和蛋白质磷酸化过程,还具有ATP结合活性和蛋白激酶活性;TEA019056的功能可能为ATP结合、GTP结合和GTP酶活性,参与过氧化物酶体组织的组成和蛋白质磷酸化等。随后预测出4个网络中打分最高功能模块的功能,并进行拓扑属性分析、功能模块分析、集群注释和聚类分析。研究结果可为鉴定蛋白质的功能、寻找关键蛋白及选育优良品种等提供理论依据。  相似文献   

14.
Porcine pleuropneumonia caused by Actinobacillus pleuropneumoniae has led to severe economic losses in the pig industry worldwide. A. pleuropneumoniae displays various levels of antimicrobial resistance, leading to the dire need to identify new drug targets. Protein–protein interaction (PPI) network can aid the identification of drug targets by discovering essential proteins during the life of bacteria. The aim of this study is to identify drug target candidates of A. pleuropneumoniae from essential proteins in PPI network. The homologous protein mapping method (HPM) was utilized to construct A. pleuropneumoniae PPI network. Afterwards, the subnetwork centered with H-NS was selected to verify the PPI network using bacterial two-hybrid assays. Drug target candidates were identified from the hub proteins by analyzing the topology of the network using interaction degree and homologous comparison with the pig proteome. An A. pleuropneumoniae PPI network containing 2737 non-redundant interaction pairs among 533 proteins was constructed. These proteins were distributed in 21 COG functional categories and 28 KEGG metabolic pathways. The A. pleuropneumoniae PPI network was scale free and the similar topological tendencies were found when compared with other bacteria PPI network. Furthermore, 56.3% of the H-NS subnetwork interactions were validated. 57 highly connected proteins (hub proteins) were identified from the A. pleuropneumoniae PPI network. Finally, 9 potential drug targets were identified from the hub proteins, with no homologs in swine. This study provides drug target candidates, which are promising for further investigations to explore lead compounds against A. pleuropneumoniae.  相似文献   

15.
Homology detection and protein structure prediction are central themes in bioinformatics. Establishment of relationship between protein sequences or prediction of their structure by sequence comparison methods finds limitations when there is low sequence similarity. Recent works demonstrate that the use of profiles improves homology detection and protein structure prediction. Profiles can be inferred from protein multiple alignments using different approaches. The "Conservatism-of-Conservatism" is an effective profile analysis method to identify structural features between proteins having the same fold but no detectable sequence similarity. The information obtained from protein multiple alignments varies according to the amino acid classification employed to calculate the profile. In this work, we calculated entropy profiles from PSI-BLAST-derived multiple alignments and used different amino acid classifications summarizing almost 500 different attributes. These entropy profiles were converted into pseudocodes which were compared using the FASTA program with an ad-hoc matrix. We tested the performance of our method to identify relationships between proteins with similar fold using a nonredundant subset of sequences having less than 40% of identity. We then compared our results using Coverage Versus Error per query curves, to those obtained by methods like PSI-BLAST, COMPASS and HHSEARCH. Our method, named HIP (Homology Identification with Profiles) presented higher accuracy detecting relationships between proteins with the same fold. The use of different amino acid classifications reflecting a large number of amino acid attributes, improved the recognition of distantly related folds. We propose the use of pseudocodes representing profile information as a fast and powerful tool for homology detection, fold assignment and analysis of evolutionary information enclosed in protein profiles.  相似文献   

16.
The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm.  相似文献   

17.
Recent advances in next-generation sequencing technologies have resulted in an exponential increase in the rate at which protein sequence data are being acquired. The k-gram feature representation, commonly used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques can be crucial for the performance and the complexity of the learning algorithms. In this paper, we study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by hashing the features into a low-dimensional space, using a hash function, i.e., by mapping features into hash keys, where multiple features can be mapped (at random) to the same hash key, and "aggregating" their counts. We compare feature hashing with the "bag of k-grams" approach. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.  相似文献   

18.

Background

In spite of the scale-free degree distribution that characterizes most protein interaction networks (PINs), it is common to define an ad hoc degree scale that defines “hub” proteins having special topological and functional significance. This raises the concern that some conclusions on the functional significance of proteins based on network properties may not be robust.

Methodology

In this paper we present three objective methods to define hub proteins in PINs: one is a purely topological method and two others are based on gene expression and function. By applying these methods to four distinct PINs, we examine the extent of agreement among these methods and implications of these results on network construction.

Conclusions

We find that the methods agree well for networks that contain a balance between error-free and unbiased interactions, indicating that the hub concept is meaningful for such networks.  相似文献   

19.
The baseplate of phage T4 is an important model system in viral supramolecular assembly. The baseplate consists of six wedges surrounding the central hub. We report the first successful attempt at complete wedge assembly using an in vitro approach based on recombinant proteins. The cells expressing the individual wedge proteins were mixed in a combinatorial manner and then lysed. Using this approach, we could both reliably isolate the complete wedge along with a series of intermediate complexes as well as determine the exact sequence of assembly. The individual proteins and intermediate complexes at each step of the wedge assembly were successfully purified and characterized by sedimentation velocity and electron microscopy. Although our results mostly confirmed the hypothesized sequential wedge assembly pathway as established using phage mutants, interestingly, we also detected some protein interactions not following the specified order. It was found that association of gene product 53 to the immediate precursor complex induces spontaneous association of the wedges to form a six-fold star-shaped baseplate-like structure in the absence of the hub. The formation of the baseplate-like structure was facilitated by the addition of gene product 25. The complete wedge in the star-shaped supramolecular complex has a structure similar to the baseplate in the expanded “star” conformation found after infection. Based on the results of the present and previous studies, we assume that the strict order of wedge assembly is due to the induced conformational change caused by every new binding event. The significance of a 40-S star-shaped baseplate structure, which was previously reported and was also found in this study, is discussed in the light of a new paradigm for T4 baseplate assembly involving the star-shaped wedge ring and the central hub. Importantly, the methods described in this article suggest a novel methodology for future structural characterization of supramolecular protein assemblies.  相似文献   

20.
Bastolla U  Bruscolini P  Velasco JL 《Proteins》2012,80(9):2287-2304
In comparison with intense investigation of the structural determinants of protein folding rates, the sequence features favoring fast folding have received little attention. Here, we investigate this subject using simple models of protein folding and a statistical analysis of the Protein Data Bank (PDB). The mean-field model by Plotkin and coworkers predicts that the folding rate is accelerated by stronger-than-average interactions at short distance along the sequence. We confirmed this prediction using the Finkelstein model of protein folding, which accounts for realistic features of polymer entropy. We then tested this prediction on the PDB. We found that native interactions are strongest at contact range l = 8. However, since short range contacts tend to be exposed and they are frequently formed in misfolded structures, selection for folding stability tends to make them less attractive, that is, stability and kinetics may have contrasting requirements. Using a recently proposed model, we predicted the relationship between contact range and contact energy based on buriedness and contact frequency. Deviations from this prediction induce a positive correlation between contact range and contact energy, that is, short range contacts are stronger than expected, for 2/3 of the proteins. This correlation increases with the absolute contact order (ACO), as expected if proteins that tend to fold slowly due to large ACO are subject to stronger selection for sequence features favoring fast folding. Our results suggest that the selective pressure for fast folding is detectable only for one third of the proteins in the PDB, in particular those with large contact order.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号