首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
More than 30 organisms have been sequenced entirely. Here, we applied a variety of simple bioinformatics tools to analyze 29 proteomes for representatives from all three kingdoms: eukaryotes, prokaryotes, and archaebacteria. We confirmed that eukaryotes have relatively more long proteins than prokaryotes and archaes, and that the overall amino acid composition is similar among the three. We predicted that approximately 15%-30% of all proteins contained transmembrane helices. We could not find a correlation between the content of membrane proteins and the complexity of the organism. In particular, we did not find significantly higher percentages of helical membrane proteins in eukaryotes than in prokaryotes or archae. However, we found more proteins with seven transmembrane helices in eukaryotes and more with six and 12 transmembrane helices in prokaryotes. We found twice as many coiled-coil proteins in eukaryotes (10%) as in prokaryotes and archaes (4%-5%), and we predicted approximately 15%-25% of all proteins to be secreted by most eukaryotes and prokaryotes. Every tenth protein had no known homolog in current databases, and 30%-40% of the proteins fell into structural families with >100 members. A classification by cellular function verified that eukaryotes have a higher proportion of proteins for communication with the environment. Finally, we found at least one homolog of experimentally known structure for approximately 20%-45% of all proteins; the regions with structural homology covered 20%-30% of all residues. These numbers may or may not suggest that there are 1200-2600 folds in the universe of protein structures. All predictions are available at http://cubic.bioc.columbia.edu/genomes.  相似文献   

2.
Fuchs A  Kirschner A  Frishman D 《Proteins》2009,74(4):857-871
Despite rapidly increasing numbers of available 3D structures, membrane proteins still account for less than 1% of all structures in the Protein Data Bank. Recent high-resolution structures indicate a clearly broader structural diversity of membrane proteins than initially anticipated, motivating the development of reliable structure prediction methods specifically tailored for this class of molecules. One important prediction target capturing all major aspects of a protein's 3D structure is its contact map. Our analysis shows that computational methods trained to predict residue contacts in globular proteins perform poorly when applied to membrane proteins. We have recently published a method to identify interacting alpha-helices in membrane proteins based on the analysis of coevolving residues in predicted transmembrane regions. Here, we present a substantially improved algorithm for the same problem, which uses a newly developed neural network approach to predict helix-helix contacts. In addition to the input features commonly used for contact prediction of soluble proteins, such as windowed residue profiles and residue distance in the sequence, our network also incorporates features that apply to membrane proteins only, such as residue position within the transmembrane segment and its orientation toward the lipophilic environment. The obtained neural network can predict contacts between residues in transmembrane segments with nearly 26% accuracy. It is therefore the first published contact predictor developed specifically for membrane proteins performing with equal accuracy to state-of-the-art contact predictors available for soluble proteins. The predicted helix-helix contacts were employed in a second step to identify interacting helices. For our dataset consisting of 62 membrane proteins of solved structure, we gained an accuracy of 78.1%. Because the reliable prediction of helix interaction patterns is an important step in the classification and prediction of membrane protein folds, our method will be a helpful tool in compiling a structural census of membrane proteins.  相似文献   

3.
Methods that predict membrane helices have become increasingly useful in the context of analyzing entire proteomes, as well as in everyday sequence analysis. Here, we analyzed 27 advanced and simple methods in detail. To resolve contradictions in previous works and to reevaluate transmembrane helix prediction algorithms, we introduced an analysis that distinguished between performance on redundancy-reduced high- and low-resolution data sets, established thresholds for significant differences in performance, and implemented both per-segment and per-residue analysis of membrane helix predictions. Although some of the advanced methods performed better than others, we showed in a thorough bootstrapping experiment based on various measures of accuracy that no method performed consistently best. In contrast, most simple hydrophobicity scale-based methods were significantly less accurate than any advanced method as they overpredicted membrane helices and confused membrane helices with hydrophobic regions outside of membranes. In contrast, the advanced methods usually distinguished correctly between membrane-helical and other proteins. Nonetheless, few methods reliably distinguished between signal peptides and membrane helices. We could not verify a significant difference in performance between eukaryotic and prokaryotic proteins. Surprisingly, we found that proteins with more than five helices were predicted at a significantly lower accuracy than proteins with five or fewer. The important implication is that structurally unsolved multispanning membrane proteins, which are often important drug targets, will remain problematic for transmembrane helix prediction algorithms. Overall, by establishing a standardized methodology for transmembrane helix prediction evaluation, we have resolved differences among previous works and presented novel trends that may impact the analysis of entire proteomes.  相似文献   

4.
The prediction of a protein's structure from its amino acid sequence has been a long-standing goal of molecular biology. In this work, a new set of conformational parameters for membrane spanning alpha helices was developed using the information from the topology of 70 membrane proteins. Based on these conformational parameters, a simple algorithm has been formulated to predict the transmembrane alpha helices in membrane proteins. A FORTRAN program has been developed which takes the amino acid sequence as input and gives the predicted transmembrane alpha-helices as output. The present method correctly identifies 295 transmembrane helical segments in 70 membrane proteins with only two overpredictions. Furthermore, this method predicts all 45 transmembrane helices in the photosynthetic reaction center, bacteriorhodopsin and cytochrome c oxidase to an 86% level of accuracy and so is better than all other methods published to date.  相似文献   

5.
Experimental structure determination continues to be challenging for membrane proteins. Computational prediction methods are therefore needed and widely used to supplement experimental data. Here, we re‐examined the state of the art in transmembrane helix prediction based on a nonredundant dataset with 190 high‐resolution structures. Analyzing 12 widely‐used and well‐known methods using a stringent performance measure, we largely confirmed the expected high level of performance. On the other hand, all methods performed worse for proteins that could not have been used for development. A few results stood out: First, all methods predicted proteins in eukaryotes better than those in bacteria. Second, methods worked less well for proteins with many transmembrane helices. Third, most methods correctly discriminated between soluble and transmembrane proteins. However, several older methods often mistook signal peptides for transmembrane helices. Some newer methods have overcome this shortcoming. In our hands, PolyPhobius and MEMSAT‐SVM outperformed other methods. Proteins 2015; 83:473–484. © 2014 Wiley Periodicals, Inc.  相似文献   

6.
Adamian L  Liang J 《Proteins》2006,63(1):1-5
Analysis of a database of structures of membrane proteins shows that membrane proteins composed of 10 or more transmembrane (TM) helices often contain buried helices that are inaccessible to phospholipids. We introduce a method for identifying TM helices that are least phospholipid accessible and for prediction of fully buried TM helices in membrane proteins from sequence information alone. Our method is based on the calculation of residue lipophilicity and evolutionary conservation. Given that the number of buried helices in a membrane protein is known, our method achieves an accuracy of 78% and a Matthew's correlation coefficient of 0.68. A server for this tool (RANTS) is available online at http://gila.bioengr.uic.edu/lab/.  相似文献   

7.
It has been many years since position-specific residue preference around the ends of a helix was revealed. However, all the existing secondary structure prediction methods did not exploit this preference feature, resulting in low accuracy in predicting the ends of secondary structures. In this study, we collected a relatively large data set consisting of 1860 high-resolution, non-homology proteins from the PDB, and further analyzed the residue distributions around the ends of regular secondary structures. It was found that there exist position-specific residue preferences (PSRP) around the ends of not only helices but also strands. Based on the unique features, we proposed a novel strategy and developed a tool named E-SSpred that treats the secondary structure as a whole and builds models to predict entire secondary structure segments directly by integrating relevant features. In E-SSpred, the support vector machine (SVM) method is adopted to model and predict the ends of helices and strands according to the unique residue distributions around them. A simple linear discriminate analysis method is applied to model and predict entire secondary structure segments by integrating end-prediction results, tri-peptide composition, and length distribution features of secondary structures, as well as the prediction results of the most famous program PSIPRED. The results of fivefold cross-validation on a widely used data set demonstrate that the accuracy of E-SSpred in predicting ends of secondary structures is about 10% higher than PSIPRED, and the overall prediction accuracy (Q(3) value) of E-SSpred (82.2%) is also better than PSIPRED (80.3%). The E-SSpred web server is available at http://bioinfo.hust.edu.cn/bio/tools/E-SSpred/index.html.  相似文献   

8.
Integral membrane proteins (of the α-helical class) are of central importance in a wide variety of vital cellular functions. Despite considerable effort on methods to predict the location of the helices, little attention has been directed toward developing an automatic method to pack the helices together. In principle, the prediction of membrane proteins should be easier than the prediction of globular proteins: there is only one type of secondary structure and all helices pack with a common alignment across the membrane. This allows all possible structures to be represented on a simple lattice and exhaustively enumerated. Prediction success lies not in generating many possible folds but in recognizing which corresponds to the native. Our evaluation of each fold is based on how well the exposed surface predicted from a multiple sequence alignment fits its allocated position. Just as exposure to solvent in globular proteins can be predicted from sequence variation, so exposure to lipid can be recognized by variable-hydrophobic (variphobic) positions. Application to both bacteriorhodopsin and the eukaryotic rhodopsin/opsin families revealed that the angular size of the lipid-exposed faces must be predicted accurately to allow selection of the correct fold. With the inherent uncertainties in helix prediction and parameter choice, this accuracy could not be guaranteed but the correct fold was typically found in the top six candidates. Our method provides the first completely automatic method that can proceed from a scan of the protein sequence databanks to a predicted three-dimensional structure with no intervention required from the investigator. Within the limited domain of the seven helix bundle proteins, a good chance can be given of selecting the correct structure. However, the limited number of sequences available with a corresponding known structure makes further characterization of the method difficult. © 1994 John Wiley & Sons, Inc.  相似文献   

9.
Helices in membrane spanning regions are more tightly packed than the helices in soluble proteins. Thus, we introduce a method that uses a simple scale of burial propensity and a new algorithm to predict transmembrane helical (TMH) segments and a positive-inside rule to predict amino-terminal orientation. The method (the topology predictor of transmembrane helical proteins using mean burial propensity [THUMBUP]) correctly predicted the topology of 55 of 73 proteins (or 75%) with known three-dimensional structures (the 3D helix database). This level of accuracy can be reached by MEMSAT 1.8 (a 200-parameter model-recognition method) and a new HMM-based method (a 111-parameter hidden Markov model, UMDHMM(TMHP)) if they were retrained with the 73-protein database. Thus, a method based on a physiochemical property can provide topology prediction as accurate as those methods based on more complicated statistical models and learning algorithms for the proteins with accurately known structures. Commonly used HMM-based methods and MEMSAT 1.8 were trained with a combination of the partial 3D helix database and a 1D helix database of TMH proteins in which topology information were obtained by gene fusion and other experimental techniques. These methods provide a significantly poorer prediction for the topology of TMH proteins in the 3D helix database. This suggests that the 1D helix database, because of its inaccuracy, should be avoided as either a training or testing database. A Web server of THUMBUP and UMDHMM(TMHP) is established for academic users at http://www.smbs.buffalo.edu/phys_bio/service.htm. The 3D helix database is also available from the same Web site.  相似文献   

10.
Newly synthesized proteins in eukaryotic cells can only function well after they are accurately transported to specific organelles. The establishment of protein databases and the development of programs have accelerated the study of protein subcellular locations, but their comparisons and evaluations of the prediction accuracy of subcellular location programs in plants are lacking. In this study, we built a random test set of maize proteins to evaluate the accuracy of six commonly used programs of subcellular locations: iLoc-Plant, Plant-mPLoc, CELLO, WoLF PSORT, SherLoc2, and Predotar. Our results showed that the accuracy of prediction varied greatly depending on the programs and subcellular locations involved. The programs using homology search methods (iLoc-Plant and Plant-mPLoc) performed better than those using feature search methods (CELLO, WoLF PSORT, SherLoc2, and Predotar). In particular, iLoc-Plant achieved an 84.9 % accuracy for proteins whose subcellular locations have been experimentally determined and a 74.3 % accuracy for all of the proteins in the test set. Regarding locations, the highest prediction accuracies for subcellular locations were obtained for the nucleus, followed by the cytoplasm, mitochondria, plastids, endoplasmic reticulum, and vacuoles, while the lowest were obtained for cell membrane, secreted, and multiple-location proteins. We discussed the accuracy of the six programs in this article. This study will assist plant biologists in choosing appropriate programs to predict the location of proteins and provide clues regarding their function, especially for hypothetical or novel proteins.  相似文献   

11.
Park Y  Helms V 《Biopolymers》2006,83(4):389-399
Given the difficulty in determining high-resolution structures of helical membrane proteins, sequence-based prediction methods can be useful in elucidating diverse physiological processes mediated by this important class of proteins. Predicting the angular orientations of transmembrane (TM) helices about the helix axes, based on the helix parameters from electron microscopy data, is a classical problem in this regard. This problem has triggered the development of a number of different empirical scales. Recently, sequence conservation patterns were also made use of for improved predictions. Empirical scales and sequence conservation patterns (collectively termed as "prediction scales") have also found frequent applications in other research areas of membrane proteins: for example, in structure modeling and in prediction of buried TM helices. This trend is expected to grow in the near future unless there are revolutionary developments in the experimental characterization of membrane proteins. Thus, it is timely and imperative to carry out a comprehensive benchmark test over the prediction scales proposed so far to determine their pros and cons. In the current analysis, we use exposure patterns of TM helices as a golden standard, because if one develops a prediction scale that correlates perfectly with exposure patterns of TM helices, it will enable one to predict buried residues (or buried faces) of TM helices with an accuracy of 100%. Our analysis reveals several important points. (1) It demonstrates that sequence conservation patterns are much more strongly correlated with exposure patterns of TM helices than empirical scales. (2) Scales that were specifically parameterized using structure data (structure-based scales) display stronger correlation than hydrophobicity-based scales, as expected. (3) A nonnegligible difference is observed among the structure-based scales in their correlational property, suggesting that not every learning algorithm is equally effective. (4) A straightforward framework of optimally combining sequence conservation patterns and empirical scales is proposed, which reveals that improvements gained from combining the two sources of information are not dramatic in almost all cases. In turn, this calls for the development of fundamentally different scales that capture the essentials of membrane protein folding for substantial improvements.  相似文献   

12.
Higher-order interactions are important for protein folding and assembly. We introduce the concept of interhelical three-body interactions as derived from Delaunay triangulation and alpha shapes of protein structures. In addition to glycophorin A, where triplets are strongly correlated with protein stability, we found that tight interhelical triplet interactions exist extensively in other membrane proteins, where many types of triplets occur far more frequently than in soluble proteins. We developed a probabilistic model for estimating the value of membrane helical interaction triplet (MHIT) propensity. Because the number of known structures of membrane proteins is limited, we developed a bootstrap method for determining the 95% confidence intervals of estimated MHIT values. We identified triplets that have high propensity for interhelical interactions and are unique to membrane proteins, e.g. AGF, AGG, GLL, GFF and others. A significant fraction (32%) of triplet types contains triplets that may be involved in interhelical hydrogen bond interactions, suggesting the prevalent and important roles of H-bond in the assembly of TM helices. There are several well-defined spatial conformations for triplet interactions on helices with similar parallel or antiparallel orientations and with similar right-handed or left-handed crossing angles. Often, they contain small residues and correspond to the regions of the closest contact between helices. Sequence motifs such as GG4 and AG4 can be part of the three-body interactions that have similar conformations, which in turn can be part of a higher-order cooperative four residue spatial motif observed in helical pairs from different proteins. In many cases, spatial motifs such as serine zipper and polar clamp are part of triplet interactions. On the basis of the analysis of the archaeal rhodopsin family of proteins, tightly packed triplet interactions can be achieved with several different choices of amino acid residues.  相似文献   

13.
MOTIVATION: Prediction methods are of great importance for membrane proteins as experimental information is harder to obtain than for globular proteins. As more membrane protein structures are solved it is clear that topology information only provides a simplified picture of a membrane protein. Here, we describe a novel challenge for the prediction of alpha-helical membrane proteins: to predict the distance between a residue and the center of the membrane, a measure we define as the Z-coordinate. Even though the traditional way of depicting membrane protein topology is useful, it is advantageous to have a measure that is based on a more "physical" property such as the Z-coordinate, since it implicitly contains information about re-entrant helices, interfacial helices, the tilt of a transmembrane helix and loop lengths. RESULTS: We show that the Z-coordinate can be predicted using either artificial neural networks, hidden Markov models or combinations of both. The best method, ZPRED, uses the output from a hidden Markov model together with a neural network. The average error of ZPRED is 2.55A and 68.6% of the residues are predicted within 3A of the target Z-coordinate in the 5-25A region. ZPRED is also able to predict the maximum protrusion of a loop to within 3A for 78% of the loops in the dataset. AVAILABILITY: Supplementary information and training data is available at http://www.sbc.su.se/~erikgr/.  相似文献   

14.
In the fold recognition approach to structure prediction, a sequence is tested for compatibility with an already known fold. For membrane proteins, however, few folds have been determined experimentally. Here the feasibility of computing the vast majority of likely membrane protein folds is tested. The results indicate that conformation space can be effectively sampled for small numbers of helices. The vast majority of potential monomeric membrane protein structures can be represented by about 30-folds for three helices, but increases exponentially to about 1,500,000 folds for seven helices. The generated folds could serve as templates for fold recognition or as starting points for conformational searches that are well distributed throughout conformation space.  相似文献   

15.
Previously, we introduced a neural network system predicting locations of transmembrane helices (HTMs) based on evolutionary profiles (PHDhtm, Rost B, Casadio R, Fariselli P, Sander C, 1995, Protein Sci 4:521-533). Here, we describe an improvement and an extension of that system. The improvement is achieved by a dynamic programming-like algorithm that optimizes helices compatible with the neural network output. The extension is the prediction of topology (orientation of first loop region with respect to membrane) by applying to the refined prediction the observation that positively charged residues are more abundant in extra-cytoplasmic regions. Furthermore, we introduce a method to reduce the number of false positives, i.e., proteins falsely predicted with membrane helices. The evaluation of prediction accuracy is based on a cross-validation and a double-blind test set (in total 131 proteins). The final method appears to be more accurate than other methods published: (1) For almost 89% (+/-3%) of the test proteins, all HTMs are predicted correctly. (2) For more than 86% (+/-3%) of the proteins, topology is predicted correctly. (3) We define reliability indices that correlate with prediction accuracy: for one half of the proteins, segment accuracy raises to 98%; and for two-thirds, accuracy of topology prediction is 95%. (4) The rate of proteins for which HTMs are predicted falsely is below 2% (+/-1%). Finally, the method is applied to 1,616 sequences of Haemophilus influenzae. We predict 19% of the genome sequences to contain one or more HTMs. This appears to be lower than what we predicted previously for the yeast VIII chromosome (about 25%).  相似文献   

16.
The advent of whole genome sequencing leads to increasing number of proteins with known amino acid sequences. Despite many efforts, the number of proteins with resolved three dimensional structures is still low. One of the challenging tasks the structural biologists face is the prediction of the interaction of metal ion with any protein for which the structure is unknown. Based on the information available in Protein Data Bank, a site (METALACTIVE INTERACTION) has been generated which displays information for significant high preferential and low‐preferential combination of endogenous ligands for 49 metal ions. User can also gain information about the residues present in the first and second coordination sphere as it plays a major role in maintaining the structure and function of metalloproteins in biological system. In this paper, a novel computational tool (ZINCCLUSTER) is developed, which can predict the zinc metal binding sites of proteins even if only the primary sequence is known. The purpose of this tool is to predict the active site cluster of an uncharacterized protein based on its primary sequence or a 3D structure. The tool can predict amino acids interacting with a metal or vice versa. This tool is based on the occurrence of significant triplets and it is tested to have higher prediction accuracy when compared to that of other available techniques.  相似文献   

17.
We review recent computational advances in the study of membrane proteins, focusing on those that have at least one transmembrane helix. Transmembrane protein regions are, in many respects, easier to investigate computationally than experimentally, due to the uniformity of their structure and interactions (e.g. consisting predominately of nearly parallel helices packed together) on one hand and presenting the challenges of solubility on the other. We present the progress made on identifying and classifying membrane proteins into families, predicting their structure from amino-acid sequence patterns (using many different methods), and analyzing their interactions and packing The total result of this work allows us for the first time to begin to think about the membrane protein interactome, the set of all interactions between distinct transmembrane helices in the lipid bilayer.  相似文献   

18.
In the postgenomic age, with the avalanche of protein sequences generated and relatively slow progress in determining their structures by experiments, it is important to develop automated methods to predict the structure of a protein from its sequence. The membrane proteins are a special group in the protein family that accounts for approximately 30% of all proteins; however, solved membrane protein structures only represent less than 1% of known protein structures to date. Although a great success has been achieved for developing computational intelligence techniques to predict secondary structures in both globular and membrane proteins, there is still much challenging work in this regard. In this review article, we firstly summarize the recent progress of automation methodology development in predicting protein secondary structures, especially in membrane proteins; we will then give some future directions in this research field.  相似文献   

19.
In the postgenomic age, with the avalanche of protein sequences generated and relatively slow progress in determining their structures by experiments, it is important to develop automated methods to predict the structure of a protein from its sequence. The membrane proteins are a special group in the protein family that accounts for approximately 30% of all proteins; however, solved membrane protein structures only represent less than 1% of known protein structures to date. Although a great success has been achieved for developing computational intelligence techniques to predict secondary structures in both globular and membrane proteins, there is still much challenging work in this regard. In this review article, we firstly summarize the recent progress of automation methodology development in predicting protein secondary structures, especially in membrane proteins; we will then give some future directions in this research field.  相似文献   

20.
Transmembrane helices and the helical bundles which they form are the major building blocks of membrane proteins. Since helices are characterized by a given periodicity, it is possible to search for patterns of traits which typify one side of the helix and not the other (e.g. amphipathic helices contain a polar and apolar sides). Using Fourier transformation we have analyzed solved membrane protein structures as well as sequences of membrane proteins from the Swiss-Prot database. The traits searched included aromaticity, volume and ionization. While a number of motifs were already recognized in the literature, many were not. One particular example involved helix VII of lactose permease which contains seven aromatic residues on six helical turns. Similarly six glycine residues in four consecutive helical turns were identified as forming a motif in the chloride channel. A tabulation of all the findings is presented as well as a possible rationalization of the function of the motif.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号