首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A prerequisite for the survival of (micro)organisms at high temperatures is an adaptation of protein stability to extreme environmental conditions. In contrast to soluble proteins, where many factors have already been identified, the mechanisms by which the thermostability of membrane proteins is enhanced are almost unknown. The hydrophobic membrane environment constrains possible stabilizing factors for transmembrane domains, so that a difference might be expected between soluble and membrane proteins. Here we present sequence analysis of predicted transmembrane helices of the genomes from eight thermophilic and 12 mesophilic organisms. A comparison of the amino acid compositions indicates that more polar residues can be found in the transmembrane helices of thermophilic organisms. Particularly, the amino acids aspartic acid and glutamic acid replace the corresponding amides. Cysteine residues are found to be significantly decreased by about 70% in thermophilic membrane domains suggesting a non-specific function of most cysteine residues in transmembrane domains of mesophilic organisms. By a pair-motif analysis of the two sets of transmembrane helices, we found that the small residues glycine and serine contribute more to transmembrane helix-helix interactions in thermophilic organisms. This may result in a tighter packing of the helices allowing more hydrogen bond formation.  相似文献   

2.
We describe and validate a new membrane protein topology prediction method, TMHMM, based on a hidden Markov model. We present a detailed analysis of TMHMM's performance, and show that it correctly predicts 97-98 % of the transmembrane helices. Additionally, TMHMM can discriminate between soluble and membrane proteins with both specificity and sensitivity better than 99 %, although the accuracy drops when signal peptides are present. This high degree of accuracy allowed us to predict reliably integral membrane proteins in a large collection of genomes. Based on these predictions, we estimate that 20-30 % of all genes in most genomes encode membrane proteins, which is in agreement with previous estimates. We further discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C(in) topologies. We discuss the possible relevance of this finding for our understanding of membrane protein assembly mechanisms. A TMHMM prediction service is available at http://www.cbs.dtu.dk/services/TMHMM/.  相似文献   

3.
More than 30 organisms have been sequenced entirely. Here, we applied a variety of simple bioinformatics tools to analyze 29 proteomes for representatives from all three kingdoms: eukaryotes, prokaryotes, and archaebacteria. We confirmed that eukaryotes have relatively more long proteins than prokaryotes and archaes, and that the overall amino acid composition is similar among the three. We predicted that approximately 15%-30% of all proteins contained transmembrane helices. We could not find a correlation between the content of membrane proteins and the complexity of the organism. In particular, we did not find significantly higher percentages of helical membrane proteins in eukaryotes than in prokaryotes or archae. However, we found more proteins with seven transmembrane helices in eukaryotes and more with six and 12 transmembrane helices in prokaryotes. We found twice as many coiled-coil proteins in eukaryotes (10%) as in prokaryotes and archaes (4%-5%), and we predicted approximately 15%-25% of all proteins to be secreted by most eukaryotes and prokaryotes. Every tenth protein had no known homolog in current databases, and 30%-40% of the proteins fell into structural families with >100 members. A classification by cellular function verified that eukaryotes have a higher proportion of proteins for communication with the environment. Finally, we found at least one homolog of experimentally known structure for approximately 20%-45% of all proteins; the regions with structural homology covered 20%-30% of all residues. These numbers may or may not suggest that there are 1200-2600 folds in the universe of protein structures. All predictions are available at http://cubic.bioc.columbia.edu/genomes.  相似文献   

4.
Transmembrane proteins make up at least one-fifth of the genome of most organisms and are critical components of key pathways for cell survival and interactions with the environment. The function of helices found at the membrane surface in transmembrane proteins has not been greatly explored, but it is likely that they play an ancillary role to membrane spanning helices and are analogous to the surface active helices of peripheral membrane proteins, being involved in: lipid association, membrane perturbation, transmembrane signal transduction and regulation, and transmembrane helical bundle formation. Due to the difficulties in obtaining high-resolution structural data for this class of proteins, structure-from-sequence predictive methods continue to be developed as a means to obtain structural models for these largely intractable systems. A simple but effective variant of the hydrophobic moment analysis of amino acid sequences is described here as part of a protocol for distinguishing helical sequences that are parallel to or 'horizontal' at the membrane bilayer/aqueous phase interface from helices that are membrane-embedded or located in extra-membranous domains. This protocol when tested on transmembrane spanning protein amino acid sequences not used in its development, was found to be 84-91% accurate when the results were compared to the partition locations in the corresponding structures determined by X-ray crystallography, and 72% accurate in determining which helices lie horizontal or near horizontal at the lipid interface.  相似文献   

5.
MOTIVATION: An amphiphilicity index of amino acid residues was developed for improving the method of transmembrane helix prediction. RESULTS: The transfer energy of a hydrocarbon stem group beyond the gamma-carbon was calculated from the accessible surface area, and used to index the amphiphilicity of the residue. Non-zero amphiphilicity index values were obtained for lysine, arginine, histidine, glutamic acid, glutamine, tyrosine and tryptophan. Those residues were found to be abundant in the end regions of transmembrane helices, indicating their preference for the membrane-water interface. The moving average of the amphiphilicity index actually showed significant peaks in the end regions of most transmembrane helices. A dispersion diagram of average amphiphilicity index versus average hydrophobicity index was devised to facilitate discrimination of transmembrane helices. AVAILABILITY: The amphiphilicity index has been incorporated into a system, SOSUI, for the discrimination of membrane proteins and the prdiction of tranmembrane helical regions (http://sosui.proteome.bio.tuat.ac.jp/sosuiframe0.html).  相似文献   

6.
Genomics and proteomics have added valuable information to our knowledgebase of the human biological system including the discovery of therapeutic targets and disease biomarkers. However, molecular profiling studies commonly result in the identification of novel proteins of unknown localization. A class of proteins of special interest is membrane proteins, in particular plasma membrane proteins. Despite their biological and medical significance, the 3-dimensional structures of less than 1% of plasma membrane proteins have been determined. In order to aid in identification of membrane proteins, a number of computational methods have been developed. These tools operate by predicting the presence of transmembrane segments. Here, we utilized five topology prediction methods (TMHMM, SOSUI, waveTM, HMMTOP, and TopPred II) in order to estimate the ratio of integral membrane proteins in the human proteome. These methods employ different algorithms and include a newly-developed method (waveTM) that has yet to be tested on a large proteome database. Since these tools are prone for error mainly as a result of falsely predicting signal peptides as transmembrane segments, we have utilized an additional method, SignalP. Based on our analyses, the ratio of human proteins with transmembrane segments is estimated to fall between 15% and 39% with a consensus of 13%. Agreement among the programs is reduced further when both a positive identification of a membrane protein and the number of transmembrane segments per protein are considered. Such a broad range of prediction depends on the selectivity of the individual method in predicting integral membrane proteins. These methods can play a critical role in determining protein structure and, hence, identifying suitable drug targets in humans.  相似文献   

7.
We have performed a comparative analysis of amino acid distributions in predicted integral membrane proteins from a total of 107 genomes. A procedure for identification of membrane spanning helices was optimized on a homology-reduced data set of 170 multi-spanning membrane proteins with experimentally determined topologies. The optimized method was then used for extraction of highly reliable partial topologies from all predicted membrane proteins in each genome, and the average biases in amino acid distributions between loops on opposite sides of the membrane were calculated. The results strongly support the notion that a biased distribution of Lys and Arg residues between cytoplasmic and extra-cytoplasmic segments (the positive-inside rule) is present in most if not all organisms.  相似文献   

8.
Seventy integral membrane proteins from the Mycobacterium tuberculosis genome have been cloned and expressed in Escherichia coli. A combination of T7 promoter-based vectors with hexa-His affinity tags and BL21 E. coli strains with additional tRNA genes to supplement sparsely used E. coli codons have been most successful. The expressed proteins have a wide range of molecular weights and number of transmembrane helices. Expression of these proteins has been observed in the membrane and insoluble fraction of E. coli cell lysates and, in some cases, in the soluble fraction. The highest expression levels in the membrane fraction were restricted to a narrow range of molecular weights and relatively few transmembrane helices. In contrast, overexpression in insoluble aggregates was distributed over a broad range of molecular weights and number of transmembrane helices.  相似文献   

9.
We have carried out detailed statistical analyses of integral membrane proteins of the helix-bundle class from eubacterial, archaean, and eukaryotic organisms for which genome-wide sequence data are available. Twenty to 30% of all ORFs are predicted to encode membrane proteins, with the larger genomes containing a higher fraction than the smaller ones. Although there is a general tendency that proteins with a smaller number of transmembrane segments are more prevalent than those with many, uni-cellular organisms appear to prefer proteins with 6 and 12 transmembrane segments, whereas Caenorhabditis elegans and Homo sapiens have a slight preference for proteins with seven transmembrane segments. In all organisms, there is a tendency that membrane proteins either have many transmembrane segments with short connecting loops or few transmembrane segments with large extra-membraneous domains. Membrane proteins from all organisms studied, except possibly the archaeon Methanococcus jannaschii, follow the so-called "positive-inside" rule; i.e., they tend to have a higher frequency of positively charged residues in cytoplasmic than in extra-cytoplasmic segments.  相似文献   

10.
MOTIVATION: Knowledge of the transmembrane helical topology can help identify binding sites and infer functions for membrane proteins. However, because membrane proteins are hard to solubilize and purify, only a very small amount of membrane proteins have structure and topology experimentally determined. This has motivated various computational methods for predicting the topology of membrane proteins. RESULTS: We present an improved hidden Markov model, TMMOD, for the identification and topology prediction of transmembrane proteins. Our model uses TMHMM as a prototype, but differs from TMHMM by the architecture of the submodels for loops on both sides of the membrane and also by the model training procedure. In cross-validation experiments using a set of 83 transmembrane proteins with known topology, TMMOD outperformed TMHMM and other existing methods, with an accuracy of 89% for both topology and locations. In another experiment using a separate set of 160 transmembrane proteins, TMMOD had 84% for topology and 89% for locations. When utilized for identifying transmembrane proteins from non-transmembrane proteins, particularly signal peptides, TMMOD has consistently fewer false positives than TMHMM does. Application of TMMOD to a collection of complete genomes shows that the number of predicted membrane proteins accounts for approximately 20-30% of all genes in those genomes, and that the topology where both the N- and C-termini are in the cytoplasm is dominant in these organisms except for Caenorhabditis elegans. AVAILABILITY: http://liao.cis.udel.edu/website/servers/TMMOD/  相似文献   

11.
We give an overview of the emerging field of structural genomics, describing how genomes can be compared in terms of protein structure. As the number of genes in a genome and the total number of protein folds are both quite limited, these comparisons take the form of surveys of a finite parts list, similar in respects to demographic censuses. Fold surveys have many similarities with other whole-genome characterizations, e.g., analyses of motifs or pathways. However, structure has a number of aspects that make it particularly suitable for comparing genomes, namely the way it allows for the precise definition of a basic protein module and the fact that it has a better defined relationship to sequence similarity than does protein function. An essential requirement for a structure survey is a library of folds, which groups the known structures into 'fold families.' This library can be built up automatically using a structure comparison program, and we described how important objective statistical measures are for assessing similarities within the library and between the library and genome sequences. After building the library, one can use it to count the number of folds in genomes, expressing the results in the form of Venn diagrams and 'top-10' statistics for shared and common folds. Depending on the counting methodology employed, these statistics can reflect different aspects of the genome, such as the amount of internal duplication or gene expression. Previous analyses have shown that the common folds shared between very different microorganisms, i.e., in different kingdoms, have a remarkably similar structure, being comprised of repeated strand-helix-strand super-secondary structure units. A major difficulty with this sort of 'fold-counting' is that only a small subset of the structures in a complete genome are currently known and this subset is prone to sampling bias. One way of overcoming biases is through structure prediction, which can be applied uniformly and comprehensively to a whole genome. Various investigators have, in fact, already applied many of the existing techniques for predicting secondary structure and transmembrane (TM) helices to the recently sequenced genomes. The results have been consistent: microbial genomes have similar fractions of strands and helices even though they have significantly different amino acid composition. The fraction of membrane proteins with a given number of TM helices falls off rapidly with more TM elements, approximately according to a Zipf law. This latter finding indicates that there is no preference for the highly studied 7-TM proteins in microbial genomes. Continuously updated tables and further information pertinent to this review are available over the web at http://bioinfo.mbb.yale.edu/genome.  相似文献   

12.
Using the data on proteins encoded in complete genomes, combined with a rigorous theory of the sampling process, we estimate the total number of protein folds and families, as well as the number of folds and families in each genome. The total number of folds in globular, water- soluble proteins is estimated at about 1000, with structural information currently available for about one-third of the number. The sequenced genomes of unicellular organisms encode from approximately 25%, for the minimal genomes of the Mycoplasmas, to 70-80% for larger genomes, such as Escherichia coli and yeast, of the total number of folds. The number of protein families with significant sequence conservation was estimated to be between 4000 and 7000, with structures available for about 20% of these.  相似文献   

13.
Alongside the well-studied membrane spanning helices, alpha-helical transmembrane (TM) proteins contain several functionally and structurally important types of substructures. Here, existing 3D structures of transmembrane proteins have been used to define and study the concept of reentrant regions, i.e. membrane penetrating regions that enter and exit the membrane on the same side. We find that these regions can be divided into three distinct categories based on secondary structure motifs, namely long regions with a helix-coil-helix motif, regions of medium length with the structure helix-coil or coil-helix and regions of short to medium length consisting entirely of irregular secondary structure. The residues situated in reentrant regions are significantly smaller on average compared to other regions and reentrant regions can be detected in the inter-transmembrane loops with an accuracy of approximately 70% based on their amino acid composition. Using TOP-MOD, a novel method for predicting reentrant regions, we have scanned the genomes of Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The results suggest that more than 10% of transmembrane proteins contain reentrant regions and that the occurrence of reentrant regions increases linearly with the number of transmembrane regions. Reentrant regions seem to be most commonly found in channel proteins and least commonly in signal receptors.  相似文献   

14.
As a whole, integral membrane proteins represent about one third of sequenced genomes, and more than 50% of currently available drugs target membrane proteins, often cell surface receptors. Some membrane protein classes, with a defined number of transmembrane (TM) helices, are receiving much attention because of their great functional and pharmacological importance, such as G protein-coupled receptors possessing 7 TM segments. Although they represent roughly half of all membrane proteins, bitopic proteins (with only 1 TM helix) have so far been less well characterized. Though they include many essential families of receptors, such as adhesion molecules and receptor tyrosine kinases, many of which are excellent targets for biopharmaceuticals (peptides, antibodies, et al.). A growing body of evidence suggests a major role for interactions between TM domains of these receptors in signaling, through homo and heteromeric associations, conformational changes, assembly of signaling platforms, etc. Significantly, mutations within single domains are frequent in human disease, such as cancer or developmental disorders. This review attempts to give an overview of current knowledge about these interactions, from structural data to therapeutic perspectives, focusing on bitopic proteins involved in cell signaling.Key words: bitopic membrane proteins, transmembrane domains, transmembrane signaling, helix-helix interactions, receptors  相似文献   

15.
基于小波分析的膜蛋白跨膜区段序列分析和预测   总被引:2,自引:0,他引:2  
膜蛋白是一类结构独特的蛋白质,在各种细胞中普遍存在,发挥着重要的生理功能。目前仅有少数膜蛋白听结构被实验测出,因此用计算机预测膜蛋白的结构是蛋白质结构预测的主要研究内容之一。膜蛋白一般在膜上形成保守的跨膜螺旋结构,序列特征明显,比较适合用预测的方法确定跨膜螺旋区段的位置。国际上已有一些研究者用人工神经网络方法、多序列比对方法和统计方法进行了预测尝试,取得了一定的成功经验。我们对蛋白质序列数据库中的  相似文献   

16.
Transmembrane helices predicted at 95% accuracy.   总被引:27,自引:1,他引:27       下载免费PDF全文
We describe a neural network system that predicts the locations of transmembrane helices in integral membrane proteins. By using evolutionary information as input to the network system, the method significantly improved on a previously published neural network prediction method that had been based on single sequence information. The input data were derived from multiple alignments for each position in a window of 13 adjacent residues: amino acid frequency, conservation weights, number of insertions and deletions, and position of the window with respect to the ends of the protein chain. Additional input was the amino acid composition and length of the whole protein. A rigorous cross-validation test on 69 proteins with experimentally determined locations of transmembrane segments yielded an overall two-state per-residue accuracy of 95%. About 94% of all segments were predicted correctly. When applied to known globular proteins as a negative control, the network system incorrectly predicted fewer than 5% of globular proteins as having transmembrane helices. The method was applied to all 269 open reading frames from the complete yeast VIII chromosome. For 59 of these, at least two transmembrane helices were predicted. Thus, the prediction is that about one-fourth of all proteins from yeast VIII contain one transmembrane helix, and some 20%, more than one.  相似文献   

17.
Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ~1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/.  相似文献   

18.
Previously, we introduced a neural network system predicting locations of transmembrane helices (HTMs) based on evolutionary profiles (PHDhtm, Rost B, Casadio R, Fariselli P, Sander C, 1995, Protein Sci 4:521-533). Here, we describe an improvement and an extension of that system. The improvement is achieved by a dynamic programming-like algorithm that optimizes helices compatible with the neural network output. The extension is the prediction of topology (orientation of first loop region with respect to membrane) by applying to the refined prediction the observation that positively charged residues are more abundant in extra-cytoplasmic regions. Furthermore, we introduce a method to reduce the number of false positives, i.e., proteins falsely predicted with membrane helices. The evaluation of prediction accuracy is based on a cross-validation and a double-blind test set (in total 131 proteins). The final method appears to be more accurate than other methods published: (1) For almost 89% (+/-3%) of the test proteins, all HTMs are predicted correctly. (2) For more than 86% (+/-3%) of the proteins, topology is predicted correctly. (3) We define reliability indices that correlate with prediction accuracy: for one half of the proteins, segment accuracy raises to 98%; and for two-thirds, accuracy of topology prediction is 95%. (4) The rate of proteins for which HTMs are predicted falsely is below 2% (+/-1%). Finally, the method is applied to 1,616 sequences of Haemophilus influenzae. We predict 19% of the genome sequences to contain one or more HTMs. This appears to be lower than what we predicted previously for the yeast VIII chromosome (about 25%).  相似文献   

19.
Adamian L  Liang J 《Proteins》2006,63(1):1-5
Analysis of a database of structures of membrane proteins shows that membrane proteins composed of 10 or more transmembrane (TM) helices often contain buried helices that are inaccessible to phospholipids. We introduce a method for identifying TM helices that are least phospholipid accessible and for prediction of fully buried TM helices in membrane proteins from sequence information alone. Our method is based on the calculation of residue lipophilicity and evolutionary conservation. Given that the number of buried helices in a membrane protein is known, our method achieves an accuracy of 78% and a Matthew's correlation coefficient of 0.68. A server for this tool (RANTS) is available online at http://gila.bioengr.uic.edu/lab/.  相似文献   

20.
Liu Y  Engelman DM  Gerstein M 《Genome biology》2002,3(10):research0054.1-research005412

Background

Polytopic membrane proteins can be related to each other on the basis of the number of transmembrane helices and sequence similarities. Building on the Pfam classification of protein domain families, and using transmembrane-helix prediction and sequence-similarity searching, we identified a total of 526 well-characterized membrane protein families in 26 recently sequenced genomes. To this we added a clustering of a number of predicted but unclassified membrane proteins, resulting in a total of 637 membrane protein families.

Results

Analysis of the occurrence and composition of these families revealed several interesting trends. The number of assigned membrane protein domains has an approximately linear relationship to the total number of open reading frames (ORFs) in 26 genomes studied. Caenorhabditis elegans is an apparent outlier, because of its high representation of seven-span transmembrane (7-TM) chemoreceptor families. In all genomes, including that of C. elegans, the number of distinct membrane protein families has a logarithmic relation to the number of ORFs. Glycine, proline, and tyrosine locations tend to be conserved in transmembrane regions within families, whereas isoleucine, valine, and methionine locations are relatively mutable. Analysis of motifs in putative transmembrane helices reveals that GxxxG and GxxxxxxG (which can be written GG4 and GG7, respectively; see Materials and methods) are among the most prevalent. This was noted in earlier studies; we now find these motifs are particularly well conserved in families, however, especially those corresponding to transporters, symporters, and channels.

Conclusions

We carried out a genome-wide analysis on patterns of the classified polytopic membrane protein families and analyzed the distribution of conserved amino acids and motifs in the transmembrane helix regions in these families.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号