首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Low-resolution experiments suggest that most membrane helices span over 17-25 residues and that most loops between two helices are longer than 15 residues. Both constraints have been used explicitly in the development of prediction methods. Here, we compared the largest possible sequence-unique data sets from high- and low-resolution experiments. For the high-resolution data, we found that only half of the helices fall into the expected length interval and that half of the loops were shorter than 10 residues. We compared the accuracy of detecting short loops and long helices for 28 advanced and simple prediction methods: All methods predicted short loops less accurately than longer ones. In particular, loops shorter than 7 residues appeared to be very difficult to detect by current methods. Similarly, all methods tended to be more accurate for longer than for shorter helices. However, helices with more than 32 residues were predicted less accurately than all other helices. Our findings may suggest particular strategies for improving predictions of membrane helices.  相似文献   

Experimental structure determination continues to be challenging for membrane proteins. Computational prediction methods are therefore needed and widely used to supplement experimental data. Here, we re‐examined the state of the art in transmembrane helix prediction based on a nonredundant dataset with 190 high‐resolution structures. Analyzing 12 widely‐used and well‐known methods using a stringent performance measure, we largely confirmed the expected high level of performance. On the other hand, all methods performed worse for proteins that could not have been used for development. A few results stood out: First, all methods predicted proteins in eukaryotes better than those in bacteria. Second, methods worked less well for proteins with many transmembrane helices. Third, most methods correctly discriminated between soluble and transmembrane proteins. However, several older methods often mistook signal peptides for transmembrane helices. Some newer methods have overcome this shortcoming. In our hands, PolyPhobius and MEMSAT‐SVM outperformed other methods. Proteins 2015; 83:473–484. © 2014 Wiley Periodicals, Inc.  相似文献   

A technique for prediction of protein membrane toplogy (intra- and extraceullular sidedness) has been developed. Membrane-spanning segments are first predicted using an algorithm based upon multiply aligned amino acid sequences. The compositional differences in the protein segments exposed at each side of the membrane are then investigated. The ratios are calculated for Asn, Asp, Gly, Phe, Pro, Trp, Tyr, and Val, mostly found on the extracellular side, and for Ala, Arg, Cys, and Lys, mostly occurring on the intracellular side. The consensus over these 12 residue distributions is used for sidedness prediction. The method was developed with a set of 42 protein families for which all but one were correctly predicted with the new algorithm. This represents an improvement over previous techniques. The new method, applied to a set of 12 membrane protein families different from the test set and with recently determined topologies, performed well, with 11 of 12 sidedness assignments agreeing with experimental results. The method has also been applied to several membrane protein families for which the topology has yet to be determined. An electronic prediction service is available at the E-mail address tmap@embl-heidelberg.de and on WWW via http://www.emblheidelberg.de.  相似文献   

Zpred2 is an improved version of ZPRED, a predictor for the Z-coordinates of alpha-helical membrane proteins, that is, the distance of the residues from the center of the membrane. Using principal component analysis and a set of neural networks, Zpred2 analyzes data extracted from the amino acid sequence, the predicted topology, and evolutionary profiles. Zpred2 achieves an average accuracy error of 2.18 A (2.17 A when an independent test set is used), an improvement by 15% compared to the previous version. We show that this accuracy is sufficient to enable the predictions of helix lengths with a correlation coefficient of 0.41. As a comparison, two state-of-the-art HMM-based topology prediction methods manage to predict the helix lengths with a correlation coefficient of less than 0.1. In addition, we applied Zpred2 to two other problems, the re-entrant region identification and model validation. Re-entrants were able to be detected with a certain consistency, but not better than with previous approaches, while incorrect models as well as mispredicted helices of transmembrane proteins could be distinguished based on the Z-coordinate predictions.  相似文献   

Kernytsky A  Rost B 《Proteins》2009,75(1):75-88
Many important characteristics of proteins such as biochemical activity and subcellular localization present a challenge to machine-learning methods: it is often difficult to encode the appropriate input features at the residue level for the purpose of making a prediction for the entire protein. The problem is usually that the biophysics of the connection between a machine-learning method's input (sequence feature) and its output (observed phenomenon to be predicted) remains unknown; in other words, we may only know that a certain protein is an enzyme (output) without knowing which region may contain the active site residues (input). The goal then becomes to dissect a protein into a vast set of sequence-derived features and to correlate those features with the desired output. We introduce a framework that begins with a set of global sequence features and then vastly expands the feature space by generically encoding the coexistence of residue-based features. It is this combination of individual features, that is the step from the fractions of serine and buried (input space 20 + 2) to the fraction of buried serine (input space 20 * 2) that implicitly shifts the search space from global feature inputs to features that can capture very local evidence such as a the individual residues of a catalytic triad. The vast feature space created is explored by a genetic algorithm (GA) paired with neural networks and support vector machines. We find that the GA is critical for selecting combinations of features that are neither too general resulting in poor performance, nor too specific, leading to overtraining. The final framework manages to effectively sample a feature space that is far too large for exhaustive enumeration. We demonstrate the power of the concept by applying it to prediction of protein enzymatic activity.  相似文献   

膜蛋白跨膜区预测方法的评价   总被引:6,自引:0,他引:6  
基因组计划所产生的大量蛋白质序列迫切需要从理论上预测跨膜区。对现有预测跨膜区的方法进行评价 ,不仅可以帮助生物学家选择合适的方法 ,而且可以为生物信息学家发展新算法提供指导。采用了最新的膜蛋白数据库作为基本测试集合并选择了水溶性蛋白序列作为对照组 ,对目前已经公开发表且提供网上服务的跨膜区预测方法进行了评价和分析。经过分析比较 ,HMMTOP在所有的方法中综合预测效果最佳  相似文献   

Transmembrane helices predicted at 95% accuracy.   总被引:27,自引:1,他引:27       下载免费PDF全文
We describe a neural network system that predicts the locations of transmembrane helices in integral membrane proteins. By using evolutionary information as input to the network system, the method significantly improved on a previously published neural network prediction method that had been based on single sequence information. The input data were derived from multiple alignments for each position in a window of 13 adjacent residues: amino acid frequency, conservation weights, number of insertions and deletions, and position of the window with respect to the ends of the protein chain. Additional input was the amino acid composition and length of the whole protein. A rigorous cross-validation test on 69 proteins with experimentally determined locations of transmembrane segments yielded an overall two-state per-residue accuracy of 95%. About 94% of all segments were predicted correctly. When applied to known globular proteins as a negative control, the network system incorrectly predicted fewer than 5% of globular proteins as having transmembrane helices. The method was applied to all 269 open reading frames from the complete yeast VIII chromosome. For 59 of these, at least two transmembrane helices were predicted. Thus, the prediction is that about one-fourth of all proteins from yeast VIII contain one transmembrane helix, and some 20%, more than one.  相似文献   

We have performed a comparative analysis of amino acid distributions in predicted integral membrane proteins from a total of 107 genomes. A procedure for identification of membrane spanning helices was optimized on a homology-reduced data set of 170 multi-spanning membrane proteins with experimentally determined topologies. The optimized method was then used for extraction of highly reliable partial topologies from all predicted membrane proteins in each genome, and the average biases in amino acid distributions between loops on opposite sides of the membrane were calculated. The results strongly support the notion that a biased distribution of Lys and Arg residues between cytoplasmic and extra-cytoplasmic segments (the positive-inside rule) is present in most if not all organisms.  相似文献   

Transmembrane proteins (TMPs) are important drug targets because they are essential for signaling, regulation, and transport. Despite important breakthroughs, experimental structure determination remains challenging for TMPs. Various methods have bridged the gap by predicting transmembrane helices (TMHs), but room for improvement remains. Here, we present TMSEG, a novel method identifying TMPs and accurately predicting their TMHs and their topology. The method combines machine learning with empirical filters. Testing it on a non‐redundant dataset of 41 TMPs and 285 soluble proteins, and applying strict performance measures, TMSEG outperformed the state‐of‐the‐art in our hands. TMSEG correctly distinguished helical TMPs from other proteins with a sensitivity of 98 ± 2% and a false positive rate as low as 3 ± 1%. Individual TMHs were predicted with a precision of 87 ± 3% and recall of 84 ± 3%. Furthermore, in 63 ± 6% of helical TMPs the placement of all TMHs and their inside/outside topology was correctly predicted. There are two main features that distinguish TMSEG from other methods. First, the errors in finding all helical TMPs in an organism are significantly reduced. For example, in human this leads to 200 and 1600 fewer misclassifications compared to the second and third best method available, and 4400 fewer mistakes than by a simple hydrophobicity‐based method. Second, TMSEG provides an add‐on improvement for any existing method to benefit from. Proteins 2016; 84:1706–1716. © 2016 Wiley Periodicals, Inc.  相似文献   

We report a comprehensive analysis of the numbers, lengths and amino acid compositions of transmembrane helices in 235 high-resolution structures of integral membrane proteins. The properties of 1551 transmembrane helices in the structures were compared with those obtained by analysis of the same amino acid sequences using topology prediction tools. Explanations for the 81 (5.2%) missing or additional transmembrane helices in the prediction results were identified. Main reasons for missing transmembrane helices were mis-identification of N-terminal signal peptides, breaks in α-helix conformation or charged residues in the middle of transmembrane helices and transmembrane helices with unusual amino acid composition. The main reason for additional transmembrane helices was mis-identification of amphipathic helices, extramembrane helices or hairpin re-entrant loops. Transmembrane helix length had an overall median of 24 residues and an average of 24.9 ± 7.0 residues and the most common length was 23 residues. The overall content of residues in transmembrane helices as a percentage of the full proteins had a median of 56.8% and an average of 55.7 ± 16.0%. Amino acid composition was analysed for the full proteins, transmembrane helices and extramembrane regions. Individual proteins or types of proteins with transmembrane helices containing extremes in contents of individual amino acids or combinations of amino acids with similar physicochemical properties were identified and linked to structure and/or function. In addition to overall median and average values, all results were analysed for proteins originating from different types of organism (prokaryotic, eukaryotic, viral) and for subgroups of receptors, channels, transporters and others.  相似文献   

Hydropathy plot methods form a cornerstone of membrane protein research, especially in the early stages of biochemical and structural characterization. Membrane Protein Explorer (MPEx), described in this article, is a refined and versatile hydropathy‐plot software tool for analyzing membrane protein sequences. MPEx is highly interactive and facilitates the characterization and identification of favorable protein transmembrane regions using experiment‐based physical and biological hydrophobicity scales. Besides allowing the consequences of sequence mutations to be examined, it provides tools for aiding the design of membrane‐active peptides. MPEx is freely available as a Java Web Start application from our web site at http://blanco.biomol.uci.edu/mpex .  相似文献   

The Profiles-3D application, an inverse-folding methodology appropriate for water-soluble proteins, has been modified to allow the determination of structural properties of integral-membrane proteins (IMPs) and for testing the validity of solved and model structures of IMPs. The modification, known as reverse-environment prediction of integral membrane protein structure (REPIMPS), takes into account the fact that exposed areas of side chains for many residues in IMPs are in contact with lipid and not the aqueous phase. This (1) allows lipid-exposed residues to be classified into the correct physicochemical environment class, (2) significantly improves compatibility scores for IMPs whose structures have been solved, and (3) reduces the possibility of rejecting a three-dimensional structure for an IMP because the presence of lipid was not included. Validation tests of REPIMPS showed that it (1) can locate the transmembrane domain of IMPs with single transmembrane helices more frequently than a range of other methodologies, (2) can rotationally orient transmembrane helices with respect to the lipid environment and surrounding helices in IMPs with multiple transmembrane helices, and (3) has the potential to accurately locate transmembrane domains in IMPs with multiple transmembrane helices. We conclude that correcting for the presence of the lipid environment surrounding the transmembrane segments of IMPs is an essential step for reasonable modeling and verification of the three-dimensional structures of these proteins.  相似文献   

  1. Download : Download high-res image (147KB)
  2. Download : Download full-size image

Schlessinger A  Rost B 《Proteins》2005,61(1):115-126
Structural flexibility has been associated with various biological processes such as molecular recognition and catalytic activity. In silico studies of protein flexibility have attempted to characterize and predict flexible regions based on simple principles. B-values derived from experimental data are widely used to measure residue flexibility. Here, we present the most comprehensive large-scale analysis of B-values. We used this analysis to develop a neural network-based method that predicts flexible-rigid residues from amino acid sequence. The system uses both global and local information (i.e., features from the entire protein such as secondary structure composition, protein length, and fraction of surface residues, and features from a local window of sequence-consecutive residues). The most important local feature was the evolutionary exchange profile reflecting sequence conservation in a family of related proteins. To illustrate its potential, we applied our method to 4 different case studies, each of which related our predictions to aspects of function. The first 2 were the prediction of regions that undergo conformational switches upon environmental changes (switch II region in Ras) and the prediction of surface regions, the rigidity of which is crucial for their function (tunnel in propeller folds). Both were correctly captured by our method. The third study established that residues in active sites of enzymes are predicted by our method to have unexpectedly low B-values. The final study demonstrated how well our predictions correlated with NMR order parameters to reflect motion. Our method had not been set up to address any of the tasks in those 4 case studies. Therefore, we expect that this method will assist in many attempts at inferring aspects of function.  相似文献   

Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

Choi S  Jeon J  Yang JS  Kim S 《Proteins》2008,71(1):68-80
Symmetry plays significant roles in protein structure and function. Particularly, symmetric interfaces are known to act as switches for two-state conformational change. Membrane proteins often undergo two-state conformational change during the transport process of ion channels or the active/inactive transitions in receptors. Here, we provide the first comprehensive analyses of internal repeat symmetry in membrane proteins. We examined the known membrane protein structures and found that, remarkably, nearly half of them have internal repeat symmetry. Moreover, we found that the conserved cores of these internal repeats are positioned at the interface of symmetric units when they are mapped on structures. Because of the large sequence divergence that occurs between internal repeats, the inherent symmetry present in protein sequences often has only been detected after structure determination. We therefore developed a sensitive procedure to predict the internal repeat symmetry from sequence information and identified 4653 proteins that are likely to have internal repeat symmetry.  相似文献   

Summary Hydropathy plots of amino acid sequences reveal the approximate locations of the transbilayer helices of membrane proteins of known structure and are thus used to predict the helices of proteins of unknown structure. Because the threedimensional structures of membrane proteins are difficult to obtain, it is important to be able to extract as much information as possible from hydropathy plots. We describe an augmented hydropathy plot analysis of the three membrane proteins of known structure, which should be useful for the systematic examination and comparison of membrane proteins of unknown structure. The sliding-window analysis utilizes the floating interfacial hydrophobicity scale [IFH(h)] of Jacobs and White (Jacobs, R.E., White, S.H., 1989.Biochemistry 28:3421–3437) and the reverse-turn (RT) frequencies of Levitt (Levitt, M., 1977,Biochemistry 17:4277–4285). The IFH(h) scale allows one to examine the consequences of different assumptions about the average hydrogen bond status (h=0 to 1) of polar side chains. Hydrophobicity plots of the three proteins show that (i) the intracellular helix-connecting links and chain ends can be distinguished from the extracellular ones and (ii) the main peaks of hydrophobicity are bounded by minor ones which bracket the helix ends. RT frequency plots show that (iii) the centers of helices are usually very close to wide-window minima of average RT frequency and (iv) helices are always bounded by narrowwindow maxima of average RT frequency. The analysis suggests that side-chain hydrogen bonding with membrane components during folding may play a key role in insertion.  相似文献   

牙鲆碱性磷酸酶cDNA序列分析与蛋白质高级结构预测   总被引:1,自引:0,他引:1  
为研究碱性磷酸酶(EC; alkaline phosphatase,ALP)在牙鲆(Paralichthys Olivaceus)发育和变态中的作用,采用RACE的方法克隆了牙鲆ALP基因cDNA全长,通过生物信息学分析了核苷酸序列并进行蛋白结构预测. 结果表明,牙鲆ALP cDNA全长为1 811bp,能编码476个氨基酸的蛋白质,分子量为52 293.1,等电点为7.67. 编码区核苷酸GC含量在ALP同源基因中差异比较大,脊椎动物明显高于非脊椎动物和细菌. 分子系统分析显示,牙鲆ALP和青黑斑河豚(Tetraodon nigroviridis)、斑马鱼(Danio rerio)的组织非特异性ALP有较高的同源性,分子进化树和物种进化树是一致的. 在蛋白序列中的一些重要的功能位点,包括金属离子结合位点、N糖基化位点和丝氨酸磷酸化位点等表现了较高的保守性. 牙鲆ALP和人胎盘ALP(PALP)在蛋白序列上有43%的相似性,其3D结构非常接近.通过氨基酸空间位置比较发现,牙鲆ALP中141和203位半胱氨酸对应于人PALP的121和183位半胱氨酸,推测能形成一个二硫键. 在两者酶活性中心,3个金属离子结合的氨基酸残基非常保守,Zn离子周围的9个氨基酸中有2个不同;Mg离子周围的7个氨基酸也只有2个不同,包括一对类似的丝氨酸155和苏氨酸175.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号