首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
膜蛋白是一类结构独特的蛋白质,是细胞执行各种功能的物质基础。根据其在细胞膜上的不同存在方式,主要分为六种类型。本文利用压缩的氨基酸对原始膜蛋白序列进行信息压缩,再对压缩序列进行氨基酸组成和顺序特征的提取,最后采用支持向量机构建分类模型。通过五叠交叉验证的结果表明,该方法对于六种膜蛋白的分类预测,准确度最高可达98%以上,平均预测准确度在85%以上,可有效实现膜蛋白六种类型的划分,为进一步分析膜蛋白的结构和功能奠定基础。  相似文献   

2.
膜蛋白是重要的药物靶位点,对膜蛋白类型的研究有助于药物的成功设计,因此正确预测膜蛋白类型对于药物研发是十分必要的。本文采用由274条分枝杆菌膜蛋白序列组成的一致性小于40%的数据集,以经过优化的伪氨基酸组分为特征,利用支持向量机分类算法预测分枝杆菌膜蛋白类型,在Jackknife检验下,得到85.4%的总体准确率和72.2%的平均准确率。结果说明,该方法可用于分枝杆菌膜蛋白类型的识别,将有助于抗分枝杆菌药物的开发。  相似文献   

3.
A number of methods to predicting the folding type of a protein based on its amino acid composition have been developed during the past few years. In order to perform an objective and fair comparison of different prediction methods, a Monte Carlo simulation method was proposed to calculate the asymptotic limit of the prediction accuracy [Zhang and Chou (1992),Biophys. J. 63, 1523–1529, referred to as simulation method I]. However, simulation method I was based on an oversimplified assumption, i.e., there are no correlations between the compositions of different amino acids. By taking into account such correlations, a new method, referred to as simulation method II, has been proposed to recalculate the objective accuracy of prediction for the least Euclidean distance method [Nakashimaet al. (1986),J. Biochem. 99, 152–162] and the least Minkowski distance method [Chou (1989),Prediction in Protein Structure and the Principles of Protein Conformation, Plenum Press, New York, pp. 549–586], respectively. The results show that the prediction accuracy of the former is still better than that of the latter, as found by simulation method I; however, after incorporating the correlative effect, the objective prediction accuracies become lower for both methods. The reason for this phenomenon is discussed in detail. The simulation method and the idea developed in this paper can be applied to examine any other statistical prediction method, including the computersimulated neural network method.  相似文献   

4.
5.
SLLE for predicting membrane protein types   总被引:2,自引:0,他引:2  
Introduction of the concept of pseudo amino acid composition (PROTEINS: Structure, Function, and Genetics 43 (2001) 246; Erratum: ibid. 44 (2001) 60) has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, and hence can significantly enhance the prediction quality of membrane protein type. As a continuous effort along such a line, the Supervised Locally Linear Embedding (SLLE) technique for nonlinear dimensionality reduction is introduced (Science 22 (2000) 2323). The advantage of using SLLE is that it can reduce the operational space by extracting the essential features from the high-dimensional pseudo amino acid composition space, and that the cluster-tolerant capacity can be increased accordingly. As a consequence by combining these two approaches, high success rates have been observed during the tests of self-consistency, jackknife and independent data set, respectively, by using the simplest nearest neighbour classifier. The current approach represents a new strategy to deal with the problems of protein attribute prediction, and hence may become a useful vehicle in the area of bioinformatics and proteomics.  相似文献   

6.
Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor.  相似文献   

7.
Membrane proteins are vitally important for many biological processes and have become an attractive target for both basic research and drug design. Knowledge of membrane protein types often provides useful clues in deducing the functions of uncharacterized membrane proteins. With the unprecedented increasing of newly found protein sequences in the post-genomic era, it is highly demanded to develop an automated method for fast and accurately identifying the types of membrane proteins according to their amino acid sequences. Although quite a few identifiers have been developed in this regard through various approaches, such as covariant discriminant (CD), support vector machine (SVM), artificial neural network (ANN), and K-nearest neighbor (KNN), classifier the way they operate the identification is basically individual. As is well known, wise persons usually take into account the opinions from several experts rather than rely on only one when they are making critical decisions. Likewise, a sophisticated identifier should be trained by several different modes. In view of this, based on the frame of pseudo-amino acid that can incorporate a considerable amount of sequence-order effects, a novel approach called "stacked generalization" or "stacking" has been introduced. Unlike the "bagging" and "boosting" approaches which only combine the classifiers of a same type, the stacking approach can combine several different types of classifiers through a meta-classifier to maximize the generalization accuracy. The results thus obtained were very encouraging. It is anticipated that the stacking approach may also hold a high potential to improve the identification quality for, among many other protein attributes, subcellular location, enzyme family class, protease type, and protein-protein interaction type. The stacked generalization classifier is available as a web-server named "SG-MPt_Pred" at: http://202.120.37.186/bioinf/wangsq/service.htm.  相似文献   

8.
蛋白质和氨基酸是植物体内重要的营养物质,研究药用植物的蛋白质和氨基酸组成对药用植物资源的合理开发和综合利用具有重要意义,但目前对药用植物次生代谢物的研究较为广泛(如黄酮类化合物[1-2],而对其初生代谢物的研究相对较少[3-4]。  相似文献   

9.
Integral membrane proteins are central to many cellular processes and constitute approximately 50% of potential targets for novel drugs. However, the number of outer membrane proteins (OMPs) present in the public structure database is very limited due to the difficulties in determining structure with experimental methods. Therefore, discriminating OMPs from non-OMPs with computational methods is of medical importance as well as genome sequencing necessity. In this study, some sequence-derived structural and physicochemical features of proteins were incorporated with amino acid composition to discriminate OMPs from non-OMPs using support vector machines. The discrimination performance of the proposed method is evaluated on a benchmark dataset of 208 OMPs, 673 globular proteins, and 206 α-helical membrane proteins. A high overall accuracy of 97.8% was observed in the 5-fold cross-validation test. In addition, the current method distinguished OMPs from globular proteins and α-helical membrane proteins with overall accuracies of 98.2 and 96.4%, respectively. The prediction performance is superior to the state-of-the-art methods in the literature. It is anticipated that the current method might be a powerful tool for the discrimination of OMPs.  相似文献   

10.
Cell membranes are vitally important to living cells. Although the infrastructure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Knowledge of membrane protein types often provides crucial hints toward determining the function of an uncharacterized membrane protein. With the avalanche of new protein sequences generated in the post-genomic era, it is highly demanded to develop a high throughput tool in identifying the type of newly found membrane proteins according to their primary sequences, so as to timely annotate them for reference usage in both basic research and drug discovery. To realize this, the key is to establish a powerful identifier that can catch their characteristic sequence patterns for different membrane protein types. However, it is not easy because they are buried in a pile of long and complicated sequences. In this paper, based on the concept of the pseudo-amino acid composition [K.C. Chou, PROTEINS: Struct., Funct., Genet. 43 (2001) 246-255], the low-frequency Fourier spectrum analysis is introduced. The merits by doing so are that the sequence pattern information can be more effectively incorporated into a set of discrete components, and that all the existing prediction algorithms can be straightforwardly used on such a formulation for protein samples. High success rates were observed by the re-substitution test, jackknife test, and independent dataset test, indicating that the low-frequency Fourier spectrum approach may become a very useful tool for membrane protein type prediction. The novel approach also holds a high potential for predicting many other attributes of proteins.  相似文献   

11.
Membrane protein plays an important role in some biochemical process such as signal transduction, transmembrane transport, etc. Membrane proteins are usually classified into five types [Chou, K.C., Elrod, D.W., 1999. Prediction of membrane protein types and subcellular locations. Proteins: Struct. Funct. Genet. 34, 137-153] or six types [Chou, K.C., Cai, Y.D., 2005. J. Chem. Inf. Modelling 45, 407-413]. Designing in silico methods to identify and classify membrane protein can help us understand the structure and function of unknown proteins. This paper introduces an integrative approach, IAMPC, to classify membrane proteins based on protein sequences and protein profiles. These modules extract the amino acid composition of the whole profiles, the amino acid composition of N-terminal and C-terminal profiles, the amino acid composition of profile segments and the dipeptide composition of the whole profiles. In the computational experiment, the overall accuracy of the proposed approach is comparable with the functional-domain-based method. In addition, the performance of the proposed approach is complementary to the functional-domain-based method for different membrane protein types.  相似文献   

12.
Digital coding of amino acids based on hydrophobic index   总被引:1,自引:0,他引:1  
Analysis of amino acid sequences can provide useful insights into the tertiary structures of proteins and their biological functions. One of the critical problems in amino acid analysis is how to establish a digital coding system to better reflect the properties of amino acids and their degeneracy. Based on the hydrophobic index, a one-to-one relationship has been established between the amino acid sequence and the digital signal process. Such a "bridge" will make it possible to apply all the existing powerful methods in the signal processing area to analysis of the amino acid sequences.  相似文献   

13.
Knowledge of membrane protein type often provides crucial hints toward determining the function of an uncharacterized membrane protein. With the avalanche of new protein sequences emerging during the post-genomic era, it is highly desirable to develop an automated method that can serve as a high throughput tool in identifying the types of newly found membrane proteins according to their primary sequences, so as to timely make the relevant annotations on them for the reference usage in both basic research and drug discovery. Based on the concept of pseudo-amino acid composition [K.C. Chou, Proteins: Struct. Funct. Genet. 43 (2001) 246-255; Erratum: Proteins: Struct. Funct. Genet. 44 (2001) 60] that has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, a novel predictor, the so-called "optimized evidence-theoretic K-nearest neighbor" or "OET-KNN" classifier, was proposed. It was demonstrated via the self-consistency test, jackknife test, and independent dataset test that the new predictor, compared with many previous ones, yielded higher success rates in most cases. The new predictor can also be used to improve the prediction quality for, among many other protein attributes, structural class, subcellular localization, enzyme family class, and G-protein coupled receptor type. The OET-KNN classifier will be available as a web-server at http://www.pami.sjtu.edu.cn/kcchou.  相似文献   

14.
Summary The amino acid permeability of membranes is of interest because they are one of the key solutes involved in cell function. Membrane permeability coefficients (P) for amino acid classes, including neutral, polar, hydrophobic, and charged species, have been measured and compared using a variety of techniques. Decreasing lipid chain length increased permeability slightly (5-fold), while variations in pH had only minor effects on the permeability coefficients of the amino acids tested in liposomes. Increasing the membrane surface charge increased the permeability of amino acids of the opposite charge, while increasing the cholesterol content decreased membrane permeability. The permeability coefficients for most amino acids tested were surprisingly similar to those previously measured for monovalent cations such as sodium and potassium (approximately 10–12–10–13 cm · s–1). This observation suggests that the permeation rates for the neutral, polar and charged amino acids are controlled by bilayer fluctuations and transient defects, rather than partition coefficients and Born energy barriers. Hydrophobic amino acids were 102 more permeable than the hydrophilic forms, reflecting their increased partition coefficient values.External pH had dramatic effects on the permeation rates for the modified amino acid lysine methyl ester in response to transmembrane pH gradients. It was established that lysine methyl ester and other modified short peptides permeate rapidly (P = 10–2 cm · s–1) as neutral (deprotonated) molecules. It was also shown that charge distributions dramatically alter permeation rates for modified di-peptides. These results may relate to the movement of peptides through membranes during protein translocation and to the origin of cellular membrane transport on the early Earth.Abbreviations DCP dicetylphosphate - DMPC dimyristoyl phosphatidylcholine - EPC egg phosphatidylcholine - LUV large unilamellar vesicle - MLV multilamellar vesicle - PLM planar lipid membrane - SUV small unilamellar vesicle - pH transmembrane pH gradient  相似文献   

15.
The sweetness-suppressing polypeptide gurmarin isolated from Gymnema sylvestre consists of 35 amino acid residues and contains three intramolecular disulfide bonds. Nuclear magnetic resonance analysis showed that the hydrophobic side chains of Tyr-13, Tyr-14, Trp-28, and Trp-29 in gurmarin are oriented outwardly. Together with the hydrophobic side chains of Leu-9, Ile-11, and Pro-12, they form a hydrophobic cluster, and therefore these hydrophobic groups are assumed to act as the site for interaction with the receptor protein. To examine the roles of these hydrophobic amino acids, they were replaced by Gly. The resulting [Gly13,14,28,29]gurmarin and [Gly9,11,13,14,28,29]gurmarin did not suppress the responses to sucrose, glucose, fructose, or Gly. This result strongly suggests that these hydrophobic amino acids are involved in the interaction with the receptor protein. © 1998 John Wiley & Sons, Inc. Biopoly 45: 231–238, 1998  相似文献   

16.
Evidence from multiple laboratories has implicated Ssy1, a nontransporting amino acid permease, as the receptor component of the yeast plasma membrane (PM)‐localized SPS (Ssy1‐Ptr3‐Ssy5)‐sensor. Upon binding external amino acids, Ssy1 is thought to initiate signaling events leading to the induction of amino acid permease gene expression. In striking contrast, Kralt et al (2015) (Traffic 16 :135‐147) have questioned the role of Ssy1 in amino acid sensing and reported that Ssy1 is a component of the endoplasmic reticulum (ER), where it reportedly participates in the formation of ER‐PM junctions. Here, we have re‐examined the intracellular location of Ssy1 and tested the role of ER‐PM junctions in SPS sensor signaling. We show that the C‐terminal of Ssy1 carries a functional ER‐export motif required for proper localization of Ssy1 to the PM. Furthermore, ER‐PM junctions are dispensable for PM‐localization and function of Ssy1; Ssy1 localizes to the PM in a Δtether strain lacking ER‐PM junctions (ist2Δ scs2Δ scs22Δ tcb1Δ tcb2Δ tcb3Δ), and this strain retains the ability to initiate signals induced by extracellular amino acids. The data demonstrate that Ssy1 functions as the primary amino acid receptor and that it carries out this function at the PM.  相似文献   

17.
The outer membrane proteins (OMPs) are β-barrel membrane proteins that performed lots of biology functions. The discriminating OMPs from other non-OMPs is a very important task for understanding some biochemical process. In this study, a method that combines increment of diversity with modified Mahalanobis Discriminant, called IDQD, is presented to predict 208 OMPs, 206 transmembrane helical proteins (TMHPs) and 673 globular proteins (GPs) by using Chou's pseudo amino acid compositions as parameters. The overall accuracy of jackknife cross-validation is 93.2% and 96.1%, respectively, for three datasets (OMPs, TMHPs and GPs) and two datasets (OMPs and non-OMPs). These predicted results suggest that the method can be effectively applied to discriminate OMPs, TMHPs and GPs. And it also indicates that the pseudo amino acid composition can better reflect the core feature of membrane proteins than the classical amino acid composition.  相似文献   

18.
Cells have developed an incredible machinery to facilitate the insertion of membrane proteins into the membrane. While we have a fairly good understanding of the mechanism and determinants of membrane integration, more data is needed to understand the insertion of membrane proteins with more complex insertion and folding pathways. This review will focus on marginally hydrophobic transmembrane helices and their influence on membrane protein folding. These weakly hydrophobic transmembrane segments are by themselves not recognized by the translocon and therefore rely on local sequence context for membrane integration. How can such segments reside within the membrane? We will discuss this in the light of features found in the protein itself as well as the environment it resides in. Several characteristics in proteins have been described to influence the insertion of marginally hydrophobic helices. Additionally, the influence of biological membranes is significant. To begin with, the actual cost for having polar groups within the membrane may not be as high as expected; the presence of proteins in the membrane as well as characteristics of some amino acids may enable a transmembrane helix to harbor a charged residue. The lipid environment has also been shown to directly influence the topology as well as membrane boundaries of transmembrane helices—implying a dynamic relationship between membrane proteins and their environment.  相似文献   

19.
We report a comprehensive analysis of the numbers, lengths and amino acid compositions of transmembrane helices in 235 high-resolution structures of integral membrane proteins. The properties of 1551 transmembrane helices in the structures were compared with those obtained by analysis of the same amino acid sequences using topology prediction tools. Explanations for the 81 (5.2%) missing or additional transmembrane helices in the prediction results were identified. Main reasons for missing transmembrane helices were mis-identification of N-terminal signal peptides, breaks in α-helix conformation or charged residues in the middle of transmembrane helices and transmembrane helices with unusual amino acid composition. The main reason for additional transmembrane helices was mis-identification of amphipathic helices, extramembrane helices or hairpin re-entrant loops. Transmembrane helix length had an overall median of 24 residues and an average of 24.9 ± 7.0 residues and the most common length was 23 residues. The overall content of residues in transmembrane helices as a percentage of the full proteins had a median of 56.8% and an average of 55.7 ± 16.0%. Amino acid composition was analysed for the full proteins, transmembrane helices and extramembrane regions. Individual proteins or types of proteins with transmembrane helices containing extremes in contents of individual amino acids or combinations of amino acids with similar physicochemical properties were identified and linked to structure and/or function. In addition to overall median and average values, all results were analysed for proteins originating from different types of organism (prokaryotic, eukaryotic, viral) and for subgroups of receptors, channels, transporters and others.  相似文献   

20.
Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号