首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Xiao X  Shao S  Ding Y  Huang Z  Huang Y  Chou KC 《Amino acids》2005,28(1):57-61
Summary. Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Because the functions of these proteins are closely correlated with their subcellular localizations, it is vitally important to develop an automated method as a high-throughput tool to timely identify their subcellular location. Based on the concept of the pseudo amino acid composition by which a considerable amount of sequence-order effects can be incorporated into a set of discrete numbers (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246–255), the complexity measure approach is introduced. The advantage by incorporating the complexity measure factor as one of the pseudo amino acid components for a protein is that it can more effectively reflect its overall sequence-order feature than the conventional correlation factors. With such a formulation frame to represent the samples of protein sequences, the covariant-discriminant predictor (Chou, K. C. and Elrod, D. W., Protein Engineering, 1999, 12: 107–118) was adopted to conduct prediction. High success rates were obtained by both the jackknife cross-validation test and independent dataset test, suggesting that introduction of the concept of the complexity measure into prediction of protein subcellular location is quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.  相似文献   

2.
SLLE for predicting membrane protein types   总被引:2,自引:0,他引:2  
Introduction of the concept of pseudo amino acid composition (PROTEINS: Structure, Function, and Genetics 43 (2001) 246; Erratum: ibid. 44 (2001) 60) has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, and hence can significantly enhance the prediction quality of membrane protein type. As a continuous effort along such a line, the Supervised Locally Linear Embedding (SLLE) technique for nonlinear dimensionality reduction is introduced (Science 22 (2000) 2323). The advantage of using SLLE is that it can reduce the operational space by extracting the essential features from the high-dimensional pseudo amino acid composition space, and that the cluster-tolerant capacity can be increased accordingly. As a consequence by combining these two approaches, high success rates have been observed during the tests of self-consistency, jackknife and independent data set, respectively, by using the simplest nearest neighbour classifier. The current approach represents a new strategy to deal with the problems of protein attribute prediction, and hence may become a useful vehicle in the area of bioinformatics and proteomics.  相似文献   

3.
Protein remote homology detection is one of the most important problems in bioinformatics. Discriminative methods such as support vector machines (SVM) have shown superior performance. However, the performance of SVM-based methods depends on the vector representations of the protein sequences. Prior works have demonstrated that sequence-order effects are relevant for discrimination, but little work has explored how to incorporate the sequence-order information along with the amino acid physicochemical properties into the prediction. In order to incorporate the sequence-order effects into the protein remote homology detection, the physicochemical distance transformation (PDT) method is proposed. Each protein sequence is converted into a series of numbers by using the physicochemical property scores in the amino acid index (AAIndex), and then the sequence is converted into a fixed length vector by PDT. The sequence-order information can be efficiently included into the feature vector with little computational cost by this approach. Finally, the feature vectors are input into a support vector machine classifier to detect the protein remote homologies. Our experiments on a well-known benchmark show the proposed method SVM-PDT achieves superior or comparable performance with current state-of-the-art methods and its computational cost is considerably superior to those of other methods. When the evolutionary information extracted from the frequency profiles is combined with the PDT method, the profile-based PDT approach can improve the performance by 3.4% and 11.4% in terms of ROC score and ROC50 score respectively. The local sequence-order information of the protein can be efficiently captured by the proposed PDT and the physicochemical properties extracted from the amino acid index are incorporated into the prediction. The physicochemical distance transformation provides a general framework, which would be a valuable tool for protein-level study.  相似文献   

4.
The amino acid gamma-aminobutyric-acid receptors (GABAARs) belong to the ligand-gated ion channels (LGICs) superfamily. GABAARs are highly diverse in the central nervous system. These channels play a key role in regulating behavior. As a result, the prediction of GABAARs from the amino acid sequence would be helpful for research on these receptors. We have developed a method to predict these proteins using the features obtained from Chou's pseudo-amino acid composition concept and support vector machine as a powerful machine learning approach. The predictor efficiency was assessed by five-fold cross-validation. This method achieved an overall accuracy and Matthew's correlation coefficient (MCC) of 94.12% and 0.88, respectively. Furthermore, to evaluate the effect and power of each feature, the minimum Redundancy and Maximum Relevance (mRMR) feature selection method was implemented. An interesting finding in this study is the presence of all six characters (hydrophobicity, hydrophilicity, side chain mass, pK1, pK2 and pI) or combination of the characters among the 5 higher ranked features (pk2 and pI, hydrophobicity and mass, pk1, hydrophilicity and mass) obtained from the mRMR feature selection method. The results show a biologically justifiable ranked attributes of pk2 and pI; hydrophobicity, hydrophilicity and mass; mass and pk1; pk2 and mass. Based on our results, using the concept of Chou's pseudo-amino acid composition and support vector machine is an effective approach for the prediction of GABAARs.  相似文献   

5.
Shi JY  Zhang SW  Pan Q  Cheng YM  Xie J 《Amino acids》2007,33(1):69-74
As more and more genomes have been discovered in recent years, there is an urgent need to develop a reliable method to predict the subcellular localization for the explosion of newly found proteins. However, many well-known prediction methods based on amino acid composition have problems utilizing the sequence-order information. Here, based on the concept of Chou's pseudo amino acid composition (PseAA), a new feature extraction method, the multi-scale energy (MSE) approach, is introduced to incorporate the sequence-order information. First, a protein sequence was mapped to a digital signal using the amino acid index. Then, by wavelet transform, the mapped signal was broken down into several scales in which the energy factors were calculated and further formed into an MSE feature vector. Following this, combining this MSE feature vector with amino acid composition (AA), we constructed a series of MSEPseAA feature vectors to represent the protein subcellular localization sequences. Finally, according to a new kind of normalization approach, the MSEPseAA feature vectors were normalized to form the improved MSEPseAA vectors, named as IEPseAA. Using the technique of IEPseAA, C-support vector machine (C-SVM) and three multi-class SVMs strategies, quite promising results were obtained, indicating that MSE is quite effective in reflecting the sequence-order effects and might become a useful tool for predicting the other attributes of proteins as well.  相似文献   

6.
7.
Kovacs JM  Mant CT  Hodges RS 《Biopolymers》2006,84(3):283-297
Understanding the hydrophilicity/hydrophobicity of amino acid side chains in peptides/proteins is one the most important aspects of biology. Though many hydrophilicity/hydrophobicity scales have been generated, an "intrinsic" scale has yet to be achieved. "Intrinsic" implies the maximum possible hydrophilicity/hydrophobicity of side chains in the absence of nearest-neighbor or conformational effects that would decrease the full expression of the side-chain hydrophilicity/hydrophobicity when the side chain is in a polypeptide chain. Such a scale is the fundamental starting point for determining the parameters that affect side-chain hydrophobicity and for quantifying such effects in peptides and proteins. A 10-residue peptide sequence, Ac-X-G-A-K-G-A-G-V-G-L-amide, was designed to enable the determination of the intrinsic values, where position X was substituted by all 20 naturally occurring amino acids and norvaline, norleucine, and ornithine. The coefficients were determined by reversed-phase high-performance liquid chromatography using six different mobile phase conditions involving different pH values (2, 5, and 7), ion-pairing reagents, and the presence and absence of different salts. The results show that the intrinsic hydrophilicity/hydrophobicity of amino acid side chains in peptides (proteins) is independent of pH, buffer conditions, or whether C(8) or C(18) reversed-phase columns were used for 17 side chains (Gly, Ala, Cys, Pro, Val, nVal, Leu, nLeu, Ile, Met, Tyr, Phe, Trp, Ser, Thr, Asn, and Gln) and dependent on pH and buffer conditions, including the type of salt or ion-pairing reagent for potentially charged side chains (Orn, Lys, His, Arg, Asp, and Glu).  相似文献   

8.
Knowledge of membrane protein type often provides crucial hints toward determining the function of an uncharacterized membrane protein. With the avalanche of new protein sequences emerging during the post-genomic era, it is highly desirable to develop an automated method that can serve as a high throughput tool in identifying the types of newly found membrane proteins according to their primary sequences, so as to timely make the relevant annotations on them for the reference usage in both basic research and drug discovery. Based on the concept of pseudo-amino acid composition [K.C. Chou, Proteins: Struct. Funct. Genet. 43 (2001) 246-255; Erratum: Proteins: Struct. Funct. Genet. 44 (2001) 60] that has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, a novel predictor, the so-called "optimized evidence-theoretic K-nearest neighbor" or "OET-KNN" classifier, was proposed. It was demonstrated via the self-consistency test, jackknife test, and independent dataset test that the new predictor, compared with many previous ones, yielded higher success rates in most cases. The new predictor can also be used to improve the prediction quality for, among many other protein attributes, structural class, subcellular localization, enzyme family class, and G-protein coupled receptor type. The OET-KNN classifier will be available as a web-server at http://www.pami.sjtu.edu.cn/kcchou.  相似文献   

9.
X-ray crystallographic protein structures often contain disordered regions that are observed as missing electron density. Diffraction data may give little or no direct evidence as to the specific nature of disordered regions. We have developed a weighted window-based disorder predictor optimized using crystallographic data. Performance of a predictor is strongly influenced by chain termini. Optimized score adjustment values for amino- and carboxy-terminal positions demonstrate a simple, monotonic relationship between disorder and residue distance from termini. This optimized disorder predictor performs similarly to DISOPRED2 on crystallographically disordered regions. Data-optimized residue disorder propensities show strong linear correlation with experimentally determined amino acid transfer energies between water and hydrogen-bonding organic solvents, which primarily reflect residue hydrophobicity (exemplified by the Nozaki-Tanford hydrophobicity scale). Disorder propensities do not correlate as well with transfer energies between water and apolar solvents, which primarily reflect a different hydropathic property: residue hydrophilicity (also reflected by the Kyte-Doolittle hydropathy scale). Our results suggest that while hydrophobic side-chain interactions are primarily involved in determining stability of the folded conformation, hydrogen bonding, and similar polar interactions are primarily involved in conformational and interaction specificity.  相似文献   

10.
采用生物信息学工具及网络资源,对已在GenBank上注册的玉米、小麦、芝麻、菜豆、海茄冬、红三叶等植物的肌醇-1-磷酸合成酶的核酸及氨基酸序列进行分析,并对其组成成分、导肽、跨膜结构域、疏水性/亲水性、分子系统进化关系、蛋白二级及三级结构等进行预测及推断。结果表明:植物肌醇-1-磷酸合成酶在进化过程中非常保守,且可能为定位于细胞质的亲水性蛋白;结构预测表明不同来源的MIPS蛋白尽管在结构上有所差异,但是其催化位点及NAD+结合区基本一致,具有较强的保守性。  相似文献   

11.
Conotoxins are disulfide rich small peptides that target a broad spectrum of ion-channels and neuronal receptors. They offer promising avenues in the treatment of chronic pain, epilepsy and cardiovascular diseases. Assignment of newly sequenced mature conotoxins into appropriate superfamilies using a computational approach could provide valuable preliminary information on the biological and pharmacological functions of the toxins. However, creation of protein sequence patterns for the reliable identification and classification of new conotoxin sequences may not be effective due to the hypervariability of mature toxins. With the aim of formulating an in silico approach for the classification of conotoxins into superfamilies, we have incorporated the concept of pseudo-amino acid composition to represent a peptide in a mathematical framework that includes the sequence-order effect along with conventional amino acid composition. The polarity index attribute, which encodes information such as residue surface buriability, polarity, and hydropathy, was used to store the sequence-order effect. Several methods like BLAST, ISort (Intimate Sorting) predictor, least Hamming distance algorithm, least Euclidean distance algorithm and multi-class support vector machines (SVMs), were explored for superfamily identification. The SVMs outperform other methods providing an overall accuracy of 88.1% for all correct predictions with generalized squared correlation of 0.75 using jackknife cross-validation test for A, M, O and T superfamilies and a negative set consisting of short cysteine rich sequences from different eukaryotes having diverse functions. The computed sensitivity and specificity for the superfamilies were found to be in the range of 84.0-94.1% and 80.0-95.5%, respectively, attesting to the efficacy of multi-class SVMs for the successful in silico classification of the conotoxins into their superfamilies.  相似文献   

12.
The complete amino acid sequence of the variable region of a Bence Jones protein NIG-77 from an individual with myeloma-associated systemic amyloidosis has been determined. This protein represents a complete light chain consisting of 216 residues and it has a sequence characteristic of V lambda I subgroup, which is closely homologous to that of another amyloidogenic V lambda I Bence Jones protein NIG-51, differing by 20 of 111 residues (82% homology). In contrast, it differs by 29 residues (74% homology) to that of non-amyloidogenic V lambda I light chain NIG-64. This finding shows that, in accordance with our previous report(1), the V lambda I-related light chains can further be divided into two distinct subsubgroups, V lambda I-1 and V lambda I-2, and the latter property seems to be more prone in association with the amyloid process.  相似文献   

13.
As a result of genome and other sequencing projects, the gap between the number of known protein sequences and the number of known protein structural classes is widening rapidly. In order to narrow this gap, it is vitally important to develop a computational prediction method for fast and accurately determining the protein structural class. In this paper, a novel predictor is developed for predicting protein structural class. It is featured by employing a support vector machine learning system and using a different pseudo-amino acid composition (PseAA), which was introduced to, to some extent, take into account the sequence-order effects to represent protein samples. As a demonstration, the jackknife cross-validation test was performed on a working dataset that contains 204 non-homologous proteins. The predicted results are very encouraging, indicating that the current predictor featured with the PseAA may play an important complementary role to the elegant covariant discriminant predictor and other existing algorithms.  相似文献   

14.
Relatively low success rates of X-ray crystallography, which is the most popular method for solving proteins structures, motivate development of novel methods that support selection of tractable protein targets. This aspect is particularly important in the context of the current structural genomics efforts that allow for a certain degree of flexibility in the target selection. We propose CRYSpred, a novel in-silico crystallization propensity predictor that uses a set of 15 novel features which utilize a broad range of inputs including charge, hydrophobicity, and amino acid composition derived from the protein chain, and the solvent accessibility and disorder predicted from the protein sequence. Our method outperforms seven modern crystallization propensity predictors on three, independent from training dataset, benchmark test datasets. The strong predictive performance offered by the CRYSpred is attributed to the careful design of the features, utilization of the comprehensive set of inputs, and the usage of the Support Vector Machine classifier. The inputs utilized by CRYSpred are well-aligned with the existing rules-of-thumb that are used in the structural genomics studies.  相似文献   

15.
甘蔗几丁质酶基因的电子克隆与生物信息学分析   总被引:1,自引:0,他引:1  
用电子克隆方法获得甘蔗几丁质酶基因SCCHI1,采用生物信息学方法,对该基因编码蛋白从氨基酸组成、理化性质、跨膜结构域、疏水性/亲水性、亚细胞定位、高级结构及功能域等方面进行了预测和分析。结果表明:SCCHI1基因全长1236bp,包含一个完整的990bp的ORF,编码329个氨基酸。SCCHI1基因属于糖苷水解酶19家族,含有N-端信号肽、交连区、催化区,与高粱等其它植物的几丁质酶基因具有高度的相似性。为SCCHI1基因的分子克隆、功能鉴定和应用提供参考。  相似文献   

16.
It is a critical challenge to develop automated methods for fast and accurately determining the structures of proteins because of the increasingly widening gap between the number of sequence-known proteins and that of structure-known proteins in the post-genomic age. The knowledge of protein structural class can provide useful information towards the determination of protein structure. Thus, it is highly desirable to develop computational methods for identifying the structural classes of newly found proteins based on their primary sequence. In this study, according to the concept of Chou's pseudo amino acid composition (PseAA), eight PseAA vectors are used to represent protein samples. Each of the PseAA vectors is a 40-D (dimensional) vector, which is constructed by the conventional amino acid composition (AA) and a series of sequence-order correlation factors as original introduced by Chou. The difference among the eight PseAA representations is that different physicochemical properties are used to incorporate the sequence-order effects for the protein samples. Based on such a framework, a dual-layer fuzzy support vector machine (FSVM) network is proposed to predict protein structural classes. In the first layer of the FSVM network, eight FSVM classifiers trained by different PseAA vectors are established. The 2nd layer FSVM classifier is applied to reclassify the outputs of the first layer. The results thus obtained are quite promising, indicating that the new method may become a useful tool for predicting not only the structural classification of proteins but also their other attributes.  相似文献   

17.
18.
西方蜜蜂毒蕈碱型乙酰胆碱受体基因的生物信息学分析   总被引:1,自引:0,他引:1  
利用生物信息学方法分析了西方蜜蜂 Apis mellifera毒蕈碱型乙酰胆碱受体的核酸和氨基酸序列,并对其组成成分、疏水/亲水区、跨膜拓扑结构域、分子系统进化关系进行了预测和推断.结果显示,该受体定位在第8条染色体上,由618个氨基酸组成,分子量69 906.5D,等电点(pI)8.56;是G蛋白偶联型受体,含N-糖基化位点、蛋白激酶C磷酸化位点、cAMP/cGMP依赖蛋白激酶磷酸化位点.  相似文献   

19.
A novel approach CE-Ploc is proposed for predicting protein subcellular locations by exploiting diversity both in feature and decision spaces. The diversity in a sequence of feature spaces is exploited using hydrophobicity and hydrophilicity of amphiphilic pseudo amino acid composition and a specific learning mechanism. Diversity in learning mechanisms is exploited by fusion of classifiers that are based on different learning mechanisms. Significant improvement in prediction performance is observed using jackknife and independent dataset tests.  相似文献   

20.
An algorithm of predicting the subcellular location of prokaryotic proteins is proposed in this paper. In addition to the amino acid composition, the auto-correlation functions based on the hydrophobicity profile of amino acids along the primary sequence of the query protein have been used. Consequently, the best predictive accuracy to date has been achieved. Of the 997 prokaryotic proteins in the database used here, 688 cytoplasmic, 107 extracellular and 202 periplasmic proteins, the overall predictive accuracies are as high as 97.7 and 90.4% in the resubstitution and jackknife tests, respectively, using the hydrophilicity value of Hopp and Woods. The underlying mechanism of the improvement is also discussed. This work would be useful for a systematic analysis of the great amounts of prokaryotic genome sequences. The computer programs used in this paper are available on request via email.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号