首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
An approach of encoding for prediction of splice sites using SVM   总被引:1,自引:0,他引:1  
Huang J  Li T  Chen K  Wu J 《Biochimie》2006,88(7):923-929
In splice sites prediction, the accuracy is lower than 90% though the sequences adjacent to the splice sites have a high conservation. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algorithms used, and few used for solving the fundamental issues, namely, nucleotide encoding. In this paper, a predictor is constructed to predict the true and false splice sites for higher eukaryotes based on support vector machines (SVM). Four types of encoding, which were mono-nucleotide (MN) encoding, MN with frequency difference between the true sites and false sites (FDTF) encoding, Pair-wise nucleotides (PN) encoding and PN with FDTF encoding, were applied to generate the input for the SVM. The results showed that PN with FDTF encoding as input to SVM led to the most reliable recognition of splice sites and the accuracy for the prediction of true donor sites and false sites were 96.3%, 93.7%, respectively, and the accuracy for predicting of true acceptor sites and false sites were 94.0%, 93.2%, respectively.  相似文献   

2.
In the post-genome era, the prediction of protein function is one of the most demanding tasks in the study of bioinformatics. Machine learning methods, such as the support vector machines (SVMs), greatly help to improve the classification of protein function. In this work, we integrated SVMs, protein sequence amino acid composition, and associated physicochemical properties into the study of nucleic-acid-binding proteins prediction. We developed the binary classifications for rRNA-, RNA-, DNA-binding proteins that play an important role in the control of many cell processes. Each SVM predicts whether a protein belongs to rRNA-, RNA-, or DNA-binding protein class. Self-consistency and jackknife tests were performed on the protein data sets in which the sequences identity was < 25%. Test results show that the accuracies of rRNA-, RNA-, DNA-binding SVMs predictions are approximately 84%, approximately 78%, approximately 72%, respectively. The predictions were also performed on the ambiguous and negative data set. The results demonstrate that the predicted scores of proteins in the ambiguous data set by RNA- and DNA-binding SVM models were distributed around zero, while most proteins in the negative data set were predicted as negative scores by all three SVMs. The score distributions agree well with the prior knowledge of those proteins and show the effectiveness of sequence associated physicochemical properties in the protein function prediction. The software is available from the author upon request.  相似文献   

3.
利用基因工程手段表达了分子量约为24 kDa的重组大肠杆菌单链结合蛋白 (r-SSBP),通过凝胶阻滞电泳与DNA熔解温度 (Tm) 影响实验表征了r-SSBP与单链DNA (ssDNA) 结合的特性,结果表明,r-SSBP可以与ssDNA结合,并且能够降低DNA的Tm值,同时还能增大含有单个错配碱基的DNA与完全匹配的DNA的Tm值差异,这一特性在提高单核苷酸多态性检测的特异性方面具有潜在的应用价值。此外,将r-SSBP应用于本课题组开发的高灵敏度焦磷酸测序体系中测定已知序列ssDNA模板,结果表明,r  相似文献   

4.
在理解细菌与环境的相互作用方面,细菌sRNA的识别发挥重要作用。文章介绍了一个通过增加训练集中实验证实的sRNA来构建细菌sRNA预测模型的策略,并以大肠杆菌K-12的sRNA预测为例来说明策略的可行性。结果表明,按此策略构建的模型sRNASVM的10倍交叉检验精度达到92.45%,高于目前文献中报道的精度。因此,构建的这一模型将为实验发现sRNA提供较好的生物信息学支持。有关模型和详细结果可以从网站http://ccb.bmi.ac.cn/srnasvm/下载。  相似文献   

5.
6.
Chung JL  Wang W  Bourne PE 《Proteins》2006,62(3):630-640
A rapid increase in the number of experimentally derived three-dimensional structures provides an opportunity to better understand and subsequently predict protein-protein interactions. In this study, structurally conserved residues were derived from multiple structure alignments of the individual components of known complexes and the assigned conservation score was weighted based on the crystallographic B factor to account for the structural flexibility that will result in a poor alignment. Sequence profile and accessible surface area information was then combined with the conservation score to predict protein-protein binding sites using a Support Vector Machine (SVM). The incorporation of the conservation score significantly improved the performance of the SVM. About 52% of the binding sites were precisely predicted (greater than 70% of the residues in the site were identified); 77% of the binding sites were correctly predicted (greater than 50% of the residues in the site were identified), and 21% of the binding sites were partially covered by the predicted residues (some residues were identified). The results support the hypothesis that in many cases protein interfaces require some residues to provide rigidity to minimize the entropic cost upon complex formation.  相似文献   

7.
ERp57 belongs to the protein disulfide isomerases, a family of homologous proteins mainly localized in the endoplasmic reticulum and characterized by the presence of a thioredoxin-like folding domain. ERp57 is a protein chaperone with thiol-dependent protein disulfide isomerase and additional activities and recently it has been shown to be involved, in cooperation with calnexin or with calreticulin, in the correct folding of glycoproteins. However, we have demonstrated that the same protein is also present in the nucleus, mainly associated with the internal nuclear matrix fraction. In vitro studies have shown that ERp57 has DNA-binding properties which are strongly dependent on its redox state, the oxidized form being the competent one. A comparison study on a recombinant form of ERp57 and several deletion mutants, obtained as fusion proteins and expressed in Escherichia coli, allowed us to identify the C-terminal a(') domain as directly involved in the DNA-binding activity of ERp57.  相似文献   

8.
Single-stranded DNA-binding protein (SSB) is an essential protein necessary for the functioning of the DNA replication, repair and recombination machineries. Here we report the structure of the DNA-binding domain of Mycobacterium tuberculosis SSB (MtuSSB) in four different crystals distributed in two forms. The structure of one of the forms was solved by a combination of isomorphous replacement and anomalous scattering. This structure was used to determine the structure of the other form by molecular replacement. The polypeptide chain in the structure exhibits the oligonucleotide binding fold. The globular core of the molecule in different subunits in the two forms and those in Escherichia coli SSB (EcoSSB) and human mitochondrial SSB (HMtSSB) have similar structure, although the three loops exhibit considerable structural variation. However, the tetrameric MtuSSB has an as yet unobserved quaternary association. This quaternary structure with a unique dimeric interface lends the oligomeric protein greater stability, which may be of significance to the functioning of the protein under conditions of stress. Also, as a result of the variation in the quaternary structure the path adopted by the DNA to wrap around MtuSSB is expected to be different from that of EcoSSB.  相似文献   

9.
Wang B  Chen P  Huang DS  Li JJ  Lok TM  Lyu MR 《FEBS letters》2006,580(2):380-384
This paper proposes a novel method that can predict protein interaction sites in heterocomplexes using residue spatial sequence profile and evolution rate approaches. The former represents the information of multiple sequence alignments while the latter corresponds to a residue's evolutionary conservation score based on a phylogenetic tree. Three predictors using a support vector machines algorithm are constructed to predict whether a surface residue is a part of a protein-protein interface. The efficiency and the effectiveness of our proposed approach is verified by its better prediction performance compared with other models. The study is based on a non-redundant data set of heterodimers consisting of 69 protein chains.  相似文献   

10.
A classification model of a DNA-binding protein chain was created based on identification of alpha helices within the chain likely to bind to DNA. Using the model, all chains in the Protein Data Bank were classified. For many of the chains classified with high confidence, previous documentation for DNA-binding was found, yet no sequence homology to the structures used to train the model was detected. The result indicates that the chain model can be used to supplement sequence based methods for annotating the function of DNA-binding. Four new candidates for DNA-binding were found, including two structures solved through structural genomics efforts. For each of the candidate structures, possible sites of DNA-binding are indicated by listing the residue ranges of alpha helices likely to interact with DNA.  相似文献   

11.
Recently, two different models have been developed for predicting gamma-turns in proteins by Kaur and Raghava [2002. An evaluation of beta-turn prediction methods. Bioinformatics 18, 1508-1514; 2003. A neural-network based method for prediction of gamma-turns in proteins from multiple sequence alignment. Protein Sci. 12, 923-929]. However, the major limitation of previous methods is inability in predicting gamma-turns types. Thus, there is a need to predict gamma-turn types using an approach which will be useful in overall tertiary structure prediction. In this work, support vector machines (SVMs), a powerful model is proposed for predicting gamma-turn types in proteins. The high rates of prediction accuracy showed that the formation of gamma-turn types is evidently correlated with the sequence of tripeptides, and hence can be approximately predicted based on the sequence information of the tripeptides alone.  相似文献   

12.
Using ultraviolet light, both the 33,000-dalton single-stranded DNA-binding protein from T4 bacteriophage (gp32) as well as a 25,000-dalton limited trypsin cleavage product of gp32 (core gp32*) that retains high affinity for single-stranded DNA can be crosslinked to an oligodeoxynucleotide, p(dT)8. After photolysis, a single tryptic peptide crosslinked to p(dT)8 was isolated by anion-exchange high-performance liquid chromatography. Gas-phase sequencing of this modified peptide gave the following sequence: Gln-Val-Ser-Gly-(X)-Ser-Asn-Tyr-Asp-Glu-Ser-Lys, which corresponds to residues 179-190 in gp32. Based on the absence of the expected phenylthiohydantoin derivative of phenylalanine 183 at cycle 5 (X) we infer that crosslinking has occurred at this position and that phenylalanine 183 is at the interface of the gp32:p(dT)8 complex in an orientation that allows covalent bond formation with the thymine radical produced by ultraviolet irradiation.  相似文献   

13.
Structural genomics projects as well as ab initio protein structure prediction methods provide structures of proteins with no sequence or fold similarity to proteins with known functions. These are often low-resolution structures that may only include the positions of C alpha atoms. We present a fast and efficient method to predict DNA-binding proteins from just the amino acid sequences and low-resolution, C alpha-only protein models. The method uses the relative proportions of certain amino acids in the protein sequence, the asymmetry of the spatial distribution of certain other amino acids as well as the dipole moment of the molecule. These quantities are used in a linear formula, with coefficients derived from logistic regression performed on a training set, and DNA-binding is predicted based on whether the result is above a certain threshold. We show that the method is insensitive to errors in the atomic coordinates and provides correct predictions even on inaccurate protein models. We demonstrate that the method is capable of predicting proteins with novel binding site motifs and structures solved in an unbound state. The accuracy of our method is close to another, published method that uses all-atom structures, time-consuming calculations and information on conserved residues.  相似文献   

14.
Huang JH  Cao DS  Yan J  Xu QS  Hu QN  Liang YZ 《Biochimie》2012,94(8):1697-1704
As the most frequent drug target, G protein-coupled receptors (GPCRs) are a large family of seven trans-membrane receptors that sense molecules outside the cell and activate inside signal transduction pathways. The activity and lifetime of activated receptors are regulated by receptor phosphorylation. Therefore, investigating the exact positions of phosphorylation sites in GPCRs sequence could provide useful clues for drug design and other biotechnology applications. Experimental identification of phosphorylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of phosphorylation sites from amino acid sequences. In this article, we presented a simple and effective method to recognize phosphorylation sites of human GPCRs by combining amino acid hydrophobicity and support vector machine. The prediction accuracy, sensitivity, specificity, Matthews correlation coefficient and area under the curve values for phosphoserine, phosphothreonine, and phosphotyrosine were 0.964, 0.790, 0.999, 0.866, 0.941; 0.954, 0.800, 0.985, 0.828, 0.958; and 0.976, 0.820, 0.993, 0.861, 0.959, respectively. The establishment of such a fast and accurate prediction method will speed up the pace of identifying proper GPCRs sites to facilitate drug discovery.  相似文献   

15.
We combined computational and experimental methods to interrogate the binding determinants of angiopoietin-2 (Ang2) to its receptor tyrosine kinase (RTK) Tie2—a central signaling system in angiogenesis, inflammation, and tumorigenesis. We used physics-based electrostatic and surface-area calculations to identify the subset of interfacial Ang2 and Tie2 residues that can affect binding directly. Using random and site-directed mutagenesis and yeast surface display (YSD), we validated these predictions and identified additional Ang2 positions that affected receptor binding. We then used burial-based calculations to classify the larger set of Ang2 residues that are buried in the Ang2 core, whose mutations can perturb the Ang2 structure and thereby affect interactions with Tie2 indirectly. Our analysis showed that the Ang2-Tie2 interface is dominated by nonpolar contributions, with only three Ang2 and two Tie2 residues that contribute electrostatically to intermolecular interactions. Individual interfacial residues contributed only moderately to binding, suggesting that engineering of this interface will require multiple mutations to reach major effects. Conversely, substitutions in substantially buried Ang2 residues were more prevalent in our experimental screen, reduced binding substantially, and are therefore more likely to have a deleterious effect that might contribute to oncogenesis. Computational analysis of additional RTK-ligand complexes, c-Kit-SCF and M-CSF-c-FMS, and comparison to previous YSD results, further show the utility of our combined methodology.  相似文献   

16.
Ding S  Zhang S  Li Y  Wang T 《Biochimie》2012,94(5):1166-1171
Knowledge of structural classes plays an important role in understanding protein folding patterns. In this paper, features based on the predicted secondary structure sequence and the corresponding E–H sequence are extracted. Then, an 11-dimensional feature vector is selected based on a wrapper feature selection algorithm and a support vector machine (SVM). Among the 11 selected features, 4 novel features are newly designed to model the differences between α/β class and α + β class, and other 7 rational features are proposed by previous researchers. To examine the performance of our method, a total of 5 datasets are used to design and test the proposed method. The results show that competitive prediction accuracies can be achieved by the proposed method compared to existing methods (SCPRED, RKS-PPSC and MODAS), and 4 new features are demonstrated essential to differentiate α/β and α + β classes. Standalone version of the proposed method is written in JAVA language and it can be downloaded from http://web.xidian.edu.cn/slzhang/paper.html.  相似文献   

17.
Membrane-binding peripheral proteins play important roles in many biological processes, including cell signaling and membrane trafficking. Unlike integral membrane proteins, these proteins bind the membrane mostly in a reversible manner. Since peripheral proteins do not have canonical transmembrane segments, it is difficult to identify them from their amino acid sequences. As a first step toward genome-scale identification of membrane-binding peripheral proteins, we built a kernel-based machine learning protocol. Key features of known membrane-binding proteins, including electrostatic properties and amino acid composition, were calculated from their amino acid sequences and tertiary structures, which were then incorporated into the support vector machine to perform the classification. A data set of 40 membrane-binding proteins and 230 non-membrane-binding proteins was used to construct and validate the protocol. Cross-validation and holdout evaluation of the protocol showed that the accuracy of the prediction reached up to 93.7% and 91.6%, respectively. The protocol was applied to the prediction of membrane-binding properties of four C2 domains from novel protein kinases C. Although these C2 domains have 50% sequence identity, only one of them was predicted to bind the membrane, which was verified experimentally with surface plasmon resonance analysis. These results suggest that our protocol can be used for predicting membrane-binding properties of a wide variety of modular domains and may be further extended to genome-scale identification of membrane-binding peripheral proteins.  相似文献   

18.
In this study, the predictors are developed for protein submitochondria locations based on various features of sequences. Information about the submitochondria location for a mitochondria protein can provide much better understanding about its function. We use ten representative models of protein samples such as pseudo amino acid composition, dipeptide composition, functional domain composition, the combining discrete model based on prediction of solvent accessibility and secondary structure elements, the discrete model of pairwise sequence similarity, etc. We construct a predictor based on support vector machines (SVMs) for each representative model. The overall prediction accuracy by the leave-one-out cross validation test obtained by the predictor which is based on the discrete model of pairwise sequence similarity is 1% better than the best computational system that exists for this problem. Moreover, we develop a method based on ordered weighted averaging (OWA) which is one of the fusion data operators. Therefore, OWA is applied on the 11 best SVM-based classifiers that are constructed based on various features of sequence. This method is called Mito-Loc. The overall leave-one-out cross validation accuracy obtained by Mito-Loc is about 95%. This indicates that our proposed approach (Mito-Loc) is superior to the result of the best existing approach which has already been reported.  相似文献   

19.
有关蛋白质功能的研究是解析生命奥秘的基础,机器学习技术在该领域已有广泛应用。利用支持向量机(support vectormachine,SVM)方法,构建一个预测蛋白质功能位点的通用平台。该平台先提取非同源蛋白质序列,再对这些序列进行特征编码(包括序列的基本信息、物化特征、结构信息及序列保守性特征等),以编码好的样本作为训练数据,利用SVM进行训练,得到敏感性、特异性、Matthew相关系数、准确率及ROC曲线等评价指标,反复测试,得到评价指标最优的SVM模型后,便可以用来预测蛋白质序列上的功能位点。该平台除了应用在预测蛋白质功能位点之外,还可以应用于疾病相关单核苷酸多态性(SNP)预测分析、预测蛋白质结构域分析、生物分子间的相互作用等。  相似文献   

20.
《Biochimie》2013,95(9):1741-1744
In this study, a 12-dimensional feature vector is constructed to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence. Among the 12 features, 6 novel features are specially designed to improve the prediction accuracies for α/β and α + β classes based on the distributions of α-helices and β-strands and the characteristics of parallel β-sheets and anti-parallel β-sheets. To evaluate our method, the jackknife cross-validating test is employed on two widely-used datasets, 25PDB and 1189 datasets with sequence similarity lower than 40% and 25%, respectively. The performance of our method outperforms the recently reported methods in most cases, and the 6 newly-designed features have significant positive effect to the prediction accuracies, especially for α/β and α + β classes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号