首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
β转角作为一种蛋白质二级结构类型在蛋白质折叠、蛋白质稳定性、分子识别等方面具有重要作用.现有的β转角预测方法,没有将PDB等结构数据库中先前存在的同源序列的结构信息映射到待预测的蛋白质序列上.PDB存储的结构已超过70 000,因此对一条新确定的序列,有较大可能性从PDB中找到其同源序列.本文融合PDB中提取的同源结构信息(对每一待测序列,仅使用先于该序列存储于PDB中的同源信息)与NetTurnP预测,提出了一种新的β转角预测方法BTMapping,在经典的BT426数据集和本文构建的数据集EVA937上,以马修斯相关系数表示的预测精度分别为0.56、0.52,而仅使用NetTurnP的为0.50、0.46,以Qtotal表示的预测精度分别为81.4%、80.4%,而仅使用NetTurnP的为78.2%、77.3%.结果证实同源结构信息结合先进的β转角预测器如NetTurnP有助于改进β转角识别.BTMapping程序及相关数据集可从http://www.bio530.weebly.com获得.  相似文献   

2.
磷酸化是蛋白质翻译后的主要修饰,可分为激酶特异性和非激酶特异性两种类型.以非激酶特异性磷酸化位点Dou数据集为基础,本文发展了一种基于位置的卡方差表特征χ~2-pos,融合伪氨基酸序列进化信息PsePSSM表征序列,构建正负样本均衡的支持向量机分类器,S,T,Y独立测试Matthew相关系数、ROC曲线下面积分及准确率分别达到了(0.59、0.87、79.74%),(0.55、0.85、77.68%)和(0.50、0.81、75.22%),明显优于文献报道结果.χ~2-pos、PsePSSM两种特征的融合在蛋白质磷酸化位点预测中有广泛应用前景.  相似文献   

3.
相似性比对预测蛋白质亚细胞区间   总被引:1,自引:0,他引:1  
王雄飞  张梁  薛卫  赵南  徐焕良 《微生物学通报》2016,43(10):2298-2305
【目的】对蛋白质所属的亚细胞区间进行预测,为进一步研究蛋白质的生物学功能提供基础。【方法】以蛋白质序列的氨基酸组成、二肽、伪氨基酸组成作为序列特征,用BLAST比对改进K最近邻分类算法(K-nearest neighbor,KNN)实现蛋白序列所属亚细胞区间预测。【结果】在Jackknife检验下,数据集CH317三种特征的成功率分别为91.5%、91.5%和89.3%,数据集ZD98成功率分别为93.9%、92.9%和89.8%。【结论】BLAST比对改进KNN算法是预测蛋白质亚细胞区间的一种有效方法。  相似文献   

4.
外膜蛋白(Outer Membrane Proteins, OMPs)是一类具有重要生物功能的蛋白质, 通过生物信息学方法来预测OMPs能够为预测OMPs的二级和三级结构以及在基因组发现新的OMPs提供帮助。文中提出计算蛋白质序列的氨基酸含量特征、二肽含量特征和加权多阶氨基酸残基指数相关系数特征, 将三类特征组合, 采用支持向量机(Support Vector Machine, SVM)算法来识别OMPs。计算了包括四种残基指数的多种组合特征的识别结果, 并且讨论了相关系数的阶次和权值对预测性能的影响。在数据集上的十倍交叉验证测试和独立性测试结果显示, 组合特征识别方法对OMPs和非OMPs的识别精度最高分别达到96.96%和97.33%, 优于现有的多种方法。在五种细菌基因组内识别OMPs的结果显示, 组合特征方法具有很高的特异性, 并且对PDB数据库中已知结构的OMPs识别准确度超过99%。表明该方法能够作为基因组内筛选OMPs的有效工具。  相似文献   

5.
外膜蛋白(Outer Membrane Proteins, OMPs)是一类具有重要生物功能的蛋白质, 通过生物信息学方法来预测OMPs能够为预测OMPs的二级和三级结构以及在基因组发现新的OMPs提供帮助。文中提出计算蛋白质序列的氨基酸含量特征、二肽含量特征和加权多阶氨基酸残基指数相关系数特征, 将三类特征组合, 采用支持向量机(Support Vector Machine, SVM)算法来识别OMPs。计算了包括四种残基指数的多种组合特征的识别结果, 并且讨论了相关系数的阶次和权值对预测性能的影响。在数据集上的十倍交叉验证测试和独立性测试结果显示, 组合特征识别方法对OMPs和非OMPs的识别精度最高分别达到96.96%和97.33%, 优于现有的多种方法。在五种细菌基因组内识别OMPs的结果显示, 组合特征方法具有很高的特异性, 并且对PDB数据库中已知结构的OMPs识别准确度超过99%。表明该方法能够作为基因组内筛选OMPs的有效工具。  相似文献   

6.
蛋白质结构类预测是生物信息和蛋白质科学中重要的研究领域.基于Chou提出的伪氨基酸离散模型框架,从蛋白质序列出发,设计一种新的伪氨基酸组成方法表示蛋白质序列样本.抽取氨基酸组合(10-D)在序列中出现的频率和疏水氨基酸模式(6-D)表示蛋白质序列的附加特征,用和传统的氨基酸组成(20-D)一起构成的36维的伪氨基酸组成向量来表示蛋白质序列的特征.使用遗传算法来优化附加特征的权重系数.伪氨基酸组成向量作为输入数据,模糊支持向量机作为预测工具.使用三个常用的标准数据集来验证算法的性能.Jack-knife检验结果说明本方法具有较高的准确率,有望成为潜在的预测蛋白质功能的工具.  相似文献   

7.
为了确定前列腺特异抗原(PSA)启动子中与雄激素调节相关的序列, 发现PSA启动子ARE上游一段15 bp的区域(-396~-382 bp), 是雄激素受体(AR)对PSA启动子激活所必需的, 将其命名为RFA. 转染和CAT分析显示该序列中某些核苷酸置换可显著降低雄激素对PSA基因启动子的诱导活性, 体外竞争结合实验证实某些前列腺细胞核内的非受体蛋白因子可与其特异结合, 但其突变型则丧失了这种能力, 该序列可能是一个新的辅助性顺式元件. 以RFA为探针, 利用亲和层析分离纯化了RFA结合蛋白, SDS-PAGE和蛋白质初步鉴定结果表明, 该蛋白与已知的多功能蛋白hnRNPA1, A2高度同源. RFA结合蛋白可能作为辅激活因子协同AR对PSA启动子的反式激活作用. 研究结果有助于深入理解PSA启动子的作用机制和组织特异性.  相似文献   

8.
路易(小)体(Lewy body, LB)构成特征的蛋白质组份(protein content)在体外形成的路易(小)体样包含体(Lewy body-like inclusion)或聚集体(aggresome)中能够获得鉴定.通过蛋白质组学方法鉴定LB蛋白质组分是一种新的途径.10 µmol/L人工合成蛋白酶体抑制剂PSI(proteasomal inhibitor)作用PC12细胞48 h使其产生PSI诱导性包含体(PSI-induced inclusions). 为了在体外指明可能的LB蛋白质组分,通过生物化学分级分离、双向电泳(two-dimensional electrophoresis,2-D)和肽质量指纹鉴定(identification via peptide mass fingerprints,PMF)的蛋白质组学方法,鉴定了2个涉及突触递质合成的蛋白质、6个26 S蛋白酶体亚基、2个细胞骨架蛋白、2个线粒体蛋白、1个抗氧化蛋白和7个分子伴侣蛋白和(或)分子伴侣样蛋白等20个LB蛋白质组分.结果提示,当PC12细胞发生蛋白酶体抑制时,这20个LB蛋白质组分可能被富集到PSI诱导性包含体中.  相似文献   

9.
探讨在纳入分析数据时,数据信息的选择对ITS2序列作为DNA条形码在葫芦科植物中鉴定能力的影响。首先,建立由葫芦科植物ITS2序列组成的3个资料组,其中Dataset1为实验样本,Dataset2由实验样本及GenBank数据库样本组合,Dataset3为从Dataset2中去除部分序列后所得。通过比较3个资料组的种间、种内的变异、Barcoding Gap及鉴定成功率,评估纳入分析的数据选择差异对ITS2鉴定能力的影响。结果显示ITS2序列在3个资料组属水平上的鉴定成功率均达到100%;种水平上,用BLAST1法鉴定成功率分别为100%、 67.8%、 90.6%,Nearest Distance法鉴定成功率分别为100%、 52.5%、 66.5%。可见纳入分析的数据选择有差异时,会导致鉴定成功率的较大变化。3个资料组中,ITS2分析仅有Dataset2的Barcoding Gap不够显著。因此对于DNA条形码分析中的数据纳入标准,值得进一步研究。  相似文献   

10.
邹凌云  王正志  黄教民 《遗传学报》2007,34(12):1080-1087
蛋白质必须处于正确的亚细胞位置才能行使其功能。文章利用PSI-BLAST工具搜索蛋白质序列,提取位点特异性谱中的位点特异性得分矩阵作为蛋白质的一类特征,并计算4等分序列的氨基酸含量以及1~7阶二肽含量作为另外两类特征,由这三类特征一共得到蛋白质序列的12个特征向量。通过设计一个简单加权函数对各类特征向量加权处理,作为神经网络预测器的输入,并使用Levenberg-Marquardt算法代替传统的EBP算法来调整网络权值和阈值,大大提高了训练速度。对具有4类亚细胞位置和12类亚细胞位置的两种蛋白质数据集分别进行"留一法"测试和5倍交叉验证测试,总体预测精度分别达到88.4%和83.3%。其中,对4类亚细胞位置数据集的预测效果优于普通BP神经网络、隐马尔可夫模型、模糊K邻近等预测方法,对12类亚细胞位置数据集的预测效果优于支持向量机分类方法。最后还对三类特征采取不同加权比例对预测精度的影响进行了讨论,对选择的八种加权比例的预测结果表明,分别给予三类特征合适的权值系数可以进一步提高预测精度。  相似文献   

11.
DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly.The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.  相似文献   

12.
ABSTRACT: BACKGROUND: RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS: We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naive Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS: Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.  相似文献   

13.
目的:建立一种准确、快速、低廉的检测超广谱β-内酰胺酶(ESBL)细菌方法。方法:以克拉维酸为ESBL的抑制剂、3种第三代头孢菌素(TGC)和氨曲南为底物、某些药敏结果为补充,建立了多底物协同方法,并与E-test方法进行了对照,符合率80%,不符合菌株用质粒提取、细菌转化、接合传递试验进行验证。结果:对116株疑产ESBL细菌用多底物协同方法和E-test方向同时检测,前者比后者阳性率高。结论:该方法准确、快速、价格低廉,特别适于常规应用。  相似文献   

14.
15.
Wang  Cui-cui  Fang  Yaping  Xiao  Jiamin  Li  Menglong 《Amino acids》2011,40(1):239-248
RNA–protein interactions play a pivotal role in various biological processes, such as mRNA processing, protein synthesis, assembly, and function of ribosome. In this work, we have introduced a computational method for predicting RNA-binding sites in proteins based on support vector machines by using a variety of features from amino acid sequence information including position-specific scoring matrix (PSSM) profiles, physicochemical properties and predicted solvent accessibility. Considering the influence of the surrounding residues of an amino acid and the dependency effect from the neighboring amino acids, a sliding window and a smoothing window are used to encode the PSSM profiles. The outer fivefold cross-validation method is evaluated on the data set of 77 RNA-binding proteins (RBP77). It achieves an overall accuracy of 88.66% with the Matthew’s correlation coefficient (MCC) of 0.69. Furthermore, an independent data set of 39 RNA-binding proteins (RBP39) is employed to further evaluate the performance and achieves an overall accuracy of 82.36% with the MCC of 0.44. The result shows that our method has good generalization abilities in predicting RNA-binding sites for novel proteins. Compared with other previous methods, our method performs well on the same data set. The prediction results suggest that the used features are effective in predicting RNA-binding sites in proteins. The code and all data sets used in this article are freely available at .  相似文献   

16.
17.
18.
19.
Ubiquitin functions to regulate protein turnover in a cell by closely regulating the degradation of specific proteins. Such a regulatory role is very important, and thus I have analyzed the proteins that are ubiquitin-like, using an artificial neural network, support vector machines and a hidden Markov model (HMM). The methods were trained and tested on a set of 373 ubiquitin proteins and 373 non-ubiquitin proteins, obtained from Entrez protein database. The artificial neural network and support vector machine are trained and tested using both the physicochemical properties and PSSM matrices generated from PSI-BLAST, while in the HMM based method direct sequences are used for training-testing procedures. Further, the performance measures of the methods are calculated for test sequences, i.e. accuracy, specificity, sensitivity and Matthew's correlation coefficients of the methods are calculated. The highest accuracy of 90.2%, specificity of 87.04% and sensitivity of 94.08% was achieved using the support vector machine model with PSSM matrices. While accuracies of 86.82%, 83.37%, 80.18% and 72.11% were obtained for the support vector machine with physicochemical properties, neural network with PSSM matrices, neural networks with physicochemical properties, and hidden Markov model, respectively. As the accuracy for SVM model is better both using physicochemical properties and the PSSM matrices, it is concluded that kernel methods such as SVM outperforms neural networks and hidden Markov models.  相似文献   

20.
Functional annotation of protein sequences with low similarity to well characterized protein sequences is a major challenge of computational biology in the post genomic era. The cyclin protein family is once such important family of proteins which consists of sequences with low sequence similarity making discovery of novel cyclins and establishing orthologous relationships amongst the cyclins, a difficult task. The currently identified cyclin motifs and cyclin associated domains do not represent all of the identified and characterized cyclin sequences. We describe a Support Vector Machine (SVM) based classifier, CyclinPred, which can predict cyclin sequences with high efficiency. The SVM classifier was trained with features of selected cyclin and non cyclin protein sequences. The training features of the protein sequences include amino acid composition, dipeptide composition, secondary structure composition and PSI-BLAST generated Position Specific Scoring Matrix (PSSM) profiles. Results obtained from Leave-One-Out cross validation or jackknife test, self consistency and holdout tests prove that the SVM classifier trained with features of PSSM profile was more accurate than the classifiers based on either of the other features alone or hybrids of these features. A cyclin prediction server--CyclinPred has been setup based on SVM model trained with PSSM profiles. CyclinPred prediction results prove that the method may be used as a cyclin prediction tool, complementing conventional cyclin prediction methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号