首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 156 毫秒
1.
采用主成分分析、偏最小二乘回归和BP神经网络三种方法对嗜热和常温蛋白进行模式识别。结果表明,三种方法对训练集拟合的平均正确率分别为92%、95%和98%,对测试集进行预测的平均正确率分别为60%、72.5%和72.5%,对嗜热蛋白预测正确率最高为75%,常温蛋白最高为85%。构建了数学模型并对其生物学意义进行了解释,建立了一种基于序列的识别嗜热和常温蛋白的新方法。  相似文献   

2.
采用Boosting机制的决策树集成分类器对嗜热和常温蛋白进行模式识别。通过自一致性检验、交叉验证和独立样本测试三种方法检测,其中作为Boosting算法中新的Logitboost算法表现更好,其识别的精度分别为100%、88.4%和89.5%,优于神经网络的识别效果。同时探讨了蛋白质分子大小对识别效果的影响。结果表明,将Boosting算法与其它单一分类器有效结合,有望提高研究者对生物分子相关特性的识别能力。  相似文献   

3.
嗜热蛋白在高温下能保持稳定性和活性,是研究蛋白质热稳定性的理想模型,开发一个蛋白质热稳定性识别的方法将对蛋白质工程和蛋白质的设计很有帮助。目前的研究中,氨基酸的组成及其物化性质一直被认为和蛋白质的热稳定性相关。本研究筛选出可靠的数据集,包括915个嗜热蛋白和793个非嗜热蛋白。利用蛋白质氨基酸的物化性质和氨基酸的组成表征嗜热蛋白,将二肽氨基酸组成整合到9组氨基酸物化性质中使蛋白序列公式化。支持向量机5折叠交叉验证表明:当gap=0时,290个特征产生的精度最高,为92.74%。因此说明对于分析蛋白质的热稳定性,所建立的预测模型将是一个很有效的工具。  相似文献   

4.
嗜热与嗜常温微生物的蛋白质氨基酸组成比较   总被引:11,自引:0,他引:11  
嗜热微生物的嗜热特性与其蛋白质的高度热稳定性紧密相关。为了探索嗜热蛋白质的热稳定机制,比较嗜热和嗜常温微生物的蛋白质在氨基酸组成上的差别,收集110对分别来自嗜热和嗜常温微生物的同源蛋白质序列,比较两组蛋白质各种氨基酸含量以及疏水性氨基酸组成、疏水性指数和荷电氨基酸组成的差别,结果两者在多种氨基酸含量上存在微小但统计学上显著的差别,嗜热蛋白质比嗜常温蛋白质具有较高的平均疏水性和荷电氨基酸组成。对两组蛋白质的“脂肪族氨基酸指数”进行分析,证明嗜热蛋白质之所以具有较高的脂肪族氨基酸指数是由于其亮氨酸含量较高,与影响该指数的其它几种氨基酸无关;从而认为该指数的意义值得怀疑。通过对大量同源嗜热蛋白质和嗜常温蛋白质氨基酸组成的比较,能够揭示一些有关蛋白质热稳定性的普遍规律。  相似文献   

5.
通过计数、分离与筛选,对常温环境嗜热菌和产嗜热蛋白酶菌的分布及资源状况进行了研究。结果表明,常温环境中存在着一定数量的嗜热菌和产嗜热蛋白酶菌。土壤与水体相比,其嗜热菌资源相对丰富,且耕作肥沃的土壤中产嗜热蛋白酶菌多于贫瘠土壤;在水环境中,无论湖水、江水还是处理中的废水,在常温条件下均有一定比例的嗜热菌和产嗜热蛋白酶菌。在啤酒废水曝气阶段,产嗜热蛋白酶菌占嗜热菌的比例较大,达45%。本研究筛选的1株嗜热菌其产嗜热蛋白酶活性较高,该菌株在pH7.6、温度68℃条件下其蛋白酶活力达到642U·ml^-1。该项研究为开发产嗜热蛋白酶菌资源,在工业和环境治理等方面的应用提供重要科学依据。  相似文献   

6.
【目的】 比较嗜压和非嗜压微生物中蛋白质在氨基酸和二肽组成上的差异对嗜压蛋白稳定性机理的了解及在此基础上的定向改造具有重要意义。【方法】利用4种微生物全蛋白质组信息,统计了639对直系同源序列二级结构氨基酸组成及二肽组成并计算其偏差。【结果】结果表明:在β折叠和无规则卷曲中二者差异明显,β折叠中,嗜压蛋白含更多的缬氨酸,异亮氨酸,亮氨酸,更少的精氨酸,赖氨酸,天冬氨酸;无规则卷曲中,嗜压蛋白含更多的缬氨酸和异亮氨酸,更少的甘氨酸和脯氨酸。而嗜压蛋白存在更多的YM、MN、KD、QC、CI、MW、MM、CY、WQ、HC、RC和QH,更少TW、MS、VD、DH、YE、CT、MW、CF、CK、CM、MY、QI、TH、MQ、QQ和MC。【结论】二肽比氨基酸包含更多的结构和序列信息,更有利于了解嗜压蛋白稳定性机制及指导其定向改造。  相似文献   

7.
嗜热毛壳菌具有强大的木质纤维素降解能力,将其开发为优异的重组蛋白表达宿主有着广阔的应用前景。蛋白表达宿主的密码子偏好性对重组蛋白的表达水平具有重大影响。为确定嗜热毛壳菌中密码子的使用模式及影响因素,本研究以6 897条CDS序列为对象,对其进行密码子偏好性分析。结果显示,嗜热毛壳菌中GC3的平均含量为66.2%,高于GC1(59.1%)和GC2(45.6%)的平均含量。Effective number of codon(ENC)分析与中性绘图分析结果显示,自然选择是影响嗜热毛壳菌密码子偏好性的主要因素。相关性分析结果显示,芳香族氨基酸比例与GC1含量及蛋白疏水水平呈极显著相关,说明密码子第一位的碱基组成对氨基酸是否具有芳香性影响较大。此外,在嗜热毛壳菌使用频率较高的密码子中,有24个以G/C末端结尾的密码子,进一步确定了23个高表达优越密码子和1个高表达最优密码子(CGC)。通过与其他模式真菌的密码子偏好性进行比较发现:与嗜热毛壳菌在密码子使用频率上差异较小的为嗜热毁丝菌、粗糙脉孢霉,有显著差异的为酿酒酵母。本研究为在嗜热毛壳菌中异源表达重组蛋白提供了目标基因密码子优化的理论依据,为嗜...  相似文献   

8.
腾冲嗜热厌氧杆菌tte0732(Galu)基因编码的TTE0732是温度依赖性蛋白。为研究其在热适应中的作用,应用PCR技术克隆腾冲嗜热厌氧菌tte0732基因,构建原核表达载体pET-28a::tte0732并在大肠埃希菌BL21表达TTE0732;通过qRT-PCR分析tte0732基因在50、60、75和80℃的RNA表达量;应用生物信息学软件分析Galu在嗜热菌和常温菌中编码氨基酸的基本理化性质。成功构建了原核表达载体pET-28a::tte0732并在大肠埃希菌BL21中得到高效表达,TTE0732分子质量大小为35 ku,主要以可溶性形式存在;qRT-PCR显示tte0732 mRNA在75和80℃高表达;生物信息学分析得出tte0732基因完整的ORF全长909 bp,编码302个氨基酸,其中Ile(I)、Leu(L)含量高于常温菌,编码蛋白为酸性亲水性蛋白,等电点为5.22,含有18个潜在的磷酸化位点,不存在跨膜结构、信号肽和糖基化位点。预测其蛋白质二级空间结构以α-螺旋、无规则卷曲、β-折叠为主。腾冲嗜热厌氧杆菌TTE0732蛋白是一种亲水性蛋白,在原核系统能高效表达,本研究结果对嗜热蛋白质的热稳定性机制的研究具有一定的参考。  相似文献   

9.
文献报道采用氨基酸组成分布提取特征值能有效提高预测分类精度, 本文采用该方法提取特征值, 使用一种新的组合分类器——随机森林, 从蛋白质一级结构对嗜热和嗜冷蛋白进行分类。通过10倍交叉验证和独立样本测试两种方法检测, 结果表明:当分段数量为1时, 其精度最优, 分别为92.9%和90.2%, 暗示使用基于氨基酸组成分布提取特征值在该算法中并不能有效提高识别精度, 这与报道结果不符, 而该提取方法在SVM中却能适当提高识别精度; 当引入6个新变量后, 其精度分别提高到93.2%和92.2%, ROC曲线下面积分别为0.9771和0.9696, 优于其它组合分类器。  相似文献   

10.
文献报道采用氨基酸组成分布提取特征值能有效提高预测分类精度, 本文采用该方法提取特征值, 使用一种新的组合分类器——随机森林, 从蛋白质一级结构对嗜热和嗜冷蛋白进行分类。通过10倍交叉验证和独立样本测试两种方法检测, 结果表明:当分段数量为1时, 其精度最优, 分别为92.9%和90.2%, 暗示使用基于氨基酸组成分布提取特征值在该算法中并不能有效提高识别精度, 这与报道结果不符, 而该提取方法在SVM中却能适当提高识别精度; 当引入6个新变量后, 其精度分别提高到93.2%和92.2%, ROC曲线下面积分别为0.9771和0.9696, 优于其它组合分类器。  相似文献   

11.
Gromiha MM  Suresh MX 《Proteins》2008,70(4):1274-1279
Discriminating thermophilic proteins from their mesophilic counterparts is a challenging task and it would help to design stable proteins. In this work, we have systematically analyzed the amino acid compositions of 3075 mesophilic and 1609 thermophilic proteins belonging to 9 and 15 families, respectively. We found that the charged residues Lys, Arg, and Glu as well as the hydrophobic residues, Val and Ile have higher occurrence in thermophiles than mesophiles. Further, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees and so forth for discriminating mesophilic and thermophilic proteins. We found that most of the machine learning techniques discriminate these classes of proteins with similar accuracy. The neural network-based method could discriminate the thermophiles from mesophiles at the five-fold cross-validation accuracy of 89% in a dataset of 4684 proteins. Moreover, this method is tested with 325 mesophiles in Xylella fastidosa and 382 thermophiles in Aquifex aeolicus and it could successfully discriminate them with the accuracy of 91%. These accuracy levels are better than other methods in the literature and we suggest that this method could be effectively used to discriminate mesophilic and thermophilic proteins.  相似文献   

12.
The stability of thermophilic proteins has been viewed from different perspectives and there is yet no unified principle to understand this stability. It would be valuable to reveal the most important interactions for designing thermostable proteins for such applications as industrial protein engineering. In this work, we have systematically analyzed the importance of various interactions by computing different parameters such as surrounding hydrophobicity, inter‐residue interactions, ion‐pairs and hydrogen bonds. The importance of each interaction has been determined by its predicted relative contribution in thermophiles versus the same contribution in mesophilic homologues based on a dataset of 373 protein families. We predict that hydrophobic environment is the major factor for the stability of thermophilic proteins and found that 80% of thermophilic proteins analyzed showed higher hydrophobicity than their mesophilic counterparts. Ion pairs, hydrogen bonds, and interaction energy are also important and favored in 68%, 50%, and 62% of thermophilic proteins, respectively. Interestingly, thermophilic proteins with decreased hydrophobic environments display a greater number of hydrogen bonds and/or ion pairs. The systematic elimination of mesophilic proteins based on surrounding hydrophobicity, interaction energy, and ion pairs/hydrogen bonds, led to correctly identifying 95% of the thermophilic proteins in our analyses. Our analysis was also applied to another, more refined set of 102 thermophilic–mesophilic pairs, which again identified hydrophobicity as a dominant property in 71% of the thermophilic proteins. Further, the notion of surrounding hydrophobicity, which characterizes the hydrophobic behavior of residues in a protein environment, has been applied to the three‐dimensional structures of elongation factor‐Tu proteins and we found that the thermophilic proteins are enriched with a hydrophobic environment. The results obtained in this work highlight the importance of hydrophobicity as the dominating characteristic in the stability of thermophilic proteins, and we anticipate this will be useful in our attempts to engineering thermostable proteins. © Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

13.
The identification of the thermostability from the amino acid sequence information would be helpful in computational screening for thermostable proteins. We have developed a method to discriminate thermophilic and mesophilic proteins based on support vector machines. Using self-consistency validation, 5-fold cross-validation and independent testing procedure with other datasets, this module achieved overall accuracy of 94.2%, 90.5% and 92.4%, respectively. The performance of this SVM-based module was better than the classifiers built using alternative machine learning and statistical algorithms including artificial neural networks, Bayesian statistics, and decision trees, when evaluated using these three validation methods. The influence of protein size on prediction accuracy was also addressed.  相似文献   

14.
A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.  相似文献   

15.
Li Y  Zhang J  Tai D  Middaugh CR  Zhang Y  Fang J 《Proteins》2012,80(1):81-92
Designing proteins with enhanced thermo-stability has been a main focus of protein engineering because of its theoretical and practical significance. Despite extensive studies in the past years, a general strategy for stabilizing proteins still remains elusive. Thus effective and robust computational algorithms for designing thermo-stable proteins are in critical demand. Here we report PROTS, a sequential and structural four-residue fragment based protein thermo-stability potential. PROTS is derived from a nonredundant representative collection of thousands of thermophilic and mesophilic protein structures and a large set of point mutations with experimentally determined changes of melting temperatures. To the best of our knowledge, PROTS is the first protein stability predictor based on integrated analysis and mining of these two types of data. Besides conventional cross validation and blind testing, we introduce hypothetical reverse mutations as a means of testing the robustness of protein thermo-stability predictors. In all tests, PROTS demonstrates the ability to reliably predict mutation induced thermo-stability changes as well as classify thermophilic and mesophilic proteins. In addition, this white-box predictor allows easy interpretation of the factors that influence mutation induced protein stability changes at the residue level.  相似文献   

16.
A novel classifier, the so-called LogitBoost classifier, was introduced to discriminate the thermophilic and mesophilic proteins according to their primary structures. When the 20-amino acid composition was chosen as the feature vector, the overall accuracy of the self-consistency check and a five-fold cross-validation procedure was 97.0% and 86.6%, respectively. To test if the method was also applicable to a wide range of biological targets, an independent testing dataset was also used. The method based on LogitBoost algorithm has achieved an overall classification accuracy of 88.9%. According to the three different validation check approaches, it was demonstrated that LogitBoost outperformed AdaBoost and performed comparably with RBF neural network and support vector machine. The influence of protein size on discrimination was addressed.  相似文献   

17.
Database including 392 homologous pairs of proteins from thermophilic and mesophilic organisms was created. Using this database we have found that proteins from termophilic organisms contain more atom-atom contacts per residue in comparison with mesophilic homologues. Contribution to increase of the number of contacts gives exterior amino acid residues, accessible for the solvent. Amino acid composition of interior, inaccessible for the solvent, and exterior amino acid residues of proteins from thermophilic and mesophilic organisms were analyzed. We have obtained that exterior residues of proteins from thermophilic organisms contain more such amino acid residues as Lys, Arg and Glu and smaller such amino acid residues as Ala, Asp, Asn. Gln, Ser, and Thr in comparison with proteins from mesophilic organisms. Amino acid compositions of interior residues of considered proteins are not different.  相似文献   

18.
Nakariyakul S  Liu ZP  Chen L 《Amino acids》2012,42(5):1947-1953
Detecting thermophilic proteins is an important task for designing stable protein engineering in interested temperatures. In this work, we develop a simple but efficient method to classify thermophilic proteins from mesophilic ones using the amino acid and dipeptide compositions. Since most of the amino acid and dipeptide compositions are redundant, we propose a new forward floating selection technique to select only a useful subset of these compositions as features for support vector machine-based classification. We test the proposed method on a benchmark data set of 915 thermophilic and 793 mesophilic proteins. The results show that our method using 28 amino acid and dipeptide compositions achieves an accuracy rate of 93.3% evaluated by the jackknife cross-validation test, which is higher not only than the existing methods but also than using all amino acid and dipeptide compositions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号