首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 140 毫秒
研究真核蛋白质的亚细胞位点是了解真核蛋白质功能,深入研究蛋白质相关信号通路内在机制的基础。同时,可以为了解 疾病发病机制及为新药研发提供帮助。因此,研究真核蛋白质的亚细胞位点意义十分重大。随着基因组测序的完成,真核蛋白质 序列信息增长迅速,为真核蛋白质亚细胞位点的研究提出了更多的挑战。传统的实验法难以满足蛋白质信息量迅速增长的需求。 而采用生物信息学手段处理大规模数据的计算预测方法,可在较短时间内获得大量真核蛋白质亚细胞位点信息,弥补了实验法 的不足。因此,运用计算预测法预测真核蛋白质的亚细胞位点成为生物信息学领域的研究热点之一。本文主要从提取真核蛋白质 的特征信息、计算预测方法及预测效果的评价三个方面,介绍近年来真核蛋白质亚细胞位点预测的研究进展。  相似文献   

糖基化是蛋白质翻译后的主要修饰,O-糖基化的固定模式未知,高精度识别O-糖基化位点是机器学习面临的挑战性问题.以迄今最大的人O-糖基化位点Steentoft数据集为基础,本文首次提出了基于位置的卡方差表特征χ~2-pos,融合伪氨基酸序列进化信息Pse PSSM以及无方向的k间隔氨基酸对组分Undirected-CKSAAP表征序列,构建5个正负样本均衡的支持向量机分类器,经加权投票,独立测试准确率、Matthew相关系数及ROC曲线下面积,分别达到了89.62%、0.79、0.96,明显优于文献报道结果.χ~2-pos、Pse PSSM与Undirected-CKSAAP三种特征的融合在蛋白质糖基化、磷酸化等位点预测中有广泛应用前景.  相似文献   

林昊 《生物信息学》2009,7(4):252-254
由于蛋白质亚细胞位置与其一级序列存在很强的相关性,利用多样性增量来描述蛋白质之间氨基酸组分和二肽组分的相似程度,采用修正的马氏判别式(这里称为IDQD方法)对分枝杆菌蛋白质的亚细胞位置进行了预测。利用Jackknife检验对不同序列相似度下的蛋白质数据集进行了预测研究,结果显示,当数据集的序列相似度小于等于70%时,算法的预测精度稳定在75%左右。在对整体852条蛋白质的预测成功率达到87.7%,这一结果优于已有算法的预测精度,说明IDQD是一种有效的分枝杆菌蛋白质亚细胞预测方法。  相似文献   

从蛋白质序列出发,对经Dr.G.P.S.Raghava整理和使用过的168条非冗余的ATP与蛋白质结合氨基酸序列进行分段,对ATP与蛋白质结合位点进行了统计分析。在此基础上,利用20种氨基酸的亲疏水性将20种氨基酸约化为6类。以氨基酸组分和6类亲疏水紧邻为参数,用多样性增量(ID)方法将氨基酸组分和6类亲疏水紧邻降维并将降维后的特征参数输入支持向量机中运算,本文运算结果显示用氨基酸组分ID值和6类亲疏水紧邻ID值共同作为特征参数结果最优,在七交叉检验下的预测总精度达到了99.67%,相关系数达到0.9934,好于前人的预测结果。  相似文献   

凋亡蛋白和Nmi的相互作用及作用位点的筛选鉴定   总被引:4,自引:1,他引:3  
为研究来源于鸡贫血病毒的小分子蛋白质———凋亡蛋白 (apoptin)诱导肿瘤细胞凋亡的分子机制 ,利用酵母双杂交系统从人白细胞cDNA文库筛选凋亡蛋白相互作用蛋白质 ,核苷酸序列分析及同源性检索表明 ,其中一个约为 1.2kb的克隆与Nmi(N Mycinteractionprotein)高度同源。细胞免疫共沉淀实验结果显示 ,在哺乳动物细胞水平仍能够检测到凋亡蛋白与全长Nmi的特异相互作用。利用构建好的分别缺失C端 11个氨基酸、中间 33~46位氨基酸和二者均缺失的 3个凋亡蛋白突变体进行相互作用位点研究 ,结果表明凋亡蛋白的 33~ 46位氨基酸(核外运信号 )对于凋亡蛋白与Nmi的相互作用是必需的 ,而C端核定位信号 /DNA结合序列对于凋亡蛋白与Nmi的相互作用不是充分必要的  相似文献   

邹凌云  王正志  黄教民 《遗传学报》2007,34(12):1080-1087
蛋白质必须处于正确的亚细胞位置才能行使其功能。文章利用PSI-BLAST工具搜索蛋白质序列,提取位点特异性谱中的位点特异性得分矩阵作为蛋白质的一类特征,并计算4等分序列的氨基酸含量以及1~7阶二肽含量作为另外两类特征,由这三类特征一共得到蛋白质序列的12个特征向量。通过设计一个简单加权函数对各类特征向量加权处理,作为神经网络预测器的输入,并使用Levenberg-Marquardt算法代替传统的EBP算法来调整网络权值和阈值,大大提高了训练速度。对具有4类亚细胞位置和12类亚细胞位置的两种蛋白质数据集分别进行"留一法"测试和5倍交叉验证测试,总体预测精度分别达到88.4%和83.3%。其中,对4类亚细胞位置数据集的预测效果优于普通BP神经网络、隐马尔可夫模型、模糊K邻近等预测方法,对12类亚细胞位置数据集的预测效果优于支持向量机分类方法。最后还对三类特征采取不同加权比例对预测精度的影响进行了讨论,对选择的八种加权比例的预测结果表明,分别给予三类特征合适的权值系数可以进一步提高预测精度。  相似文献   

蛋白质亚细胞定位预测对蛋白质的功能、相互作用及调控机制的研究具有重要意义。本文基于物化性质和结构性质对氨基酸的约化,描述序列局部和全局信息的"组成"、"转换"和"分布"特征,并利用氨基酸亲疏水性的数值统计特征,提出了一种新的蛋白质特征表示方法(NSBH)。分别使用三种分类器KNN、SVM及BP神经网络进行蛋白质亚细胞定位预测,比较了几种方法和特征融合方法的预测结果,显示融合特征表示及结合SVM分类器时能够达到更好的预测准确率。同时,还详细讨论了不同参数对实验结果的影响,具体的实验及比较结果显示了该方法的有效性。  相似文献   

有关蛋白质功能的研究是解析生命奥秘的基础,机器学习技术在该领域已有广泛应用。利用支持向量机(support vectormachine,SVM)方法,构建一个预测蛋白质功能位点的通用平台。该平台先提取非同源蛋白质序列,再对这些序列进行特征编码(包括序列的基本信息、物化特征、结构信息及序列保守性特征等),以编码好的样本作为训练数据,利用SVM进行训练,得到敏感性、特异性、Matthew相关系数、准确率及ROC曲线等评价指标,反复测试,得到评价指标最优的SVM模型后,便可以用来预测蛋白质序列上的功能位点。该平台除了应用在预测蛋白质功能位点之外,还可以应用于疾病相关单核苷酸多态性(SNP)预测分析、预测蛋白质结构域分析、生物分子间的相互作用等。  相似文献   

根据凋亡蛋白的亚细胞位置主要决定于它的氨基酸序列这一观点,基于局部氨基酸序列的n肽组分和序列的亲疏水性分布信息,采用离散增量结合支持向量机(ID_SVM)算法,对六类细胞凋亡蛋白的亚细胞位置进行预测。结果表明,在Re-substitution检验和Jackknife检验下,ID_SVM算法的总体预测成功率分别达到了94.6%和84.2%;在5-fold检验和10-fold检验下,其总体预测成功率也都达到了83%以上。通过比较ID和ID_SVM两种方法的预测能力发现,结合了支持向量机的离散增量算法能够改进预测成功率,结果表明ID_SVM是预测凋亡蛋白亚细胞位置的一种很有效的方法。  相似文献   

蛋白质功能位点预测   总被引:3,自引:1,他引:2  
在 IBM-PC 机上开发了蛋白质功能位点预测软件:PROSITE.根据 EMBL发布于激光光盘上的蛋白质功能位点氨基酸片段的保守模式数据库,对给定的蛋白质序列,可按19类443个氨基酸保守模式来探测蛋白质的所属家族,各种功能区的位置和活性部位等性质,通过52个序列的验证结果和 SWWISS 蛋白质数据库相一致.此外该软件还具有操作灵活,多种输入输出方式等特点。  相似文献   

MOTIVATION: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. AVAILABILITY: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/.  相似文献   

The frequencies of amino acid residues are known to be biased at both terminal regions of amino acid sequences deduced from bacterial genomic DNA. To investigate whether or not the features of biases of amino acid residues at the terminal regions are related to the bacterial phylogeny, we calculated the normalized amino acid compositions at both terminal regions, and used these compositions to classify 144 bacteria by hierarchical clustering analysis. Our results showed that most of these bacteria were classified into taxonomic classes by the hierarchical clustering analysis that was based on the normalized amino acid compositions at the N-terminal region. Therefore, we concluded that the features of biases of the N-terminal amino acid residues were related to the bacterial phylogeny.  相似文献   

In silico prediction of protein subcellular localization based on amino acid sequence can reveal valuable information about the protein's innate roles in the cell. Unfortunately, such prediction is made difficult because of complex protein sorting signals. Some prediction methods are based on searching for similar proteins with known localization, assuming that known homologs exist. However, it may not perform well on proteins with no known homolog. In contrast, machine learning-based approaches attempt to infer a predictive model that describes the protein sorting signals. Alas, in doing so, it does not take advantage of known homologs (if they exist) by doing a simple "table lookup". Here, we capture the best of both worlds by combining both approaches. On a dataset with 12 locations, similarity-based and machine learning independently achieve an accuracy of 83.8% and 72.6%, respectively. Our hybrid approach yields an improved accuracy of 85.9%. We compared our method with three other methods' published results. For two of the methods, we used their published datasets for comparison. For the third we used the 12 location dataset. The Error Correcting Output Code algorithm was used to construct our predictive model. This algorithm gives attention to all the classes regardless of number of instances and led to high accuracy among each of the classes and a high prediction rate overall. We also illustrated how the machine learning classifier we use, built over a meaningful set of features can produce interpretable rules that may provide valuable insights into complex protein sorting mechanisms.  相似文献   

The four major polypeptide chains (alpha, beta, gamma, delta) constituting the capsid protein of mouse Elberfeld (ME) virus were isolated by preparative electrophoresis on polyacrylamide gels, and the amino acid composition of each chain was determined. In addition, the molecular weights of the smallest chains of ME virus, mengovirus, and poliovirus, which had previously been determined by gel electrophoretic methods, were redetermined by gel filtration chromatography in 6 m guanidine hydrochloride. Each was found to have a molecular weight about 7,300. Using the reevaluated molecular weights and the known amino acid compositions of the chains, the molar ratio of each chain in the ME virion was determined by quantitative analysis of the distribution of radioactivity in the electrophoretically separated chains of virus which had been specifically radiolabeled with leucine or with methionine. Equimolar proportions of all four chains were found in the virion.  相似文献   

Nakariyakul S  Liu ZP  Chen L 《Amino acids》2012,42(5):1947-1953
Detecting thermophilic proteins is an important task for designing stable protein engineering in interested temperatures. In this work, we develop a simple but efficient method to classify thermophilic proteins from mesophilic ones using the amino acid and dipeptide compositions. Since most of the amino acid and dipeptide compositions are redundant, we propose a new forward floating selection technique to select only a useful subset of these compositions as features for support vector machine-based classification. We test the proposed method on a benchmark data set of 915 thermophilic and 793 mesophilic proteins. The results show that our method using 28 amino acid and dipeptide compositions achieves an accuracy rate of 93.3% evaluated by the jackknife cross-validation test, which is higher not only than the existing methods but also than using all amino acid and dipeptide compositions.  相似文献   

Recognition of sorting signals within the cytoplasmic tail of membrane proteins by adaptor protein complexes is a crucial step in membrane protein sorting. The three known adaptor complexes, AP1, AP2, and AP3, have all been shown to recognize tyrosine- and leucine-based sorting signals, which are the most common sorting signals within membrane protein cytoplasmic tails. Although tyrosine-based signals are recognized by the micro-chains of adaptor complexes, the subunit recognizing leucine-based sorting signals is less clear. In this report we show by surface plasmon resonance that the two leucine-based sorting signals within the cytoplasmic tail of the invariant chain bind independently from each other to AP1 and AP2 but not to AP3. We also show that both motifs can be recognized by the micro-chains of AP1 and AP2. Moreover, by using monomeric as well as trimeric invariant chain constructs, we show that adaptor binding does not require trimerization of the invariant chain.  相似文献   

The subcomponents C1r and C1s and their activated forms C-1r and C-1s were each found to have mol.wts. in dissociating solvents of about 83000. The amino acid compositions of each were similar, but there were significant differences in the monosaccharide analyses of subcomponents C1r and C1s, whether activated or not. Subcomponents C1r and C1s have only one polypeptide chain, but subcomponents C-1r and C-1s each contain two peptide chains of approx. mol.wts. 56000 ("a" chain) and 27000 ("b" chain). The amino acid analyses of the "a" chains from each activated subcomponent are similar, as are those of the "b" chains. The N-terminal amino acid sequence of 29 residues of the C-1s "a" chain was determined, but the C-1r "a" chain has blocked N-terminal amino acid. The 20 N-terminal residues of both "b" chains are similar, but not identical, and both show obvious homology with other serine proteinases. The difference in polysaccharide content of the subcomponents C-1r and C-1s is most marked in the 'b' chains. When tested on synthetic amino acid esters, subcomponent C-1r hydrolysed both lysine and tyrosine ester bonds, but subcomponent C-1r did not hydrolyse any amino acid esters tested nor any protein substrate except subcomponent C1s. The lysine esterase activity of subcomponent C1s provides a rapid and sensitive assay of the subcomponent.  相似文献   

A double-labeling procedure for amino acid analysis using 3H-labeled 1-fluoro-2,4-dinitrobenzene and 14C-labeled amino acids as internal standards is described. The procedure was tested by analyzing lysozyme and insulin B chain, and the results obtained were in good agreement with their accepted amino acid compositions. Analysis of samples containing from 100 pmol of each amino acid can be achieved at an accuracy comparable to that obtained by conventional automated amino acid analysis methods, many of which require considerably more material. An important advantage is that amino acids present in low molar proportions can be separated and measured more readily than by column chromatography.  相似文献   

Two constituent polypeptide chains of castor bean hemagglutinin (CBH-A) were isolated from the performic acid-oxidized or reduced-carboxymethylated CBH-A by chromatography on DEAE-cellulose or Sepharose 4B. From the analyses of the N-terminal amino acids, the amino acid compositions and the tryptic peptides of each chain, it was found that the larger chain with mol. wt. 34,000 and the smaller chain with mol. wt. 31,000 were homologous with the Ala and He chains of ricin D, respectively, and the subunit structure of CBH-A is represented as (α′/β′)2 in relation to αβ of ricin D.  相似文献   

Lamp1 is a type I transmembrane glycoprotein that is localized primarily in lysosomes and late endosomes. Newly synthesized molecules are mostly transported from the trans-Golgi network directly to endosomes and then to lysosomes. A minor pathway involves transport via the plasma membrane. The 11-amino acid cytoplasmic tail of lamp1 contains a tyrosine-based motif that has been previously shown to mediate sorting in the trans-Golgi network and rapid internalization at the plasma membrane. We studied whether this motif also mediates sorting in endosomes. We found that mutant forms of lamp1 in which all the amino acids of the cytoplasmic tail were modified except for the RKR membrane anchor and the YXXI sorting motif still localized to dense lysosomes, indicating that the YXXI motif is sufficient to confer proper intracellular targeting. However, when the spacing of the YXXI motif relative to the membrane was changed by deleting one amino acid or adding five amino acids, lysosomal targeting was almost completely abolished. Kinetic studies showed that these mutants were trapped in a recycling pathway, involving trafficking between the plasma membrane and early endocytic compartments. These findings indicate that the YXXI signal of lamp1 is recognized at several sorting sites, including the trans-Golgi network, the plasma membrane, and the early/sorting endosomes. Small changes in the spacing of this motif relative to the membrane dramatically impair sorting in the early/sorting endosomes but have only a modest effect on internalization at the plasma membrane. The spacing of sorting signals relative to the membrane may prove to be an important determinant in the functioning of these signals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号