首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Phosphorylation is a crucial way to control the activity of proteins in many eukaryotic organisms in vivo. Experimental methods to determine phosphorylation sites in substrates are usually restricted by the in vitro condition of enzymes and very intensive in time and labor. Although some in silico methods and web servers have been introduced for automatic detection of phosphorylation sites, sophisticated methods are still in urgent demand to further improve prediction performances. Protein primary se-quences can help predict phosphorylation sites catalyzed by different protein kinase and most com-putational approaches use a short local peptide to make prediction. However, the useful information may be lost if only the conservative residues that are not close to the phosphorylation site are consid-ered in prediction, which would hamper the prediction results. A novel prediction method named IEPP (Information-Entropy based Phosphorylation Prediction) is presented in this paper for automatic de-tection of potential phosphorylation sites. In prediction, the sites around the phosphorylation sites are selected or excluded by their entropy values. The algorithm was compared with other methods such as GSP and PPSP on the ABL, MAPK and PKA PK families. The superior prediction accuracies were ob-tained in various measurements such as sensitivity (Sn) and specificity (Sp). Furthermore, compared with some online prediction web servers on the new discovered phosphorylation sites, IEPP also yielded the best performance. IEPP is another useful computational resource for identification of PK-specific phosphorylation sites and it also has the advantages of simpleness, efficiency and con-venience.  相似文献   

2.
Protein phosphorylation is one of the essential posttranslation modifications playing a vital role in the regulation of many fundamental cellular processes. We propose a LightGBM-based computational approach that uses evolutionary, geometric, sequence environment, and amino acid-specific features to decipher phosphate binding sites from a protein sequence. Our method, while compared with other existing methods on 2429 protein sequences taken from standard Phospho.ELM (P.ELM) benchmark data set featuring 11 organisms reports a higher F1 score = 0.504 (harmonic mean of the precision and recall) and ROC AUC = 0.836 (area under the curve of the receiver operating characteristics). The computation time of our proposed approach is much less than that of the recently developed deep learning-based framework. Structural analysis on selected protein sequences informs that our prediction is the superset of the phosphorylation sites, as mentioned in P.ELM data set. The foundation of our scheme is manual feature engineering and a decision tree-based classification. Hence, it is intuitive, and one can interpret the final tree as a set of rules resulting in a deeper understanding of the relationships between biophysical features and phosphorylation sites. Our innovative problem transformation method permits more control over precision and recall as is demonstrated by the fact that if we incorporate output probability of the existing deep learning framework as an additional feature, then our prediction improves (F1 score = 0.546; ROC AUC = 0.849). The implementation of our method can be accessed at http://cse.iitkgp.ac.in/~pralay/resources/PPSBoost/ and is mirrored at https://cosmos.iitkgp.ac.in/PPSBoost .  相似文献   

3.

Background  

As a reversible and dynamic post-translational modification (PTM) of proteins, phosphorylation plays essential regulatory roles in a broad spectrum of the biological processes. Although many studies have been contributed on the molecular mechanism of phosphorylation dynamics, the intrinsic feature of substrates specificity is still elusive and remains to be delineated.  相似文献   

4.
Protein phosphorylation is a ubiquitous protein post-translational modification, which plays an important role in cellular signaling systems underlying various physiological and pathological processes. Current in silico methods mainly focused on the prediction of phosphorylation sites, but rare methods considered whether a phosphorylation site is functional or not. Since functional phosphorylation sites are more valuable for further experimental research and a proportion of phosphorylation sites have no direct functional effects, the prediction of functional phosphorylation sites is quite necessary for this research area. Previous studies have shown that functional phosphorylation sites are more conserved than non-functional phosphorylation sites in evolution. Thus, in our method, we developed a web server by integrating existing phosphorylation site prediction methods, as well as both absolute and relative evolutionary conservation scores to predict the most likely functional phosphorylation sites. Using our method, we predicted the most likely functional sites of the human, rat and mouse proteomes and built a database for the predicted sites. By the analysis of overall prediction results, we demonstrated that protein phosphorylation plays an important role in all the enriched KEGG pathways. By the analysis of protein-specific prediction results, we demonstrated the usefulness of our method for individual protein studies. Our method would help to characterize the most likely functional phosphorylation sites for further studies in this research area.  相似文献   

5.
DNA序列分析中的信息熵应用现状   总被引:1,自引:0,他引:1  
詹青 《生物信息学》2012,10(1):44-49
信息熵理论是生物信息学研究的一个重要工具,它在DNA序列分析中有着广泛的应用。本文详细介绍了近年来诸多DNA序列分析问题中信息熵应用的研究进展,并分析了未来该问题的研究方向。  相似文献   

6.
7.
8.
基于信息熵的我国日降水量随机性和时空差异性   总被引:1,自引:0,他引:1  
我国日降水过程呈现明显的随机性与时空差异性,如何准确认识其时空变化规律对洪涝灾害防治等实际工作的影响具有重要意义.本文基于1961—2013年全国520个气象站点的日降水数据,选用信息熵指标研究我国日降水量的随机性.结果表明:研究期间,我国东南地区日降水量的随机性大于西北地区,且不同等级日降水量随机性的空间分布存在差异,小雨(降雨量0.1~10 mm,P_0)等级日降水量随机性较大,差异不明显,中雨(10~25 mm,P10)、大雨(25~50 mm,P25)等级日降水量随机性最大,差异明显,暴雨及以上(≥50 mm,P50)等级日降水量随机性最小,差异最明显.整体上,日降水的信息熵值呈上升趋势,表明全球气候变化下我国大部分地区日降水量的随机性增大,尤其表现为极端暴雨发生的频次明显增大.日降水信息熵的空间分布及其变化趋势可以很好地综合反映我国日降水量随机性的空间分布格局,可为洪涝灾害防治、农业规划布局、生态环境规划等提供科学依据.  相似文献   

9.
Li T  Li F  Zhang X 《Proteins》2008,70(2):404-414
Protein phosphorylation plays important roles in a variety of cellular processes. Detecting possible phosphorylation sites and their corresponding protein kinases is crucial for studying the function of many proteins. This article presents a new prediction system, called PhoScan, to predict phosphorylation sites in a kinase-family-specific way. Common phosphorylation features and kinase-specific features are extracted from substrate sequences of different protein kinases based on the analysis of published experiments, and a scoring system is developed for evaluating the possibility that a peptide can be phosphorylated by the protein kinase at the specific site in its sequence context. PhoScan can achieve a specificity of above 90% with sensitivity around 90% at kinase-family level on the data experimented. The system is applied on a set of human proteins collected from Swiss-Prot and sets of putative phosphorylation sites are predicted for protein kinase A, cyclin-dependent kinase, and casein kinase 2 families. PhoScan is available at http://bioinfo.au.tsinghua.edu.cn/phoscan/.  相似文献   

10.
在充分利用土壤类型、土地利用方式、岩性类型、地形、道路、工业类型等影响土壤质量主要因素,准确获取区域土壤质量的空间分布特征的基础上,采用互信息理论对13个辅助变量(岩性类型、土地利用方式、土壤类型、到城镇的距离、到道路的距离、到工业用地的距离、到河流的距离、相对高程、坡度、坡向、平向曲率、纵向曲率和切线曲率)进行筛选,然后通过决策树See5.0预测研究区土壤质量.结果表明: 影响研究区土壤质量的主要因素包括土壤类型、土地利用方式、岩性类型、到城镇的距离、到水域的距离、相对高程、到道路的距离和到工业用地的距离;以互信息理论选取的因子为预测变量的决策树模型精度明显优于以全部因子为预测变量的决策树模型,在前者的决策树模型中,无论是决策树还是决策规则,分类预测精度均达到80%以上.互信息理论结合决策树的方法在充分利用连续型和字符型数据的基础上,不仅精简了一般决策树算法的输入参数,而且能有效地预测和评价区域土壤质量等级.  相似文献   

11.
时间信息熵及其在植被覆盖时空变化遥感检测中的应用   总被引:2,自引:0,他引:2  
王超军  吴锋  赵红蕊  陆胜寒 《生态学报》2017,37(21):7359-7367
基于遥感影像的变化检测是当前的研究热点,可为区域生态环境保育、资源管理与发展规划等提供决策支撑。目前遥感影像的变化检测多基于两个时相,不能充分地反映植被在时间维的连续变化特征。通过引入信息论,提出了利用时间信息熵来综合表征植被长时间序列的变化特征。研究以延河流域为试验区,基于MODIS/NDVI数据,应用时间信息熵方法来计算了2000—2010年该区域的植被覆盖变化信息,厘清了时空变化特征。研究结果表明,近10年延河流域的植被覆盖的变化以增加为主,占流域面积的80.7%;植被覆盖明显增加的区域占流域面积13.9%,主要分布在流域的东北部和东南部;植被覆盖减少的区域占比2.4%,主要分布在流域的西部和西北部;严重减少的区域占比1.1%,主要分布在流域的中部和西南部,是需要重点的生态恢复与治理区域。时间信息熵方法与回归分析法相比,能够更为客观地表征长时间序列植被覆盖的连续变化强度和变化趋势,可为区域生态环境的保护和管理提供更为科学的理论依据。  相似文献   

12.
We have applied concepts from information theory for a comparative analysis of donor (gt) and acceptor (ag) splice site regions in the genes of five different organisms by calculating their mutual information content (relative entropy) over a selected block of nucleotides. A similar pattern that the information content decreases as the block size increases was observed for both regions in all the organisms studied. This result suggests that the information required for splicing might be contained in the consensus of -6-8 nt at both regions. We assume from our study that even though the nucleotides are showing some degrees of conservation in the flanking regions of the splice sites, certain level of variability is still tolerated, which leads the splicing process to occur normally even if the extent of base pairing is not fully satisfied. We also suggest that this variability can be compensated by recognizing different splice sites with different spliceosomal factors.  相似文献   

13.
了解真核细胞中细胞核内蛋白质的定位情况对于新发现蛋白质的功能注释具有重要意义.随着蛋白质数据库中蛋白质序列数量的急速增加,采用计算方法来预测蛋白质亚核定位已经成为蛋白质科学领域研究的热点.根据Chou提出的伪氨基酸组成离散模型,提出了一种新的蛋白质亚核定位预测方法.计算蛋白质序列的近似熵作为附加特征构建伪氨基酸组成,表示蛋白质序列特征,AdaBoost分类算法作为预测工具.与已报道的亚核定位预测方法的性能相比,这种方法具有更高的准确率.  相似文献   

14.
蜡梅(Chimonanthus praecox)是我国二级濒危珍稀植物,是重要的冬季传统观花植物。利用已报道的246个分布点和worldclim中提取的19个气候因子,基于最大熵(Maxent)模型和地理信息系统(Arc Gis)对蜡梅在中国的潜在适生区分布进行预测分析,采用受试者工作特征(ROC)曲线对预测结果进行检验和评价。结果表明蜡梅的潜在适生范围相对集中,主要集中在西南的四川盆地、华中、华东及华北的中南部地区,其他地区则适应性较低。温度是影响蜡梅分布的决定性因子,其中,当最冷季度平均温度接近0℃,等温性范围为0—10℃,降雨量变异系数约为45时,蜡梅的分布概率最大。与原分布区相比较,蜡梅的适生区范围正向中国东部地区和北部地区迁移。ROC曲线检验评价结果表明,Maxent模型的ROC曲线分析法的面积(AUC)值为0.986,预测结果达到了极高精度。  相似文献   

15.
刘陈坚  张黎明  任引 《生态学报》2020,40(22):8199-8206
森林生物量会直接影响森林生态系统服务的评估。如何运用景感生态学,准确预测区域尺度下森林生物量的时空演变趋势,是关乎国家重大方针政策制定和生态产业体系建设的关键性战略课题。本研究目的是构建一套生态信息诊断框架,优化趋善化模型(3PG2模型)结构,解决由于模型结构设计所导致在森林景感营造过程中生态预测的不确定性。以杉木林分布广泛的福建南靖县为研究区域,选择合适的阈值范围和空间统计分析识别出模拟生物量的不确定性区域,构建包含Geogdetector软件、遗传技术和计算机程序3个部分组成的生态信息诊断框架,使用Geogdetector软件阐明多重因素交互作用对模型模拟的影响及机理,采用遗传技术优化模型结构以提升模拟精度,运用计算机程序和3PG2模型准确预测区域尺度杉木林生物量的时空演变趋势。结果表明:林龄是导致3PG2模型生物量模拟结果不确定性的主导因素。通过景感生态学(谜码数据和趋善化模型)构建的生态信息诊断框架可以准确预测森林生物量,实现区域尺度上的可持续森林管理。  相似文献   

16.
In order to find evidence of consistent sequence conservation or the base correlation degree in miRNA, some important sites in the sequences of reported miRNA and their precursors (pre-miRNA) were in-vestigated via information entropy analysis. Twelve different groups of sites were obtained from special locations (head, tail) in miRNAs of different sources according to taxonomy (animal, plant and virus) and then analyzed by measuring the single base information redundancy (D1(L)) and the adjacent base related information redundancy (D2(L)). The results showed that D2(L) has more information than D1(L), though D1(L) changes roughly consistently with D2(L) in each group. Viral pre-miRNAs are more con-servative than those belonging to animals or plants. In addition, U is dominant in most sites compared with other nucleotides. It was also found that in the middle of several groups, there were sites where miRNAs were cut down from pre-miRNAs by Enzyme Dicer which were significantly conservative. This phenomenon shows that the conservatism is an aspect of the of miRNA and may be involved in the recognition and cutting by the Dicer. Those results provided another perspective for understanding more about the primary structure of pre-miRNA.  相似文献   

17.
基于RNA-Seq的长非编码RNA预测   总被引:1,自引:0,他引:1  
随着新一代生物技术和生物信息学的发展,研究发现,在真核生物转录组中存在大量长非编码RNA(long non-codingRNA,lncRNA),而这些lncRNA可能在基因表达调控过程中起到关键性的功能作用.当前lncRNA研究主要采用高通量RNA-Seq测序技术,并通过生物信息学方法对测序数据进行处理和分析,以挖掘其中lncRNA的序列、结构、表达及功能等信息.本文将对基于RNA-Seq的lncRNA预测流程进行介绍,对其中涉及的生物信息学方法进行较为全面的综述,就相关问题和挑战展开讨论,并对研究进行展望.  相似文献   

18.
赵红蕊  刘欣桐  王超军 《生态学报》2022,42(9):3749-3758
视流域为一个有生命力的不可分割的有机整体,为表征其生态可持续性,从熵的视角出发,提出结合空间信息熵和时间信息熵的时空信息熵方法。其中空间信息熵用于表征生态系统格局在空间分布的有序程度,时间信息熵用于度量生态系统的动态演变是否有序,时空信息熵方法将格局和动态有机结合,定量分析流域生态系统的可持续性。以延河流域为研究区,基于土地利用数据和归一化植被指数数据,利用时空信息熵方法分析2000-2018年延河流域生态可持续性。结果表明:(1)延河流域生态系统格局朝着有序的方向变化,此间整体处于生长期或恢复期;(2)时间信息熵结果呈现空间异质性,耕地、中低覆盖度草地和其他林地的时间信息熵值较高,生态弹性能力更强;(3)研究区生态可持续性以"强"和"较强"为主(61%),广泛分布在其中部和北部地区,表明流域的生态弹性能力总体增强,生态可持续状况明显改善。对基于熵视角研究生态可持续问题的有益探索,为延河流域及黄土高原其他类似流域的生态保护和修复提供借鉴和参考。  相似文献   

19.
基于多个结构域联合作用导致蛋白质间相互作用的假设,提出了一种预测蛋白质间相互作用的新方法。使用支持向量机分析结构域组合对序列的氨基酸理化性质得到其序列特征值,同时采用统计分析的方法获取其频率特征值,最后通过融合上述两种特征估计该结构域组合间发生相互作用的可能性,并以此预测蛋白质间相互作用关系。该方法能够预测所有结构域组合间相互作用关系,且对于蛋白质相互作用关系有着较好的预测效果。  相似文献   

20.
氨基酸序列集熵值计算工具实现及应用   总被引:1,自引:0,他引:1  
氨基酸序列保守区和可变区分析是蛋白质结构和功能分析预测的关键环节。本研究根据该需求,编写了Entropy软件,实现了氨基酸序列集熵值计算、统计分析和优势序列模型自动生成等功能,并利用其对A型流感病毒血凝素氨基酸序列的特征进行了分析。该软件为氨基酸序列集保守性分析提供了可靠工具。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号