首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Protein structure prediction in genomics   总被引:1,自引:0,他引:1  
As the number of completely sequenced genomes rapidly increases, including now the complete Human Genome sequence, the post-genomic problems of genome-scale protein structure determination and the issue of gene function identification become ever more pressing. In fact, these problems can be seen as interrelated in that experimentally determining or predicting or the structure of proteins encoded by genes of interest is one possible means to glean subtle hints as to the functions of these genes. The applicability of this approach to gene characterisation is reviewed, along with a brief survey of the reliability of large-scale protein structure prediction methods and the prospects for the development of new prediction methods.  相似文献   

2.
药物从研发到临床应用需要耗费较长的时间,研发期间的投入成本可高达十几亿元。而随着医药研发与人工智能的结合以及生物信息学的飞速发展,药物活性相关数据急剧增加,传统的实验手段进行药物活性预测已经难以满足药物研发的需求。借助算法来辅助药物研发,解决药物研发中的各种问题能够大大推动药物研发进程。传统机器学习方法尤其是随机森林、支持向量机和人工神经网络在药物活性方面能够达到较高的预测精度。深度学习由于具有多层神经网络,模型可以接收高维的输入变量且不需要人工限定数据输入特征,可以拟合较为复杂的函数模型,应用于药物研发可以进一步提高各个环节的效率。在药物活性预测中应用较为广泛的深度学习模型主要是深度神经网络(deep neural networks,DNN)、循环神经网络(recurrent neural networks,RNN)和自编码器(auto encoder,AE),而生成对抗网络(generative adversarial networks,GAN)由于其生成数据的能力常常被用来和其他模型结合进行数据增强。近年来深度学习在药物分子活性预测方面的研究和应用综述表明,深度学习模型的准确度和效率均高于传统实验方法和传统机器学习方法。因此,深度学习模型有望成为药物研发领域未来十年最重要的辅助计算模型。  相似文献   

3.
Low-resolution experiments suggest that most membrane helices span over 17-25 residues and that most loops between two helices are longer than 15 residues. Both constraints have been used explicitly in the development of prediction methods. Here, we compared the largest possible sequence-unique data sets from high- and low-resolution experiments. For the high-resolution data, we found that only half of the helices fall into the expected length interval and that half of the loops were shorter than 10 residues. We compared the accuracy of detecting short loops and long helices for 28 advanced and simple prediction methods: All methods predicted short loops less accurately than longer ones. In particular, loops shorter than 7 residues appeared to be very difficult to detect by current methods. Similarly, all methods tended to be more accurate for longer than for shorter helices. However, helices with more than 32 residues were predicted less accurately than all other helices. Our findings may suggest particular strategies for improving predictions of membrane helices.  相似文献   

4.
A recent deluge of publicly available multi-omics data has fueled the development of machine learning methods aimed at investigating important questions in genomics. Although the motivations for these methods vary, a task that is commonly adopted is that of profile prediction, where predictions are made for one or more forms of biochemical activity along the genome, for example, histone modification, chromatin accessibility, or protein binding. In this review, we give an overview of the research works performing profile prediction, define two broad categories of profile prediction tasks, and discuss the types of scientific questions that can be answered in each.  相似文献   

5.
癌症具有较高的发病率和致死率,对人类健康具有重大威胁。癌症预后分析可以有效避免过度治疗及医疗资源的浪费,为医务人员及家属进行医疗决策提供科学依据,已成为癌症研究的必要条件。随着近年来人工智能技术的迅速发展,对癌症患者的预后情况进行自动化分析成为可能。此外,随着医疗信息化的发展,智慧医疗的理念受到广泛关注。癌症患者作为智慧医疗的重要组成部分,对其进行有效的智能预后分析十分必要。本文综述现有基于机器学习的癌症预后方法。首先,对机器学习与癌症预后进行概述,介绍癌症预后及相关的机器学习方法,分析机器学习在癌症预后中的应用;然后,对基于机器学习的癌症预后方法进行归纳,包括癌症易感性预测、癌症复发性预测、癌症生存期预测,梳理了它们的研究现状、涉及到的癌症类型与数据集、用到的机器学习方法及预后性能、特点、优势与不足;最后,对癌症预后方法进行总结与展望。  相似文献   

6.
CASP13 has investigated the impact of sparse NMR data on the accuracy of protein structure prediction. NOESY and 15N-1H residual dipolar coupling data, typical of that obtained for 15N,13C-enriched, perdeuterated proteins up to about 40 kDa, were simulated for 11 CASP13 targets ranging in size from 80 to 326 residues. For several targets, two prediction groups generated models that are more accurate than those produced using baseline methods. Real NMR data collected for a de novo designed protein were also provided to predictors, including one data set in which only backbone resonance assignments were available. Some NMR-assisted prediction groups also did very well with these data. CASP13 also assessed whether incorporation of sparse NMR data improves the accuracy of protein structure prediction relative to nonassisted regular methods. In most cases, incorporation of sparse, noisy NMR data results in models with higher accuracy. The best NMR-assisted models were also compared with the best regular predictions of any CASP13 group for the same target. For six of 13 targets, the most accurate model provided by any NMR-assisted prediction group was more accurate than the most accurate model provided by any regular prediction group; however, for the remaining seven targets, one or more regular prediction method provided a more accurate model than even the best NMR-assisted model. These results suggest a novel approach for protein structure determination, in which advanced prediction methods are first used to generate structural models, and sparse NMR data is then used to validate and/or refine these models.  相似文献   

7.
蛋白质结构预测的理论方法及阶段   总被引:2,自引:0,他引:2  
孙侠  殷志祥 《生物学杂志》2007,24(1):16-17,15
一直以来,蛋白质结构预测都是人们研究的焦点,综述了蛋白质结构预测的几种理论方法和不同阶段。  相似文献   

8.
Several fold recognition algorithms are compared to each other in terms of prediction accuracy and significance. It is shown that on standard benchmarks, hybrid methods, which combine scoring based on sequence-sequence and sequence-structure matching, surpass both sequence and threading methods in the number of accurate predictions. However, the sequence similarity contributes most to the prediction accuracy. This strongly argues that most examples of apparently nonhomologous proteins with similar folds are actually related by evolution. While disappointing from the perspective of the fundamental understanding of protein folding, this adds a new significance to fold recognition methods as a possible first step in function prediction. Despite hybrid methods being more accurate at fold prediction than either the sequence or threading methods, each of the methods is correct in some cases where others have failed. This partly reflects a different perspective on sequence/structure relationship embedded in various methods. To combine predictions from different methods, estimates of significance of predictions are made for all methods. With the help of such estimates, it is possible to develop a "jury" method, which has accuracy higher than any of the single methods. Finally, building full three-dimensional models for all top predictions helps to eliminate possible false positives where alignments, which are optimal in the one-dimensional sequences, lead to unsolvable sterical conflicts for the full three-dimensional models.  相似文献   

9.
抗原表位预测是免疫信息学研究的重要方向之一,可以给实验提供重要的线索。B细胞表位或抗原决定簇是抗原中可被B细胞受体或抗体特异性识别并结合的部位。实际上,近90%的B细胞表位是构象性的。即使抗原蛋白质三级结构已知,B细胞表位预测仍然是一大挑战。该文结合实例阐述当今主要的构象性B细胞表位预测方法和算法:机器学习预测、非机器学习的计算预测、基于噬菌体展示数据的识别方法,以及一些也可用于构象性B细胞表位预测的通用蛋白质-蛋白质界面预测方法;介绍最新相关预测软件和Web服务资源,说明未来的研究趋势。  相似文献   

10.
目前,基于计算机数学方法对基因的功能注释已成为热点及挑战,其中以机器学习方法应用最为广泛。生物信息学家不断提出有效、快速、准确的机器学习方法用于基因功能的注释,极大促进了生物医学的发展。本文就关于机器学习方法在基因功能注释的应用与进展作一综述。主要介绍几种常用的方法,包括支持向量机、k近邻算法、决策树、随机森林、神经网络、马尔科夫随机场、logistic回归、聚类算法和贝叶斯分类器,并对目前机器学习方法应用于基因功能注释时如何选择数据源、如何改进算法以及如何提高预测性能上进行讨论。  相似文献   

11.
Bondugula R  Xu D 《Proteins》2007,66(3):664-670
Predicting secondary structures from a protein sequence is an important step for characterizing the structural properties of a protein. Existing methods for protein secondary structure prediction can be broadly classified into template based or sequence profile based methods. We propose a novel framework that bridges the gap between the two fundamentally different approaches. Our framework integrates the information from the fuzzy k-nearest neighbor algorithm and position-specific scoring matrices using a neural network. It combines the strengths of the two methods and has a better potential to use the information in both the sequence and structure databases than existing methods. We implemented the framework into a software system MUPRED. MUPRED has achieved three-state prediction accuracy (Q3) ranging from 79.2 to 80.14%, depending on which benchmark dataset is used. A higher Q3 can be achieved if a query protein has a significant sequence identity (>25%) to a template in PDB. MUPRED also estimates the prediction accuracy at the individual residue level more quantitatively than existing methods. The MUPRED web server and executables are freely available at http://digbio.missouri.edu/mupred.  相似文献   

12.
细胞外基质蛋白质在细胞的一系列生物过程中发挥着重要作用,它的异常调节会导致很多重大疾病。理论细胞外基质蛋白质参考数据是实现细胞外基质蛋白质高效鉴定的基础,研究者们已经基于机器学习的方法开发出一系列的细胞外基质蛋白质预测工具。文中首先阐述了基于机器学习模型构建细胞外基质蛋白质预测工具的基本流程,之后以工具为单位总结了已有细胞外基质蛋白质预测工具的研究成果,最后提出了细胞外基质蛋白质预测工具目前面临的问题和可能的优化方法。  相似文献   

13.
Yuan Z  Burrage K  Mattick JS 《Proteins》2002,48(3):566-570
A Support Vector Machine learning system has been trained to predict protein solvent accessibility from the primary structure. Different kernel functions and sliding window sizes have been explored to find how they affect the prediction performance. Using a cut-off threshold of 15% that splits the dataset evenly (an equal number of exposed and buried residues), this method was able to achieve a prediction accuracy of 70.1% for single sequence input and 73.9% for multiple alignment sequence input, respectively. The prediction of three and more states of solvent accessibility was also studied and compared with other methods. The prediction accuracies are better than, or comparable to, those obtained by other methods such as neural networks, Bayesian classification, multiple linear regression, and information theory. In addition, our results further suggest that this system may be combined with other prediction methods to achieve more reliable results, and that the Support Vector Machine method is a very useful tool for biological sequence analysis.  相似文献   

14.
Fujitsuka Y  Chikenji G  Takada S 《Proteins》2006,62(2):381-398
Predicting protein tertiary structures by in silico folding is still very difficult for proteins that have new folds. Here, we developed a coarse-grained energy function, SimFold, for de novo structure prediction, performed a benchmark test of prediction with fragment assembly simulations for 38 test proteins, and proposed consensus prediction with Rosetta. The SimFold energy consists of many terms that take into account solvent-induced effects on the basis of physicochemical consideration. In the benchmark test, SimFold succeeded in predicting native structures within 6.5 A for 12 of 38 proteins; this success rate was the same as that by the publicly available version of Rosetta (ab initio version 1.2) run with default parameters. We investigated which energy terms in SimFold contribute to structure prediction performance, finding that the hydrophobic interaction is the most crucial for the prediction, whereas other sequence-specific terms have weak but positive roles. In the benchmark, well-predicted proteins by SimFold and by Rosetta were not the same for 5 of 12 proteins, which led us to introduce consensus prediction. With combined decoys, we succeeded in prediction for 16 proteins, four more than SimFold or Rosetta separately. For each of 38 proteins, structural ensembles generated by SimFold and by Rosetta were qualitatively compared by mapping sampled structural space onto two dimensions. For proteins of which one of the two methods succeeded and the other failed in prediction, the former had a less scattered ensemble located around the native. For proteins of which both methods succeeded in prediction, often two ensembles were mixed up.  相似文献   

15.
This study focuses on predicting breathing pattern, which is crucial to deal with system latency in the treatments of moving lung tumors. Predicting respiratory motion in real-time is challenging, due to the inherent chaotic nature of breathing patterns, i.e. sensitive dependence on initial conditions. In this work, nonlinear prediction methods are used to predict the short-term evolution of the respiratory system for 62 patients, whose breathing time series was acquired using respiratory position management (RPM) system. Single step and N-point multi step prediction are performed for sampling rates of 5 Hz and 10 Hz. We compare the employed non-linear prediction methods with respect to prediction accuracy to Adaptive Infinite Impulse Response (IIR) prediction filters. A Local Average Model (LAM) and local linear models (LLMs) combined with a set of linear regularization techniques to solve ill-posed regression problems are implemented. For all sampling frequencies both single step and N-point multi step prediction results obtained using LAM and LLM with regularization methods perform better than IIR prediction filters for the selected sample patients. Moreover, since the simple LAM model performs as well as the more complicated LLM models in our patient sample, its use for non-linear prediction is recommended.  相似文献   

16.
根据斜纹夜蛾发育起点温度和有效积温,探讨了斜纹夜蛾发生期预测方法。结果表明,田间斜纹夜蛾发生高峰期与预测的日期基本吻合。  相似文献   

17.
Membrane protein prediction methods   总被引:13,自引:0,他引:13  
We survey computational approaches that tackle membrane protein structure and function prediction. While describing the main ideas that have led to the development of the most relevant and novel methods, we also discuss pitfalls, provide practical hints and highlight the challenges that remain. The methods covered include: sequence alignment, motif search, functional residue identification, transmembrane segment and protein topology predictions, homology and ab initio modeling. In general, predictions of functional and structural features of membrane proteins are improving, although progress is hampered by the limited amount of high-resolution experimental information available. While predictions of transmembrane segments and protein topology rank among the most accurate methods in computational biology, more attention and effort will be required in the future to ameliorate database search, homology and ab initio modeling.  相似文献   

18.
蛋白质的序列决定结构,结构决定功能。新一代准确的蛋白质结构预测工具为结构生物学、结构生物信息学、药物研发和生命科学等许多领域带来了全新的机遇与挑战,单链蛋白质结构预测的准确率达到与试验方法相媲美的水平。本综述概述了蛋白质结构预测领域的理论基础、发展历程与最新进展,讨论了大量预测的蛋白质结构和基于人工智能的方法如何影响实验结构生物学,最后,分析了当前蛋白质结构预测领域仍未解决的问题以及未来的研究方向。  相似文献   

19.
SARS-CoV(BJ01)基因预测及功能推测   总被引:1,自引:1,他引:1  
通过对有关SARS—Cov文献的调研,指出了有关基因预测和功能研究的不足。为制备有效的药物和疫苗,对SARS—CoV(BJ01)重新进行了基因预测和功能推测。比较12种基因预测方法对冠状病毒属中已知基因的预测优劣,选用Heuristic models、Gene Identification、ZCURVE—CoV和ORF FINDER4种较好的方法来预测基因,然后运用AT—Gpr分析第一起始密码子的可能性及是否符合Kozak规则,同时搜索转录调控序列,以提高基因预测的准确性。共预测出34个ORF,排除NCBI及有关文献中完全相同或有微弱差别的13个,得到21个大于50个氨基酸的可能新基因。对于预测出的蛋白质,运用ProtParam分析它们的物理化学特征,用SignaIP分析蛋白是否有信号肽,用BLAST、FASTA分析是否有相似序列,用TMPred、TMHMM、PFAM和HMMTOP分析结构域或模体,以提高基因功能推测的可靠性。根据4种基因预测方法使用情况、与其他冠状病毒属已知基因匹配分值、匹配预期值、已知基因与预测基因长度差别,将21个可能的新基因按出现可能性分为4类。同时对结果进行了讨论。  相似文献   

20.
Pseudoknots are an essential feature of RNA tertiary structures. Simple H-type pseudoknots have been studied extensively in terms of biological functions, computational prediction, and energy models. Intramolecular kissing hairpins are a more complex and biologically important type of pseudoknot in which two hairpin loops form base pairs. They are hard to predict using free energy minimization due to high computational requirements. Heuristic methods that allow arbitrary pseudoknots strongly depend on the quality of energy parameters, which are not yet available for complex pseudoknots. We present an extension of the heuristic pseudoknot prediction algorithm DotKnot, which covers H-type pseudoknots and intramolecular kissing hairpins. Our framework allows for easy integration of advanced H-type pseudoknot energy models. For a test set of RNA sequences containing kissing hairpins and other types of pseudoknot structures, DotKnot outperforms competing methods from the literature. DotKnot is available as a web server under http://dotknot.csse.uwa.edu.au.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号