首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One common and challenging problem faced by many bioinformatics applications, such as promoter recognition, splice site prediction, RNA gene prediction, drug discovery and protein classification, is the imbalance of the available datasets. In most of these applications, the positive data examples are largely outnumbered by the negative data examples, which often leads to the development of sub-optimal prediction models having high negative recognition rate (Specificity = SP) and low positive recognition rate (Sensitivity = SE). When class imbalance learning methods are applied, usually, the SE is increased at the expense of reducing some amount of the SP. In this paper, we point out that in these data-imbalanced bioinformatics applications, the goal of applying class imbalance learning methods would be to increase the SE as high as possible by keeping the reduction of SP as low as possible. We explain that the existing performance measures used in class imbalance learning can still produce sub-optimal models with respect to this classification goal. In order to overcome these problems, we introduce a new performance measure called Adjusted Geometric-mean (AGm). The experimental results obtained on ten real-world imbalanced bioinformatics datasets demonstrates that the AGm metric can achieve a lower rate of reduction of SP than the existing performance metrics, when increasing the SE through class imbalance learning methods. This characteristic of AGm metric makes it more suitable for achieving the proposed classification goal in imbalanced bioinformatics datasets learning.  相似文献   

2.
Deep learning has revolutionized research in image processing, speech recognition, natural language processing, game playing, and will soon revolutionize research in proteomics and genomics. Through three examples in genomics, protein structure prediction, and proteomics, we demonstrate that deep learning is changing bioinformatics research, shifting from algorithm‐centric to data‐centric approaches.  相似文献   

3.
Plant membrane proteomics   总被引:11,自引:0,他引:11  
Plant membrane proteins are involved in many different functions according to their location in the cell. For instance, the chloroplast has two membrane systems, thylakoids and envelope, with specialized membrane proteins for photosynthesis and metabolite and ion transporters, respectively. Although recent advances in sample preparation and analytical techniques have been achieved for the study of membrane proteins, the characterization of these proteins, especially the hydrophobic ones, is still challenging. The present review highlights recent advances in methodologies for identification of plant membrane proteins from purified subcellular structures. The interest of combining several complementary extraction procedures to take into account specific features of membrane proteins is discussed in the light of recent proteomics data, notably for chloroplast envelope, mitochondrial membranes and plasma membrane from Arabidopsis. These examples also illustrate how, on one hand, proteomics can feed bioinformatics for a better definition of prediction tools and, on the other hand, although prediction tools are not 100% reliable, they can give valuable information for biological investigations. In particular, membrane proteomics brings new insights over plant membrane systems, on both the membrane compartment where proteins are working and their putative cellular function.  相似文献   

4.
曹国军  邵宁生 《生命科学》2008,20(2):183-189
RNA技术可以分为RNA基础研究相关的技术、RNA应用相关的技术和RNA的生物信息学技术。RNA基础研究相关技术包括RNA分离纯化和鉴定技术、RNA与其他生物大分子相互作用技术、RNA高级结构的研究技术和其他相关RNA技术;RNA应用相关技术则包括用于生产其他产品的RNA技术和直接用于药物开发的RNA技术;RNA的生物信息学技术则有各种数据库、非编码RNA的预测、RNA二级结构预测和各种设计软件。本文简略介绍了上述各类RNA技术的原理及其国内外研究进展,从而有助于对RNA领域有关技术方面有一较全面的了解。  相似文献   

5.
Recent development in biochemical experiment techniques and bioinformatics has enabled us to create a variety of artificial biocatalysts with protein scaffolds (namely ‘artificial enzymes’). The construction methods of these catalysts include genetic mutation, chemical modification using synthetic molecules and/or a combination of these methods. Designed evolution strategy based on the structural information of host proteins has become more and more popular as an effective approach to construct artificial protein-based biocatalysts with desired reactivities. From the viewpoint of application of artificial enzymes for organic synthesis, recently constructed artificial enzymes mediating oxidation, reduction and C–C bond formation/cleavage are introduced in this review article.  相似文献   

6.
The bioinformatics analysis of proteins containing tandem repeats requires special computer programs and databases, since the conventional approaches predominantly developed for globular domains have limited success. Here, I survey bioinformatics tools which have been developed recently for identification and proteome-wide analysis of protein repeats. The last few years have also been marked by an emergence of new 3D structures of these proteins. Appraisal of the known structures and their classification uncovers a straightforward relationship between their architecture and the length of the repetitive units. This relationship and the repetitive character of structural folds suggest rules for better prediction of the 3D structures of such proteins. Furthermore, bioinformatics approaches combined with low resolution structural data, from biophysical techniques, especially, the recently emerged cryo-electron microscopy, lead to reliable prediction of the protein repeat structures and their mode of binding with partners within molecular complexes. This hybrid approach can actively be used for structural and functional annotations of proteomes.  相似文献   

7.
Support vector machine applications in bioinformatics   总被引:14,自引:0,他引:14  
  相似文献   

8.

Background

Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor.

Results

Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction.

Conclusions

Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.  相似文献   

9.
Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning.  相似文献   

10.
Machine learning methods without tears: a primer for ecologists   总被引:1,自引:0,他引:1  
Machine learning methods, a family of statistical techniques with origins in the field of artificial intelligence, are recognized as holding great promise for the advancement of understanding and prediction about ecological phenomena. These modeling techniques are flexible enough to handle complex problems with multiple interacting elements and typically outcompete traditional approaches (e.g., generalized linear models), making them ideal for modeling ecological systems. Despite their inherent advantages, a review of the literature reveals only a modest use of these approaches in ecology as compared to other disciplines. One potential explanation for this lack of interest is that machine learning techniques do not fall neatly into the class of statistical modeling approaches with which most ecologists are familiar. In this paper, we provide an introduction to three machine learning approaches that can be broadly used by ecologists: classification and regression trees, artificial neural networks, and evolutionary computation. For each approach, we provide a brief background to the methodology, give examples of its application in ecology, describe model development and implementation, discuss strengths and weaknesses, explore the availability of statistical software, and provide an illustrative example. Although the ecological application of machine learning approaches has increased, there remains considerable skepticism with respect to the role of these techniques in ecology. Our review encourages a greater understanding of machin learning approaches and promotes their future application and utilization, while also providing a basis from which ecologists can make informed decisions about whether to select or avoid these approaches in their future modeling endeavors.  相似文献   

11.
The reliable assessment of the quality of protein structural models is fundamental to the progress of structural bioinformatics. The ModFOLD server provides access to two accurate techniques for the global and local prediction of the quality of 3D models of proteins. Firstly ModFOLD, which is a fast Model Quality Assessment Program (MQAP) used for the global assessment of either single or multiple models. Secondly ModFOLDclust, which is a more intensive method that carries out clustering of multiple models and provides per-residue local quality assessment. AVAILABILITY: http://www.biocentre.rdg.ac.uk/bioinformatics/ModFOLD/.  相似文献   

12.
Protein structure prediction by using bioinformatics can involve sequence similarity searches, multiple sequence alignments, identification and characterization of domains, secondary structure prediction, solvent accessibility prediction, automatic protein fold recognition, constructing three-dimensional models to atomic detail, and model validation. Not all protein structure prediction projects involve the use of all these techniques. A central part of a typical protein structure prediction is the identification of a suitable structural target from which to extrapolate three-dimensional information for a query sequence. The way in which this is done defines three types of projects. The first involves the use of standard and well-understood techniques. If a structural template remains elusive, a second approach using nontrivial methods is required. If a target fold cannot be reliably identified because inconsistent results have been obtained from nontrivial data analyses, the project falls into the third type of project and will be virtually impossible to complete with any degree of reliability. In this article, a set of protocols to predict protein structure from sequence is presented and distinctions among the three types of project are given. These methods, if used appropriately, can provide valuable indicators of protein structure and function.  相似文献   

13.
We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.  相似文献   

14.
Artificial neural networks and their use in quantitative pathology   总被引:2,自引:0,他引:2  
A brief general introduction to artificial neural networks is presented, examining in detail the structure and operation of a prototype net developed for the solution of a simple pattern recognition problem in quantitative pathology. The process by which a neural network learns through example and gradually embodies its knowledge as a distributed representation is discussed, using this example. The application of neurocomputer technology to problems in quantitative pathology is explored, using real-world and illustrative examples. Included are examples of the use of artificial neural networks for pattern recognition, database analysis and machine vision. In the context of these examples, characteristics of neural nets, such as their ability to tolerate ambiguous, noisy and spurious data and spontaneously generalize from known examples to handle unfamiliar cases, are examined. Finally, the strengths and deficiencies of a connectionist approach are compared to those of traditional symbolic expert system methodology. It is concluded that artificial neural networks, used in conjunction with other nonalgorithmic artificial intelligence techniques and traditional algorithmic processing, may provide useful software engineering tools for the development of systems in quantitative pathology.  相似文献   

15.
蛋白质的序列决定结构,结构决定功能。新一代准确的蛋白质结构预测工具为结构生物学、结构生物信息学、药物研发和生命科学等许多领域带来了全新的机遇与挑战,单链蛋白质结构预测的准确率达到与试验方法相媲美的水平。本综述概述了蛋白质结构预测领域的理论基础、发展历程与最新进展,讨论了大量预测的蛋白质结构和基于人工智能的方法如何影响实验结构生物学,最后,分析了当前蛋白质结构预测领域仍未解决的问题以及未来的研究方向。  相似文献   

16.
Recent advances in membrane protein crystallography have greatly increased structural information of channels permeating metal ions. Structural bioinformatics techniques and molecular dynamics calculations are providing structural models of ion channels for which the three-dimensional structure is not known. Most of the reported structure prediction studies focus on K(+) channels and are based on the KcsA K(+) channel structure.  相似文献   

17.
Secondary structure prediction with support vector machines   总被引:8,自引:0,他引:8  
MOTIVATION: A new method that uses support vector machines (SVMs) to predict protein secondary structure is described and evaluated. The study is designed to develop a reliable prediction method using an alternative technique and to investigate the applicability of SVMs to this type of bioinformatics problem. METHODS: Binary SVMs are trained to discriminate between two structural classes. The binary classifiers are combined in several ways to predict multi-class secondary structure. RESULTS: The average three-state prediction accuracy per protein (Q(3)) is estimated by cross-validation to be 77.07 +/- 0.26% with a segment overlap (Sov) score of 73.32 +/- 0.39%. The SVM performs similarly to the 'state-of-the-art' PSIPRED prediction method on a non-homologous test set of 121 proteins despite being trained on substantially fewer examples. A simple consensus of the SVM, PSIPRED and PROFsec achieves significantly higher prediction accuracy than the individual methods.  相似文献   

18.
19.
20.
Bacterial small RNAs (sRNAs) are an emerging class of regulatory RNAs of about 40-500 nucleotides in length and, by binding to their target mRNAs or proteins, get involved in many biological processes such as sensing environmental changes and regulating gene expression. Thus, identification of bacterial sRNAs and their targets has become an important part of sRNA biology. Current strategies for discovery of sRNAs and their targets usually involve bioinformatics prediction followed by experimental validation, emphasizing a key role for bioinformatics prediction. Here, therefore, we provided an overview on prediction methods, focusing on the merits and limitations of each class of models. Finally, we will present our thinking on developing related bioinformatics models in future.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号