共查询到20条相似文献,搜索用时 0 毫秒
1.
Background
A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. 相似文献2.
Background
Current efforts in Metabolomics, such as the Human Metabolome Project, collect structures of biological metabolites as well as data for their characterisation, such as spectra for identification of substances and measurements of their concentration. Still, only a fraction of existing metabolites and their spectral fingerprints are known. Computer-Assisted Structure Elucidation (CASE) of biological metabolites will be an important tool to leverage this lack of knowledge. Indispensable for CASE are modules to predict spectra for hypothetical structures. This paper evaluates different statistical and machine learning methods to perform predictions of proton NMR spectra based on data from our open database NMRShiftDB. 相似文献3.
Barla A Jurman G Riccadonna S Merler S Chierici M Furlanello C 《Briefings in bioinformatics》2008,9(2):119-128
The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 10(3) times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers' list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies. 相似文献
4.
Junghyun Namkung 《Journal of microbiology (Seoul, Korea)》2020,58(3):206-216
Researches on the microbiome have been actively conducted worldwide and the results have shown human gut bacterial environment significantly impacts on immune system, psychological conditions, cancers, obesity, and metabolic diseases. Thanks to the development of sequencing technology, microbiome studies with large number of samples are eligible on an acceptable cost nowadays. Large samples allow analysis of more sophisticated modeling using machine learning approaches to study relationships between microbiome and various traits. This article provides an overview of machine learning methods for non-data scientists interested in the association analysis of microbiomes and host phenotypes. Once genomic feature of microbiome is determined, various analysis methods can be used to explore the relationship between microbiome and host phenotypes that include penalized regression, support vector machine (SVM), random forest, and artificial neural network (ANN). Deep neural network methods are also touched. Analysis procedure from environment setup to extract analysis results are presented with Python programming language. 相似文献
5.
PROMIS (protein machine induction system), a program for machine learning, was used to generalize rules that characterize the relationship between primary and secondary structure in globular proteins. These rules can be used to predict an unknown secondary structure from a known primary structure. The symbolic induction method used by PROMIS was specifically designed to produce rules that are meaningful in terms of chemical properties of the residues. The rules found were compared with existing knowledge of protein structure: some features of the rules were already recognized (e.g. amphipathic nature of alpha-helices). Other features are not understood, and are under investigation. The rules produced a prediction accuracy for three states (alpha-helix, beta-strand and coil) of 60% for all proteins, 73% for proteins of known alpha domain type, 62% for proteins of known beta domain type and 59% for proteins of known alpha/beta domain type. We conclude that machine learning is a useful tool in the examination of the large databases generated in molecular biology. 相似文献
6.
Machine learning methods without tears: a primer for ecologists 总被引:1,自引:0,他引:1
Machine learning methods, a family of statistical techniques with origins in the field of artificial intelligence, are recognized as holding great promise for the advancement of understanding and prediction about ecological phenomena. These modeling techniques are flexible enough to handle complex problems with multiple interacting elements and typically outcompete traditional approaches (e.g., generalized linear models), making them ideal for modeling ecological systems. Despite their inherent advantages, a review of the literature reveals only a modest use of these approaches in ecology as compared to other disciplines. One potential explanation for this lack of interest is that machine learning techniques do not fall neatly into the class of statistical modeling approaches with which most ecologists are familiar. In this paper, we provide an introduction to three machine learning approaches that can be broadly used by ecologists: classification and regression trees, artificial neural networks, and evolutionary computation. For each approach, we provide a brief background to the methodology, give examples of its application in ecology, describe model development and implementation, discuss strengths and weaknesses, explore the availability of statistical software, and provide an illustrative example. Although the ecological application of machine learning approaches has increased, there remains considerable skepticism with respect to the role of these techniques in ecology. Our review encourages a greater understanding of machin learning approaches and promotes their future application and utilization, while also providing a basis from which ecologists can make informed decisions about whether to select or avoid these approaches in their future modeling endeavors. 相似文献
7.
E. Decencière G. Cazuguel X. Zhang G. Thibault J.-C. Klein F. Meyer B. Marcotegui G. Quellec M. Lamard R. Danno D. Elie P. Massin Z. Viktor A. Erginay B. Laÿ A. Chabouis 《IRBM》2013,34(2):196-203
A complete prototype for the automatic detection of normal examinations on a teleophthalmology network for diabetic retinopathy screening is presented. The system combines pathological pattern mining methods, with specific lesion detection methods, to extract information from the images. This information, plus patient and other contextual data, is used by a classifier to compute an abnormality risk. Such a system should reduce the burden on readers on teleophthalmology networks. 相似文献
8.
9.
Naushad Shaik Mohammad Hussain Tajamul Indumathi Bobbala Samreen Khatoon Alrokayan Salman A. Kutala Vijay Kumar 《Molecular biology reports》2018,45(5):901-910
Molecular Biology Reports - In view of high mortality associated with coronary artery disease (CAD), development of an early predicting tool will be beneficial in reducing the burden of the... 相似文献
10.
Machine learning approaches for the prediction of signal peptides and other protein sorting signals 总被引:10,自引:0,他引:10
Prediction of protein sorting signals from the sequence of amino acids has great importance in the field of proteomics today. Recently, the growth of protein databases, combined with machine learning approaches, such as neural networks and hidden Markov models, have made it possible to achieve a level of reliability where practical use in, for example automatic database annotation is feasible. In this review, we concentrate on the present status and future perspectives of SignalP, our neural network-based method for prediction of the most well-known sorting signal: the secretory signal peptide. We discuss the problems associated with the use of SignalP on genomic sequences, showing that signal peptide prediction will improve further if integrated with predictions of start codons and transmembrane helices. As a step towards this goal, a hidden Markov model version of SignalP has been developed, making it possible to discriminate between cleaved signal peptides and uncleaved signal anchors. Furthermore, we show how SignalP can be used to characterize putative signal peptides from an archaeon, Methanococcus jannaschii. Finally, we briefly review a few methods for predicting other protein sorting signals and discuss the future of protein sorting prediction in general. 相似文献
11.
Background
Diverse modeling approaches viz. neural networks and multiple regression have been followed to date for disease prediction in plant populations. However, due to their inability to predict value of unknown data points and longer training times, there is need for exploiting new prediction softwares for better understanding of plant-pathogen-environment relationships. Further, there is no online tool available which can help the plant researchers or farmers in timely application of control measures. This paper introduces a new prediction approach based on support vector machines for developing weather-based prediction models of plant diseases. 相似文献12.
Puton T Kozlowski L Tuszynska I Rother K Bujnicki JM 《Journal of structural biology》2012,179(3):261-268
Understanding the molecular mechanism of protein-RNA recognition and complex formation is a major challenge in structural biology. Unfortunately, the experimental determination of protein-RNA complexes by X-ray crystallography and nuclear magnetic resonance spectroscopy (NMR) is tedious and difficult. Alternatively, protein-RNA interactions can be predicted by computational methods. Although less accurate than experimental observations, computational predictions can be sufficiently accurate to prompt functional hypotheses and guide experiments, e.g. to identify individual amino acid or nucleotide residues. In this article we review 10 methods for predicting protein-RNA interactions, seven of which predict RNA-binding sites from protein sequences and three from structures. We also developed a meta-predictor that uses the output of top three sequence-based primary predictors to calculate a consensus prediction, which outperforms all the primary predictors. In order to fully cover the software for predicting protein-RNA interactions, we also describe five methods for protein-RNA docking. The article highlights the strengths and shortcomings of existing methods for the prediction of protein-RNA interactions and provides suggestions for their further development. 相似文献
13.
Machine learning in bioinformatics 总被引:1,自引:0,他引:1
Larrañaga P Calvo B Santana R Bielza C Galdiano J Inza I Lozano JA Armañanzas R Santafé G Pérez A Robles V 《Briefings in bioinformatics》2006,7(1):86-112
This article reviews machine learning methods for bioinformatics. It presents modelling methods, such as supervised classification, clustering and probabilistic graphical models for knowledge discovery, as well as deterministic and stochastic heuristics for optimization. Applications in genomics, proteomics, systems biology, evolution and text mining are also shown. 相似文献
14.
Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns. 相似文献
15.
Protein structure prediction methods for drug design 总被引:1,自引:0,他引:1
Along the long path from genomic data to a new drug, the knowledge of three-dimensional protein structure can be of significant help in several places.This paper points out such places, discusses the virtues of protein structure knowledge and reviews bioinformatics methods for gaining such knowledge on the protein structure. 相似文献
16.
17.
Machine learning for Big Data analytics in plants 总被引:2,自引:0,他引:2
18.
预测蛋白质间相互作用的生物信息学方法 总被引:8,自引:0,他引:8
后基因组时代的研究模式,已从原来的序列-结构-功能转向基因表达-系统动力学-生理功能。建立蛋白质间相互作用的完全网络,即蛋白质相互作用组(interactome),将有助于从系统角度加深对细胞结构和功能的认识,并为新药靶点的发现和药物设计提供理论基础。一系列系统分析蛋白质相互作用的实验方法已经建立,近年来,出现了多种预测蛋白质相互作用的生物信息学方法,这些方法不仅是对传统实验方法的有价值的补充,而且能够扩展实验方法的预测范围;同时,在开发这些方法的过程中建立了一些重要的分子进化和分子生物学慨念。本文综述了9种生物信息学方法的原理、方法评估、存在的问题.并分析了这个领域的发展前景。 相似文献
19.
Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review. 相似文献
20.
Saraswathi S Fernández-Martínez JL Kolinski A Jernigan RL Kloczkowski A 《Journal of molecular modeling》2012,18(9):4275-4289
Computational methods are rapidly gaining importance in the field of structural biology, mostly due to the explosive progress in genome sequencing projects and the large disparity between the number of sequences and the number of structures. There has been an exponential growth in the number of available protein sequences and a slower growth in the number of structures. There is therefore an urgent need to develop computational methods to predict structures and identify their functions from the sequence. Developing methods that will satisfy these needs both efficiently and accurately is of paramount importance for advances in many biomedical fields, including drug development and discovery of biomarkers. A novel method called fast learning optimized prediction methodology (FLOPRED) is proposed for predicting protein secondary structure, using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data that yield better and faster convergence to produce more accurate results. Protein secondary structures are predicted reliably, more efficiently and more accurately using FLOPRED. These techniques yield superior classification of secondary structure elements, with a training accuracy ranging between 83?% and 87?% over a widerange of hidden neurons and a cross-validated testing accuracy ranging between 81?% and 84?% and a segment overlap (SOV) score of 78?% that are obtained with different sets of proteins. These results are comparable to other recently published studies, but are obtained with greater efficiencies, in terms of time and cost. 相似文献