首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 843 毫秒
1.
Pathway analysis using random forests classification and regression   总被引:3,自引:0,他引:3  
MOTIVATION: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers. RESULTS: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data. AVAILABILITY: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm.  相似文献   

2.
Challenges and solutions in proteomics   总被引:1,自引:0,他引:1  
The accelerated growth of proteomics data presents both opportunities and challenges. Large-scale proteomic profiling of biological samples such as cells, organelles or biological fluids has led to discovery of numerous key and novel proteins involved in many biological/disease processes including cancers, as well as to the identification of novel disease biomarkers and potential therapeutic targets. While proteomic data analysis has been greatly assisted by the many bioinformatics tools developed in recent years, a careful analysis of the major steps and flow of data in a typical highthroughput analysis reveals a few gaps that still need to be filled to fully realize the value of the data. To facilitate functional and pathway discovery for large-scale proteomic data, we have developed an integrated proteomic expression analysis system, iProXpress, which facilitates protein identification using a comprehensive sequence library and functional interpretation using integrated data. With its modular design, iProXpress complements and can be integrated with other software in a proteomic data analysis pipeline. This novel approach to complex biological questions involves the interrogation of multiple data sources, thereby facilitating hypothesis generation and knowledge discovery from the genomic-scale studies and fostering disease diagnosis and drug development.  相似文献   

3.
4.
随着深度测序和基因芯片技术的不断发展,基因组、转录组、表达谱数据大量积累。目前,至少有10多个昆虫的基因组已被测序,30多个昆虫的转录组数据被报道。显然,传统的生物统计学方法无法处理如此海量的生物数据。量变引发质变,生物数据的大量积累催生了一门新兴学科,生物信息学。生物信息学融合了统计学、信息科学和生物学等各学科的理论和研究内容,在医学、基础生物学、农业科学以及昆虫学等方面获得了广泛的应用。生物信息学的目标是存储数据、管理数据和数据挖掘。因此,建立维护生物学数据库、设计开发基于模式识别、机器学习、数据挖掘等方法的生物软件,以及运用上述工具进行深度的数据挖掘,是生物信息学的重要研究内容。本文首先简要介绍了生物信息学的历史、研究现状及其在昆虫学科中的应用,然后综述了昆虫基因组学和转录组学的研究进展,最后对生物信息学在昆虫学研究中的应用前景进行了展望。  相似文献   

5.
6.
The elucidation of the entire genomic sequence of various organisms, from viruses to complex metazoans, most recently man, is undoubtedly the greatest triumph of molecular biology since the discovery of the DNA double helix. Over the past two decades, the focus of molecular biology has gradually moved from genomes to proteomes, the intention being to discover the functions of the genes themselves. The postgenomic era stimulated the development of new techniques (e.g. 2-DE and MS) and bioinformatics tools to identify the functions, reactions, interactions and location of the gene products in tissues and/or cells of living organisms. Both 2-DE and MS have been very successfully employed to identify proteins involved in biological phenomena (e.g. immunity, cancer, host-parasite interactions, etc.), although recently, several papers have emphasised the pitfalls of 2-DE experiments, especially in relation to experimental design, poor statistical treatment and the high rate of 'false positive' results with regard to protein identification. In the light of these perceived problems, we review the advantages and misuses of bioinformatics tools - from realisation of 2-DE gels to the identification of candidate protein spots - and suggest some useful avenues to improve the quality of 2-DE experiments. In addition, we present key steps which, in our view, need to be to taken into consideration during such analyses. Lastly, we present novel biological entities named 'interactomes', and the bioinformatics tools developed to analyse the large protein-protein interaction networks they form, along with several new perspectives of the field.  相似文献   

7.
8.
This is the second article in a series, intended as a tutorial to provide the interested reader with an overview of the concepts not covered in part I, such as: the principles of ion-activation methods, the ability of mass-spectrometric methods to interface with various proteomic strategies, analysis techniques, bioinformatics and data interpretation and annotation. Although these are different topics, it is important that a reader has a basic and collective understanding of all of them for an overall appreciation of how to carry out and analyze a proteomic experiment. Different ion-activation methods for MS/MS, such as collision-induced dissociation (including postsource decay) and surface-induced dissociation, electron capture and electron-transfer dissociation, infrared multiphoton and blackbody infrared radiative dissociation have been discussed since they are used in proteomic research. The high dimensionality of data generated from proteomic studies requires an understanding of the underlying analytical procedures used to obtain these data, as well as the development of improved bioinformatics tools and data-mining approaches for efficient and accurate statistical analyses of biological samples from healthy and diseased individuals, in addition to determining the utility of the interpreted data. Currently available strategies for the analysis of the proteome by mass spectrometry, such as those employed for the analysis of substantially purified proteins and complex peptide mixtures, as well as hypothesis-driven strategies, have been elaborated upon. Processing steps prior to the analysis of mass spectrometry data, statistics and the several informatics steps currently used for the analysis of shotgun proteomic experiments, as well as proteomics ontology, are also discussed.  相似文献   

9.
The search and validation of novel disease biomarkers requires the complementary power of professional study planning and execution, modern profiling technologies and related bioinformatics tools for data analysis and interpretation. Biomarkers have considerable impact on the care of patients and are urgently needed for advancing diagnostics, prognostics and treatment of disease. This survey article highlights emerging bioinformatics methods for biomarker discovery in clinical metabolomics, focusing on the problem of data preprocessing and consolidation, the data-driven search, verification, prioritization and biological interpretation of putative metabolic candidate biomarkers in disease. In particular, data mining tools suitable for the application to omic data gathered from most frequently-used type of experimental designs, such as case-control or longitudinal biomarker cohort studies, are reviewed and case examples of selected discovery steps are delineated in more detail. This review demonstrates that clinical bioinformatics has evolved into an essential element of biomarker discovery, translating new innovations and successes in profiling technologies and bioinformatics to clinical application.  相似文献   

10.
Massive DNA sequencing studies have expanded our insights and understanding of the ecological and functional characteristics of the gut microbiome. Advanced sequencing technologies allow us to understand the close association of the gut microbiome with human health and critical illnesses. In the future, analyses of the gut microbiome will provide key information associating with human individual health, which will help provide personalized health care for diseases. Numerous molecular biological analysis tools have been rapidly developed and employed for the gut microbiome researches; however, methodological differences among researchers lead to inconsistent data, limiting extensive share of data. It is therefore very essential to standardize the current methodologies and establish appropriate pipelines for human gut microbiome research. Herein, we review the methods and procedures currently available for studying the human gut microbiome, including fecal sample collection, metagenomic DNA extraction, massive DNA sequencing, and data analyses with bioinformatics. We believe that this review will contribute to the progress of gut microbiome research in the clinical and practical aspects of human health.  相似文献   

11.
Many statistical methods have been developed to screen for differentially expressed genes associated with specific phenotypes in the microarray data. However, it remains a major challenge to synthesize the observed expression patterns with abundant biological knowledge for more complete understanding of the biological functions among genes. Various methods including clustering analysis on genes, neural network, Bayesian network and pathway analysis have been developed toward this goal. In most of these procedures, the activation and inhibition relationships among genes have hardly been utilized in the modeling steps. We propose two novel Bayesian models to integrate the microarray data with the putative pathway structures obtained from the KEGG database and the directional gene–gene interactions in the medical literature. We define the symmetric Kullback–Leibler divergence of a pathway, and use it to identify the pathway(s) most supported by the microarray data. Monte Carlo Markov Chain sampling algorithm is given for posterior computation in the hierarchical model. The proposed method is shown to select the most supported pathway in an illustrative example. Finally, we apply the methodology to a real microarray data set to understand the gene expression profile of osteoblast lineage at defined stages of differentiation. We observe that our method correctly identifies the pathways that are reported to play essential roles in modulating bone mass.  相似文献   

12.
13.
Bioinformatics is a central discipline in modern life sciences aimed at describing the complex properties of living organisms starting from large-scale data sets of cellular constituents such as genes and proteins. In order for this wealth of information to provide useful biological knowledge, databases and software tools for data collection, analysis and interpretation need to be developed. In this paper, we review recent advances in the design and implementation of bioinformatics resources devoted to the study of metals in biological systems, a research field traditionally at the heart of bioinorganic chemistry. We show how metalloproteomes can be extracted from genome sequences, how structural properties can be related to function, how databases can be implemented, and how hints on interactions can be obtained from bioinformatics.  相似文献   

14.
Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum.  相似文献   

15.
16.
The development of high-throughput technologies has generated the need for bioinformatics approaches to assess the biological relevance of gene networks. Although several tools have been proposed for analysing the enrichment of functional categories in a set of genes, none of them is suitable for evaluating the biological relevance of the gene network. We propose a procedure and develop a web-based resource (BIOREL) to estimate the functional bias (biological relevance) of any given genetic network by integrating different sources of biological information. The weights of the edges in the network may be either binary or continuous. These essential features make our web tool unique among many similar services. BIOREL provides standardized estimations of the network biases extracted from independent data. By the analyses of real data we demonstrate that the potential application of BIOREL ranges from various benchmarking purposes to systematic analysis of the network biology.  相似文献   

17.
Pathway analysis, also known as gene-set enrichment analysis, is a multilocus analytic strategy that integrates a priori, biological knowledge into the statistical analysis of high-throughput genetics data. Originally developed for the studies of gene expression data, it has become a powerful analytic procedure for indepth mining of genome-wide genetic variation data. Astonishing discoveries were made in the past years,uncovering genes and biological mechanisms underlying common and complex disorders. However, as massive amounts of diverse functional genomics data accrue, there is a pressing need for newer generations of pathway analysis methods that can utilize multiple layers of high-throughput genomics data. In this review, we provide an intellectual foundation of this powerful analytic strategy, as well as an update of the state-of-the-art in recent method developments. The goal of this review is threefold:(1) introduce the motivation and basic steps of pathway analysis for genome-wide genetic variation data;(2) review the merits and the shortcomings of classic and newly emerging integrative pathway analysis tools; and(3)discuss remaining challenges and future directions for further method developments.  相似文献   

18.
《BIOSILICO》2003,1(3):89-96
The function(s) of a novel gene or gene product can be inferred by associating the gene or gene product with those whose functions are known. It is now common practice to associate two genes if they have similar sequences. In recent years, computational methods have been developed that associate genes on the basis of features beyond similarity, using a variety of biological data beyond single-gene sequences. This review describes several promising methods that associate genes or gene products. These associative methods employ similarity of sequences and structures, features from whole-genome analysis, co-expression patterns from microarray and EST data, interacting properties from proteomic data, and links from literature mining. Finally, we outline issues surrounding the validation and integration of these methods.  相似文献   

19.
Xu FL  Li L 《生理科学进展》2002,33(4):322-326
基因是细胞增殖,分化,成熟等各项生命活动的调控中心,也是许多痢疾发生,发展和转归的决定性因素。基因表达的变化必然导致细胞,组织,器官乃至整个机体的各种异常。包括创伤在内的各种内外刺激,都可不同程度地引起基因表达的变化,最终妨碍机体健康。随着生物信息学的逐渐兴起和分子生物学的不断发展并向其他学科的逐渐渗透,业已建立起一系列研究基因表达变化的切实可行的技术手段(即“基因表达差异分析技术”,如DNA微阵列),对捕获基因表达的种种变化具有重要价值。这些技术已经在肿瘤及其他疾病的研究中得到广泛应用,近几年也逐渐进入创伤研究领域,在一定程度上推动了创伤研究的发展。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号