首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed.  相似文献   

2.
3.
Results of high throughput experiments can be challenging to interpret. Current approaches have relied on bulk processing the set of expression levels, in conjunction with easily obtained external evidence, such as co-occurrence. While such techniques can be used to reason probabilistically, they are not designed to shed light on what any individual gene, or a network of genes acting together, may be doing. Our belief is that today we have the information extraction ability and the computational power to perform more sophisticated analyses that consider the individual situation of each gene. The use of such techniques should lead to qualitatively superior results. The specific aim of this project is to develop computational techniques to generate a small number of biologically meaningful hypotheses based on observed results from high throughput microarray experiments, gene sequences, and next-generation sequences. Through the use of relevant known biomedical knowledge, as represented in published literature and public databases, we can generate meaningful hypotheses that will aide biologists to interpret their experimental data. We are currently developing novel approaches that exploit the rich information encapsulated in biological pathway graphs. Our methods perform a thorough and rigorous analysis of biological pathways, using complex factors such as the topology of the pathway graph and the frequency in which genes appear on different pathways, to provide more meaningful hypotheses to describe the biological phenomena captured by high throughput experiments, when compared to other existing methods that only consider partial information captured by biological pathways.  相似文献   

4.
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.  相似文献   

5.
Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available.  相似文献   

6.
KEGGanim: pathway animations for high-throughput data   总被引:1,自引:0,他引:1  
MOTIVATION: Gene expression analysis with microarrays has become one of the most widely used high-throughput methods for gathering genome-wide functional data. Emerging -omics fields such as proteomics and interactomics introduce new information sources. With the rise of systems biology, researchers need to concentrate on entire complex pathways that guide individual genes and related processes. Bioinformatics methods are needed to link the existing knowledge about pathways with the growing amounts of experimental data. RESULTS: We present KEGGanim, a novel web-based tool for visualizing experimental data in biological pathways. KEGGanim produces animations and images of KEGG pathways using public or user uploaded high-throughput data. Pathway members are coloured according to experimental measurements, and animated over experimental conditions. KEGGanim visualization highlights dynamic changes over conditions and allows the user to observe important modules and key genes that influence the pathway. The simple user interface of KEGGanim provides options for filtering genes and experimental conditions. KEGGanim may be used with public or private data for 14 organisms with a large collection of public microarray data readily available. Most common gene and protein identifiers and microarray probesets are accepted for visualization input. AVAILABILITY: http://biit.cs.ut.ee/KEGGanim/.  相似文献   

7.
Pathway analysis using random forests classification and regression   总被引:3,自引:0,他引:3  
MOTIVATION: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers. RESULTS: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data. AVAILABILITY: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm.  相似文献   

8.

Background  

To date, many genomic and pathway-related tools and databases have been developed to analyze microarray data. In published web-based applications to date, however, complex pathways have been displayed with static image files that may not be up-to-date or are time-consuming to rebuild. In addition, gene expression analyses focus on individual probes and genes with little or no consideration of pathways. These approaches reveal little information about pathways that are key to a full understanding of the building blocks of biological systems. Therefore, there is a need to provide useful tools that can generate pathways without manually building images and allow gene expression data to be integrated and analyzed at pathway levels for such experimental organisms as Arabidopsis.  相似文献   

9.
Goh WW  Lee YH  Chung M  Wong L 《Proteomics》2012,12(4-5):550-563
Proteomics provides important information--that may not be inferable from indirect sources such as RNA or DNA--on key players in biological systems or disease states. However, it suffers from coverage and consistency problems. The advent of network-based analysis methods can help in overcoming these problems but requires careful application and interpretation. This review considers briefly current trends in proteomics technologies and understanding the causes of critical issues that need to be addressed--i.e., incomplete data coverage and inter-sample inconsistency. On the coverage issue, we argue that holistic analysis based on biological networks provides a suitable background on which more robust models and interpretations can be built upon; and we introduce some recently developed approaches. On consistency, group-based approaches based on identified clusters, as well as on properly integrated pathway databases, are particularly useful. Despite that protein interactions and pathway networks are still largely incomplete, given proper quality checks, applications and reasonably sized data sets, they yield valuable insights that greatly complement data generated from quantitative proteomics.  相似文献   

10.
Glioblastoma multiforme (GBM) is the most malignant of all the brain tumors with very low median survival time of one year, as per Central Brain Tumor Registry of the USA, 2001. Efforts are ongoing to understand this disease pathogenesis in complete details. Global gene expression changes in GBM pathogenesis have been studied by several groups using microarray technology (e.g. Carro et al., 2010). One of the many approaches to ‘understand the control mechanisms underlying the observed changes in the activity of a biological process’ (Cline et al., 2007) is integration of gene expression and protein–protein interactions (PPI) datasets. Among several examples, aberrant activation of Wnt/β-catenin signaling pathway as well as sonic hedgehog (SHH) signaling pathway is reported in GBMs (Klaus & Birchmeier, 2008). Further, these two pathways are also involved in proliferation and clonogenicity of glioma cancer stem cells (Li et al., 2009), which are thought to play a role in glioma initiation, proliferation, and invasion, and are one of the important points of intervention. Hedgehog–Gli1 signaling is also found to regulate the expression of stemness genes. In this paper, analyses of the relationship between the significant differential expression of these and other genes and the connectivity as well as topological features of a PPI network would be discussed. This way, genes potentially overlooked when relying solely on expression profiles may be identified which can be biologically relevant as possible drug target/s or disease biomarker/s.  相似文献   

11.
Enormous amounts of data result from genome sequencing projects and new experimental methods. Within this tremendous amount of genomic data 30-40 per cent of the genes being identified in an organism remain unknown in terms of their biological function. As a consequence of this lack of information the overall schema of all the biological functions occurring in a specific organism cannot be properly represented. To understand the functional properties of the genomic data more experimental data must be collected. A pathway database is an effort to handle the current knowledge of biochemical pathways and in addition can be used for interpretation of sequence data. Some of the existing pathway databases can be interpreted as detailed functional annotations of genomes because they are tightly integrated with genomic information. However, experimental data are often lacking in these databases. This paper summarises a list of pathway databases and some of their corresponding biological databases, and also focuses on information about the content and the structure of these databases, the organisation of the data and the reliability of stored information from a biological point of view. Moreover, information about the representation of the pathway data and tools to work with the data are given. Advantages and disadvantages of the analysed databases are pointed out, and an overview to biological scientists on how to use these pathway databases is given.  相似文献   

12.
The functioning of even a simple biological system is much more complicated than the sum of its genes, proteins and metabolites. A premise of systems biology is that molecular profiling will facilitate the discovery and characterization of important disease pathways. However, as multiple levels of effector pathway regulation appear to be the norm rather than the exception, a significant challenge presented by high-throughput genomics and proteomics technologies is the extraction of the biological implications of complex data. Thus, integration of heterogeneous types of data generated from diverse global technology platforms represents the first challenge in developing the necessary foundational databases needed for predictive modelling of cell and tissue responses. Given the apparent difficulty in defining the correspondence between gene expression and protein abundance measured in several systems to date, how do we make sense of these data and design the next experiment? In this review, we highlight current approaches and challenges associated with integration and analysis of heterogeneous data sets, focusing on global analysis obtained from high-throughput technologies.  相似文献   

13.
A switching mechanism in gene expression, where two genes are positively correlated in one condition and negatively correlated in the other condition, is a key to elucidating complex biological systems. There already exist methods for detecting switching mechanisms from microarrays. However, current approaches have problems under three real cases: outliers, expression values with a very small range and a small number of examples. ROS-DET overcomes these three problems, keeping the computational complexity of current approaches. We demonstrated that ROS-DET outperformed existing methods, under that all these three situations are considered. Furthermore, for each of the top 10 pairs ranked by ROS-DET, we attempted to identify a pathway, i.e. consecutive biological phenomena, being related with the corresponding two genes by checking the biological literature. In 8 out of the 10 pairs, we found two parallel pathways, one of the two genes being in each of the two pathways and two pathways coming to (or starting with) the same gene. This indicates that two parallel pathways would be cooperatively used under one experimental condition, corresponding to the positive correlation, and the two pathways might be alternatively used under the other condition, corresponding to the negative correlation. ROS-DET is available from http://www.bic.kyoto-u.ac.jp/pathway/kayano/ros-det.htm.  相似文献   

14.
Computational analysis of gene expression data from microarrays has been useful for medical diagnosis and prognosis. The ability to analyze such data at the level of biological modules, rather than individual genes, has been recognized as important for improving our understanding of disease-related pathways. It has proved difficult, however, to infer pathways from microarray data by deriving modules of multiple synergistically interrelated genes, rather than individual genes. Here we propose a systems-based approach called Entropy Minimization and Boolean Parsimony (EMBP) that identifies, directly from gene expression data, modules of genes that are jointly associated with disease. Furthermore, the technique provides insight into the underlying biomolecular logic by inferring a logic function connecting the joint expression levels in a gene module with the outcome of disease. Coupled with biological knowledge, this information can be useful for identifying disease-related pathways, suggesting potential therapeutic approaches for interfering with the functions of such pathways. We present an example providing such gene modules associated with prostate cancer from publicly available gene expression data, and we successfully validate the results on additional independently derived data. Our results indicate a link between prostate cancer and cellular damage from oxidative stress combined with inhibition of apoptotic mechanisms normally triggered by such damage.  相似文献   

15.
本研究对非小细胞肺癌(non-small cell lung carcinoma,NSCLC)基因表达数据进行差异表达分析,并与蛋白质相互作用网络(PPIN)数据进行整合,进一步利用Heinz搜索算法识别NSCLC相关的基因功能模块,并对模块中的基因进行功能(GO term)和通路(KEGG)富集分析,旨在探究肺癌发病分子机制。蛋白互作网络分析得到一个包含96个基因和117个相互作用的功能模块,以及8个对NSCLC的发生和发展起到关键作用候选基因标志物。富集分析结果表明,这些基因主要富集于基因转录催化及染色质调控等生物学过程,并在基础转录因子、黏着连接、细胞周期、Wnt信号通路及HTLV-Ⅰ感染等生物学通路中发挥重要作用。本研究对非小细胞肺癌相关的基因和生物学通路进行预测,可用于肺癌的早期诊断和早期治疗,以降低肺癌死亡率。  相似文献   

16.
复杂疾病的发生发展与机体内生物学通路的功能紊乱有密切联系,从高通量数据出发,利用计算机辅助方法来研究疾病与通路间的关系具有重要意义.本文提出了一个新的基于网络的全局性通路识别方法.该方法利用蛋白质互作信息和通路的基因集组成信息构建复杂的蛋白质-通路网.然后,基于表达谱数据,通过随机游走算法从全局层面优化疾病风险通路.最终,通过扰动方式识别统计学显著的风险通路.将该网络运用于结肠直肠癌风险通路识别,识别出15个与结肠直肠癌发生与发展过程显著相关的通路.通过与其他通路识别方法(超几何检验,SPIA)相比较,该方法能够更有效识别出疾病相关的风险通路.  相似文献   

17.
Gene expression profiling and protein studies of the type I interferon pathway have revealed important insights into the disease process in adult and juvenile dermatomyositis. The most prominent and consistent feature has been a characteristic whole blood gene signature indicating upregulation of the type I interferon pathway. Upregulation of the type I interferon protein signature has added additional markers of disease activity and insight into the pathogenesis of the disease.  相似文献   

18.
Permanent Atrial fibrillation (pmAF) has largely remained incurable since the existing information for explaining precise mechanisms underlying pmAF is not sufficient. Microarray analysis offers a broader and unbiased approach to identify and predict new biological features of pmAF. By considering the unbalanced sample numbers in most microarray data of case - control, we designed an asymmetric principal component analysis algorithm and applied it to re - analyze differential gene expression data of pmAF patients and control samples for predicting new biological features. Finally, we identified 51 differentially expressed genes using the proposed method, in which 42 differentially expressed genes are new findings compared with two related works on the same data and the existing studies. The enrichment analysis illustrated the reliability of identified differentially expressed genes. Moreover, we predicted three new pmAF – related signaling pathways using the identified differentially expressed genes via the KO-Based Annotation System. Our analysis and the existing studies supported that the predicted signaling pathways may promote the pmAF progression. The results above are worthy to do further experimental studies. This work provides some new insights into molecular features of pmAF. It has also the potentially important implications for improved understanding of the molecular mechanisms of pmAF.  相似文献   

19.
There is great interest in chromosome- and pathway-based techniques for genomics data analysis in the current work in order to understand the mechanism of disease. However, there are few studies addressing the abilities of machine learning methods in incorporating pathway information for analyzing microarray data. In this paper, we identified the characteristic pathways by combining the classification error rates of out-of-bag (OOB) in random forests with pathways information. At each characteristic pathway, the correlation of gene expression was studied and the co-regulated gene patterns in different biological conditions were mined by Mining Attribute Profile (MAP) algorithm. The discovered co-regulated gene patterns were clustered by the average-linkage hierarchical clustering technique. The results showed that the expression of genes at the same characteristic pathway were approximate. Furthermore, two characteristic pathways were discovered to present co-regulated gene patterns in which one contained 108 patterns and the other contained one pattern. The results of cluster analysis showed that the smallest similarity coefficient of clusters was more than 0.623, which indicated that the co-regulated patterns in different biological conditions were more approximate at the same characteristic pathway. The methods discussed in this paper can provide additional insight into the study of microarray data.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号