首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
KEGG spider is a web-based tool for interpretation of experimentally derived gene lists in order to gain understanding of metabolism variations at a genomic level. KEGG spider implements a 'pathway-free' framework that overcomes a major bottleneck of enrichment analyses: it provides global models uniting genes from different metabolic pathways. Analyzing a number of experimentally derived gene lists, we demonstrate that KEGG spider provides deeper insights into metabolism variations in comparison to existing methods.  相似文献   

2.
Most methods for the interpretation of gene expression profiling experiments rely on the categorization of genes, as provided by the Gene Ontology (GO) and pathway databases. Due to the manual curation process, such databases are never up-to-date and tend to be limited in focus and coverage. Automated literature mining tools provide an attractive, alternative approach. We review how they can be employed for the interpretation of gene expression profiling experiments. We illustrate that their comprehensive scope aids the interpretation of data from domains poorly covered by GO or alternative databases, and allows for the linking of gene expression with diseases, drugs, tissues and other types of concepts. A framework for proper statistical evaluation of the associations between gene expression values and literature concepts was lacking and is now implemented in a weighted extension of global test. The weights are the literature association scores and reflect the importance of a gene for the concept of interest. In a direct comparison with classical GO-based gene sets, we show that use of literature-based associations results in the identification of much more specific GO categories. We demonstrate the possibilities for linking of gene expression data to patient survival in breast cancer and the action and metabolism of drugs. Coupling with online literature mining tools ensures transparency and allows further study of the identified associations. Literature mining tools are therefore powerful additions to the toolbox for the interpretation of high-throughput genomics data.  相似文献   

3.
Recent advances in experimental technologies allow for the detection of a complete cell proteome. Proteins that are expressed at a particular cell state or in a particular compartment as well as proteins with differential expression between various cells states are commonly delivered by many proteomics studies. Once a list of proteins is derived, a major challenge is to interpret the identified set of proteins in the biological context. Protein–protein interaction (PPI) data represents abundant information that can be employed for this purpose. However, these data have not yet been fully exploited due to the absence of a methodological framework that can integrate this type of information. Here, we propose to infer a network model from an experimentally identified protein list based on the available information about the topology of the global PPI network. We propose to use a Monte Carlo simulation procedure to compute the statistical significance of the inferred models. The method has been implemented as a freely available web‐based tool, PPI spider ( http://mips.helmholtz‐muenchen.de/proj/ppispider ). To support the practical significance of PPI spider, we collected several hundreds of recently published experimental proteomics studies that reported lists of proteins in various biological contexts. We reanalyzed them using PPI spider and demonstrated that in most cases PPI spider could provide statistically significant hypotheses that are helpful for understanding of the protein list.  相似文献   

4.
Conventional statistical methods for interpreting microarray data require large numbers of replicates in order to provide sufficient levels of sensitivity. We recently described a method for identifying differentially-expressed genes in one-channel microarray data 1. Based on the idea that the variance structure of microarray data can itself be a reliable measure of noise, this method allows statistically sound interpretation of as few as two replicates per treatment condition. Unlike the one-channel array, the two-channel platform simultaneously compares gene expression in two RNA samples. This leads to covariation of the measured signals. Hence, by accounting for covariation in the variance model, we can significantly increase the power of the statistical test. We believe that this approach has the potential to overcome limitations of existing methods. We present here a novel approach for the analysis of microarray data that involves modeling the variance structure of paired expression data in the context of a Bayesian framework. We also describe a novel statistical test that can be used to identify differentially-expressed genes. This method, bivariate microarray analysis (BMA), demonstrates dramatically improved sensitivity over existing approaches. We show that with only two array replicates, it is possible to detect gene expression changes that are at best detected with six array replicates by other methods. Further, we show that combining results from BMA with Gene Ontology annotation yields biologically significant results in a ligand-treated macrophage cell system.  相似文献   

5.
Relative changes in mRNA as well as protein levels induced by sublethal doses of antibiotics on bacteria are measured and results visualised in the context of metabolic pathway diagrams. The mRNA levels present at a given time point after the addition of the antibiotic are measured using microarrays from Affymetrix. Additionally, the relative amount of each protein synthesised during 3 minute intervals sampled at the given times is measured using radio-labelling followed by two-dimensional polyacrylamide gel electrophoresis and the subsequent analysis of the images produced by exposure to a phosphorimager. Metabolic pathway diagrams are both constructed in-house and imported from KEGG (Kyoto Encyclopedia of Genes and Genomes). Both protein and mRNA expression data can be displayed in the pathway diagrams such that the colour of the vectors or enzyme identifiers indicate the relative change in expression level and reproducibility.  相似文献   

6.
7.

Background  

Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal.  相似文献   

8.
What gives an organism the ability to regrow tissues and to recover function where another organism fails is the central problem of regenerative biology. The challenge is to describe the mechanisms of regeneration at the molecular level, delivering detailed insights into the many components that are cross-regulated. In other words, a broad, yet deep dissection of the system-wide network of molecular interactions is needed. Functional genomics has been used to elucidate gene regulatory networks (GRNs) in developing tissues, which, like regeneration, are complex systems. Therefore, we reason that the GRN approach, aided by next generation technologies, can also be applied to study the molecular mechanisms underlying the complex functions of regeneration. We ask what characteristics a model system must have to support a GRN analysis. Our discussion focuses on regeneration in the central nervous system, where loss of function has particularly devastating consequences for an organism. We examine a cohort of cells conserved across all vertebrates, the reticulospinal (RS) neurons, which lend themselves well to experimental manipulations. In the lamprey, a jawless vertebrate, there are giant RS neurons whose large size and ability to regenerate make them particularly suited for a GRN analysis. Adding to their value, a distinct subset of lamprey RS neurons reproducibly fail to regenerate, presenting an opportunity for side-by-side comparison of gene networks that promote or inhibit regeneration. Thus, determining the GRN for regeneration in RS neurons will provide a mechanistic understanding of the fundamental cues that lead to success or failure to regenerate.  相似文献   

9.
10.
Bin Gao  Xu Liu  Hongzhe Li  Yuehua Cui 《Biometrics》2019,75(4):1063-1075
In a living organism, tens of thousands of genes are expressed and interact with each other to achieve necessary cellular functions. Gene regulatory networks contain information on regulatory mechanisms and the functions of gene expressions. Thus, incorporating network structures, discerned either through biological experiments or statistical estimations, could potentially increase the selection and estimation accuracy of genes associated with a phenotype of interest. Here, we considered a gene selection problem using gene expression data and the graphical structures found in gene networks. Because gene expression measurements are intermediate phenotypes between a trait and its associated genes, we adopted an instrumental variable regression approach. We treated genetic variants as instrumental variables to address the endogeneity issue. We proposed a two‐step estimation procedure. In the first step, we applied the LASSO algorithm to estimate the effects of genetic variants on gene expression measurements. In the second step, the projected expression measurements obtained from the first step were treated as input variables. A graph‐constrained regularization method was adopted to improve the efficiency of gene selection and estimation. We theoretically showed the selection consistency of the estimation method and derived the bound of the estimates. Simulation and real data analyses were conducted to demonstrate the effectiveness of our method and to compare it with its counterparts.  相似文献   

11.
The genus Mycobacterium comprises significant pathogenic species that infect both humans and animals. One species within this genus, Mycobacterium tuberculosis, is the primary killer of humans resulting from bacterial infections. Five mycobacterial genomes belonging to four different species (M. tuberculosis, Mycobacterium bovis, Mycobacterium leprae and Mycobacterium avium ssp. paratuberculosis) have been sequenced to date and another 14 mycobacterial genomes are at various stages of completion. A comparative analysis of the gene products of key metabolic pathways revealed that the major differences among these species are in the gene products constituting the cell wall and the gene families encoding the acidic glycine-rich (PE/PPE/PGRS) proteins. Mycobacterium leprae has evolved by retaining a minimal gene set for most of the gene families, whereas M. avium ssp. paratuberculosis has acquired some of the virulence factors by lateral gene transfer.  相似文献   

12.
Geometric interpretation of gene coexpression network analysis   总被引:1,自引:0,他引:1  
THE MERGING OF NETWORK THEORY AND MICROARRAY DATA ANALYSIS TECHNIQUES HAS SPAWNED A NEW FIELD: gene coexpression network analysis. While network methods are increasingly used in biology, the network vocabulary of computational biologists tends to be far more limited than that of, say, social network theorists. Here we review and propose several potentially useful network concepts. We take advantage of the relationship between network theory and the field of microarray data analysis to clarify the meaning of and the relationship among network concepts in gene coexpression networks. Network theory offers a wealth of intuitive concepts for describing the pairwise relationships among genes, which are depicted in cluster trees and heat maps. Conversely, microarray data analysis techniques (singular value decomposition, tests of differential expression) can also be used to address difficult problems in network theory. We describe conditions when a close relationship exists between network analysis and microarray data analysis techniques, and provide a rough dictionary for translating between the two fields. Using the angular interpretation of correlations, we provide a geometric interpretation of network theoretic concepts and derive unexpected relationships among them. We use the singular value decomposition of module expression data to characterize approximately factorizable gene coexpression networks, i.e., adjacency matrices that factor into node specific contributions. High and low level views of coexpression networks allow us to study the relationships among modules and among module genes, respectively. We characterize coexpression networks where hub genes are significant with respect to a microarray sample trait and show that the network concept of intramodular connectivity can be interpreted as a fuzzy measure of module membership. We illustrate our results using human, mouse, and yeast microarray gene expression data. The unification of coexpression network methods with traditional data mining methods can inform the application and development of systems biologic methods.  相似文献   

13.

Background  

The KEGG Pathway database is a valuable collection of metabolic pathway maps. Nevertheless, the production of simulation capable metabolic networks from KEGG Pathway data is a challenging complicated work, regardless the already developed tools for this scope. Originally used for illustration purposes, KEGG Pathways through KGML (KEGG Markup Language) files, can provide complete reaction sets and introduce species versioning, which offers advantages for the scope of cellular metabolism simulation modelling. In this project, KEGGconverter is described, implemented also as a web-based application, which uses as source KGML files, in order to construct integrated pathway SBML models fully functional for simulation purposes.  相似文献   

14.
15.

Background

Centralized silos of genomic data are architecturally easier to initially design, develop and deploy than distributed models. However, as interoperability pains in EHR/EMR, HIE and other collaboration-centric life sciences domains have taught us, the core challenge of networking genomics systems is not in the construction of individual silos, but the interoperability of those deployments in a manner embracing the heterogeneous needs, terms and infrastructure of collaborating parties. This article demonstrates the adaptation of BitTorrent to private collaboration networks in an authenticated, authorized and encrypted manner while retaining the same characteristics of standard BitTorrent.

Results

The BitTorious portal was sucessfully used to manage many concurrent domestic Bittorrent clients across the United States: exchanging genomics data payloads in excess of 500GiB using the uTorrent client software on Linux, OSX and Windows platforms. Individual nodes were sporadically interrupted to verify the resilience of the system to outages of a single client node as well as recovery of nodes resuming operation on intermittent Internet connections.

Conclusions

The authorization-based extension of Bittorrent and accompanying BitTorious reference tracker and user management web portal provide a free, standards-based, general purpose and extensible data distribution system for large ‘omics collaborations.  相似文献   

16.
Functional genomics: learning to think about gene expression data.   总被引:2,自引:0,他引:2  
R Brent 《Current biology : CB》1999,9(9):R338-R341
Three recent studies of gene expression patterns in whole cells provide examples of the inferences one can make from this type of information. They also provide examples of the non-traditional types of reasoning we will need to use to make such inferences.  相似文献   

17.
18.
In this report, a genome-scale reconstruction of Bacillus subtilis metabolism and its iterative development based on the combination of genomic, biochemical, and physiological information and high-throughput phenotyping experiments is presented. The initial reconstruction was converted into an in silico model and expanded in a four-step iterative fashion. First, network gap analysis was used to identify 48 missing reactions that are needed for growth but were not found in the genome annotation. Second, the computed growth rates under aerobic conditions were compared with high-throughput phenotypic screen data, and the initial in silico model could predict the outcomes qualitatively in 140 of 271 cases considered. Detailed analysis of the incorrect predictions resulted in the addition of 75 reactions to the initial reconstruction, and 200 of 271 cases were correctly computed. Third, in silico computations of the growth phenotypes of knock-out strains were found to be consistent with experimental observations in 720 of 766 cases evaluated. Fourth, the integrated analysis of the large-scale substrate utilization and gene essentiality data with the genome-scale metabolic model revealed the requirement of 80 specific enzymes (transport, 53; intracellular reactions, 27) that were not in the genome annotation. Subsequent sequence analysis resulted in the identification of genes that could be putatively assigned to 13 intracellular enzymes. The final reconstruction accounted for 844 open reading frames and consisted of 1020 metabolic reactions and 988 metabolites. Hence, the in silico model can be used to obtain experimentally verifiable hypothesis on the metabolic functions of various genes.  相似文献   

19.
There is great interest in chromosome- and pathway-based techniques for genomics data analysis in the current work in order to understand the mechanism of disease. However, there are few studies addressing the abilities of machine learning methods in incorporating pathway information for analyzing microarray data. In this paper, we identified the characteristic pathways by combining the classification error rates of out-of-bag (OOB) in random forests with pathways information. At each characteristic pathway, the correlation of gene expression was studied and the co-regulated gene patterns in different biological conditions were mined by Mining Attribute Profile (MAP) algorithm. The discovered co-regulated gene patterns were clustered by the average-linkage hierarchical clustering technique. The results showed that the expression of genes at the same characteristic pathway were approximate. Furthermore, two characteristic pathways were discovered to present co-regulated gene patterns in which one contained 108 patterns and the other contained one pattern. The results of cluster analysis showed that the smallest similarity coefficient of clusters was more than 0.623, which indicated that the co-regulated patterns in different biological conditions were more approximate at the same characteristic pathway. The methods discussed in this paper can provide additional insight into the study of microarray data.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号