首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cellular metabolism is most often described and interpreted in terms of the biochemical reactions that make up the metabolic network. Genomics is providing near complete information regarding the genes/gene products participating in cellular metabolism for a growing number of organisms. As the true functional units of metabolic systems are its pathways, the time has arrived to define metabolic pathways in the context of whole-cell metabolism for the analysis of the structural design and capabilities of the metabolic network. In this study, we present the theoretical foundations for the identification of the unique set of systemically independent biochemical pathways, termed extreme pathways, based on system stochiometry and limited thermodynamics. These pathways represent the edges of the steady-state flux cone derived from convex analysis, and they can be used to represent any flux distribution achievable by the metabolic network. An algorithm is presented to determine the set of extreme pathways for a system of any complexity and a classification scheme is introduced for the characterization of these pathways. The property of systemic independence is discussed along with its implications for issues related to metabolic regulation and the evolution of cellular metabolic networks. The underlying pathway structure that is determined from the set of extreme pathways now provides us with the ability to analyse, interpret, and perhaps predict metabolic function from a pathway-based perspective in addition to the traditional reaction-based perspective. The algorithm and classification scheme developed can be used to describe the pathway structure in annotated genomes to explore the capabilities of an organism.  相似文献   

2.

Background  

In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research.  相似文献   

3.
New metabolic profiling technologies provide data on a wider range of metabolites than traditional targeted approaches. Metabolomic technologies currently facilitate acquisition of multivariate metabolic data using diverse, mostly hyphenated, chromatographic detection systems, such as GC-MS or liquid chromatography coupled to mass spectrometry, Fourier-transformed infrared spectroscopy or NMR-based methods. Analysis of the resulting data can be performed through a combination of non-supervised and supervised statistical methods, such as independent component analysis and analysis of variance, respectively. These methods reduce the complex data sets to information, which is relevant for the discovery of metabolic markers or for hypothesis-driven, pathway-based analysis. Plant responses to salinity involve changes in the activity of genes and proteins, which invariably lead to changes in plant metabolism. Here, we highlight a selection of recent publications in the salt stress field, and use gas chromatography time-of-flight mass spectrometry profiles of polar fractions from the plant models, Arabidopsis thaliana, Lotus japonicus and Oryza sativa to demonstrate the power of metabolite profiling. We present evidence for conserved and divergent metabolic responses among these three species and conclude that a change in the balance between amino acids and organic acids may be a conserved metabolic response of plants to salt stress.  相似文献   

4.
The increasing availability of large metabolomics datasets enhances the need for computational methodologies that can organize the data in a way that can lead to the inference of meaningful relationships. Knowledge of the metabolic state of a cell and how it responds to various stimuli and extracellular conditions can offer significant insight in the regulatory functions and how to manipulate them. Constraint based methods, such as Flux Balance Analysis (FBA) and Thermodynamics-based flux analysis (TFA), are commonly used to estimate the flow of metabolites through genome-wide metabolic networks, making it possible to identify the ranges of flux values that are consistent with the studied physiological and thermodynamic conditions. However, unless key intracellular fluxes and metabolite concentrations are known, constraint-based models lead to underdetermined problem formulations. This lack of information propagates as uncertainty in the estimation of fluxes and basic reaction properties such as the determination of reaction directionalities. Therefore, knowledge of which metabolites, if measured, would contribute the most to reducing this uncertainty can significantly improve our ability to define the internal state of the cell. In the present work we combine constraint based modeling, Design of Experiments (DoE) and Global Sensitivity Analysis (GSA) into the Thermodynamics-based Metabolite Sensitivity Analysis (TMSA) method. TMSA ranks metabolites comprising a metabolic network based on their ability to constrain the gamut of possible solutions to a limited, thermodynamically consistent set of internal states. TMSA is modular and can be applied to a single reaction, a metabolic pathway or an entire metabolic network. This is, to our knowledge, the first attempt to use metabolic modeling in order to provide a significance ranking of metabolites to guide experimental measurements.  相似文献   

5.

Motivation

When we were asked for help with high-level microarray data analysis (on Affymetrix HGU-133A microarray), we faced the problem of selecting an appropriate method. We wanted to select a method that would yield "the best result" (detected as many "really" differentially expressed genes (DEGs) as possible, without false positives and false negatives). However, life scientists could not help us – they use their "favorite" method without special argumentation. We also did not find any norm or recommendation. Therefore, we decided to examine it for our own purpose. We considered whether the results obtained using different methods of high-level microarray data analyses – Significant Analysis of Microarrays, Rank Products, Bland-Altman, Mann-Whitney test, T test and the Linear Models for Microarray Data – would be in agreement. Initially, we conducted a comparative analysis of the results on eight real data sets from microarray experiments (from the Array Express database). The results were surprising. On the same array set, the set of DEGs by different methods were significantly different. We also applied the methods to artificial data sets and determined some measures that allow the preparation of the overall scoring of tested methods for future recommendation.

Results

We found a very low level concordance of results from tested methods on real array sets. The number of common DEGs (detected by all six methods on fixed array sets, checked on eight array sets) ranged from 6 to 433 (22,283 total array readings). Results on artificial data sets were better than those on the real data. However, they were not fully satisfying. We scored tested methods on accuracy, recall, precision, f-measure and Matthews correlation coefficient. Based on the overall scoring, the best methods were SAM and LIMMA. We also found TT to be acceptable. The worst scoring was MW. Based on our study, we recommend: 1. Carefully taking into account the need for study when choosing a method, 2. Making high-level analysis with more than one method and then only taking the genes that are common to all methods (which seems to be reasonable) and 3. Being very careful (while summarizing facts) about sets of differentially expressed genes: different methods discover different sets of DEGs.  相似文献   

6.

Background  

Gene set analysis (GSA) is a widely used strategy for gene expression data analysis based on pathway knowledge. GSA focuses on sets of related genes and has established major advantages over individual gene analyses, including greater robustness, sensitivity and biological relevance. However, previous GSA methods have limited usage as they cannot handle datasets of different sample sizes or experimental designs.  相似文献   

7.
Schizophrenia is a complex genetic disorder. Gene set-based analytic (GSA) methods have been widely applied for exploratory analyses of large, high-throughput datasets, but less commonly employed for biological hypothesis testing. Our primary hypothesis is that variation in ion channel genes contribute to the genetic susceptibility to schizophrenia. We applied Exploratory Visual Analysis (EVA), one GSA application, to analyze European-American (EA) and African-American (AA) schizophrenia genome-wide association study datasets for statistical enrichment of ion channel gene sets, comparing GSA results derived under three SNP-to-gene mapping strategies: (1) GENIC; (2) 500-Kb; (3) 2.5-Mb and three complimentary SNP-to-gene statistical reduction methods: (1) minimum p value (pMIN); (2) a novel method, proportion of SNPs per Gene with p values below a pre-defined α-threshold (PROP); and (3) the truncated product method (TPM). In the EA analyses, ion channel gene set(s) were enriched under all mapping and statistical approaches. In the AA analysis, ion channel gene set(s) were significantly enriched under pMIN for all mapping strategies and under PROP for broader mapping strategies. Less extensive enrichment in the AA sample may reflect true ethnic differences in susceptibility, sampling or case ascertainment differences, or higher dimensionality relative to sample size of the AA data. More consistent findings under broader mapping strategies may reflect enhanced power due to increased SNP inclusion, enhanced capture of effects over extended haplotypes or significant contributions from regulatory regions. While extensive pMIN findings may reflect gene size bias, the extent and significance of PROP and TPM findings suggest that common variation at ion channel genes may capture some of the heritability of schizophrenia.  相似文献   

8.
Here we present the Coon OMSSA Proteomic Analysis Software Suite (COMPASS): a free and open-source software pipeline for high-throughput analysis of proteomics data, designed around the Open Mass Spectrometry Search Algorithm. We detail a synergistic set of tools for protein database generation, spectral reduction, peptide false discovery rate analysis, peptide quantitation via isobaric labeling, protein parsimony and protein false discovery rate analysis, and protein quantitation. We strive for maximum ease of use, utilizing graphical user interfaces and working with data files in the original instrument vendor format. Results are stored in plain text comma-separated value files, which are easy to view and manipulate with a text editor or spreadsheet program. We illustrate the operation and efficacy of COMPASS through the use of two LC-MS/MS data sets. The first is a data set of a highly annotated mixture of standard proteins and manually validated contaminants that exhibits the identification workflow. The second is a data set of yeast peptides, labeled with isobaric stable isotope tags and mixed in known ratios, to demonstrate the quantitative workflow. For these two data sets, COMPASS performs equivalently or better than the current de facto standard, the Trans-Proteomic Pipeline.  相似文献   

9.
Abstract. Random rearrangement of entry order in three data sets often changed ordination and classification results based on Reciprocal Averaging. Results varied with the data set and method used. Eliminating infrequently occurring species largely reduced, but did not always eliminate, the variability. Overall, results appeared related to data set complexity, the type of data or transformation, and the analysis method used. Detrended Correspondence Analysis had the greatest variability of the ordination methods tested. Results from quantitative data were usually more variable than presence/absence data. Variation in cluster analysis was related to the number of tie values in the similarity matrix. Detailed tests using randomization of entry order of individual data sets with each of the programs to be used are needed to individually assess the effects on the results.; Keywords :; Cluster analysis; DECORANA; Ecological group; Entry order; Environmental gradient; TWINSPAN  相似文献   

10.
MOTIVATION: Experimental gene expression data sets, such as those generated by microarray or gene chip experiments, typically have significant noise and complicated interconnectivities that make understanding even simple regulatory patterns difficult. Given these complications, characterizing the effectiveness of different analysis techniques to uncover network groups and structures remains a challenge. Generating simulated expression patterns with known biological features of expression complexity, diversity and interconnectivities provides a more controlled means of investigating the appropriateness of different analysis methods. A simulation-based approach can systematically evaluate different gene expression analysis techniques and provide a basis for improved methods in dynamic metabolic network reconstruction. RESULTS: We have developed an on-line simulator, called eXPatGen, to generate dynamic gene expression patterns typical of microarray experiments. eXPatGen provides a quantitative network structure to represent key biological features, including the induction, repression, and cascade regulation of messenger RNA (mRNA). The simulation is modular such that the expression model can be replaced with other representations, depending on the level of biological detail required by the user. Two example gene networks, of 25 and 100 genes respectively, were simulated. Two standard analysis techniques, clustering and PCA analysis, were performed on the resulting expression patterns in order to demonstrate how the simulator might be used to evaluate different analysis methods and provide experimental guidance for biological studies of gene expression. AVAILABILITY: http://www.che.udel.edu/eXPatGen/  相似文献   

11.
Identifying differential features between conditions is a popular approach to understanding molecular features and their mechanisms underlying a biological process of particular interest. Although many tests for identifying differential expression of gene or gene sets have been proposed, there was limited success in developing methods for differential interactions of genes between conditions because of its computational complexity. We present a method for Evaluation of Dependency DifferentialitY (EDDY), which is a statistical test for differential dependencies of a set of genes between two conditions. Unlike previous methods focused on differential expression of individual genes or correlation changes of individual gene–gene interactions, EDDY compares two conditions by evaluating the probability distributions of dependency networks from genes. The method has been evaluated and compared with other methods through simulation studies, and application to glioblastoma multiforme data resulted in informative cancer and glioblastoma multiforme subtype-related findings. The comparison with Gene Set Enrichment Analysis, a differential expression-based method, revealed that EDDY identifies the gene sets that are complementary to those identified by Gene Set Enrichment Analysis. EDDY also showed much lower false positives than Gene Set Co-expression Analysis, a method based on correlation changes of individual gene–gene interactions, thus providing more informative results. The Java implementation of the algorithm is freely available to noncommercial users. Download from: http://biocomputing.tgen.org/software/EDDY.  相似文献   

12.

Background

Gene Set Analysis (GSA) identifies differential expression gene sets amid the different phenotypes. The results of published papers in this filed are inconsistent and there is no consensus on the best method. In this paper two new methods, in comparison to the previous ones, are introduced for GSA.

Methods

The MMGSA and MRGSA methods based on multivariate nonparametric techniques were presented. The implementation of five GSA methods (Hotelling's T2, Globaltest, Abs_Cat, Med_Cat and Rs_Cat) and the novel methods to detect differential gene expression between phenotypes were compared using simulated and real microarray data sets.

Results

In a real dataset, the results showed that the powers of MMGSA and MRGSA were as well as Globaltest and Tsai. The MRGSA method has not a good performance in the simulation dataset.

Conclusions

The Globaltest method is the best method in the real or simulation datasets. The performance of MMGSA in simulation dataset is good in small-size gene sets. The GLS methods are not good in the simulated data, except the Med_Cat method in large-size gene sets.  相似文献   

13.
Contemporary drug discovery and development (DDD) is dominated by a molecular target-based paradigm. Molecular targets that are potentially important in disease are physically characterized; chemical entities that interact with these targets are identified by ex vivo high-throughput screening assays, and optimized lead compounds enter testing as drugs. Contrary to highly publicized claims, the ascendance of this approach has in fact resulted in the lowest rate of new drug approvals in a generation. The primary explanation for low rates of new drugs is attrition, or the failure of candidates identified by molecular target-based methods to advance successfully through the DDD process. In this essay, I advance the thesis that this failure was predictable, based on modern principles of metabolic control that have emerged and been applied most forcefully in the field of metabolic engineering. These principles, such as the robustness of flux distributions, address connectivity relationships in complex metabolic networks and make it unlikely a priori that modulating most molecular targets will have predictable, beneficial functional outcomes. These same principles also suggest, however, that unexpected therapeutic actions will be common for agents that have any effect (i.e., that complexity can be exploited therapeutically). A potential operational solution (pathway-based DDD), based on observability rather than predictability, is described, focusing on emergent properties of key metabolic pathways in vivo. Recent examples of pathway-based DDD are described. In summary, the molecular target-based DDD paradigm is built on a na?ve and misleading model of biologic control and is not heuristically adequate for advancing the mission of modern therapeutics. New approaches that take account of and are built on principles described by metabolic engineers are needed for the next generation of DDD.  相似文献   

14.
Constraint-based approaches recently brought new insight into our understanding of metabolism. By making very simple assumptions such as that the system is at steady-state and some reactions are irreversible, and without requiring kinetic parameters, general properties of the system can be derived. A central concept in this methodology is the notion of an elementary mode (EM for short) which represents a minimal functional subsystem. The computation of EMs still forms a limiting step in metabolic studies and several algorithms have been proposed to address this problem leading to increasingly faster methods. However, although a theoretical upper bound on the number of elementary modes that a network may possess has been established, surprisingly, the complexity of this problem has never been systematically studied. In this paper, we give a systematic overview of the complexity of optimisation problems related to modes. We first establish results regarding network consistency. Most consistency problems are easy, i.e., they can be solved in polynomial time. We then establish the complexity of finding and counting elementary modes. We show in particular that finding one elementary mode is easy but that this task becomes hard when a specific EM (i.e. an EM containing some specified reactions) is sought. We then show that counting the number of elementary modes is musical sharpP-complete. We emphasize that the easy problems can be solved using currently existing software packages. We then analyse the complexity of a closely related task which is the computation of so-called minimum reaction cut sets and we show that this problem is hard. We then present two positive results which both allow to avoid computing EMs as a prior to the computation of reaction cuts. The first one is a polynomial approximation algorithm for finding a minimum reaction cut set. The second one is a test for verifying whether a set of reactions constitutes a reaction cut; this test can be readily included in existing algorithms to improve their performance. Finally, we discuss the complexity of other cut-related problems.  相似文献   

15.
We have constructed a perceptron type neural network for E. coli promoter prediction and improved its ability to generalize with a new technique for selecting the sequence features shown during training. We have also reconstructed five previous prediction methods and compared the effectiveness of those methods and our neural network. Surprisingly, the simple statistical method of Mulligan et al. performed the best amongst the previous methods. Our neural network was comparable to Mulligan's method when false positives were kept low and better than Mulligan's method when false negatives were kept low. We also showed the correlation between the prediction rates of neural networks achieved by previous researchers and the information content of their data sets.  相似文献   

16.
17.
MOTIVATION: Because of the complexity of metabolic networks and their regulation, formal modelling is a useful method to improve the understanding of these systems. An essential step in network modelling is to validate the network model. Petri net theory provides algorithms and methods, which can be applied directly to metabolic network modelling and analysis in order to validate the model. The metabolism between sucrose and starch in the potato tuber is of great research interest. Even if the metabolism is one of the best studied in sink organs, it is not yet fully understood. RESULTS: We provide an approach for model validation of metabolic networks using Petri net theory, which we demonstrate for the sucrose breakdown pathway in the potato tuber. We start with hierarchical modelling of the metabolic network as a Petri net and continue with the analysis of qualitative properties of the network. The results characterize the net structure and give insights into the complex net behaviour.  相似文献   

18.
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth''s ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth''s parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth''s parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.  相似文献   

19.
We have previously shown that the metabolism for most efficient cell growth can be realized by a combination of two types of elementary modes. One mode produces biomass while the second mode generates only energy. The identity of the four most efficient biomass and energy pathway pairs changes, depending on the degree of oxygen limitation. The identification of such pathway pairs for different growth conditions offers a pathway-based explanation of maintenance energy generation. For a given growth rate, experimental aerobic glucose consumption rates can be used to estimate the contribution of each pathway type to the overall metabolic flux pattern. All metabolic fluxes are then completely determined by the stoichiometries of involved pathways defining all nutrient consumption and metabolite secretion rates. We present here equations that permit computation of network fluxes on the basis of unique pathways for the case of optimal, glucose-limited Escherichia coli growth under varying levels of oxygen stress. Predicted glucose and oxygen uptake rates and some metabolite secretion rates are in remarkable agreement with experimental observations supporting the validity of the presented approach. The entire most efficient, steady-state, metabolic rate structure is explicitly defined by the developed equations without need for additional computer simulations. The approach should be generally useful for analyzing and interpreting genomic data by predicting concise, pathway-based metabolic rate structures.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号