首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Datasets resulting from metabolomics or metabolic profiling experiments are becoming increasingly complex. Such datasets may contain underlying factors, such as time (time-resolved or longitudinal measurements), doses or combinations thereof. Currently used biostatistics methods do not take the structure of such complex datasets into account. However, incorporating this structure into the data analysis is important for understanding the biological information in these datasets. RESULTS: We describe ASCA, a new method that can deal with complex multivariate datasets containing an underlying experimental design, such as metabolomics datasets. It is a direct generalization of analysis of variance (ANOVA) for univariate data to the multivariate case. The method allows for easy interpretation of the variation induced by the different factors of the design. The method is illustrated with a dataset from a metabolomics experiment with time and dose factors.  相似文献   

2.
3.
4.
Exciting funding initiatives are emerging in Europe and the US for metabolomics data production, storage, dissemination and analysis. This is based on a rich ecosystem of resources around the world, which has been build during the past ten years, including but not limited to resources such as MassBank in Japan and the Human Metabolome Database in Canada. Now, the European Bioinformatics Institute has launched MetaboLights, a database for metabolomics experiments and the associated metadata (http://www.ebi.ac.uk/metabolights). It is the first comprehensive, cross-species, cross-platform metabolomics database maintained by one of the major open access data providers in molecular biology. In October, the European COSMOS consortium will start its work on Metabolomics data standardization, publication and dissemination workflows. The NIH in the US is establishing 6?C8 metabolomics services cores as well as a national metabolomics repository. This communication reports about MetaboLights as a new resource for Metabolomics research, summarises the related developments and outlines how they may consolidate the knowledge management in this third large omics field next to proteomics and genomics.  相似文献   

5.
CressExpress is a user-friendly, online, coexpression analysis tool for Arabidopsis (Arabidopsis thaliana) microarray expression data that computes patterns of correlated expression between user-entered query genes and the rest of the genes in the genome. Unlike other coexpression tools, CressExpress allows characterization of tissue-specific coexpression networks through user-driven filtering of input data based on sample tissue type. CressExpress also performs pathway-level coexpression analysis on each set of query genes, identifying and ranking genes based on their common connections with two or more query genes. This allows identification of novel candidates for involvement in common processes and functions represented by the query group. Users launch experiments using an easy-to-use Web-based interface and then receive the full complement of results, along with a record of tool settings and parameters, via an e-mail link to the CressExpress Web site. Data sets featured in CressExpress are strictly versioned and include expression data from MAS5, GCRMA, and RMA array processing algorithms. To demonstrate applications for CressExpress, we present coexpression analyses of cellulose synthase genes, indolic glucosinolate biosynthesis, and flowering. We show that subselecting sample types produces a richer network for genes involved in flowering in Arabidopsis. CressExpress provides direct access to expression values via an easy-to-use URL-based Web service, allowing users to determine quickly if their query genes are coexpressed with each other and likely to yield informative pathway-level coexpression results. The tool is available at http://www.cressexpress.org.  相似文献   

6.
In many metabolomics studies, NMR spectra are divided into bins of fixed width. This spectral quantification technique, known as uniform binning, is used to reduce the number of variables for pattern recognition techniques and to mitigate effects from variations in peak positions; however, shifts in peaks near the boundaries can cause dramatic quantitative changes in adjacent bins due to non-overlapping boundaries. Here we describe a new Gaussian binning method that incorporates overlapping bins to minimize these effects. A Gaussian kernel weights the signal contribution relative to distance from bin center, and the overlap between bins is controlled by the kernel standard deviation. Sensitivity to peak shift was assessed for a series of test spectra where the offset frequency was incremented in 0.5 Hz steps. For a 4 Hz shift within a bin width of 24 Hz, the error for uniform binning increased by 150%, while the error for Gaussian binning increased by 50%. Further, using a urinary metabolomics data set (from a toxicity study) and principal component analysis (PCA), we showed that the information content in the quantified features was equivalent for Gaussian and uniform binning methods. The separation between groups in the PCA scores plot, measured by the J 2 quality metric, is as good or better for Gaussian binning versus uniform binning. The Gaussian method is shown to be robust in regards to peak shift, while still retaining the information needed by classification and multivariate statistical techniques for NMR-metabolomics data.  相似文献   

7.
Smilde et al. Bioinformatics (2005), 21(13); 3043–3048 The above paper by Smilde et al. inappropriately quotes results  相似文献   

8.

Background  

Extracting and visualizing of protein-protein interaction (PPI) from text literatures are a meaningful topic in protein science. It assists the identification of interactions among proteins. There is a lack of tools to extract PPI, visualize and classify the results.  相似文献   

9.

Background  

Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets.  相似文献   

10.

Introduction

In metabolomics studies, unwanted variation inevitably arises from various sources. Normalization, that is the removal of unwanted variation, is an essential step in the statistical analysis of metabolomics data. However, metabolomics normalization is often considered an imprecise science due to the diverse sources of variation and the availability of a number of alternative strategies that may be implemented.

Objectives

We highlight the need for comparative evaluation of different normalization methods and present software strategies to help ease this task for both data-oriented and biological researchers.

Methods

We present NormalizeMets—a joint graphical user interface within the familiar Microsoft Excel and freely-available R software for comparative evaluation of different normalization methods. The NormalizeMets R package along with the vignette describing the workflow can be downloaded from https://cran.r-project.org/web/packages/NormalizeMets/. The Excel Interface and the Excel user guide are available on https://metabolomicstats.github.io/ExNormalizeMets.

Results

NormalizeMets allows for comparative evaluation of normalization methods using criteria that depend on the given dataset and the ultimate research question. Hence it guides researchers to assess, select and implement a suitable normalization method using either the familiar Microsoft Excel and/or freely-available R software. In addition, the package can be used for visualisation of metabolomics data using interactive graphical displays and to obtain end statistical results for clustering, classification, biomarker identification adjusting for confounding variables, and correlation analysis.

Conclusion

NormalizeMets is designed for comparative evaluation of normalization methods, and can also be used to obtain end statistical results. The use of freely-available R software offers an attractive proposition for programming-oriented researchers, and the Excel interface offers a familiar alternative to most biological researchers. The package handles the data locally in the user’s own computer allowing for reproducible code to be stored locally.
  相似文献   

11.
One of the unsolved paradigms in molecular biology is the protein folding problem. In recent years, with the identification of several diseases as protein folding disorders and with the explosion of genome information and the need for efficient ways to predict protein structure, protein folding became a central issue in molecular sciences research. Using molecular dynamics unfolding simulations of an amyloidogenic protein--transthyretin--as an example, we put forward a series of ideas on how simulations of this type may be used to infer rules and unfolding behavior in amyloidogenic proteins, and to extrapolate rules for protein folding in different structural classes of proteins. These, in turn, could help in the development of protein structure prediction methods. The need to analyse different proteins and to run multiple simulations creates a huge amount of data which has to be stored, managed, analyzed and shared (database and Grid technology; data mining). Once the data is captured, the next challenge is to find meaningful patterns (associations, correlations, clusters, rules, relationships) among molecular properties, or their relative importance at different stages of the folding or unfolding processes. This clearly puts new and interesting challenges to the bioinformatics community.  相似文献   

12.
One of the central problems in bioinformatics is data retrieval and integration. The existing biological databases are geographically distributed across the Internet, complex and heterogeneous in data types and data structures, and constantly changing. With the current rapid growth of biomedical data, the challenge is how large volumes of data retrieved from multiple databases can be transformed and integrated automatically and flexibly. This article describes a powerful new tool, the Kleisli system, for complex queries across multiple databases and data integration.  相似文献   

13.
14.

Background  

Molecular experiments using multiplex strategies such as cDNA microarrays or proteomic approaches generate large datasets requiring biological interpretation. Text based data mining tools have recently been developed to query large biological datasets of this type of data. PubMatrix is a web-based tool that allows simple text based mining of the NCBI literature search service PubMed using any two lists of keywords terms, resulting in a frequency matrix of term co-occurrence.  相似文献   

15.
16.
The rapid elevation in rat brain temperature achieveable with focused beam microwave irradiation (FBMI) leads to a permanent inactivation of enzymes, thereby minimizing enzyme-dependent post-mortem metabolic changes. An additional characteristic of FBMI is that the NMR properties of the tissue are close to those of the in vivo condition and remain so for at least 12 h. These features create an opportunity to develop magnetic resonance spectroscopy and imaging on microwave-irradiated samples into a technique with a resolution, coverage and sensitivity superior to any experiment performed directly in vivo . Furthermore, when combined with pre-FBMI infusion of 13C-labeled substrates, like [1-13C]-glucose, the technique can generate maps of metabolic fluxes, like the tricarboxylic acid and glutamate-glutamine neurotransmitter cycle fluxes at an unprecedented spatial resolution.  相似文献   

17.
To take full advantage of the power of functional genomics technologies and in particular those for metabolomics, both the analytical approach and the strategy chosen for data analysis need to be as unbiased and comprehensive as possible. Existing approaches to analyze metabolomic data still do not allow a fast and unbiased comparative analysis of the metabolic composition of the hundreds of genotypes that are often the target of modern investigations. We have now developed a novel strategy to analyze such metabolomic data. This approach consists of (1) full mass spectral alignment of gas chromatography (GC)-mass spectrometry (MS) metabolic profiles using the MetAlign software package, (2) followed by multivariate comparative analysis of metabolic phenotypes at the level of individual molecular fragments, and (3) multivariate mass spectral reconstruction, a method allowing metabolite discrimination, recognition, and identification. This approach has allowed a fast and unbiased comparative multivariate analysis of the volatile metabolite composition of ripe fruits of 94 tomato (Lycopersicon esculentum Mill.) genotypes, based on intensity patterns of >20,000 individual molecular fragments throughout 198 GC-MS datasets. Variation in metabolite composition, both between- and within-fruit types, was found and the discriminative metabolites were revealed. In the entire genotype set, a total of 322 different compounds could be distinguished using multivariate mass spectral reconstruction. A hierarchical cluster analysis of these metabolites resulted in clustering of structurally related metabolites derived from the same biochemical precursors. The approach chosen will further enhance the comprehensiveness of GC-MS-based metabolomics approaches and will therefore prove a useful addition to nontargeted functional genomics research.  相似文献   

18.

Principal component analysis (PCA) is probably one of the most used methods for exploratory data analysis. However, it may not be always effective when there are multiple influential factors. In this paper, the use of multiblock PCA for analysing such types of data is demonstrated through a real metabolomics study combined with a series of data simulating two underlying influential factors with different types of interactions based on 2 × 2 experiment designs. The performance of multiblock PCA is compared with those of PCA and also ANOVA-PCA which is another PCA extension developed to solve similar problems. The results demonstrate that multiblock PCA is highly efficient at analysing such types of data which contain multiple influential factors. These models give the most comprehensive view of data compared to the other two methods. The combination of super scores and block scores shows not only the general trends of changing caused by each of the influential factors but also the subtle changes within each combination of the factors and their levels. It is also highly resistant to the addition of ‘irrelevant’ competing information and the first PC remains the most discriminant one which neither of the other two methods was able to do. The reason of such property was demonstrated by employing a 2 × 3 experiment designs. Finally, the validity of the results shown by the multiblock PCA was tested using permutation tests and the results suggested that the inherit risk of over-fitting of this type of approach is low.

  相似文献   

19.
DAGchainer: a tool for mining segmental genome duplications and synteny   总被引:8,自引:0,他引:8  
SUMMARY: Given the positions of protein-coding genes along genomic sequence and probability values for protein alignments between genes, DAGchainer identifies chains of gene pairs sharing conserved order between genomic regions, by identifying paths through a directed acyclic graph (DAG). These chains of collinear gene pairs can represent segmentally duplicated regions and genes within a single genome or syntenic regions between related genomes. Automated mining of the Arabidopsis genome for segmental duplications illustrates the use of DAGchainer.  相似文献   

20.
Principal component analysis (PCA) is probably one of the most used methods for exploratory data analysis. However, it may not be always effective when there are multiple influential factors. In this paper, the use of multiblock PCA for analysing such types of data is demonstrated through a real metabolomics study combined with a series of data simulating two underlying influential factors with different types of interactions based on 2 × 2 experiment designs. The performance of multiblock PCA is compared with those of PCA and also ANOVA-PCA which is another PCA extension developed to solve similar problems. The results demonstrate that multiblock PCA is highly efficient at analysing such types of data which contain multiple influential factors. These models give the most comprehensive view of data compared to the other two methods. The combination of super scores and block scores shows not only the general trends of changing caused by each of the influential factors but also the subtle changes within each combination of the factors and their levels. It is also highly resistant to the addition of ‘irrelevant’ competing information and the first PC remains the most discriminant one which neither of the other two methods was able to do. The reason of such property was demonstrated by employing a 2 × 3 experiment designs. Finally, the validity of the results shown by the multiblock PCA was tested using permutation tests and the results suggested that the inherit risk of over-fitting of this type of approach is low.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号