首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Background

In translational cancer research, gene expression data is collected together with clinical data and genomic data arising from other chip based high throughput technologies. Software tools for the joint analysis of such high dimensional data sets together with clinical data are required.

Results

We have developed an open source software tool which provides interactive visualization capability for the integrated analysis of high-dimensional gene expression data together with associated clinical data, array CGH data and SNP array data. The different data types are organized by a comprehensive data manager. Interactive tools are provided for all graphics: heatmaps, dendrograms, barcharts, histograms, eventcharts and a chromosome browser, which displays genetic variations along the genome. All graphics are dynamic and fully linked so that any object selected in a graphic will be highlighted in all other graphics. For exploratory data analysis the software provides unsupervised data analytics like clustering, seriation algorithms and biclustering algorithms.

Conclusions

The SEURAT software meets the growing needs of researchers to perform joint analysis of gene expression, genomical and clinical data.  相似文献   

2.
There is tremendous scientific interest in the analysis of gene expression data in clinical settings, such as oncology. In this paper, we describe the importance of adjusting for confounders and other prognostic factors in order to select for differentially expressed genes for follow-up validation studies. We develop two approaches to the analysis of microarray data in non-randomized clinical settings. The first is an extension of the current significance analysis of microarray procedures, where other covariates are taken into account. The second is a novel covariate-adjusted regression modelling based on the receiver operating characteristic (ROC) curve for the analysis of gene expression data. The ideas are illustrated using data from a prostate cancer molecular profiling study.  相似文献   

3.
4.
Gene expression data preprocessing   总被引:4,自引:0,他引:4  
We present an interactive web tool for preprocessing microarray gene expression data. It analyses the data, suggests the most appropriate transformations and proceeds with them after user agreement. The normal preprocessing steps include scale transformations, management of missing values, replicate handling, flat pattern filtering and pattern standardization and they are required before performing any pattern analysis. The processed data set can be sent to other pattern analysis tools.  相似文献   

5.
Assessing reliability of gene clusters from gene expression data   总被引:5,自引:0,他引:5  
The rapid development of microarray technologies has raised many challenging problems in experiment design and data analysis. Although many numerical algorithms have been successfully applied to analyze gene expression data, the effects of variations and uncertainties in measured gene expression levels across samples and experiments have been largely ignored in the literature. In this article, in the context of hierarchical clustering algorithms, we introduce a statistical resampling method to assess the reliability of gene clusters identified from any hierarchical clustering method. Using the clustering trees constructed from the resampled data, we can evaluate the confidence value for each node in the observed clustering tree. A majority-rule consensus tree can be obtained, showing clusters that only occur in a majority of the resampled trees. We illustrate our proposed methods with applications to two published data sets. Although the methods are discussed in the context of hierarchical clustering methods, they can be applied with other cluster-identification methods for gene expression data to assess the reliability of any gene cluster of interest. Electronic Publication  相似文献   

6.
Breast cancer outcome can be predicted using models derived from gene expression data or clinical data. Only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. We rigorously compare three different integration strategies (early, intermediate, and late integration) as well as classifiers employing no integration (only one data type) using five classifiers of varying complexity. We perform our analysis on a set of 295 breast cancer samples, for which gene expression data and an extensive set of clinical parameters are available as well as four breast cancer datasets containing 521 samples that we used as independent validation.mOn the 295 samples, a nearest mean classifier employing a logical OR operation (late integration) on clinical and expression classifiers significantly outperforms all other classifiers. Moreover, regardless of the integration strategy, the nearest mean classifier achieves the best performance. All five classifiers achieve their best performance when integrating clinical and expression data. Repeating the experiments using the 521 samples from the four independent validation datasets also indicated a significant performance improvement when integrating clinical and gene expression data. Whether integration also improves performances on other datasets (e.g. other tumor types) has not been investigated, but seems worthwhile pursuing. Our work suggests that future models for predicting breast cancer outcome should exploit both data types by employing a late OR or intermediate integration strategy based on nearest mean classifiers.  相似文献   

7.
MOTIVATION: Experimental limitations have resulted in the popularity of parametric statistical tests as a method for identifying differentially regulated genes in microarray data sets. However, these tests assume that the data follow a normal distribution. To date, the assumption that replicate expression values for any gene are normally distributed, has not been critically addressed for Affymetrix GeneChip data. RESULTS: The normality of the expression values calculated using four different commercial and academic software packages was investigated using a data set consisting of the same target RNA applied to 59 human Affymetrix U95A GeneChips using a combination of statistical tests and visualization techniques. For the majority of probe sets obtained from each analysis suite, the expression data showed a good correlation with normality. The exception was a large number of low-expressed genes in the data set produced using Affymetrix Microarray Suite 5.0, which showed a striking non-normal distribution. In summary, our data provide strong support for the application of parametric tests to GeneChip data sets without the need for data transformation.  相似文献   

8.
随着DNA芯片技术的广泛应用,基因表达数据分析已成为生命科学的研究热点之一。概述基因表达聚类技术类型、算法分类与特点、结果可视化与注释;阐述一些流行的和新型的算法;介绍17个最新相关软件包和在线web服务工具;并说明软件工具的研究趋向。  相似文献   

9.
The microarray-based analysis of gene expression has become a workhorse for biomedical research. Managing the amount and diversity of data that such experiments produce is a task that must be supported by appropriate software tools, which led to the creation of literally hundreds of systems. In consequence, choosing the right tool for a given project is difficult even for the expert. We report on the results of a survey encompassing 78 of such tools, of which 22 were inspected in detail and seven were tested hands-on. We report on our experiences with a focus on completeness of functionality, ease-of-use, and necessary effort for installation and maintenance. Thereby, our survey provides a valuable guideline for any project considering the use of a microarray data management system. It reveals which tasks are covered by mature tools and also shows that important requirements, especially in the area of integrated analysis of different experimental data, are not yet met satisfyingly by existing systems.  相似文献   

10.
Microarrays and more recently RNA sequencing has led to an increase in available gene expression data. How to manage and store this data is becoming a key issue. In response we have developed EXP-PAC, a web based software package for storage, management and analysis of gene expression and sequence data. Unique to this package is SQL based querying of gene expression data sets, distributed normalization of raw gene expression data and analysis of gene expression data across experiments and species. This package has been populated with lactation data in the international milk genomic consortium web portal (http://milkgenomics.org/). Source code is also available which can be hosted on a Windows, Linux or Mac APACHE server connected to a private or public network (http://mamsap.it.deakin.edu.au/~pcc/Release/EXP_PAC.html).  相似文献   

11.
Allan R Brasier 《BioTechniques》2002,32(1):100-2, 104, 106, 108-9
High-density oligonucleotide arrays are widely employed for detecting global changes in gene expression profiles of cells or tissues exposed to specific stimuli. Presented with large amounts of data, investigators can spend significant amounts of time analyzing and interpreting this array data. In our application of GeneChip arrays to analyze changes in gene expression in viral-infected epithelium, we have needed to develop additional computational tools that may be of utility to other investigators using this methodology. Here, I describe two executable programs to facilitate data extraction and multiple data point analysis. These programs run in a virtual DOS environment on Microsoft Windows 95/98/2K operating systems on a desktop PC. Both programs can be freely downloaded from the BioTechniques Software Library (www.BioTechniques.com). The first program, Retriever, extracts primary data from an array experiment contained in an Affymetrix textfile using user-inputted individual identification strings (e.g., the probe set identification numbers). With specific data retrieved for individual genes, hybridization profiles can be examined and data normalized. The second program, CompareTable, is used to facilitate comparison analysis of two experimental replicates. CompareTable compares two lists of genes, identifies common entries, extracts their data, and writes an output text file containing only those genes present in both of the experiments. The output files generated by these two programs can be opened and manipulated by any software application recognizing tab-delimited text files (e.g., Microsoft NotePad or Excel).  相似文献   

12.

Background  

Genome-wide expression signatures are emerging as potential marker for overall survival and disease recurrence risk as evidenced by recent commercialization of gene expression based biomarkers in breast cancer. Similar predictions have recently been carried out using genome-wide copy number alterations and microRNAs. Existing software packages for microarray data analysis provide functions to define expression-based survival gene signatures. However, there is no software that can perform survival analysis using SNP array data or draw survival curves interactively for expression-based sample clusters.  相似文献   

13.
The first problem in gene expression profiling to be solved is choosing the appropriate gene array, detection procedure, image analysis and data generation depending on the organism of interest, equipment and budget. The next one is how to deduce biologically meaningful data. We assessed gene expression data from chemiluminescent detection and empirically found criteria for the reliable identification of biologically meaningful expression ratios. Current statistical assessments are often applied unreflectedly concerning problems occurring in practice. So interesting results are considered to be irrelevant. This requires a laborious data check. We suggest automation. Our empirically found criteria were transformed into and validated by a knowledge-based system. This system is adaptable to all other methods of expression profiling. We compared the experience-based and new knowledge-based assessment of the expression data from our chemiluminescent and additionally radioactive detection of several experiments with published data to evaluate our entire procedure. With our adaptation of chemiluminescence detection to commercially available Escherichia coli gene arrays we present a useful alternative to common procedures in gene expression monitoring. Moreover, with our consideration of plasmid-harbouring E. coli strains we provide the opportunity to monitor gene expression during processes requiring any plasmids (e.g. recombinant protein expression).  相似文献   

14.
15.
MOTIVATION: The study of the dynamics of regulatory processes has led to increased interest for the analysis of temporal gene expression level data. To address the dynamics of regulation, expression data are collected repeatedly over time. It is difficult to statistically represent the resulting high-dimensional data. When regulatory processes determine gene expression, time-warping is likely to be present, i.e. the sample of gene expression trajectories reflects variation not only in terms of the expression amplitudes, but also in terms of the temporal structure of gene expression. RESULTS: A non-parametric time-synchronized iterative mean updating technique is proposed to find an overall representation that corresponds to a mode of a sample of expression profiles, viewed as a random sample in function space. The proposed algorithm explores the application of previous work of Hall and Heckman to genome-wide expression data and provides an extension that includes random time-warping with the aim to synchronize timescales across genes. The proposed algorithm is universally applicable for the construction of modes for functional data with time-warping. We demonstrate the construction of mode functions for a sample of Drosophila gene expression data. The algorithm can be applied to define clusters among the observed trajectories of gene expression, without any kind of prior non-time-warped clustering, as illustrated in the numerical example.  相似文献   

16.
Reid R  Dix DJ  Miller D  Krawetz SA 《BioTechniques》2001,30(4):762-6, 768
The use of commercial microarrays is rapidly becoming the method of choice for profiling gene expression and assessing various disease states. Research Genetics has provided a series of biological and software tools to the research community for these analyses. The fidelity of data analysis using these tools is dependent on a series of well-defined reference control points in the array. During the course of our investigations, it became apparent that in some instances the reference control points that are required for analysis became lost in background noise. This effectively halted the analysis and the recovery of any information contained within that experiment. To recover this data and to increase analytical veracity, the simple strategy of superimposing a template of reference control points onto the experimental array was developed. The utility of this tool is established in this communication.  相似文献   

17.
MOTIVATION: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process. RESULTS: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data. AVAILABILITY: The software is available at http://www.cs.umn.edu/~hskim/tools.html CONTACT: hpark@cs.umn.edu  相似文献   

18.
Gene expression data analysis   总被引:33,自引:0,他引:33  
Brazma A  Vilo J 《FEBS letters》2000,480(1):17-24
Microarrays are one of the latest breakthroughs in experimental molecular biology, which allow monitoring of gene expression for tens of thousands of genes in parallel and are already producing huge amounts of valuable data. Analysis and handling of such data is becoming one of the major bottlenecks in the utilization of the technology. The raw microarray data are images, which have to be transformed into gene expression matrices--tables where rows represent genes, columns represent various samples such as tissues or experimental conditions, and numbers in each cell characterize the expression level of the particular gene in the particular sample. These matrices have to be analyzed further, if any knowledge about the underlying biological processes is to be extracted. In this paper we concentrate on discussing bioinformatics methods used for such analysis. We briefly discuss supervised and unsupervised data analysis and its applications, such as predicting gene function classes and cancer classification. Then we discuss how the gene expression matrix can be used to predict putative regulatory signals in the genome sequences. In conclusion we discuss some possible future directions.  相似文献   

19.
ArrayExpress is a public microarray repository founded on the Minimum Information About a Microarray Experiment (MIAME) principles that stores MIAME-compliant gene expression data. Plant-based data sets represent approximately one-quarter of the experiments in ArrayExpress. The majority are based on Arabidopsis (Arabidopsis thaliana); however, there are other data sets based on Triticum aestivum, Hordeum vulgare, and Populus subsp. AtMIAMExpress is an open-source Web-based software application for the submission of Arabidopsis-based microarray data to ArrayExpress. AtMIAMExpress exports data in MAGE-ML format for upload to any MAGE-ML-compliant application, such as J-Express and ArrayExpress. It was designed as a tool for users with minimal bioinformatics expertise, has comprehensive help and user support, and represents a simple solution to meeting the MIAME guidelines for the Arabidopsis community. Plant data are queryable both in ArrayExpress and in the Data Warehouse databases, which support queries based on gene-centric and sample-centric annotation. The AtMIAMExpress submission tool is available at http://www.ebi.ac.uk/at-miamexpress/. The software is open source and is available from http://sourceforge.net/projects/miamexpress/. For information, contact miamexpress@ebi.ac.uk.  相似文献   

20.
目的:目前,关于数字化表达谱差异分析的方法及软件极少,且需懂得R语言等,操作繁琐,这给数字表达谱分析带来了不少困难,DGE-P软件针对数字化表达谱开发的差异分析软件。方法:DGE-P软件,利用倍数分析及数字化基因表达谱差异基因检测方法,对通过本软件标准化后的数据进行差异显著性分析。结果:DGE-P软件包含了丰度统计、数据标准化、求倍数分析和p-value值三个模块。可得出倍数分析与数字化基因表达谱差异基因检测方法(p-value)两个值。结论:DGE-P较以前的差异分析软件相比是一款针对数字化表达谱分析的软件,克服了其他软件在无重复实验数据时无法避免误差的缺陷。并且DGE-P较其他的软件相比使用方便,可在windows系统下运行,操作简单。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号