首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the problem of comparing the gene expression levels of cells grown under two different conditions using cDNA microarray data. We use a quality index, computed from duplicate spots on the same slide, to filter out outlying spots, poor quality genes and problematical slides. We also perform calibration experiments to show that normalization between fluorescent labels is needed and that the normalization is slide dependent and non-linear. A rank invariant method is suggested to select non-differentially expressed genes and to construct normalization curves in comparative experiments. After normalization the residuals from the calibration data are used to provide prior information on variance components in the analysis of comparative experiments. Based on a hierarchical model that incorporates several levels of variations, a method for assessing the significance of gene effects in comparative experiments is presented. The analysis is demonstrated via two groups of experiments with 125 and 4129 genes, respectively, in Escherichia coli grown in glucose and acetate.  相似文献   

2.
There are many sources of systematic variation in cDNA microarray experiments which affect the measured gene expression levels (e.g. differences in labeling efficiency between the two fluorescent dyes). The term normalization refers to the process of removing such variation. A constant adjustment is often used to force the distribution of the intensity log ratios to have a median of zero for each slide. However, such global normalization approaches are not adequate in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments. The selection of appropriate controls for normalization is discussed and a novel set of controls (microarray sample pool, MSP) is introduced to aid in intensity-dependent normalization. Lastly, to allow for comparisons of expression levels across slides, a robust method based on maximum likelihood estimation is proposed to adjust for scale differences among slides.  相似文献   

3.
4.
表达谱基因芯片的可靠性验证分析   总被引:7,自引:0,他引:7  
cDNA芯片是一项新兴的能评估检测全范围mRNA表达水平变化的技术。通过同种组织RNA自身比较实验及不同组织RNA的差异分析实验对cDNA芯片实验的重复性进行检验,利用相关系数(correlation coefficient,R)、变异系数(coefficient of variation,CV)和假阳性率(false positiver ate,FPR)分析eDNA芯片数据的可靠程度,对cDNA芯片实验数据作了整体的评估。结果证实,该芯片系统得到的cDNA表达谱数据相关系数一般大于0.9,平均变异系数15%左右,假阳性率控制在3%以内。还提出一致率(consistence rate,CR)的概念,作为衡量cDNA芯片系统重复性的新参数,同时阐述了该参数优于目前常用的相关系数及变异系数的特点。另外,通过比较芯片制备中点样浓度、mRNA和总RNA以及不同批次芯片和不同标记过程对实验的影响,来分析芯片数据的系统误差来源。并提出重复两次实验,可以克服绝大部分实验系统引入的假阳性。  相似文献   

5.
MOTIVATION: Recent technological advances such as cDNA microarray technology have made it possible to simultaneously interrogate thousands of genes in a biological specimen. A cDNA microarray experiment produces a gene expression 'profile'. Often interest lies in discovering novel subgroupings, or 'clusters', of specimens based on their profiles, for example identification of new tumor taxonomies. Cluster analysis techniques such as hierarchical clustering and self-organizing maps have frequently been used for investigating structure in microarray data. However, clustering algorithms always detect clusters, even on random data, and it is easy to misinterpret the results without some objective measure of the reproducibility of the clusters. RESULTS: We present statistical methods for testing for overall clustering of gene expression profiles, and we define easily interpretable measures of cluster-specific reproducibility that facilitate understanding of the clustering structure. We apply these methods to elucidate structure in cDNA microarray gene expression profiles obtained on melanoma tumors and on prostate specimens.  相似文献   

6.
Normalization removes or minimizes the biases of systematic variation that exists in experimental data sets. This study presents a systematic variation normalization (SVN) procedure for removing systematic variation in two channel microarray gene expression data. Based on an analysis of how systematic variation contributes to variability in microarray data sets, our normalization procedure includes background subtraction determined from the distribution of pixel intensity values from each data acquisition channel and log conversion, linear or non-linear regression, restoration or transformation, and multiarray normalization. In the case when a non-linear regression is required, an empirical polynomial approximation approach is used. Either the high terminated points or their averaged values in the distributions of the pixel intensity values observed in control channels may be used for rescaling multiarray datasets. These pre-processing steps remove systematic variation in the data attributable to variability in microarray slides, assay-batches, the array process, or experimenters. Biologically meaningful comparisons of gene expression patterns between control and test channels or among multiple arrays are therefore unbiased using normalized but not unnormalized datasets.  相似文献   

7.
MOTIVATION: A major focus of current cancer research is to identify genes that can be used as markers for prognosis and diagnosis, and as targets for therapy. Microarray technology has been applied extensively for this purpose, even though it has been reported that the agreement between microarray platforms is poor. A critical question is: how can we best combine the measurements of matched genes across microarray platforms to develop diagnostic and prognostic tools related to the underlying biology? RESULTS: We introduce a statistical approach within a Bayesian framework to combine the microarray data on matched genes from three investigations of gene expression profiling of B-cell chronic lymphocytic leukemia (CLL) and normal B cells (NBC) using three different microarray platforms, oligonucleotide arrays, cDNA arrays printed on glass slides and cDNA arrays printed on nylon membranes. Using this approach, we identified a number of genes that were consistently differentially expressed between CLL and NBC samples.  相似文献   

8.
We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called Latent Process Decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in constrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast.  相似文献   

9.
MOTIVATION: We face the absence of optimized standards to guide normalization, comparative analysis, and interpretation of data sets. One aspect of this is that current methods of statistical analysis do not adequately utilize the information inherent in the large data sets generated in a microarray experiment and require a tradeoff between detection sensitivity and specificity. RESULTS: We present a multistep procedure for analysis of mRNA expression data obtained from cDNA array methods. To identify and classify differentially expressed genes, results from standard paired t-test of normalized data are compared with those from a novel method, denoted an associative analysis. This method associates experimental gene expressions presented as residuals in regression analysis against control averaged expressions to a common standard-the family of similarly computed residuals for low variability genes derived from control experiments. By associating changes in expression of a given gene to a large family of equally expressed genes of the control group, this method utilizes the large data sets inherent in microarray experiments to increase both specificity and sensitivity. The overall procedure is illustrated by tabulation of genes whose expression differs significantly between Snell dwarf mice (dw/dw) and their phenotypically normal littermates (dw/+, +/+). Of the 2,352 genes examined only 450-500 were expressed above the background levels observed in nonexpressed genes and of these 120 were established as differentially expressed in dwarf mice at a significance level that excludes appearance of false positive determinations.  相似文献   

10.
11.
Systematic variations can occur at various steps of a cDNA microarray experiment and affect the measurement of gene expression levels. Accepted standards integrated into every cDNA microarray analysis can assess these variabilities and aid the interpretation of cDNA microarray experiments from different sources. A universally applicable approach to evaluate parameters such as input and output ratios, signal linearity, hybridization specificity and consistency across an array, as well as normalization strategies, is the utilization of exogenous control genes as spike-in and negative controls. We suggest that the use of such control sets, together with a sufficient number of experimental repeats, in-depth statistical analysis and thorough data validation should be made mandatory for the publication of cDNA microarray data.  相似文献   

12.
We consider the problem of identifying differentially expressed genes under different conditions using gene expression microarrays. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a robust Bayesian hierarchical model for testing for differential expression. Errors are modeled explicitly using a t-distribution, which accounts for outliers. The model includes an exchangeable prior for the variances, which allows different variances for the genes but still shrinks extreme empirical variances. Our model can be used for testing for differentially expressed genes among multiple samples, and it can distinguish between the different possible patterns of differential expression when there are three or more samples. Parameter estimation is carried out using a novel version of Markov chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space. The method is illustrated using two publicly available gene expression data sets. We compare our method to six other baseline and commonly used techniques, namely the t-test, the Bonferroni-adjusted t-test, significance analysis of microarrays (SAM), Efron's empirical Bayes, and EBarrays in both its lognormal-normal and gamma-gamma forms. In an experiment with HIV data, our method performed better than these alternatives, on the basis of between-replicate agreement and disagreement.  相似文献   

13.
14.

Background

Microarray technology allows the monitoring of expression levels for thousands of genes simultaneously. This novel technique helps us to understand gene regulation as well as gene by gene interactions more systematically. In the microarray experiment, however, many undesirable systematic variations are observed. Even in replicated experiment, some variations are commonly observed. Normalization is the process of removing some sources of variation which affect the measured gene expression levels. Although a number of normalization methods have been proposed, it has been difficult to decide which methods perform best. Normalization plays an important role in the earlier stage of microarray data analysis. The subsequent analysis results are highly dependent on normalization.

Results

In this paper, we use the variability among the replicated slides to compare performance of normalization methods. We also compare normalization methods with regard to bias and mean square error using simulated data.

Conclusions

Our results show that intensity-dependent normalization often performs better than global normalization methods, and that linear and nonlinear normalization methods perform similarly. These conclusions are based on analysis of 36 cDNA microarrays of 3,840 genes obtained in an experiment to search for changes in gene expression profiles during neuronal differentiation of cortical stem cells. Simulation studies confirm our findings.
  相似文献   

15.
Hokeun Sun  Hongzhe Li 《Biometrics》2012,68(4):1197-1206
Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified‐likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re‐estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso.  相似文献   

16.
17.
The major goal of two-color cDNA microarray experiments is to measure the relative gene expression level (i.e., relative amount of mRNA) of each gene between samples in studies of gene expression. More specifically, given an N-sample experiment, we need all N(N - 1)/2 relative expression levels of all sample pairs of each gene for identification of the differentially expressed genes and for clustering of gene expression patterns. However, the intensities observed from two-color cDNA microarray experiments do not simply represent the relative gene expression level. They are composed of signal (gene expression level), noise, and other factors. In discussions on the experimental design of two-color cDNA microarray experiments, little attention has been given to the fact that different combinations of test and control samples will produce microarray intensities data with varying intrinsic composition of factors. As a consequence, not all experimental designs for two-color cDNA microarray experiments are able to provide all possible relative gene expression levels. This phenomenon has never been addressed. To obtain all possible relative gene expression levels, a novel method for two-color cDNA microarray experimental design evaluation is necessary that will allow the making of an accurate choice. In this study, we propose a model-based approach to illustrate how the factor composition of microarray intensities changed with different experimental designs in two-color cDNA microarray experiments. By analyzing 12 experimental designs (including 5 general forms), we demonstrate that not all experimental designs are able to provide all possible relative gene expression levels due to the differences in factor composition. Our results indicate that whether an experimental design can provide all possible relative expression levels of all sample pairs for each gene should be the first criterion to be considered in an evaluation of experimental designs for two-color cDNA microarray experiments.  相似文献   

18.
Gene expression profiling on microarrays is widely used to measure the expression of large numbers of genes in a single experiment. Because of the high cost of this method, feasible numbers of replicates are limited, thus impairing the power of statistical analysis. As a step toward reducing technically induced variation, we developed a procedure of sample preparation and analysis that minimizes the number of sample manipulation steps, introduces quality control before array hybridization, and allows recovery of the prepared mRNA for independent validation of results. Sample preparation is based on mRNA separation using oligo(dT) magnetic beads, which are subsequently used for first-strand cDNA synthesis on the beads. cDNA covalently bound to the magnetic beads is used as template for second-strand cDNA synthesis, leaving the intact mRNA in solution for further analysis. The quality of the synthesized cDNA can be assessed by quantitative polymerase chain reaction using 3'- and 5'-specific primer pairs for housekeeping genes such as glyceraldehyde-3-phosphate dehydrogenase. Second-strand cDNA is chemically labeled with fluorescent dyes to avoid dye bias in enzymatic labeling reactions. After hybridization of two differently labeled samples to microarray slides, arrays are scanned and images analyzed automatically with high reproducibility. Quantile-normalized data from five biological replica display a coefficient of variation 45% for 90% of profiled genes, allowing detection of twofold changes with false positive and false negative rates of 10% each. We demonstrate successful application of the procedure for expression profiling in plant leaf tissue. However, the method could be easily adapted for samples from animal including human or from microbial origin.  相似文献   

19.
Lee EK  Park T 《Bioinformation》2007,1(10):423-428
In microarray experiments many undesirable systematic variations are commonly observed. Often investigators analyzing microarray data need to make subjective decisions about the quality of the experiment, by examining its chip image and a simple scatter plot. Thus, a more rigorous but simple method is desirable to determine the quality of microarray data. We propose two exploratory methods to investigate the quality of microarray experiments with replicated chips. The first method is based on correlations among chips and the second on the actual intensity values for each gene. The proposed methods are illustrated using a real microarray data set. The methods provide an initial estimation for determining the quality of microarray experiments.  相似文献   

20.
Microarray technologies, which can measure tens of thousands of gene expression values simultaneously in a single experiment, have become a common research method for biomedical researchers. Computational tools to analyze microarray data for biological discovery are needed. In this paper, we investigate the feasibility of using formal concept analysis (FCA) as a tool for microarray data analysis. The method of FCA builds a (concept) lattice from the experimental data together with additional biological information. For microarray data, each vertex of the lattice corresponds to a subset of genes that are grouped together according to their expression values and some biological information related to gene function. The lattice structure of these gene sets might reflect biological relationships in the dataset. Similarities and differences between experiments can then be investigated by comparing their corresponding lattices according to various graph measures. We apply our method to microarray data derived from influenza-infected mouse lung tissue and healthy controls. Our preliminary results show the promise of our method as a tool for microarray data analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号