首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new integrated image analysis package with quantitative quality control schemes is described for cDNA microarray technology. The package employs an iterative algorithm that utilizes both intensity characteristics and spatial information of the spots on a microarray image for signal–background segmentation and defines five quality scores for each spot to record irregularities in spot intensity, size and background noise levels. A composite score qcom is defined based on these individual scores to give an overall assessment of spot quality. Using qcom we demonstrate that the inherent variability in intensity ratio measurements is closely correlated with spot quality, namely spots with higher quality give less variable measurements and vice versa. In addition, gauging data by qcom can improve data reliability dramatically and efficiently. We further show that the variability in ratio measurements drops exponentially with increasing qcom and, for the majority of spots at the high quality end, this improvement is mainly due to an improvement in correlation between the two dyes. Based on these studies, we discuss the potential of quantitative quality control for microarray data and the possibility of filtering and normalizing microarray data using a quality metrics-dependent scheme.  相似文献   

2.
Normalization removes or minimizes the biases of systematic variation that exists in experimental data sets. This study presents a systematic variation normalization (SVN) procedure for removing systematic variation in two channel microarray gene expression data. Based on an analysis of how systematic variation contributes to variability in microarray data sets, our normalization procedure includes background subtraction determined from the distribution of pixel intensity values from each data acquisition channel and log conversion, linear or non-linear regression, restoration or transformation, and multiarray normalization. In the case when a non-linear regression is required, an empirical polynomial approximation approach is used. Either the high terminated points or their averaged values in the distributions of the pixel intensity values observed in control channels may be used for rescaling multiarray datasets. These pre-processing steps remove systematic variation in the data attributable to variability in microarray slides, assay-batches, the array process, or experimenters. Biologically meaningful comparisons of gene expression patterns between control and test channels or among multiple arrays are therefore unbiased using normalized but not unnormalized datasets.  相似文献   

3.
MOTIVATION: High-throughput microarray technologies enable measurements of the expression levels of thousands of genes in parallel. However, microarray printing, hybridization and washing may create substantial variability in the quality of the data. As erroneous measurements may have a drastic impact on the results by disturbing the normalization schemes and by introducing expression patterns that lead to incorrect conclusions, it is crucial to discard low quality observations in the early phases of a microarray experiment. A typical microarray experiment consists of tens of thousands of spots on a microarray, making manual extraction of poor quality spots impossible. Thus, there is a need for a reliable and general microarray spot quality control strategy. RESULTS: We suggest a novel strategy for spot quality control by using Bayesian networks, which contain many appealing properties in the spot quality control context. We illustrate how a non-linear least squares based Gaussian fitting procedure can be used in order to extract features for a spot on a microarray. The features we used in this study are: spot intensity, size of the spot, roundness of the spot, alignment error, background intensity, background noise, and bleeding. We conclude that Bayesian networks are a reliable and useful model for microarray spot quality assessment. SUPPLEMENTARY INFORMATION: http://sigwww.cs.tut.fi/TICSP/SpotQuality/.  相似文献   

4.
5.
Multidimensional local false discovery rate for microarray studies   总被引:1,自引:0,他引:1  
MOTIVATION: The false discovery rate (fdr) is a key tool for statistical assessment of differential expression (DE) in microarray studies. Overall control of the fdr alone, however, is not sufficient to address the problem of genes with small variance, which generally suffer from a disproportionally high rate of false positives. It is desirable to have an fdr-controlling procedure that automatically accounts for gene variability. METHODS: We generalize the local fdr as a function of multiple statistics, combining a common test statistic for assessing DE with its standard error information. We use a non-parametric mixture model for DE and non-DE genes to describe the observed multi-dimensional statistics, and estimate the distribution for non-DE genes via the permutation method. We demonstrate this fdr2d approach for simulated and real microarray data. RESULTS: The fdr2d allows objective assessment of DE as a function of gene variability. We also show that the fdr2d performs better than commonly used modified test statistics. AVAILABILITY: An R-package OCplus containing functions for computing fdr2d() and other operating characteristics of microarray data is available at http://www.meb.ki.se/~yudpaw.  相似文献   

6.
PURPOSE OF REVIEW: To highlight the development in microarray data analysis for the identification of differentially expressed genes, particularly via control of false discovery rate. RECENT FINDINGS: The emergence of high-throughput technology such as microarrays raises two fundamental statistical issues: multiplicity and sensitivity. We focus on the biological problem of identifying differentially expressed genes. First, multiplicity arises due to testing tens of thousands of hypotheses, rendering the standard P value meaningless. Second, known optimal single-test procedures such as the t-test perform poorly in the context of highly multiple tests. The standard approach of dealing with multiplicity is too conservative in the microarray context. The false discovery rate concept is fast becoming the key statistical assessment tool replacing the P value. We review the false discovery rate approach and argue that it is more sensible for microarray data. We also discuss some methods to take into account additional information from the microarrays to improve the false discovery rate. SUMMARY: There is growing consensus on how to analyse microarray data using the false discovery rate framework in place of the classical P value. Further research is needed on the preprocessing of the raw data, such as the normalization step and filtering, and on finding the most sensitive test procedure.  相似文献   

7.
Commonly accepted intensity-dependent normalization in spotted microarray studies takes account of measurement errors in the differential expression ratio but ignores measurement errors in the total intensity, although the definitions imply the same measurement error components are involved in both statistics. Furthermore, identification of differentially expressed genes is usually considered separately following normalization, which is statistically problematic. By incorporating the measurement errors in both total intensities and differential expression ratios, we propose a measurement-error model for intensity-dependent normalization and identification of differentially expressed genes. This model is also flexible enough to incorporate intra-array and inter-array effects. A Bayesian framework is proposed for the analysis of the proposed measurement-error model to avoid the potential risk of using the common two-step procedure. We also propose a Bayesian identification of differentially expressed genes to control the false discovery rate instead of the ad hoc thresholding of the posterior odds ratio. The simulation study and an application to real microarray data demonstrate promising results.  相似文献   

8.
In this paper, fluorescent microarray images and various analysis techniques are described to improve the microarray data acquisition processes. Signal intensities produced by rarely expressed genes are initially correctly detected, but they are often lost in corrections for background, log or ratio. Our analyses indicate that a simple correlation between the mean and median signal intensities may be the best way to eliminate inaccurate microarray signals. Unlike traditional quality control methods, the low intensity signals are retained and inaccurate signals are eliminated in this mean and median correlation. With larger amounts of microarray data being generated, it becomes increasingly more difficult to analyze data on a visual basis. Our method allows for the automatic quantitative determination of accurate and reliable signals, which can then be used for normalization. We found that a mean to median correlation of 85% or higher not only retains more data than current methods, but the retained data is more accurate than traditional thresholds or common spot flagging algorithms. We have also found that by using pin microtapping and microvibrations, we can control spot quality independent from initial PCR volume.  相似文献   

9.
Analysis of repeatability in spotted cDNA microarrays   总被引:7,自引:3,他引:4  
We report a strategy for analysis of data quality in cDNA microarrays based on the repeatability of repeatedly spotted clones. We describe how repeatability can be used to control data quality by developing adaptive filtering criteria for microarray data containing clones spotted in multiple spots. We have applied the method on five publicly available cDNA microarray data sets and one previously unpublished data set from our own laboratory. The results demonstrate the feasibility of the approach as a foundation for data filtering, and indicate a high degree of variation in data quality, both across the data sets and between arrays within data sets.  相似文献   

10.
MOTIVATION: There are two general methods for making gene-expression microarrays: one is to hybridize a single test set of labeled targets to the probe, and measure the background-subtracted intensity at each probe site; the other is to hybridize both a test and a reference set of differentially labeled targets to a single detector array, and measure the ratio of the background-subtracted intensities at each probe site. Which method is better depends on the variability in the cell system and the random factors resulting from the microarray technology. It also depends on the purpose for which the microarray is being used. Classification is a fundamental application and it is the one considered here. RESULTS: This paper describes a model-based simulation paradigm that compares the classification accuracy provided by these methods over a variety of noise types and presents the results of a study modeled on noise typical of cDNA microarray data. The model consists of four parts: (1) the measurement equation for genes in the reference state; (2) the measurement equation for genes in the test state; (3) the ratio and normalization procedure for a dual-channel system; and (4) the intensity and normalization procedure for a single-channel system. In the reference state, the mean intensities are modeled as a shifted exponential distribution, and the intensity for a particular gene is modeled via a normal distribution, Normal(I, alphaI), about its mean intensity I, with alpha being the coefficient of variation of the cell system. In the test state, some genes have their intensities up-regulated by a random factor. The model includes a number of random factors affecting intensity measurement: deposition gain d, labeling gain, and post-image-processing residual noise. The key conclusion resulting from the study is that the coefficient of variation governing the randomness of the intensities and the deposition gain are the most important factors for determining whether a single-channel or dual-channel system provides superior classification, and the decision region in the alpha-d plane is approximately linear.  相似文献   

11.
In this study we present two novel normalization schemes for cDNA microarrays. They are based on iterative local regression and optimization of model parameters by generalized cross-validation. Permutation tests assessing the efficiency of normalization demonstrated that the proposed schemes have an improved ability to remove systematic errors and to reduce variability in microarray data. The analysis also reveals that without parameter optimization local regression is frequently insufficient to remove systematic errors in microarray data.  相似文献   

12.
13.
A microarray experiment includes many steps, and each one of them may include systematic variations. To have a sound analysis, the systematic bias must be identified and removed prior to the data being analyzed. Based on the M-A dependency observed by Dudoit et al. (2002), we suggest that, instead of using the lowess normalization, a new normalization method called ANCOVA be used for dealing with genes with replicates. Simulation studies have shown that the performance of the suggested ANCOVA method is superior to any of the available approaches with regards to the Fisher's Z score and concordance rate. We used a microarray data from bladder cancer to illustrate the application of our approach. The edge the ANCOVA method has over the existing normalization approaches is further confirmed through real-time PCR.  相似文献   

14.
Quality control of a microarray experiment has become an important issue for both research and regulation. External RNA controls (ERCs), which can be either added to the total RNA level (tERCs) or introduced right before hybridization (cERCs), are designed and recommended by commercial microarray platforms for assessment of performance of a microarray experiment. However, the utility of ERCs has not been fully realized mainly due to the lack of sufficient data resources. The US Food and Drug Administration (FDA)-led community-wide Microarray Quality Control (MAQC) study generates a large amount of microarray data with implementation of ERCs across several commercial microarray platforms. The utility of ERCs in quality control by assessing the ERCs’ concentration-response behavior was investigated in the MAQC study. In this work, an ERC-based correlation analysis was conducted to assess the quality of a microarray experiment. We found that the pairwise correlations of tERCs are sample independent, indicating that the array data obtained from different biological samples can be treated as technical replicates in analysis of tERCs. Consequently, the commonly used quality control method of applying correlation analysis on technical replicates can be adopted for assessing array performance based on different biological samples using tERCs. The proposed approach is sensitive to identifying outlying assays and is not dependent on the choice of normalization method.  相似文献   

15.

Background  

The quality of microarray data can seriously affect the accuracy of downstream analyses. In order to reduce variability and enhance signal reproducibility in these data, many normalization methods have been proposed and evaluated, most of which are for data obtained from cDNA microarrays and Affymetrix GeneChips. CodeLink Bioarrays are a newly emerged, single-color oligonucleotide microarray platform. To date, there are no reported studies that evaluate normalization methods for CodeLink Bioarrays.  相似文献   

16.

Background  

To cancel experimental variations, microarray data must be normalized prior to analysis. Where an appropriate model for statistical data distribution is available, a parametric method can normalize a group of data sets that have common distributions. Although such models have been proposed for microarray data, they have not always fit the distribution of real data and thus have been inappropriate for normalization. Consequently, microarray data in most cases have been normalized with non-parametric methods that adjust data in a pair-wise manner. However, data analysis and the integration of resultant knowledge among experiments have been difficult, since such normalization concepts lack a universal standard.  相似文献   

17.
Yin BC  Li H  Ye BC 《Analytical biochemistry》2008,383(2):270-278
DNA microarray technology has become powerful and popular in mutation/single nucleotide polymorphism (SNP) discovery and genotyping. However, this method is often associated with considerable signal noise of nonbiological origin that may compromise the data quality and interpretation. To achieve a high degree of reliability, accuracy, and sensitivity in data analysis, an effective normalization method to minimize the technical variability is highly desired. In the current study, a simple and robust normalization method is described. The method is based on introduction of a reference probe coimmobilized with SNP probes on the microarray for a dual-probe hybridization (DPH) reaction. The reference probe is used as an intraspot control for the customized microarrays. Using this method, the interassay coefficient of variation (CV) was reduced significantly by approximately 10%. After DPH normalization, the CVs and ranges of the ratios were reduced by two to five times. The relative magnitudes of variation of different sources were also analyzed by analysis of variance. Glass slides were shown to contribute the most to the variance, whereas sampling and residual errors had relatively modest contribution. The results showed that this DPH-based spot-dependent normalization method is an effective solution for reducing experimental variation associated with microarray genotyping data.  相似文献   

18.
基因芯片表达谱数据的预处理分析   总被引:1,自引:0,他引:1  
基因芯片数据的预处理是一个十分关键的步骤,通过数据过滤获取需要的数据、数据转换满足正态分布的分析要求、缺失值的估计弥补不完整的数据、数据归一化纠正系统误差等处理为后续分析工作做准备,预处理分析的重要性并不亚于基因芯片的后续分析,它将直接影响后续分析是否能得到预期的结果.本文重点综述了cDNA芯片的数据预处理,简要地概述寡核苷酸芯片的数据预处理.  相似文献   

19.
MOTIVATION: We face the absence of optimized standards to guide normalization, comparative analysis, and interpretation of data sets. One aspect of this is that current methods of statistical analysis do not adequately utilize the information inherent in the large data sets generated in a microarray experiment and require a tradeoff between detection sensitivity and specificity. RESULTS: We present a multistep procedure for analysis of mRNA expression data obtained from cDNA array methods. To identify and classify differentially expressed genes, results from standard paired t-test of normalized data are compared with those from a novel method, denoted an associative analysis. This method associates experimental gene expressions presented as residuals in regression analysis against control averaged expressions to a common standard-the family of similarly computed residuals for low variability genes derived from control experiments. By associating changes in expression of a given gene to a large family of equally expressed genes of the control group, this method utilizes the large data sets inherent in microarray experiments to increase both specificity and sensitivity. The overall procedure is illustrated by tabulation of genes whose expression differs significantly between Snell dwarf mice (dw/dw) and their phenotypically normal littermates (dw/+, +/+). Of the 2,352 genes examined only 450-500 were expressed above the background levels observed in nonexpressed genes and of these 120 were established as differentially expressed in dwarf mice at a significance level that excludes appearance of false positive determinations.  相似文献   

20.
The finite mixture model approach has attracted much attention in analyzing microarray data due to its robustness to the excessive variability which is common in the microarray data. Pan (2003) proposed to use the normal mixture model method (MMM) to estimate the distribution of a test statistic and its null distribution. However, considering the fact that the test statistic is often of t-type, our studies find that the rejection region from MMM is often significantly larger than the correct rejection region, resulting an inflated type I error. This motivates us to propose the t-mixture model (TMM) approach. In this paper, we demonstrate that TMM provides significantly more accurate control of the probability of making type I errors (hence of the familywise error rate) than MMM. Finally, TMM is applied to the well-known leukemia data of Golub et al. (1999). The results are compared with those obtained from MMM.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号