首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 674 毫秒
1.
Normalization removes or minimizes the biases of systematic variation that exists in experimental data sets. This study presents a systematic variation normalization (SVN) procedure for removing systematic variation in two channel microarray gene expression data. Based on an analysis of how systematic variation contributes to variability in microarray data sets, our normalization procedure includes background subtraction determined from the distribution of pixel intensity values from each data acquisition channel and log conversion, linear or non-linear regression, restoration or transformation, and multiarray normalization. In the case when a non-linear regression is required, an empirical polynomial approximation approach is used. Either the high terminated points or their averaged values in the distributions of the pixel intensity values observed in control channels may be used for rescaling multiarray datasets. These pre-processing steps remove systematic variation in the data attributable to variability in microarray slides, assay-batches, the array process, or experimenters. Biologically meaningful comparisons of gene expression patterns between control and test channels or among multiple arrays are therefore unbiased using normalized but not unnormalized datasets.  相似文献   

2.
There are many options in handling microarray data that can affect study conclusions, sometimes drastically. Working with a two-color platform, this study uses ten spike-in microarray experiments to evaluate the relative effectiveness of some of these options for the experimental goal of detecting differential expression. We consider two data transformations, background subtraction and intensity normalization, as well as six different statistics for detecting differentially expressed genes. Findings support the use of an intensity-based normalization procedure and also indicate that local background subtraction can be detrimental for effectively detecting differential expression. We also verify that robust statistics outperform t-statistics in identifying differentially expressed genes when there are few replicates. Finally, we find that choice of image analysis software can also substantially influence experimental conclusions.  相似文献   

3.

Background  

To cancel experimental variations, microarray data must be normalized prior to analysis. Where an appropriate model for statistical data distribution is available, a parametric method can normalize a group of data sets that have common distributions. Although such models have been proposed for microarray data, they have not always fit the distribution of real data and thus have been inappropriate for normalization. Consequently, microarray data in most cases have been normalized with non-parametric methods that adjust data in a pair-wise manner. However, data analysis and the integration of resultant knowledge among experiments have been difficult, since such normalization concepts lack a universal standard.  相似文献   

4.
The reference design is a practical and popular choice for microarray studies using two-color platforms. In the reference design, the reference RNA uses half of all array resources, leading investigators to ask: What is the best reference RNA? We propose a novel method for evaluating reference RNAs and present the results of an experiment that was specially designed to evaluate three common choices of reference RNA. We found no compelling evidence in favor of any particular reference. In particular, a commercial reference showed no advantage in our data. Our experimental design also enabled a new way to test the effectiveness of pre-processing methods for two-color arrays. Our results favor using intensity normalization and foregoing background subtraction. Finally, we evaluate the sensitivity and specificity of data quality filters, and we propose a new filter that can be applied to any experimental design and does not rely on replicate hybridizations.  相似文献   

5.
MOTIVATION: We face the absence of optimized standards to guide normalization, comparative analysis, and interpretation of data sets. One aspect of this is that current methods of statistical analysis do not adequately utilize the information inherent in the large data sets generated in a microarray experiment and require a tradeoff between detection sensitivity and specificity. RESULTS: We present a multistep procedure for analysis of mRNA expression data obtained from cDNA array methods. To identify and classify differentially expressed genes, results from standard paired t-test of normalized data are compared with those from a novel method, denoted an associative analysis. This method associates experimental gene expressions presented as residuals in regression analysis against control averaged expressions to a common standard-the family of similarly computed residuals for low variability genes derived from control experiments. By associating changes in expression of a given gene to a large family of equally expressed genes of the control group, this method utilizes the large data sets inherent in microarray experiments to increase both specificity and sensitivity. The overall procedure is illustrated by tabulation of genes whose expression differs significantly between Snell dwarf mice (dw/dw) and their phenotypically normal littermates (dw/+, +/+). Of the 2,352 genes examined only 450-500 were expressed above the background levels observed in nonexpressed genes and of these 120 were established as differentially expressed in dwarf mice at a significance level that excludes appearance of false positive determinations.  相似文献   

6.
Gene set analysis (GSA) incorporates biological information into statistical knowledge to identify gene sets differently expressed between two or more phenotypes. It allows us to gain an insight into the functional working mechanism of cells beyond the detection of differently expressed gene sets. In order to evaluate the competence of GSA approaches, three self-contained GSA approaches with different statistical methods were chosen; Category, Globaltest and Hotelling's T2 together with their assayed power to identify the differences expressed via simulation and real microarray data. The Category does not take care of the correlation structure, while the other two deal with correlations.  相似文献   

7.
SUMMARY: We introduce a novel Matlab toolbox for microarray data analysis. This toolbox uses normalization based upon a normally distributed background and differential gene expression based on five statistical measures. The objects in this toolbox are open source and can be implemented to suit your application. AVAILABILITY: MDAT v1.0 is a Matlab toolbox and requires Matlab to run. MDAT is freely available at http://microarray.omrf.org/publications/2004/knowlton/MDAT.zip.  相似文献   

8.
9.
Here we present a methodology for the normalization of element signal intensities to a mean intensity calculated locally across the surface of a DNA microarray. These methods allow the detection and/or correction of spatially systematic artifacts in microarray data. These include artifacts that can be introduced during the robotic printing, hybridization, washing, or imaging of microarrays. Using array element signal intensities alone, this local mean normalization process can correct for such artifacts because they vary across the surface of the array. The local mean normalization can be usedfor quality control and data correction purposes in the analysis of microarray data. These algorithms assume that array elements are not spatially ordered with regard to sequence or biological function and require that this spatial mapping is identical between the two sets of intensities to be compared. The tool described in this report was developed in the R statistical language and is freely available on the Internet as part of a larger gene expression analysis package. This Web implementation is interactive and user-friendly and allows the easy use of the local mean normalization tool described here, without programming expertise or downloading of additional software.  相似文献   

10.
Comparison of normalization methods with microRNA microarray   总被引:3,自引:0,他引:3  
Hua YJ  Tu K  Tang ZY  Li YX  Xiao HS 《Genomics》2008,92(2):122-128
MicroRNAs (miRNAs) are a group of RNAs that play important roles in regulating gene expression and protein translation. In a previous study, we established an oligonucleotide microarray platform to detect miRNA expression. Because it contained only hundreds of probes, data normalization was difficult. In this study, the microarray data for eight miRNAs extracted from inflamed rat dorsal root ganglion (DRG) tissue were normalized using 15 methods and compared with the results of real-time polymerase chain reaction. It was found that the miRNA microarray data normalized by the print-tip loess method were the most consistent with results from real-time polymerase chain reaction. Moreover, the same pattern was also observed in 14 different types of rat tissue. This study compares a variety of normalization methods and will be helpful in the preprocessing of miRNA microarray data.  相似文献   

11.

Background  

In the microarray experiment, many undesirable systematic variations are commonly observed. Normalization is the process of removing such variation that affects the measured gene expression levels. Normalization plays an important role in the earlier stage of microarray data analysis. The subsequent analysis results are highly dependent on normalization. One major source of variation is the background intensities. Recently, some methods have been employed for correcting the background intensities. However, all these methods focus on defining signal intensities appropriately from foreground and background intensities in the image analysis. Although a number of normalization methods have been proposed, no systematic methods have been proposed using the background intensities in the normalization process.  相似文献   

12.
The analysis of two-colour cDNA microarray data usually involves subtracting background values from foreground values prior to normalization and further analysis. This approach has the advantage of reducing bias and the disadvantage of blowing up the variance of lower abundant spots. Whenever background subtraction is considered, it implicitly assumes locally constant background values. In practice, this assumption is often not met, which casts doubts on the usefulness of simple background subtraction. In order to improve background correction, we propose local background smoothing within the pre-processing pipeline of cDNA microarray data prior to background correction. For this purpose, we employ a geostatistical framework with ordinary kriging using both isotropic and anisotropic models of spatial correlation and 2-D locally weighted regression. We show that application of local background smoothing prior to background correction is beneficial in comparison to using raw background estimates. This is done using data of a self-versus-self experiment in Arabidopsis where subsets of differentially expressed genes were simulated. Using locally smoothed background values in conjunction with existing background correction methods increases the power, increases the accuracy and decreases the number of false positive results.  相似文献   

13.
Significance of gene ranking for classification of microarray samples   总被引:1,自引:0,他引:1  
Many methods for classification and gene selection with microarray data have been developed. These methods usually give a ranking of genes. Evaluating the statistical significance of the gene ranking is important for understanding the results and for further biological investigations, but this question has not been well addressed for machine learning methods in existing works. Here, we address this problem by formulating it in the framework of hypothesis testing and propose a solution based on resampling. The proposed r-test methods convert gene ranking results into position p-values to evaluate the significance of genes. The methods are tested on three real microarray data sets and three simulation data sets with support vector machines as the method of classification and gene selection. The obtained position p-values help to determine the number of genes to be selected and enable scientists to analyze selection results by sophisticated multivariate methods under the same statistical inference paradigm as for simple hypothesis testing methods.  相似文献   

14.

Background

We present a novel and systematic approach to analyze temporal microarray data. The approach includes normalization, clustering and network analysis of genes.

Methodology

Genes are normalized using an error model based uniform normalization method aimed at identifying and estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000 genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail.

Conclusions

We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of potential interest to Arabidopsis researchers.  相似文献   

15.
Genome-wide RNA interference (RNAi) screening allows investigation of the role of individual genes in a process of choice. Most RNAi screens identify a large number of genes with a continuous gradient in the assessed phenotype. Screeners must decide whether to examine genes with the most robust phenotype or the full gradient of genes that cause an effect and how to identify candidate genes. The authors have used RNAi in Drosophila cells to examine viability in a 384-well plate format and compare 2 screens, untreated control and treatment. They compare multiple normalization methods, which take advantage of different features within the data, including quantile normalization, background subtraction, scaling, cellHTS2 (Boutros et al. 2006), and interquartile range measurement. Considering the false-positive potential that arises from RNAi technology, a robust validation method was designed for the purpose of gene selection for future investigations. In a retrospective analysis, the authors describe the use of validation data to evaluate each normalization method. Although no method worked ideally, a combination of 2 methods, background subtraction followed by quantile normalization and cellHTS2, at different thresholds, captures the most dependable and diverse candidate genes. Thresholds are suggested depending on whether a few candidate genes are desired or a more extensive systems-level analysis is sought. The normalization approaches and experimental design to perform validation experiments are likely to apply to those high-throughput screening systems attempting to identify genes for systems-level analysis.  相似文献   

16.
Combinatorial image analysis of DNA microarray features   总被引:3,自引:0,他引:3  
MOTIVATION: DNA and protein microarrays have become an established leading-edge technology for large-scale analysis of gene and protein content and activity. Contact-printed microarrays has emerged as a relatively simple and cost effective method of choice but its reliability is especially susceptible to quality of pixel information obtained from digital scans of spotted features in the microarray image. RESULTS: We address the statistical computation requirements for optimizing data acquisition and processing of digital scans. We consider the use of median filters to reduce noise levels in images and top-hat filters to correct for trends in background values. We also consider, as alternative estimators of spot intensity, discs of fixed radius, proportions of histograms and k-means clustering, either with or without a square-root intensity transformation and background subtraction. We identify, using combinatoric procedures, optimal filter and estimator parameters, in achieving consistency among the replicates of a gene on each microarray. Our results, using test data from microarrays of HCMV, indicate that a highly effective approach for improving reliability and quality of microarray data is to apply a 21 by 21 top-hat filter, then estimate spot intensity as the mean of the largest 20% of pixel values in the target region, after a square-root transformation, and corrected for background, by subtracting the mean of the smallest 70% of pixel values. AVAILABILITY: Fortran90 subroutines implementing these methods are available from the authors, or at http://www.bioss.ac.uk/~chris.  相似文献   

17.
Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray.  相似文献   

18.
Microarray-based analysis of single nucleotide polymorphisms (SNPs) has many applications in large-scale genetic studies. To minimize the influence of experimental variation, microarray data usually need to be processed in different aspects including background subtraction, normalization and low-signal filtering before genotype determination. Although many algorithms are sophisticated for these purposes, biases are still present. In the present paper, new algorithms for SNP microarray data analysis and the software, AccuTyping, developed based on these algorithms are described. The algorithms take advantage of a large number of SNPs included in each assay, and the fact that the top and bottom 20% of SNPs can be safely treated as homozygous after sorting based on their ratios between the signal intensities. These SNPs are then used as controls for color channel normalization and background subtraction. Genotype calls are made based on the logarithms of signal intensity ratios using two cutoff values, which were determined after training the program with a dataset of approximately 160,000 genotypes and validated by non-microarray methods. AccuTyping was used to determine >300,000 genotypes of DNA and sperm samples. The accuracy was shown to be >99%. AccuTyping can be downloaded from http://www2.umdnj.edu/lilabweb/publications/AccuTyping.html.  相似文献   

19.
The upcoming availability of public microarray repositories and of large compendia of gene expression information opens up a new realm of possibilities for microarray data analysis. An essential challenge is the efficient integration of microarray data generated by different research groups on different array platforms. This review focuses on the problems associated with this integration, which are: (1) the efficient access to and exchange of microarray data; (2) the validation and comparison of data from different platforms (cDNA and short and long oligonucleotides); and (3) the integrated statistical analysis of multiple data sets.  相似文献   

20.
Two-color DNA microarrays are commonly used for the analysis of global gene expression. They provide information on relative abundance of thousands of mRNAs. However, the generated data need to be normalized to minimize systematic variations so that biologically significant differences can be more easily identified. A large number of normalization procedures have been proposed and many softwares for microarray data analysis are available. Here, we have applied two normalization methods (median and loess) from two packages of microarray data analysis softwares. They were examined using a sample data set. We found that the number of genes identified as differentially expressed varied significantly depending on the method applied. The obtained results, i.e. lists of differentially expressed genes, were consistent only when we used median normalization methods. Loess normalization implemented in the two software packages provided less coherent and for some probes even contradictory results. In general, our results provide an additional piece of evidence that the normalization method can profoundly influence final results of DNA microarray-based analysis. The impact of the normalization method depends greatly on the algorithm employed. Consequently, the normalization procedure must be carefully considered and optimized for each individual data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号