首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Authors of several recent papers have independently introduced a family of transformations (the generalized-log family), which stabilizes the variance of microarray data up to the first order. However, for data from two-color arrays, tests for differential expression may require that the variance of the difference of transformed observations be constant, rather than that of the transformed observations themselves. RESULTS: We introduce a transformation within the generalized-log family which stabilizes, to the first order, the variance of the difference of transformed observations. We also introduce transformations from the 'started-log' and log-linear-hybrid families which provide good approximate variance stabilization of differences. Examples using control-control data show that any of these transformations may provide sufficient variance stabilization for practical applications, and all perform well compared to log ratios.  相似文献   

2.
A comparison was made between mathematical variations of the square root and Schoolfield models for predicting growth rate as a function of temperature. The statistical consequences of square root and natural logarithm transformations of growth rate use in several variations of the Schoolfield and square root models were examined. Growth rate variances of Yersinia enterocolitica in brain heart infusion broth increased as a function of temperature. The ability of the two data transformations to correct for the heterogeneity of variance was evaluated. A natural logarithm transformation of growth rate was more effective than a square root transformation at correcting for the heterogeneity of variance. The square root model was more accurate than the Schoolfield model when both models used natural logarithm transformation.  相似文献   

3.
A comparison was made between mathematical variations of the square root and Schoolfield models for predicting growth rate as a function of temperature. The statistical consequences of square root and natural logarithm transformations of growth rate use in several variations of the Schoolfield and square root models were examined. Growth rate variances of Yersinia enterocolitica in brain heart infusion broth increased as a function of temperature. The ability of the two data transformations to correct for the heterogeneity of variance was evaluated. A natural logarithm transformation of growth rate was more effective than a square root transformation at correcting for the heterogeneity of variance. The square root model was more accurate than the Schoolfield model when both models used natural logarithm transformation.  相似文献   

4.
MOTIVATION: Standard statistical techniques often assume that data are normally distributed, with constant variance not depending on the mean of the data. Data that violate these assumptions can often be brought in line with the assumptions by application of a transformation. Gene-expression microarray data have a complicated error structure, with a variance that changes with the mean in a non-linear fashion. Log transformations, which are often applied to microarray data, can inflate the variance of observations near background. RESULTS: We introduce a transformation that stabilizes the variance of microarray data across the full range of expression. Simulation studies also suggest that this transformation approximately symmetrizes microarray data.  相似文献   

5.

Background  

Classifying nuclear magnetic resonance (NMR) spectra is a crucial step in many metabolomics experiments. Since several multivariate classification techniques depend upon the variance of the data, it is important to first minimise any contribution from unwanted technical variance arising from sample preparation and analytical measurements, and thereby maximise any contribution from wanted biological variance between different classes. The generalised logarithm (glog) transform was developed to stabilise the variance in DNA microarray datasets, but has rarely been applied to metabolomics data. In particular, it has not been rigorously evaluated against other scaling techniques used in metabolomics, nor tested on all forms of NMR spectra including 1-dimensional (1D) 1H, projections of 2D 1H, 1H J-resolved (pJRES), and intact 2D J-resolved (JRES).  相似文献   

6.
We introduce a statistical model for microarray gene expression data that comprises data calibration, the quantification of differential expression, and the quantification of measurement error. In particular, we derive a transformation h for intensity measurements, and a difference statistic Deltah whose variance is approximately constant along the whole intensity range. This forms a basis for statistical inference from microarray data, and provides a rational data pre-processing strategy for multivariate analyses. For the transformation h, the parametric form h(x)=arsinh(a+bx) is derived from a model of the variance-versus-mean dependence for microarray intensity data, using the method of variance stabilizing transformations. For large intensities, h coincides with the logarithmic transformation, and Deltah with the log-ratio. The parameters of h together with those of the calibration between experiments are estimated with a robust variant of maximum-likelihood estimation. We demonstrate our approach on data sets from different experimental platforms, including two-colour cDNA arrays and a series of Affymetrix oligonucleotide arrays.  相似文献   

7.
Transformation and normalization of oligonucleotide microarray data   总被引:3,自引:0,他引:3  
MOTIVATION: Most methods of analyzing microarray data or doing power calculations have an underlying assumption of constant variance across all levels of gene expression. The most common transformation, the logarithm, results in data that have constant variance at high levels but not at low levels. Rocke and Durbin showed that data from spotted arrays fit a two-component model and Durbin, Hardin, Hawkins, and Rocke, Huber et al. and Munson provided a transformation that stabilizes the variance as well as symmetrizes and normalizes the error structure. We wish to evaluate the applicability of this transformation to the error structure of GeneChip microarrays. RESULTS: We demonstrate in an example study a simple way to use the two-component model of Rocke and Durbin and the data transformation of Durbin, Hardin, Hawkins and Rocke, Huber et al. and Munson on Affymetrix GeneChip data. In addition we provide a method for normalization of Affymetrix GeneChips simultaneous with the determination of the transformation, producing a data set without chip or slide effects but with constant variance and with symmetric errors. This transformation/normalization process can be thought of as a machine calibration in that it requires a few biologically constant replicates of one sample to determine the constant needed to specify the transformation and normalize. It is hypothesized that this constant needs to be found only once for a given technology in a lab, perhaps with periodic updates. It does not require extensive replication in each study. Furthermore, the variance of the transformed pilot data can be used to do power calculations using standard power analysis programs. AVAILABILITY: SPLUS code for the transformation/normalization for four replicates is available from the first author upon request. A program written in C is available from the last author.  相似文献   

8.
MOTIVATION AND RESULTS: Durbin et al. (2002), Huber et al. (2002) and Munson (2001) independently introduced a family of transformations (the generalized-log family) which stabilizes the variance of microarray data up to the first order. We introduce a method for estimating the transformation parameter in tandem with a linear model based on the procedure outlined in Box and Cox (1964). We also discuss means of finding transformations within the generalized-log family which are optimal under other criteria, such as minimum residual skewness and minimum mean-variance dependency. AVAILABILITY: R and Matlab code and test data are available from the authors on request.  相似文献   

9.
MOTIVATION: To study lowly expressed genes in microarray experiments, it is useful to increase the photometric gain in the scanning. However, a large gain may cause some pixels for highly expressed genes to become saturated. Spatial statistical models that model spot shapes on the pixel level may be used to infer information about the saturated pixel intensities. Other possible applications for spot shape models include data quality control and accurate determination of spot centres and spot diameters. RESULTS: Spatial statistical models for spotted microarrays are studied including pixel level transformations and spot shape models. The models are applied to a dataset from 50mer oligonucleotide microarrays with 452 selected Arabidopsis genes. Logarithmic, Box-Cox and inverse hyperbolic sine transformations are compared in combination with four spot shape models: a cylindric plateau shape, an isotropic Gaussian distribution and a difference of two-scaled Gaussian distribution suggested in the literature, as well as a proposed new polynomial-hyperbolic spot shape model. A substantial improvement is obtained for the dataset studied by the polynomial-hyperbolic spot shape model in combination with the Box-Cox transformation. The spatial statistical models are used to correct spot measurements with saturation by extrapolating the censored data. AVAILABILITY: Source code for R is available at http://www.matfys.kvl.dk/~ekstrom/spotshapes/  相似文献   

10.
For some applications of the WILCOXON-MANN-WHITNEY-statistic its variance has to be estimated. So e.g. for the test of POTTHOFF (1963) to detect differences in medians of two symmetric distributions as well as for the computation of approximate, confidence bounds for the probability P(X1X2), cf. GOVINDARAJULU (1968). In the present paper an easy to compute variance estimator is proposed which as only information uses the ranks of the data with the additional property that it is unbiased for the finite variance. Because of its invariance under any monotone transformation of the data its applicability is not confined to quantitative data. The estimator may be applied to ordinal data just as well. Some properties are discussed and a numerical example is given.  相似文献   

11.

Background  

In a high throughput setting, effective flow cytometry data analysis depends heavily on proper data preprocessing. While usual preprocessing steps of quality assessment, outlier removal, normalization, and gating have received considerable scrutiny from the community, the influence of data transformation on the output of high throughput analysis has been largely overlooked. Flow cytometry measurements can vary over several orders of magnitude, cell populations can have variances that depend on their mean fluorescence intensities, and may exhibit heavily-skewed distributions. Consequently, the choice of data transformation can influence the output of automated gating. An appropriate data transformation aids in data visualization and gating of cell populations across the range of data. Experience shows that the choice of transformation is data specific. Our goal here is to compare the performance of different transformations applied to flow cytometry data in the context of automated gating in a high throughput, fully automated setting. We examine the most common transformations used in flow cytometry, including the generalized hyperbolic arcsine, biexponential, linlog, and generalized Box-Cox, all within the BioConductor flowCore framework that is widely used in high throughput, automated flow cytometry data analysis. All of these transformations have adjustable parameters whose effects upon the data are non-intuitive for most users. By making some modelling assumptions about the transformed data, we develop maximum likelihood criteria to optimize parameter choice for these different transformations.  相似文献   

12.
An approximate method for estimating the sample size in simple random sampling and a systematic way of transformation of sample data are derived by using the parameters α and β of the regression of mean crowding on mean density in the spatial distribution per quadrat of animal populations (Iwao , 1968). If the values of α and β have been known for the species concerned, the sample size needed to attain a desired precision can be estimated by simply knowing the approximate level of mean density of the population to be sampled. Also, an appropriate variance stabilizing transformation of sample data can be obtained by the method given here without restrictions on the distribution pattern of the frequency counts.  相似文献   

13.
MOTIVATION: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. RESULTS: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.  相似文献   

14.
15.
In applied entomological experiments, when the response is a count-type variable, certain transformation remedies such as the square root, logarithm (log), or rank transformation are often used to normalize data before analysis of variance. In this study, we examine the usefulness of these transformations by reanalyzing field-collected data from a split-plot experiment and by performing a more comprehensive simulation study of factorial and split-plot experiments. For field-collected data, significant interactions were dependent upon the type of transformation. For the simulation study, Poisson distributed errors were used for a 2 by 2 factorial arrangement, in both randomized complete block and split-plot settings. Various sizes of main effects were induced, and type I error rates and powers of the tests for interaction were examined for the raw response values, log-, square root-, and rank-transformed responses. The aligned rank transformation also was investigated because it has been shown to perform well in testing interactions in factorial arrangements. We found that for testing interactions, the untransformed response and the aligned rank response performed best (preserved nominal type I error rates), whereas the other transformations had inflated error rates when main effects were present. No evaluations of the tests for main effects or simple effects have been conducted. Potentially these transformations will still be necessary when performing these tests.  相似文献   

16.
Bertail P  Tressou J 《Biometrics》2006,62(1):66-74
This article proposes statistical tools for quantitative evaluation of the risk due to the presence of some particular contaminants in food. We focus on the estimation of the probability of the exposure to exceed the so-called provisional tolerable weekly intake (PTWI), when both consumption data and contamination data are independently available. A Monte Carlo approximation of the plug-in estimator, which may be seen as an incomplete generalized U-statistic, is investigated. We obtain the asymptotic properties of this estimator and propose several confidence intervals, based on two estimators of the asymptotic variance: (i) a bootstrap type estimator and (ii) an approximate jackknife estimator relying on the Hoeffding decomposition of the original U-statistics. As an illustration, we present an evaluation of the exposure to Ochratoxin A in France.  相似文献   

17.
MOTIVATION: Many standard statistical techniques are effective on data that are normally distributed with constant variance. Microarray data typically violate these assumptions since they come from non-Gaussian distributions with a non-trivial mean-variance relationship. Several methods have been proposed that transform microarray data to stabilize variance and draw its distribution towards the Gaussian. Some methods, such as log or generalized log, rely on an underlying model for the data. Others, such as the spread-versus-level plot, do not. We propose an alternative data-driven multiscale approach, called the Data-Driven Haar-Fisz for microarrays (DDHFm) with replicates. DDHFm has the advantage of being 'distribution-free' in the sense that no parametric model for the underlying microarray data is required to be specified or estimated; hence, DDHFm can be applied very generally, not just to microarray data. RESULTS: DDHFm achieves very good variance stabilization of microarray data with replicates and produces transformed intensities that are approximately normally distributed. Simulation studies show that it performs better than other existing methods. Application of DDHFm to real one-color cDNA data validates these results. AVAILABILITY: The R package of the Data-Driven Haar-Fisz transform (DDHFm) for microarrays is available in Bioconductor and CRAN.  相似文献   

18.
In recent years osteologists have frequently used non-metric (dichotomous) cranial data to measure biological distance between skeletal samples of Homo sapiens. Applying methods used earlier by biologists, these workers begin by attempting to stabilize the variance of the measures used by transforming the observed trait frequencies using some type of inverse sine transformation. The frequently used Grewal-Smith transformation doesn't work well for small samples of the size often considered by osteologists. As a consequence the mean measure of divergence between populations determined by this method is strongly influenced by a bias which depends on sample size. This paper compares several transformations in terms of how close the actual variance of the transformed frequency corresponds to its nominal value. It is suggested that the traditional (Grewal-Smith) inverse sine transformation not be used, and several alternatives are considered.  相似文献   

19.
The conceptual simplicity of DNA microarray technology often belies the complex nature of the measurement errors inherent in the methodology. As the technology has developed, the importance of understanding the sources of uncertainty in the measurements and developing ways to control their influence on the conclusions drawn has become apparent. In this review, strategies for modeling measurement errors and minimizing their effect on the outcome of experiments using a variety of techniques are discussed in the context of spotted, dual-color microarrays. First, methods designed to reduce the influence of random variability through data filtering, replication, and experimental design are introduced. This is followed by a review of data analysis methods that partition the variance into random effects and one or more systematic effects, specifically two-sample significance testing and analysis of variance (ANOVA) methods. Finally, the current state of measurement error models for spotted microarrays and their role in variance stabilizing transformations are discussed.  相似文献   

20.
Diagnostic or screening tests are widely used in medical fields to classify patients according to their disease status. Several statistical models for meta‐analysis of diagnostic test accuracy studies have been developed to synthesize test sensitivity and specificity of a diagnostic test of interest. Because of the correlation between test sensitivity and specificity, modeling the two measures using a bivariate model is recommended. In this paper, we extend the current standard bivariate linear mixed model (LMM) by proposing two variance‐stabilizing transformations: the arcsine square root and the Freeman–Tukey double arcsine transformation. We compared the performance of the proposed methods with the standard method through simulations using several performance measures. The simulation results showed that our proposed methods performed better than the standard LMM in terms of bias, root mean square error, and coverage probability in most of the scenarios, even when data were generated assuming the standard LMM. We also illustrated the methods using two real data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号