首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Analysis of variance components in gene expression data   总被引:5,自引:0,他引:5  
MOTIVATION: A microarray experiment is a multi-step process, and each step is a potential source of variation. There are two major sources of variation: biological variation and technical variation. This study presents a variance-components approach to investigating animal-to-animal, between-array, within-array and day-to-day variations for two data sets. The first data set involved estimation of technical variances for pooled control and pooled treated RNA samples. The variance components included between-array, and two nested within-array variances: between-section (the upper- and lower-sections of the array are replicates) and within-section (two adjacent spots of the same gene are printed within each section). The second experiment was conducted on four different weeks. Each week there were reference and test samples with a dye-flip replicate in two hybridization days. The variance components included week-to-week, animal-to-animal and between-array and within-array variances. RESULTS: We applied the linear mixed-effects model to quantify different sources of variation. In the first data set, we found that the between-array variance is greater than the between-section variance, which, in turn, is greater than the within-section variance. In the second data set, for the reference samples, the week-to-week variance is larger than the between-array variance, which, in turn, is slightly larger than the within-array variance. For the test samples, the week-to-week variance has the largest variation. The animal-to-animal variance is slightly larger than the between-array and within-array variances. However, in a gene-by-gene analysis, the animal-to-animal variance is smaller than the between-array variance in four out of five housekeeping genes. In summary, the largest variation observed is the week-to-week effect. Another important source of variability is the animal-to-animal variation. Finally, we describe the use of variance-component estimates to determine optimal numbers of animals, arrays per animal and sections per array in planning microarray experiments.  相似文献   

2.
This work is a statistical analysis of reproducibility of a MALDI-TOF mass spectrometry experiment. Its aim is to evaluate measurement variability and compare peak intensities from two types of MALDI-TOF platforms. We compared and commented on the abilities of Principal Component Analysis and mixed-model analysis of variance to evaluate the biological variability and the technical variability of peak intensities in different patients. The properties and hypotheses of both methods are summarized and applied to spectra from plasma of patients with Hodgkin lymphoma. Principal Component Analysis checks rapidly the balance between the two variabilities; however, a mixed-model analysis of variance is necessary to quantify the biological and technical components of the experimental variance as well as their interactions and to split the total variance into between-subjects and within-subject components. The latter method helped to assess the reproducibility of measurements from two MALDI-TOF platforms and to decompose the technical variability according to the experimental design.  相似文献   

3.
We studied the general problem of interpreting and detecting differences in phenotypic variability among the genotypes at a locus, from both a biological and a statistical point of view. The scales on which we measure interval-scale quantitative traits are man-made and have little intrinsic biological relevance. Before claiming a biological interpretation for genotype differences in variance, we should be sure that no monotonic transformation of the data can reduce or eliminate these differences. We show theoretically that for an autosomal diallelic SNP, when the three corresponding means are distinct so that the variance can be expressed as a quadratic function of the mean, there implicitly exists a transformation that will tend to equalize the three variances; we also demonstrate how to find a transformation that will do this. We investigate the validity of Bartlett’s test, Box’s modification of it, and a modified Levene’s test to test for differences in variances when normality does not hold. We find that, although they may detect differences in variability, these tests do not necessarily detect differences in variance. The same is true for permutation tests that use these three statistics.  相似文献   

4.
5.
Two-dimensional difference gel electrophoresis (2-D DIGE) allows for reliable quantification of global protein abundance changes. The threshold of significance for protein abundance changes depends on the experimental variation (biological and technical). This study estimates biological, technical and total variation inherent to 2-D DIGE analysis of environmental bacteria, using the model organisms "Aromatoleum aromaticum" EbN1 and Phaeobacter gallaeciensis DSM 17395. Of both bacteria the soluble proteomes were analyzed from replicate cultures. For strains EbN1 and DSM 17395, respectively, CV revealed a total variation of below 19 and 15%, an average technical variation of 12 and 7%, and an average biological variation of 18 and 17%. Multivariate analysis of variance confirmed domination of biological over technical variance to be significant in most cases. To visualize variances, the complex protein data have been plotted with a multidimensional scaling technique. Furthermore, comparison of different treatment groups (different substrate conditions) demonstrated that variability within groups is significantly smaller than differences caused by treatment.  相似文献   

6.
In quantitative shotgun proteomic analyses by liquid chromatography and mass spectrometry, a rigid study design is necessary in order to obtain statistically relevant results. Hypothesis testing, sample size calculation and power estimation are fundamental concepts that require consideration upon designing an experiment. For this reason, the reproducibility and variability of the proteomic platform needs to be assessed. In this study, we evaluate the technical (sample preparation), labeling (isobaric labels), and total (biological + technical + labeling + experimental) variability and reproducibility of a workflow that employs a shotgun LC-MS/MS approach in combination with TMT peptide labeling for the quantification of peripheral blood mononuclear cell (PBMC) proteome. We illustrate that the variability induced by TMT labeling is small when compared to the technical variation. The latter is also responsible for a substantial part of the total variation. Prior knowledge about the experimental variability allows for a correct design, a prerequisite for the detection of biologically significant disease-specific differential proteins in clinical proteomics experiments.  相似文献   

7.
Accommodating general patterns of confounding in sample size/power calculations for observational studies is extremely challenging, both technically and scientifically. While employing previously implemented sample size/power tools is appealing, they typically ignore important aspects of the design/data structure. In this paper, we show that sample size/power calculations that ignore confounding can be much more unreliable than is conventionally thought; using real data from the US state of North Carolina, naive calculations yield sample size estimates that are half those obtained when confounding is appropriately acknowledged. Unfortunately, eliciting realistic design parameters for confounding mechanisms is difficult. To overcome this, we propose a novel two-stage strategy for observational study design that can accommodate arbitrary patterns of confounding. At the first stage, researchers establish bounds for power that facilitate the decision of whether or not to initiate the study. At the second stage, internal pilot data are used to estimate key scientific inputs that can be used to obtain realistic sample size/power. Our results indicate that the strategy is effective at replicating gold standard calculations based on knowing the true confounding mechanism. Finally, we show that consideration of the nature of confounding is a crucial aspect of the elicitation process; depending on whether the confounder is positively or negatively associated with the exposure of interest and outcome, naive power calculations can either under or overestimate the required sample size. Throughout, simulation is advocated as the only general means to obtain realistic estimates of statistical power; we describe, and provide in an R package, a simple algorithm for estimating power for a case-control study.  相似文献   

8.
Cultivated bread wheat (Triticum aestivum L.) is an allohexaploid species resulting from the natural hybridization and chromosome doubling of allotetraploid durum wheat (T. turgidum) and a diploid goatgrass Aegilops tauschii Coss (Ae. tauschii). Synthetic hexaploid wheat (SHW) was developed through the interspecific hybridization of Ae. tauschii and T. turgidum, and then crossed to T. aestivum to produce synthetic hexaploid wheat derivatives (SHWDs). Owing to this founding variability, one may infer that the genetic variances of native wild populations vs improved wheat may vary due to their differential origin and evolutionary history. In this study, we partitioned the additive variance of SHW and SHWD with respect to their breed origin by fitting a hierarchical Bayesian model with heterogeneous covariance structure for breeding values to estimate variance components for each breed category, and segregation variance. Two data sets were used to test the proposed hierarchical Bayesian model, one from a multi-year multi-location field trial of SHWD and the other comprising the two species of SHW. For the SHWD, the Bayesian estimates of additive variances of grain yield from each breed category were similar for T. turgidum and Ae. tauschii, but smaller for T. aestivum. Segregation variances between Ae. tauschii—T. aestivum and T. turgidum—T. aestivum populations explained a sizable proportion of the phenotypic variance. Bayesian additive variance components and the Best Linear Unbiased Predictors (BLUPs) estimated by two well-known software programs were similar for multi-breed origin and for the sum of the breeding values by origin for both data sets. Our results support the suitability of models with heterogeneous additive genetic variances to predict breeding values in wheat crosses with variable ploidy levels.  相似文献   

9.
10.
Judit M. Nagy 《Proteomics》2010,10(10):1903-1905
The Biological Reference Material Initiative Workshop held at the Toronto HUPO congress on 26 September 2009, focused on the development of new biological reference materials and tools for the assessment of reproducibility, the solutions to many of the technical challenges in proteomics and protein‐based molecular diagnostics. This half‐day meeting included presentations from leading scientists from the worldwide proteomic community, who shared a common interest in standardization and increased accuracy of proteomic data. The conclusion was that proteomics is highly sensitive to both biological and technical variability. It is this biological and technical variance, when not accounted for by experiment design, that invalidates proteomic experiments, but both of these issues can be dealt with by tackling reproducibility.  相似文献   

11.
Experimental variance is a major challenge when dealing with high-throughput sequencing data. This variance has several sources: sampling replication, technical replication, variability within biological conditions, and variability between biological conditions. The high per-sample cost of RNA-Seq often precludes the large number of experiments needed to partition observed variance into these categories as per standard ANOVA models. We show that the partitioning of within-condition to between-condition variation cannot reasonably be ignored, whether in single-organism RNA-Seq or in Meta-RNA-Seq experiments, and further find that commonly-used RNA-Seq analysis tools, as described in the literature, do not enforce the constraint that the sum of relative expression levels must be one, and thus report expression levels that are systematically distorted. These two factors lead to misleading inferences if not properly accommodated. As it is usually only the biological between-condition and within-condition differences that are of interest, we developed ALDEx, an ANOVA-like differential expression procedure, to identify genes with greater between- to within-condition differences. We show that the presence of differential expression and the magnitude of these comparative differences can be reasonably estimated with even very small sample sizes.  相似文献   

12.

Introduction

A central issue in the design of microarray-based analysis of global gene expression is that variability resulting from experimental processes may obscure changes resulting from the effect being investigated. This study quantified the variability in gene expression at each level of a typical in vitro stimulation experiment using human peripheral blood mononuclear cells (PBMC). The primary objective was to determine the magnitude of biological and technical variability relative to the effect being investigated, namely gene expression changes resulting from stimulation with lipopolysaccharide (LPS).

Methods and Results

Human PBMC were stimulated in vitro with LPS, with replication at 5 levels: 5 subjects each on 2 separate days with technical replication of LPS stimulation, amplification and hybridisation. RNA from samples stimulated with LPS and unstimulated samples were hybridised against common reference RNA on oligonucleotide microarrays. There was a closer correlation in gene expression between replicate hybridisations (0.86–0.93) than between different subjects (0.66–0.78). Deconstruction of the variability at each level of the experimental process showed that technical variability (standard deviation (SD) 0.16) was greater than biological variability (SD 0.06), although both were low (SD<0.1 for all individual components). There was variability in gene expression both at baseline and after stimulation with LPS and proportion of cell subsets in PBMC was likely partly responsible for this. However, gene expression changes after stimulation with LPS were much greater than the variability from any source, either individually or combined.

Conclusions

Variability in gene expression was very low and likely to improve further as technical advances are made. The finding that stimulation with LPS has a markedly greater effect on gene expression than the degree of variability provides confidence that microarray-based studies can be used to detect changes in gene expression of biological interest in infectious diseases.  相似文献   

13.
Karp NA  Lilley KS 《Proteomics》2005,5(12):3105-3115
DIGE is a powerful tool for measuring changes in protein expression between samples. Here we assess the assumptions of normality and heterogeneity of variance that underlie the univariate statistical tests routinely used to detect proteins with expression changes. Furthermore, the technical variance experienced in a multigel experiment is assessed here and found to be reproducible within- and across-sample types. Utilising the technical variance measured, a power study is completed for several "typical" fold changes in expression commonly used as thresholds by researchers. Based on this study using DeCyder, guidance is given on the number of gel replicates that are needed for the experiment to have sufficient sensitivity to detect expression changes. A two-dye system based on utilising just Cy3 and Cy5 was found to be more reproducible than the three-dye system. A power and cost-benefit analysis performed here suggests that the traditional three-dye system would use fewer resources in studies where multiple samples are compared. Technical variance was shown to encompass both experimental and analytical noise and thus is dependent on the analytical software utilised. Data is provided as a resource to the community to assess alternative software and upgrades.  相似文献   

14.
State‐space models (SSMs) are a popular tool for modeling animal abundances. Inference difficulties for simple linear SSMs are well known, particularly in relation to simultaneous estimation of process and observation variances. Several remedies to overcome estimation problems have been studied for relatively simple SSMs, but whether these challenges and proposed remedies apply for nonlinear stage‐structured SSMs, an important class of ecological models, is less well understood. Here we identify improvements for inference about nonlinear stage‐structured SSMs fit with biased sequential life stage data. Theoretical analyses indicate parameter identifiability requires covariates in the state processes. Simulation studies show that plugging in externally estimated observation variances, as opposed to jointly estimating them with other parameters, reduces bias and standard error of estimates. In contrast to previous results for simple linear SSMs, strong confounding between jointly estimated process and observation variance parameters was not found in the models explored here. However, when observation variance was also estimated in the motivating case study, the resulting process variance estimates were implausibly low (near‐zero). As SSMs are used in increasingly complex ways, understanding when inference can be expected to be successful, and what aids it, becomes more important. Our study illustrates (a) the need for relevant process covariates and (b) the benefits of using externally estimated observation variances for inference about nonlinear stage‐structured SSMs.  相似文献   

15.
Lymphoblastoid cell lines (LCLs), originally collected as renewable sources of DNA, are now being used as a model system to study genotype–phenotype relationships in human cells, including searches for QTLs influencing levels of individual mRNAs and responses to drugs and radiation. In the course of attempting to map genes for drug response using 269 LCLs from the International HapMap Project, we evaluated the extent to which biological noise and non-genetic confounders contribute to trait variability in LCLs. While drug responses could be technically well measured on a given day, we observed significant day-to-day variability and substantial correlation to non-genetic confounders, such as baseline growth rates and metabolic state in culture. After correcting for these confounders, we were unable to detect any QTLs with genome-wide significance for drug response. A much higher proportion of variance in mRNA levels may be attributed to non-genetic factors (intra-individual variance—i.e., biological noise, levels of the EBV virus used to transform the cells, ATP levels) than to detectable eQTLs. Finally, in an attempt to improve power, we focused analysis on those genes that had both detectable eQTLs and correlation to drug response; we were unable to detect evidence that eQTL SNPs are convincingly associated with drug response in the model. While LCLs are a promising model for pharmacogenetic experiments, biological noise and in vitro artifacts may reduce power and have the potential to create spurious association due to confounding.  相似文献   

16.
Mass spectrometric profiling approaches such as MALDI‐TOF and SELDI‐TOF are increasingly being used in disease marker discovery, particularly in the lower molecular weight proteome. However, little consideration has been given to the issue of sample size in experimental design. The aim of this study was to develop a protocol for the use of sample size calculations in proteomic profiling studies using MS. These sample size calculations can be based on a simple linear mixed model which allows the inclusion of estimates of biological and technical variation inherent in the experiment. The use of a pilot experiment to estimate these components of variance is investigated and is shown to work well when compared with larger studies. Examination of data from a number of studies using different sample types and different chromatographic surfaces shows the need for sample‐ and preparation‐specific sample size calculations.  相似文献   

17.
Effects of marker chromosomes on relative viability   总被引:2,自引:2,他引:0       下载免费PDF全文
Cockerham CC  Mukai T 《Genetics》1978,90(4):827-849
Viability relative to Cy/Pm as a standard was studied in Drosophila melanogaster. One experiment, E1, consisted of progeny from eleven distinct 7 x 7 factorial mating designs with reciprocals for second chromosomes extracted from a natural population. The other experiment, E2, consisted of two distinct sets of heterozygotes with reciprocals and corresponding homozygotes. It was established from E1 that there are little to no synergistic effects among different genotypes in a vial and that Cy and Pm heterozygotes vary almost as much as would be expected if one chromosome were held constant for wild-type heterozygotes. In wild-type heterozygotes, variances were estimated to be 0.0099 for average chromosomal effects, 0.0054 for interactions of chromosomes, 0.0021 for maternal effects, 0.0079 for paternal effects, and -0.0010 for the remaining interaction effects, all being significantly different from zero except the last. The variances of Cy and Pm heterozygotes, covariance of Cy and Pm heterozygotes, and covariances of Cy and Pm heterozygotes with wild-type heterozygotes, as well as the comparable statistics available in E2, all showed a large paternal component of variance and a smaller maternal component of variance, both unexpected results.—From E2 the variance of homozygotes, excluding error variance, was estimated to be 0.0149, and the covariances of homozygotes with wild-type heterozygotes to be 0.0056 for maternally derived chromosomes common and 0.0126 for paternally derived chromosomes common, again showing the larger paternal than maternal influence. The average genetic regression of heterozygotes on homozygotes of 0.61 was reduced only slightly to 0.56 by correcting for maternal and paternal variances. These genetic regressions, generally utilized as estimators of the average degree of dominance, are larger than any previously reported.—Differential meiotic drive in Cy and Pm parents was shown to be compatible with the large paternal and maternal variances, but other causes cannot be ruled out.—Approximations were developed for translating various variances, covariances, and regressions between single- and double-marker experiments, assuming that marker chromosomes behave as typical wild-type chromosomes in one case and assuming a (partially) recessive model with the population in mutation selection balance in another case. Various features, particularly the estimation of dominance, were compared and discussed between the two cases.  相似文献   

18.
We consider sample size calculations for testing differences in means between two samples and allowing for different variances in the two groups. Typically, the power functions depend on the sample size and a set of parameters assumed known, and the sample size needed to obtain a prespecified power is calculated. Here, we account for two sources of variability: we allow the sample size in the power function to be a stochastic variable, and we consider estimating the parameters from preliminary data. An example of the first source of variability is nonadherence (noncompliance). We assume that the proportion of subjects who will adhere to their treatment regimen is not known before the study, but that the proportion is a stochastic variable with a known distribution. Under this assumption, we develop simple closed form sample size calculations based on asymptotic normality. The second source of variability is in parameter estimates that are estimated from prior data. For example, we account for variability in estimating the variance of the normal response from existing data which are assumed to have the same variance as the study for which we are calculating the sample size. We show that we can account for the variability of the variance estimate by simply using a slightly larger nominal power in the usual sample size calculation, which we call the calibrated power. We show that the calculation of the calibrated power depends only on the sample size of the existing data, and we give a table of calibrated power by sample size. Further, we consider the calculation of the sample size in the rarer situation where we account for the variability in estimating the standardized effect size from some existing data. This latter situation, as well as several of the previous ones, is motivated by sample size calculations for a Phase II trial of a malaria vaccine candidate.  相似文献   

19.
In genetic toxicology it is important to know whether chemicals should be regarded as clearly hazardous or whether they can be considered sufficiently safe, which latter would be the case from the genotoxicologist's view if their genotoxic effects are nil or at least significantly below a predefined minimal effect level. A previously presented statistical decision procedure which allows one to make precisely this distinction is now extended to the question of how optimal experimental sample size can be determined in advance for genotoxicity experiments using the somatic mutation and recombination tests (SMART) of Drosophila. Optimally, the statistical tests should have high power to minimise the chance for statistically inconclusive results. Based on the normal test, the statistical principles are explained, and in an application to the wing spot assay, it is shown how the practitioner can proceed to optimise sample size to achieve numerically satisfactory conditions for statistical testing. The somatic genotoxicity assays of Drosophila are in principle based on somatic spots (mutant clones) that are recovered in variable numbers on individual flies. The underlying frequency distributions are expected to be of the Poisson type. However, some care seems indicated with respect to this latter assumption, because pooling of data over individuals, sexes, and experiments, for example, can (but need not) lead to data which are overdispersed, i.e, the data may show more variability than theoretically expected. It is an undesired effect of overdispersion that in comparisons of pooled totals it can lead to statistical testing which is too liberal, because overall it yields too many seemingly significant results. If individual variability considered alone is not contradiction with Poisson expectation, however, experimental planning can help to minimise the undesired effects of overdispersion on statistical testing of pooled totals. The rule for the practice is to avoid disproportionate sampling. It is recalled that for optimal power in statistical testing, it is preferable to use equal total numbers of flies in the control and treated series. Statistical tests which are based on Poisson expectations are too liberal if there is overdispersion in the data due to excess individual variability. In this case we propose to use the U test as a non-parametric two-sample test and to adjust the estimated optimal sample size according to (i) the overdispersion observed in a large historical control and (ii) the relative efficiency of the U test in comparison to the t test and related parametric tests.  相似文献   

20.
The internal pilot study design enables to estimate nuisance parameters required for sample size calculation on the basis of data accumulated in an ongoing trial. By this, misspecifications made when determining the sample size in the planning phase can be corrected employing updated knowledge. According to regulatory guidelines, blindness of all personnel involved in the trial has to be preserved and the specified type I error rate has to be controlled when the internal pilot study design is applied. Especially in the late phase of drug development, most clinical studies are run in more than one centre. In these multicentre trials, one may have to deal with an unequal distribution of the patient numbers among the centres. Depending on the type of the analysis (weighted or unweighted), unequal centre sample sizes may lead to a substantial loss of power. Like the variance, the magnitude of imbalance is difficult to predict in the planning phase. We propose a blinded sample size recalculation procedure for the internal pilot study design in multicentre trials with normally distributed outcome and two balanced treatment groups that are analysed applying the weighted or the unweighted approach. The method addresses both uncertainty with respect to the variance of the endpoint and the extent of disparity of the centre sample sizes. The actual type I error rate as well as the expected power and sample size of the procedure is investigated in simulation studies. For the weighted analysis as well as for the unweighted analysis, the maximal type I error rate was not or only minimally exceeded. Furthermore, application of the proposed procedure led to an expected power that achieves the specified value in many cases and is throughout very close to it.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号