首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. RESULTS: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.  相似文献   

2.
Neuroproteomics is aimed to study the molecular organisation of the nervous system at the protein level. Two-dimensional electrophoresis is the most frequently used technique in quantitative proteomics. The aim of this study was to assess the experimental and biological variations on this proteomic platform using mouse brain tissue. Mice are the most generally used lab animals for modelling human disease or investigating the effect of a drug-candidate or a treatment. Experimental design plays a crucial role in quantitative proteomics, hence understanding and minimizing the variables is essential. Our results indicate that the technical variance dominantly contributes to the total variance in mouse brain and the genetic background has a negligible effect on the total variation. The results also characterise the anticipated variation using mouse brain for proteomic study hence they should be useful for future experimental design in other proteomics laboratories.  相似文献   

3.
Optimal experimental design is important for the efficient use of modern highthroughput technologies such as microarrays and proteomics. Multiple factors including the reliability of measurement system, which itself must be estimated from prior experimental work, could influence design decisions. In this study, we describe how the optimal number of replicate measures (technical replicates) for each biological sample (biological replicate) can be determined. Different allocations of biological and technical replicates were evaluated by minimizing the variance of the ratio of technical variance (measurement error) to the total variance (sum of sampling error and measurement error). We demonstrate that if the number of biological replicates and the number of technical replicates per biological sample are variable, while the total number of available measures is fixed, then the optimal allocation of replicates for measurement evaluation experiments requires two technical replicates for each biological replicate. Therefore, it is recommended to use two technical replicates for each biological replicate if the goal is to evaluate the reproducibility of measurements.  相似文献   

4.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

5.
Quantitative proteomics investigates physiology at the molecular level by measuring relative differences in protein expression between samples under different experimental conditions. A major obstacle to reliably determining quantitative changes in protein expression is to overcome error imposed by technical variation and biological variation. In drug discovery and development the issue of biological variation often rises in concordance with the developmental stage of research, spanning from in vitro assays to clinical trials. In this paper we present case studies to raise awareness to the issues of technical variation and biological variation and the impact this places on applying quantitative proteomics. We defined the degree of technical variation from the process of two-dimensional electrophoresis as 20-30% coefficient of variation. On the other hand, biological variation observed experiment-to-experiment showed a broader degree of variation depending upon the sample type. This was demonstrated with case studies where variation was monitored across experiments with bacteria, established cell lines, primary cultures, and with drug treated human subjects. We discuss technical variation and biological variation as key factors to consider during experimental design, and offer insight into preparing experiments that overcome this challenge to provide statistically significant outcomes for conducting quantitative proteomic research.  相似文献   

6.
7.
Knowledge about the extent of total variation experienced between samples from different individuals is of great importance for the design of not only proteomics but every clinical study. This variation defines the smallest statistically significant detectable signal difference when comparing two groups of individuals. We isolated platelets from 20 healthy human volunteers aged 56-100 years because this age group is most commonly encountered in the clinics. We determined the technical and total variation experienced in a proteome analysis using two-dimensional DIGE with IPGs in the pI ranges 4-7 and 6-9. Only spots that were reproducibly detectable in at least 90% of all gels (n = 908) were included in the study. All spots had a similar technical variation with a median coefficient of variation (cv) of about 7%. In contrast, spots showed a more diverse total variation between individuals with a surprisingly low median cv of only 18%. Because most known biomarkers show an effect size in a 1-2-fold range of their cv, any future clinical proteomics study with platelets will require an analytical method that is able to detect such small quantitative differences. In addition, we calculated the minimal number of samples (sample size) needed to detect given protein expression differences with statistical significance.  相似文献   

8.
Summary Many studies have shown that segregating quantitative trait loci (QTL) can be detected via linkage to genetic markers. Power to detect a QTL effect on the trait mean as a function of the number of individuals genotyped for the marker is increased by selectively genotyping individuals with extreme values for the quantitative trait. Computer simulations were employed to study the effect of various sampling strategies on the statistical power to detect QTL variance effects. If only individuals with extreme phenotypes for the quantitative trait are selected for genotyping, then power to detect a variance effect is less than by random sampling. If 0.2 of the total number of individuals genotyped are selected from the center of the distribution, then power to detect a variance effect is equal to that obtained with random selection. Power to detect a variance effect was maximum when 0.2 to 0.5 of the individuals selected for genotyping were selected from the tails of the distribution and the remainder from the center.  相似文献   

9.
The effect of sample duration on the quantification of stream drift   总被引:1,自引:0,他引:1  
1. We performed computer simulations and a field experiment to determine the effect that sample duration and, thus, sample volume had on estimates of drift density and sample variance. 2. In computer simulations, when the spatial arrangement of individuals in the water column approximated a random and a contagious-random distribution, estimated mean drift density was not significantly affected by sample duration, but sample variance decreased curvilinearly as sample duration increased. 3. Similar results were obtained in field experiments in habitats of high and low water velocity. 4. Our findings from an Albertan stream indicate that the relationship between sample variance (i.e. coefficient of variation) and duration of drift samples is curvilinear. This relationship affected the number of samples required to achieve a specific level of precision (i.e. a standard error within 10% of the mean). For estimates in low and high current velocities, sample variation was halved by increasing the duration of sample collections from 10 to 20 min. The increased precision obtained with samples of 20 min duration reduced the amount of drift material that needed to be processed by approximately 50% compared with an equivalent 10% level of precision for samples of 10 min duration. This reduction in the number of samples required to obtain a given level of precision has important consequences to the cost of processing drift samples. 5. Thus to optimize studies of stream invertebrate drift, both in terms of sample precision and processing effort, researchers must consider the effect that sample volume has on the variance of drift density estimates. Because researchers generally use drift nets with similar-sized apertures (>300cm2), the problem for specific field applications becomes one of optimizing sample duration relative to variance estimates for drift density.  相似文献   

10.
The quest to understand biological systems requires further attention of the scientific community to the challenges faced in proteomics. In fact the complexity of the proteome reaches uncountable orders of magnitude. This means that significant technical and data‐analytic innovations will be needed for the full understanding of biology. Current state of art MS is probably our best choice for studying protein complexity and exploring new ways to use MS and MS derived data should be given higher priority. We present here a brief overview of visualization and statistical analysis strategies for quantitative peptide values on an individual protein basis. These analysis strategies can help pinpoint protein modifications, splice, and genomic variants of biological relevance. We demonstrate the application of these data analysis strategies using a bottom‐up proteomics dataset obtained in a drug profiling experiment. Furthermore, we have also observed that the presented methods are useful for studying peptide distributions from clinical samples from a large number of individuals. We expect that the presented data analysis strategy will be useful in the future to define functional protein variants in biological model systems and disease studies. Therefore robust software implementing these strategies is urgently needed.  相似文献   

11.
The efficiency of pooling mRNA in microarray experiments   总被引:11,自引:0,他引:11  
In a microarray experiment, messenger RNA samples are oftentimes pooled across subjects out of necessity, or in an effort to reduce the effect of biological variation. A basic problem in such experiments is to estimate the nominal expression levels of a large number of genes. Pooling samples will affect expression estimation, but the exact effects are not yet known as the approach has not been systematically studied in this context. We consider how mRNA pooling affects expression estimates by assessing the finite-sample performance of different estimators for designs with and without pooling. Conditions under which it is advantageous to pool mRNA are defined; and general properties of estimates from both pooled and non-pooled designs are derived under these conditions. A formula is given for the total number of subjects and arrays required in a pooled experiment to obtain gene expression estimates and confidence intervals comparable to those obtained from the no-pooling case. The formula demonstrates that by pooling a perhaps increased number of subjects, one can decrease the number of arrays required in an experiment without a loss of precision. The assumptions that facilitate derivation of this formula are considered using data from a quantitative real-time PCR experiment. The calculations are not specific to one particular method of quantifying gene expression as they assume only that a single, normalized, estimate of expression is obtained for each gene. As such, the results should be generally applicable to a number of technologies provided sufficient pre-processing and normalization methods are available and applied.  相似文献   

12.
Judit M. Nagy 《Proteomics》2010,10(10):1903-1905
The Biological Reference Material Initiative Workshop held at the Toronto HUPO congress on 26 September 2009, focused on the development of new biological reference materials and tools for the assessment of reproducibility, the solutions to many of the technical challenges in proteomics and protein‐based molecular diagnostics. This half‐day meeting included presentations from leading scientists from the worldwide proteomic community, who shared a common interest in standardization and increased accuracy of proteomic data. The conclusion was that proteomics is highly sensitive to both biological and technical variability. It is this biological and technical variance, when not accounted for by experiment design, that invalidates proteomic experiments, but both of these issues can be dealt with by tackling reproducibility.  相似文献   

13.
Experimental variance is a major challenge when dealing with high-throughput sequencing data. This variance has several sources: sampling replication, technical replication, variability within biological conditions, and variability between biological conditions. The high per-sample cost of RNA-Seq often precludes the large number of experiments needed to partition observed variance into these categories as per standard ANOVA models. We show that the partitioning of within-condition to between-condition variation cannot reasonably be ignored, whether in single-organism RNA-Seq or in Meta-RNA-Seq experiments, and further find that commonly-used RNA-Seq analysis tools, as described in the literature, do not enforce the constraint that the sum of relative expression levels must be one, and thus report expression levels that are systematically distorted. These two factors lead to misleading inferences if not properly accommodated. As it is usually only the biological between-condition and within-condition differences that are of interest, we developed ALDEx, an ANOVA-like differential expression procedure, to identify genes with greater between- to within-condition differences. We show that the presence of differential expression and the magnitude of these comparative differences can be reasonably estimated with even very small sample sizes.  相似文献   

14.
Estimates of genetic diversity in major geographic regions are frequently made by pooling all individuals into regional aggregates. This method can potentially bias results if there are differences in population substructure within regions, since increased variation among local populations could inflate regional diversity. A preferred method of estimating regional diversity is to compute the mean diversity within local populations. Both methods are applied to a global sample of craniometric data consisting of 57 measurements taken on 1734 crania from 18 local populations in six geographic regions: sub-Saharan Africa, Europe, East Asia, Australasia, Polynesia, and the Americas. Each region is represented by three local populations. Both methods for estimating regional diversity show sub-Saharan Africa to have the highest levels of phenotypic variation, consistent with many genetic studies. Polynesia and the Americas both show high levels of regional diversity when regional aggregates are used, but the lowest mean local population diversity. Regional estimates of F(ST) made using quantitative genetic methods show that both Polynesia and the Americas also have the highest levels of differentiation among local populations, which inflates regional diversity. Regional differences in F(ST) are directly related to the geographic dispersion of samples within each region; higher F(ST) values occur when the local populations are geographically dispersed. These results show that geographic sampling can affect results, and suggest caution in making inferences regarding regional diversity when population substructure is ignored.  相似文献   

15.
Weinberg CR  Umbach DM 《Biometrics》1999,55(3):718-726
Assays can be so expensive that interesting hypotheses become impractical to study epidemiologically. One need not, however, perform an assay for everyone providing a biological specimen. We propose pooling equal-volume aliquots from randomly grouped sets of cases and randomly grouped sets of controls, and then assaying the smaller number of pooled samples. If an effect modifier is of concern, the pooling can be done within strata defined by that variable. For covariates assessed on individuals (e.g., questionnaire data), set-based counterparts are calculated by adding the values for the individuals in each set. The pooling set then becomes the unit of statistical analysis. We show that, with appropriate specification of a set-based logistic model, standard software yields a valid estimated exposure odds ratio, provided the multiplicative formulation is correct. Pooling minimizes the depletion of irreplaceable biological specimens and can enable additional exposures to be studied economically. Statistical power suffers very little compared with the usual, individual-based analysis. In settings where high assay costs constrain the number of people an investigator can afford to study, specimen pooling can make it possible to study more people and hence improve the study's statistical power with no increase in cost.  相似文献   

16.
Sequencing pools of individuals rather than individuals separately reduces the costs of estimating allele frequencies at many loci in many populations. Theoretical and empirical studies show that sequencing pools comprising a limited number of individuals (typically fewer than 50) provides reliable allele frequency estimates, provided that the DNA pooling and DNA sequencing steps are carefully controlled. Unequal contributions of different individuals to the DNA pool and the mean and variance in sequencing depth both can affect the standard error of allele frequency estimates. To our knowledge, no study separately investigated the effect of these two factors on allele frequency estimates; so that there is currently no method to a priori estimate the relative importance of unequal individual DNA contributions independently of sequencing depth. We develop a new analytical model for allele frequency estimation that explicitly distinguishes these two effects. Our model shows that the DNA pooling variance in a pooled sequencing experiment depends solely on two factors: the number of individuals within the pool and the coefficient of variation of individual DNA contributions to the pool. We present a new method to experimentally estimate this coefficient of variation when planning a pooled sequencing design where samples are either pooled before or after DNA extraction. Using this analytical and experimental framework, we provide guidelines to optimize the design of pooled sequencing experiments. Finally, we sequence replicated pools of inbred lines of the plant Medicago truncatula and show that the predictions from our model generally hold true when estimating the frequency of known multilocus haplotypes using pooled sequencing.  相似文献   

17.

Background  

Typically, pooling of mRNA samples in microarray experiments implies mixing mRNA from several biological-replicate samples before hybridization onto a microarray chip. Here we describe an alternative smart pooling strategy in which different samples, not necessarily biological replicates, are pooled in an information theoretic efficient way. Further, each sample is tested on multiple chips, but always in pools made up of different samples. The end goal is to exploit the compressibility of microarray data to reduce the number of chips used and increase the robustness to noise in measurements.  相似文献   

18.
Analysis of primary animal and human tissues is key in biological and biomedical research. Comparative proteomics analysis of primary biological material would benefit from uncomplicated experimental work flows capable of evaluating an unlimited number of samples. In this report we describe the application of label-free proteomics to the quantitative analysis of five mouse core proteomes. We developed a computer program and normalization procedures that allow exploitation of the quantitative data inherent in LC-MS/MS experiments for relative and absolute quantification of proteins in complex mixtures. Important features of this approach include (i) its ability to compare an unlimited number of samples, (ii) its applicability to primary tissues and cultured cells, (iii) its straightforward work flow without chemical reaction steps, and (iv) its usefulness not only for relative quantification but also for estimation of absolute protein abundance. We applied this approach to quantitatively characterize the most abundant proteins in murine brain, heart, kidney, liver, and lung. We matched 8,800 MS/MS peptide spectra to 1,500 proteins and generated 44,000 independent data points to profile the approximately 1,000 most abundant proteins in mouse tissues. This dataset provides a quantitative profile of the fundamental proteome of a mouse, identifies the major similarities and differences between organ-specific proteomes, and serves as a paradigm of how label-free quantitative MS can be used to characterize the phenotype of mammalian primary tissues at the molecular level.  相似文献   

19.
Estimating p-values in small microarray experiments   总被引:5,自引:0,他引:5  
MOTIVATION: Microarray data typically have small numbers of observations per gene, which can result in low power for statistical tests. Test statistics that borrow information from data across all of the genes can improve power, but these statistics have non-standard distributions, and their significance must be assessed using permutation analysis. When sample sizes are small, the number of distinct permutations can be severely limited, and pooling the permutation-derived test statistics across all genes has been proposed. However, the null distribution of the test statistics under permutation is not the same for equally and differentially expressed genes. This can have a negative impact on both p-value estimation and the power of information borrowing statistics. RESULTS: We investigate permutation based methods for estimating p-values. One of methods that uses pooling from a selected subset of the data are shown to have the correct type I error rate and to provide accurate estimates of the false discovery rate (FDR). We provide guidelines to select an appropriate subset. We also demonstrate that information borrowing statistics have substantially increased power compared to the t-test in small experiments.  相似文献   

20.
MOTIVATION: Microarrays can simultaneously measure the expression levels of many genes and are widely applied to study complex biological problems at the genetic level. To contain costs, instead of obtaining a microarray on each individual, mRNA from several subjects can be first pooled and then measured with a single array. mRNA pooling is also necessary when there is not enough mRNA from each subject. Several studies have investigated the impact of pooling mRNA on inferences about gene expression, but have typically modeled the process of pooling as if it occurred in some transformed scale. This assumption is unrealistic. RESULTS: We propose modeling the gene expression levels in a pool as a weighted average of mRNA expression of all individuals in the pool on the original measurement scale, where the weights correspond to individual sample contributions to the pool. Based on these improved statistical models, we develop the appropriate F statistics to test for differentially expressed genes. We present formulae to calculate the power of various statistical tests under different strategies for pooling mRNA and compare resulting power estimates to those that would be obtained by following the approach proposed by Kendziorski et al. (2003). We find that the Kendziorski estimate tends to exceed true power and that the estimate we propose, while somewhat conservative, is less biased. We argue that it is possible to design a study that includes mRNA pooling at a significantly reduced cost but with little loss of information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号