首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A significant challenge facing high-throughput phenotyping of in-vivo knockout mice is ensuring phenotype calls are robust and reliable. Central to this problem is selecting an appropriate statistical analysis that models both the experimental design (the workflow and the way control mice are selected for comparison with knockout animals) and the sources of variation. Recently we proposed a mixed model suitable for small batch-oriented studies, where controls are not phenotyped concurrently with mutants. Here we evaluate this method both for its sensitivity to detect phenotypic effects and to control false positives, across a range of workflows used at mouse phenotyping centers. We found the sensitivity and control of false positives depend on the workflow. We show that the phenotypes in control mice fluctuate unexpectedly between batches and this can cause the false positive rate of phenotype calls to be inflated when only a small number of batches are tested, when the effect of knockout becomes confounded with temporal fluctuations in control mice. This effect was observed in both behavioural and physiological assays. Based on this analysis, we recommend two approaches (workflow and accompanying control strategy) and associated analyses, which would be robust, for use in high-throughput phenotyping pipelines. Our results show the importance in modelling all sources of variability in high-throughput phenotyping studies.  相似文献   

2.
If biological questions are to be answered using quantitative proteomics, it is essential to design experiments which have sufficient power to be able to detect changes in expression. Sample subpooling is a strategy that can be used to reduce the variance but still allow studies to encompass biological variation. Underlying sample pooling strategies is the biological averaging assumption that the measurements taken on the pool are equal to the average of the measurements taken on the individuals. This study finds no evidence of a systematic bias triggered by sample pooling for DIGE and that pooling can be useful in reducing biological variation. For the first time in quantitative proteomics, the two sources of variance were decoupled and it was found that technical variance predominates for mouse brain, while biological variance predominates for human brain. A power analysis found that as the number of individuals pooled increased, then the number of replicates needed declined but the number of biological samples increased. Repeat measures of biological samples decreased the numbers of samples required but increased the number of gels needed. An example cost benefit analysis demonstrates how researchers can optimise their experiments while taking into account the available resources.  相似文献   

3.
This simulation study was designed to study the power and type I error rate in QTL mapping using cofactor analysis in half-sib designs. A number of scenarios were simulated with different power to identify QTL by varying family size, heritability, QTL effect and map density, and three threshold levels for cofactor were considered. Generally cofactor analysis did not increase the power of QTL mapping in a half-sib design, but increased the type I error rate. The exception was with small family size where the number of correctly identified QTL increased by 13% when heritability was high and 21% when heritability was low. However, in the same scenarios the number of false positives increased by 49% and 45% respectively. With a liberal threshold level of 10% for cofactor combined with a low heritability, the number of correctly identified QTL increased by 14% but there was a 41% increase in the number of false positives. Also, the power of QTL mapping did not increase with cofactor analysis in scenarios with unequal QTL effect, sparse marker density and large QTL effect (25% of the genetic variance), but the type I error rate tended to increase. A priori, cofactor analysis was expected to have higher power than individual chromosome analysis especially in experiments with lower power to detect QTL. Our study shows that cofactor analysis increased the number of false positives in all scenarios with low heritability and the increase was up to 50% in low power experiments and with lower thresholds for cofactors.  相似文献   

4.
J I Weller  J Z Song  D W Heyen  H A Lewin  M Ron 《Genetics》1998,150(4):1699-1706
Saturated genetic marker maps are being used to map individual genes affecting quantitative traits. Controlling the "experimentwise" type-I error severely lowers power to detect segregating loci. For preliminary genome scans, we propose controlling the "false discovery rate," that is, the expected proportion of true null hypotheses within the class of rejected null hypotheses. Examples are given based on a granddaughter design analysis of dairy cattle and simulated backcross populations. By controlling the false discovery rate, power to detect true effects is not dependent on the number of tests performed. If no detectable genes are segregating, controlling the false discovery rate is equivalent to controlling the experimentwise error rate. If quantitative loci are segregating in the population, statistical power is increased as compared to control of the experimentwise type-I error. The difference between the two criteria increases with the increase in the number of false null hypotheses. The false discovery rate can be controlled at the same level whether the complete genome or only part of it has been analyzed. Additional levels of contrasts, such as multiple traits or pedigrees, can be handled without the necessity of a proportional decrease in the critical test probability.  相似文献   

5.
Genome-wide association studies have been instrumental in identifying genetic variants associated with complex traits such as human disease or gene expression phenotypes. It has been proposed that extending existing analysis methods by considering interactions between pairs of loci may uncover additional genetic effects. However, the large number of possible two-marker tests presents significant computational and statistical challenges. Although several strategies to detect epistasis effects have been proposed and tested for specific phenotypes, so far there has been no systematic attempt to compare their performance using real data. We made use of thousands of gene expression traits from linkage and eQTL studies, to compare the performance of different strategies. We found that using information from marginal associations between markers and phenotypes to detect epistatic effects yielded a lower false discovery rate (FDR) than a strategy solely using biological annotation in yeast, whereas results from human data were inconclusive. For future studies whose aim is to discover epistatic effects, we recommend incorporating information about marginal associations between SNPs and phenotypes instead of relying solely on biological annotation. Improved methods to discover epistatic effects will result in a more complete understanding of complex genetic effects.  相似文献   

6.
Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α=2.5×10-6) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.  相似文献   

7.
Flow cytometry (FCM) is widely used in cancer research for diagnosis, detection of minimal residual disease, as well as immune monitoring and profiling following immunotherapy. In all these applications, the challenge is to detect extremely rare cell subsets while avoiding spurious positive events. To achieve this objective, it helps to be able to analyze FCM data using multiple markers simultaneously, since the additional information provided often helps to minimize the number of false positive and false negative events, hence increasing both sensitivity and specificity. However, with manual gating, at most two markers can be examined in a single dot plot, and a sequential strategy is often used. As the sequential strategy discards events that fall outside preceding gates at each stage, the effectiveness of the strategy is difficult to evaluate without laborious and painstaking back-gating. Model-based analysis is a promising computational technique that works using information from all marker dimensions simultaneously, and offers an alternative approach to flow analysis that can usefully complement manual gating in the design of optimal gating strategies. Results from model-based analysis will be illustrated with examples from FCM assays commonly used in cancer immunotherapy laboratories.  相似文献   

8.
Mott R  Flint J 《Genetics》2002,160(4):1609-1618
We describe a method to simultaneously detect and fine map quantitative trait loci (QTL) that is especially suited to the mapping of modifier loci in mouse mutant models. The method exploits the high level of historical recombination present in a heterogeneous stock (HS), an outbred population of mice derived from known founder strains. The experimental design is an F(2) cross between the HS and a genetically distinct line, such as one carrying a knockout or transgene. QTL detection is performed by a standard genome scan with approximately 100 markers and fine mapping by typing the same animals using densely spaced markers over those candidate regions detected by the scan. The analysis uses an extension of the dynamic-programming technique employed previously to fine map QTL in HS mice. We show by simulation that a QTL accounting for 5% of the total variance can be detected and fine mapped with >50% probability to within 3 cM by genotyping approximately 1500 animals.  相似文献   

9.
Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. The recent availability of high-density single-nucleotide polymorphism (SNP) maps provides a potentially more powerful option. Using the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets from the Genetics Analysis Workshop 14 (GAW14), we examined how altering the density of SNP marker sets impacted the overall information content, the power to detect trait loci, and the number of false positive results. For the simulated data we used SNP maps with density of 0.3 cM, 1 cM, 2 cM, and 3 cM. For the COGA data we combined the marker sets from Illumina and Affymetrix to create a map with average density of 0.25 cM and then, using a sub-sample of these markers, created maps with density of 0.3 cM, 0.6 cM, 1 cM, 2 cM, and 3 cM. For each marker set, multipoint linkage analysis using MERLIN was performed for both dominant and recessive traits derived from marker loci. Our results showed that information content increased with increased map density. For the homogeneous, completely penetrant traits we created, there was only a modest difference in ability to detect trait loci. Additionally, as map density increased there was only a slight increase in the number of false positive results when there was linkage disequilibrium (LD) between markers. The presence of LD between markers may have led to an increased number of false positive regions but no clear relationship between regions of high LD and locations of false positive linkage signals was observed.  相似文献   

10.
MOTIVATION: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. RESULTS: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.  相似文献   

11.
The availability of genome-wide RNAi libraries has enabled researchers to rapidly assess the functions of thousands of genes; however the fact that these screens are run in living biological systems add complications above and beyond that normally seen in high-throughput screening (HTS). Specifically, error due to variance in both measurement and biology are large in such screens, leading to the conclusion that the majority of "hits" are expected to be false positives. Here, we outline basic guidelines for screen development that will help the researcher to control these forms of variance. By running a large number of positive and negative control genes, error of measurement can be accurately estimated and false negatives reduced. Likewise, by using a complex readout for the screen which is not easily mimicked by other biological pathways and phenomena, false positives can be minimized. By controlling variance in these ways, the researcher can maximize the utility of genome-wide RNAi screening.  相似文献   

12.
The power of QTL mapping by a mixed-model approach has been studied for hybrid crops but remains unknown in self-pollinated crops. Our objective was to evaluate the usefulness of mixed-model QTL mapping in the context of a breeding program for a self-pollinated crop. Specifically, we simulated a soybean (Glycine max L. Merr.) breeding program and applied a mixed-model approach that comprised three steps: variance component estimation, single-marker analyses, and multiple-marker analysis. Average power to detect QTL ranged from <1 to 47% depending on the significance level (0.01 or 0.0001), number of QTL (20 or 80), heritability of the trait (0.40 or 0.70), population size (600 or 1,200 inbreds), and number of markers (300 or 600). The corresponding false discovery rate ranged from 2 to 43%. Larger populations, higher heritability, and fewer QTL controlling the trait led to a substantial increase in power and to a reduction in the false discovery rate and bias. A stringent significance level reduced both the power and false discovery rate. There was greater power to detect major QTL than minor QTL. Power was higher and the false discovery rate was lower in hybrid crops than in self-pollinated crops. We conclude that mixed-model QTL mapping is useful for gene discovery in plant breeding programs of self-pollinated crops.  相似文献   

13.
Karp NA  Lilley KS 《Proteomics》2005,5(12):3105-3115
DIGE is a powerful tool for measuring changes in protein expression between samples. Here we assess the assumptions of normality and heterogeneity of variance that underlie the univariate statistical tests routinely used to detect proteins with expression changes. Furthermore, the technical variance experienced in a multigel experiment is assessed here and found to be reproducible within- and across-sample types. Utilising the technical variance measured, a power study is completed for several "typical" fold changes in expression commonly used as thresholds by researchers. Based on this study using DeCyder, guidance is given on the number of gel replicates that are needed for the experiment to have sufficient sensitivity to detect expression changes. A two-dye system based on utilising just Cy3 and Cy5 was found to be more reproducible than the three-dye system. A power and cost-benefit analysis performed here suggests that the traditional three-dye system would use fewer resources in studies where multiple samples are compared. Technical variance was shown to encompass both experimental and analytical noise and thus is dependent on the analytical software utilised. Data is provided as a resource to the community to assess alternative software and upgrades.  相似文献   

14.
Rosenbaum PR 《Biometrics》2011,67(3):1017-1027
Summary In an observational or nonrandomized study of treatment effects, a sensitivity analysis indicates the magnitude of bias from unmeasured covariates that would need to be present to alter the conclusions of a naïve analysis that presumes adjustments for observed covariates suffice to remove all bias. The power of sensitivity analysis is the probability that it will reject a false hypothesis about treatment effects allowing for a departure from random assignment of a specified magnitude; in particular, if this specified magnitude is “no departure” then this is the same as the power of a randomization test in a randomized experiment. A new family of u‐statistics is proposed that includes Wilcoxon's signed rank statistic but also includes other statistics with substantially higher power when a sensitivity analysis is performed in an observational study. Wilcoxon's statistic has high power to detect small effects in large randomized experiments—that is, it often has good Pitman efficiency—but small effects are invariably sensitive to small unobserved biases. Members of this family of u‐statistics that emphasize medium to large effects can have substantially higher power in a sensitivity analysis. For example, in one situation with 250 pair differences that are Normal with expectation 1/2 and variance 1, the power of a sensitivity analysis that uses Wilcoxon's statistic is 0.08 while the power of another member of the family of u‐statistics is 0.66. The topic is examined by performing a sensitivity analysis in three observational studies, using an asymptotic measure called the design sensitivity, and by simulating power in finite samples. The three examples are drawn from epidemiology, clinical medicine, and genetic toxicology.  相似文献   

15.
Two-stage designs for experiments with a large number of hypotheses   总被引:1,自引:0,他引:1  
MOTIVATION: When a large number of hypotheses are investigated the false discovery rate (FDR) is commonly applied in gene expression analysis or gene association studies. Conventional single-stage designs may lack power due to low sample sizes for the individual hypotheses. We propose two-stage designs where the first stage is used to screen the 'promising' hypotheses which are further investigated at the second stage with an increased sample size. A multiple test procedure based on sequential individual P-values is proposed to control the FDR for the case of independent normal distributions with known variance. RESULTS: The power of optimal two-stage designs is impressively larger than the power of the corresponding single-stage design with equal costs. Extensions to the case of unknown variances and correlated test statistics are investigated by simulations. Moreover, it is shown that the simple multiple test procedure using first stage data for screening purposes and deriving the test decisions only from second stage data is a very powerful option.  相似文献   

16.
Shotgun proteomics via mass spectrometry (MS) is a powerful technology for biomarker discovery that has the potential to lead to noninvasive disease screening mechanisms. Successful application of MS-based proteomics technologies for biomarker discovery requires accurate expectations of bias, reproducibility, variance, and the true detectable differences in platforms chosen for analyses. Characterization of the variability inherent in MS assays is vital and should affect interpretation of measurements of observed differences in biological samples. Here we describe observed biases, variance structure, and the ability to detect known differences in spike-in data sets for which true relative abundance among defined samples were known and were subsequently measured with the iTRAQ technology on two MS platforms. Global biases were observed within these data sets. Measured variability was a function of mean abundance. Fold changes were biased toward the null and variance of a fold change was a function of protein mass and abundance. The information presented herein will be valuable for experimental design and analysis of the resulting data.  相似文献   

17.
OBJECTIVES: Some traits, while naturally polychotomous, are routinely dichotomized for genetic analysis. Dichotomization, intuitively, leads to a loss of power to detect linkage, as some phenotypic variability is discarded. This paper examines this power loss in the context of a trichotomous trait. METHODS: To examine this power loss, we performed a simulation study where a trichotomous trait was simulated in a sample of 1,000 sib-pairs under various genetic models. The study was replicated 1,000 times. Linkage analysis using a variance components method, as implemented in Mx, was then performed on the trichotomous trait and compared with that on a dichotomized version of the trait. RESULTS: A comparison of the power and false positive rates of the analyses shows that power to detect linkage was increased by up to 22 percentage points simply by examining the trait as a trichotomy instead of a dichotomy. Under all models examined, the trichotomous analysis outperformed the dichotomous version. CONCLUSIONS: Comparable levels of false positive rates under both methods confirm that this power gain comes solely from the information lost upon dichotomization. Thus, dichotomizing tri- or poly-chotomous traits can lead to crippling power loss, especially in the case of many loci of small effect.  相似文献   

18.
Experiments using quantitative real-time PCR to test hypotheses are limited by technical and biological variability; we seek to minimise sources of confounding variability through optimum use of biological and technical replicates. The quality of an experiment design is commonly assessed by calculating its prospective power. Such calculations rely on knowledge of the expected variances of the measurements of each group of samples and the magnitude of the treatment effect; the estimation of which is often uninformed and unreliable. Here we introduce a method that exploits a small pilot study to estimate the biological and technical variances in order to improve the design of a subsequent large experiment. We measure the variance contributions at several ‘levels’ of the experiment design and provide a means of using this information to predict both the total variance and the prospective power of the assay. A validation of the method is provided through a variance analysis of representative genes in several bovine tissue-types. We also discuss the effect of normalisation to a reference gene in terms of the measured variance components of the gene of interest. Finally, we describe a software implementation of these methods, powerNest, that gives the user the opportunity to input data from a pilot study and interactively modify the design of the assay. The software automatically calculates expected variances, statistical power, and optimal design of the larger experiment. powerNest enables the researcher to minimise the total confounding variance and maximise prospective power for a specified maximum cost for the large study.  相似文献   

19.
We present results concerning the power to detect past population growth using three microsatellite-based statistics available in the current literature: (1) that based on between-locus variability, (2) that based on the shape of allele size distribution, and (3) that based on the imbalance between variance and heterozygosity at a locus. The analysis is based on the single-step stepwise mutation model. The power of the statistics is evaluated for constant, as well as variable, mutation rates across loci. The latter case is important, since it is a standard procedure to pool data collected at a number of loci, and mutation rates at microsatellite loci are known to be different. Our analysis indicates that the statistic based on the imbalance between allele size variance and heterozygosity at a locus has the highest power for detection of population growth, particularly when mutation rates vary across loci.  相似文献   

20.
Wang J  Jia M  Zhu L  Yuan Z  Li P  Chang C  Luo J  Liu M  Shi T 《PloS one》2010,5(10):e13721
Many methods, including parametric, nonparametric, and Bayesian methods, have been used for detecting differentially expressed genes based on the assumption that biological systems are linear, which ignores the nonlinear characteristics of most biological systems. More importantly, those methods do not simultaneously consider means, variances, and high moments, resulting in relatively high false positive rate. To overcome the limitations, the SWang test is proposed to determine differentially expressed genes according to the equality of distributions between case and control. Our method not only latently incorporates functional relationships among genes to consider nonlinear biological system but also considers the mean, variance, skewness, and kurtosis of expression profiles simultaneously. To illustrate biological significance of high moments, we construct a nonlinear gene interaction model, demonstrating that skewness and kurtosis could contain useful information of function association among genes in microarrays. Simulations and real microarray results show that false positive rate of SWang is lower than currently popular methods (T-test, F-test, SAM, and Fold-change) with much higher statistical power. Additionally, SWang can uniquely detect significant genes in real microarray data with imperceptible differential expression but higher variety in kurtosis and skewness. Those identified genes were confirmed with previous published literature or RT-PCR experiments performed in our lab.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号