首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifications, and approaches that are commonly proposed for sample size estimation in microarray experiments. Common methods for sample size estimation are formulated as the minimum sample size necessary to achieve a specified sensitivity (proportion of detected truly differentially expressed genes) on average at a specified false discovery rate (FDR) level and specified expected proportion (π 1) of the true differentially expression genes in the array. Unfortunately, the probability of detecting the specified sensitivity in such a formulation can be low. We formulate the sample size problem as the number of arrays needed to achieve a specified sensitivity with 95% probability at the specified significance level. A permutation method using a small pilot dataset to estimate sample size is proposed. This method accounts for correlation and effect size heterogeneity among genes.  相似文献   

2.
We propose the use of a stationary probability distribution for the analysis of data on population size. Predicting this long term population property from short term individual events is accomplished by the use of the asymptotic theory of stochastic processes. A WKB approximation to the stationary density is obtained and then applied to observations on the flour beetleTribolium.  相似文献   

3.
It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simulate RNA-Seq count data based on parameters estimated from six widely different public data sets (including cell line comparison, tissue comparison, and cancer data sets) and calculate the statistical power in paired and unpaired sample experiments. We comprehensively compare five differential expression analysis packages (DESeq, edgeR, DESeq2, sSeq, and EBSeq) and evaluate their performance by power, receiver operator characteristic (ROC) curves, and other metrics including areas under the curve (AUC), Matthews correlation coefficient (MCC), and F-measures. DESeq2 and edgeR tend to give the best performance in general. Increasing sample size or sequencing depth increases power; however, increasing sample size is more potent than sequencing depth to increase power, especially when the sequencing depth reaches 20 million reads. Long intergenic noncoding RNAs (lincRNA) yields lower power relative to the protein coding mRNAs, given their lower expression level in the same RNA-Seq experiment. On the other hand, paired-sample RNA-Seq significantly enhances the statistical power, confirming the importance of considering the multifactor experimental design. Finally, a local optimal power is achievable for a given budget constraint, and the dominant contributing factor is sample size rather than the sequencing depth. In conclusion, we provide a power analysis tool (http://www2.hawaii.edu/~lgarmire/RNASeqPowerCalculator.htm) that captures the dispersion in the data and can serve as a practical reference under the budget constraint of RNA-Seq experiments.  相似文献   

4.
5.
Summary Estimates of survival, migration rates, and population size are developed for a triple catch marking experiment onn (n>-2) areas with migration among all areas and death in all areas occurring, but no recruitment (birth). This repressents the extension to three sampling times of the method ofChapman andJunge (1956) for estimates in a stratified population. The method is further extented to allow for ‘losses on capture’.  相似文献   

6.
Data relating to middle phalangeal hair (MPH) among unrelated individuals of both sexes born and living in Sardinia are presented. The occurrence of MPH is generally manifested on the 3-4-5 digits of both hands in the two sexes. The observed sex differences are statistically non-significant. The Sardinian sample seems to have a marked decrease in the frequency of individuals with MPH with regard to Mediterranean and other European populations.  相似文献   

7.
Sex assessment from tooth measurements can be of major importance for forensic and bioarchaeological investigations, especially when only teeth or jaws are available. The purpose of this study is to assess the reliability and applicability of establishing sex identity in a sample of Greek population using the discriminant function proposed by Rösing et al. (1995).  相似文献   

8.
We study the properties of gene genealogies for large samples using a continuous approximation introduced by R. A. Fisher. We show that the major effect of large sample size, relative to the effective size of the population, is to increase the proportion of polymorphisms at which the mutant type is found in a single copy in the sample. We derive analytical expressions for the expected number of these singleton polymorphisms and for the total number of polymorphic, or segregating, sites that are valid even when the sample size is much greater than the effective size of the population. We use simulations to assess the accuracy of these predictions and to investigate other aspects of large-sample genealogies. Lastly, we apply our results to some data from Pacific oysters sampled from British Columbia. This illustrates that, when large samples are available, it is possible to estimate the mutation rate and the effective population size separately, in contrast to the case of small samples in which only the product of the mutation rate and the effective population size can be estimated.  相似文献   

9.
We consider maximum likelihood estimation of the size of a targetpopulation to which has been added a known number of plantedindividuals. The standard equal-catchability model used in mark-recaptureis assumed to be applicable to the augmented population. Afterproving the unimodality of the profile likelihood for the targetpopulation size, we obtain both the maximum likelihood estimatorof this size and interval estimators based on its asymptoticdistribution.  相似文献   

10.
Population subdivision due to habitat loss and modification, exploitation of wild populations and altered spatial population dynamics is of increasing concern in nature. Detecting population fragmentation is therefore crucial for conservation management. Using computer simulations, we show that a single sample estimator of N e based on linkage disequilibrium is a highly sensitive and promising indicator of recent population fragmentation and bottlenecks, even with some continued gene flow. For example, fragmentation of a panmictic population of N e = 1,000 into demes of N e = 100 can be detected with high probability after a single generation when estimates from this method are compared to prefragmentation estimates, given data for ~20 microsatellite loci in samples of 50 individuals. We consider a range of loci (10–40) and individuals (25–100) typical of current studies of natural populations and show that increasing the number of loci gives nearly the same increase in precision as increasing the number of individuals sampled. We also evaluated effects of incomplete fragmentation and found this N e-reduction signal is still apparent in the presence of considerable migration (m ~ 0.10–0.25). Single-sample genetic estimates of N e thus show considerable promise for early detection of population fragmentation and decline.  相似文献   

11.
Jiang  Wei  Yu  Weichuan 《BMC genomics》2016,17(1):19-32
Background

Replication study is a commonly used verification method to filter out false positives in genome-wide association studies (GWAS). If an association can be confirmed in a replication study, it will have a high confidence to be true positive. To design a replication study, traditional approaches calculate power by treating replication study as another independent primary study. These approaches do not use the information given by primary study. Besides, they need to specify a minimum detectable effect size, which may be subjective. One may think to replace the minimum effect size with the observed effect sizes in the power calculation. However, this approach will make the designed replication study underpowered since we are only interested in the positive associations from the primary study and the problem of the “winner’s curse” will occur.

Results

An Empirical Bayes (EB) based method is proposed to estimate the power of replication study for each association. The corresponding credible interval is estimated in the proposed approach. Simulation experiments show that our method is better than other plug-in based estimators in terms of overcoming the winner’s curse and providing higher estimation accuracy. The coverage probability of given credible interval is well-calibrated in the simulation experiments. Weighted average method is used to estimate the average power of all underlying true associations. This is used to determine the sample size of replication study. Sample sizes are estimated on 6 diseases from Wellcome Trust Case Control Consortium (WTCCC) using our method. They are higher than sample sizes estimated by plugging observed effect sizes in power calculation.

Conclusions

Our new method can objectively determine replication study’s sample size by using information extracted from primary study. Also the winner’s curse is alleviated. Thus, it is a better choice when designing replication studies of GWAS. The R-package is available at: http://bioinformatics.ust.hk/RPower.html.

  相似文献   

12.
Mathematical simulation has been used to analyze how the sample size affects the accuracy of the estimation of molecular variation in a population. The sample size was varied from 1/200 to 1/4 of the total size of the simulated population. The possible effect of the length of the nucleotide sequences compared has also been estimated; it was varied from 500 to 15 000 bp. A tendency towards underestimation of the mean nucleotide diversity (??) by about 25% of the expected value has been found. The sample size and/or the length of the nucleotide sequence used have been shown to affect more the scatter of the ?? values than the accuracy of its measurement (the proportion of correct estimates of ?? is about 14%). The assumption is made that the sample size affects the probability of accepting a false null hypothesis in analysis of the demographic history of a species.  相似文献   

13.
14.
Asymptotic distribution of the sample roots for a nonnormal population   总被引:1,自引:0,他引:1  
WATERNAUX  CHRISTINE M. 《Biometrika》1976,63(3):639-645
  相似文献   

15.
16.
K H Pollock  M C Otto 《Biometrics》1983,39(4):1035-1049
In this paper the problem of finding robust estimators of population size in closed K-sample capture-recapture experiments is considered. Particular attention is paid to models where heterogeneity of capture probabilities is allowed. First, a general estimation procedure is given which does not depend on any assumptions about the form of the distribution of capture probabilities. This is followed by a detailed discussion of the usefulness of the generalized jackknife technique to reduce bias. Numerical comparisons of the bias and variance of various estimators are given. Finally, a general discussion is given with several recommendations on estimators to be used in practice.  相似文献   

17.
Capture-recapture data on common volesMicrotus arvalis (Pallas, 1779) in central Europe have been almost exclusively analysed by means of the enumeration technique (minimum number alive or calendar of catches). Here we compare enumeration and Jolly-Seber (JS) estimation of population size in the common vole using live-trapping data from an alfalfa field-population in southern Moravia, Czech Republic. Over the entire study the enumeration estimate of the population size was smaller by an average of 28% than the JS estimate. The negative bias increased with density, decreased with both capture probability and the survival rate, and was more pronounced in males at high density. We conclude that the method of direct enumeration is not reliable for estimating population size in the common vole.  相似文献   

18.
Bayesian methods for estimation of the size of a closed population   总被引:4,自引:0,他引:4  
  相似文献   

19.
Species distribution models are used for a range of ecological and evolutionary questions, but often are constructed from few and/or biased species occurrence records. Recent work has shown that the presence‐only model Maxent performs well with small sample sizes. While the apparent accuracy of such models with small samples has been studied, less emphasis has been placed on the effect of small or biased species records on the secondary modeling steps, specifically accuracy assessment and threshold selection, particularly with profile (presence‐only) modeling techniques. When testing the effects of small sample sizes on distribution models, accuracy assessment has generally been conducted with complete species occurrence data, rather than similarly limited (e.g. few or biased) test data. Likewise, selection of a probability threshold – a selection of probability that classifies a model into discrete areas of presences and absences – has also generally been conducted with complete data. In this study we subsampled distribution data for an endangered rodent across multiple years to assess the effects of different sample sizes and types of bias on threshold selection, and examine the differences between apparent and actual accuracy of the models. Although some previously recommended threshold selection techniques showed little difference in threshold selection, the most commonly used methods performed poorly. Apparent model accuracy calculated from limited data was much higher than true model accuracy, but the true model accuracy was lower than it could have been with a more optimal threshold. That is, models with thresholds and accuracy calculated from biased and limited data had inflated reported accuracy, but were less accurate than they could have been if better data on species distribution were available and an optimal threshold were used.  相似文献   

20.
H. A. Verhoef  A. J. van  Selm 《Ecography》1983,6(4):387-388
The significance of variations in soil moisture for the distribution and abundance of the four collembolan species Tomocerus minor, Orchesella cincta. Lepidocyrtus lignorum and Entomnbrya nivalis has been studied in a pine forest. During a relatively dry summer, the distribution and abundance of these species were examined on two sites with an initially different soil water content and depth of the titter/humus layer. The distribution of the species investigated could be described with the negative binomial distribution. During the sampling period. Lloyd's index of patchiness, / for T. minor and E. nivalis was subject to changes. For T. minor this was probably related to soil water content.
The densities of T. minor and O. cincta were higher in the wet site than in the drier site. At the beginning and at the end of the sampling period the drought tolerant E. nivalis reached equal densities in both sites. The density fluctuations of the four species appeared to be totally different during the sampling period: the drought sensitive species T. minor decreased strongly, L. lignorum remained constant and the drought tolerant species O. cincta and E. nivalis increased strongly. These latter two species were able to survive the dry periods and to attain high densities by reproduction. The results agree with laboratory data on distribution and survival in relation to humidity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号