首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
For two independent binomial proportions Barnard (1947) has introduced a method to construct a non-asymptotic unconditional test by maximisation of the probabilities over the ‘classical’ null hypothesis H0= {(θ1, θ2) ∈ [0, 1]2: θ1 = θ2}. It is shown that this method is also useful when studying test problems for different null hypotheses such as, for example, shifted null hypotheses of the form H0 = {(θ1, θ2) ∈ [0, 1]2: θ2 ≤ θ1 ± Δ } for non-inferiority and 1-sided superiority problems (including the classical null hypothesis with a 1-sided alternative hypothesis). We will derive some results for the more general ‘shifted’ null hypotheses of the form H0 = {(θ1, θ2) ∈ [0, 1]2: θ2g1 )} where g is a non decreasing curvilinear function of θ1. Two examples for such null hypotheses in the regulatory setting are given. It is shown that the usual asymptotic approximations by the normal distribution may be quite unreliable. Non-asymptotic unconditional tests (and the corresponding p-values) may, therefore, be an alternative, particularly because the effort to compute non-asymptotic unconditional p-values for such more complex situations does not increase as compared to the classical situation. For ‘classical’ null hypotheses it is known that the number of possible p-values derived by the unconditional method is very large, albeit finite, and the same is true for the null hypotheses studied in this paper. In most of the situations investigated it becomes obvious that Barnard's CSM test (1947) when adapted to the respective null space is again a very powerful test. A theorem is provided which in addition to allowing fast algorithms to compute unconditional non-asymptotical p-values fills a methodological gap in the calculation of exact unconditional p-values as it is implemented, for example, in Stat Xact 3 for Windows (1995).  相似文献   

2.

Background

Evaluating the significance for a group of genes or proteins in a pathway or biological process for a disease could help researchers understand the mechanism of the disease. For example, identifying related pathways or gene functions for chromatin states of tumor-specific T cells will help determine whether T cells could reprogram or not, and further help design the cancer treatment strategy. Some existing p-value combination methods can be used in this scenario. However, these methods suffer from different disadvantages, and thus it is still challenging to design more powerful and robust statistical method.

Results

The existing method of Group combined p-value (GCP) first partitions p-values to several groups using a set of several truncation points, but the method is often sensitive to these truncation points. Another method of adaptive rank truncated product method(ARTP) makes use of multiple truncation integers to adaptively combine the smallest p-values, but the method loses statistical power since it ignores the larger p-values. To tackle these problems, we propose a robust p-value combination method (rPCMP) by considering multiple partitions of p-values with different sets of truncation points. The proposed rPCMP statistic have a three-layer hierarchical structure. The inner-layer considers a statistic which combines p-values in a specified interval defined by two thresholds points, the intermediate-layer uses a GCP statistic which optimizes the statistic from the inner layer for a partition set of threshold points, and the outer-layer integrates the GCP statistic from multiple partitions of p-values. The empirical distribution of statistic under null distribution could be estimated by permutation procedure.

Conclusions

Our proposed rPCMP method has been shown to be more robust and have higher statistical power. Simulation study shows that our method can effectively control the type I error rates and have higher statistical power than the existing methods. We finally apply our rPCMP method to an ATAC-seq dataset for discovering the related gene functions with chromatin states in mouse tumors T cell.
  相似文献   

3.
A two-tailed P-value is proposed for testing two-sided departures from Hardy-Weinberg equilibrium at a diallelic locus. The calculation of P uses the exact conditional distribution of the test statistic P, the observed number of heterozygotes in the sample. The proposed P-value is always two-tailed, unlike other P-values proposed in the literature.  相似文献   

4.
The problem of combining p-values from independent experiments is discussed. It is shown that Fisher's solution to the problem can be derived from a “weight-free” method that has been suggested for the purpose of ranking vector observations (Biometrics 19: 85–97, 1963). The method implies that the value p = 0.37 is a critical one: p-values below 0.37 suggest that the null hypothesis is more likely to be false, whereas p-values above 0.37 suggest that it is more likely to be true.  相似文献   

5.
On weighted Hochberg procedures   总被引:1,自引:0,他引:1  
Tamhane  Ajit C.; Liu  Lingyun 《Biometrika》2008,95(2):279-294
We consider different ways of constructing weighted Hochberg-typestep-up multiple test procedures including closed proceduresbased on weighted Simes tests and their conservative step-upshort-cuts, and step-up counterparts of two weighted Holm procedures.It is shown that the step-up counterparts have some seriouspitfalls such as lack of familywise error rate control and lackof monotonicity in rejection decisions in terms of p-values.Therefore an exact closed procedure appears to be the best alternative,its only drawback being lack of simple stepwise structure. Aconservative step-up short-cut to the closed procedure may beused instead, but with accompanying loss of power. Simulationsare used to study the familywise error rate and power propertiesof the competing procedures for independent and correlated p-values.Although many of the results of this paper are negative, theyare useful in highlighting the need for caution when procedureswith similar pitfalls may be used.  相似文献   

6.
We have a statistic for assessing an observed data point relativeto a statistical model but find that its distribution functiondepends on the parameter. To obtain the corresponding p-value,we require the minimally modified statistic that is ancillary;this process is called Studentization. We use recent likelihoodtheory to develop a maximal third-order ancillary; this givesimmediately a candidate Studentized statistic. We show thatthe corresponding p-value is higher-order Un(0, 1), is equivalentto a repeated bootstrap version of the initial statistic andagrees with a special Bayesian modification of the originalstatistic. More importantly, the modified statistic and p-valueare available by Markov chain Monte Carlo simulations and, insome cases, by higher-order approximation methods. Examples,including the Behrens–Fisher problem, are given to indicatethe ease and flexibility of the approach.  相似文献   

7.
Multiple testing (MT) with false discovery rate (FDR) control has been widely conducted in the “discrete paradigm” where p-values have discrete and heterogeneous null distributions. However, in this scenario existing FDR procedures often lose some power and may yield unreliable inference, and for this scenario there does not seem to be an FDR procedure that partitions hypotheses into groups, employs data-adaptive weights and is nonasymptotically conservative. We propose a weighted p-value-based FDR procedure, “weighted FDR (wFDR) procedure” for short, for MT in the discrete paradigm that efficiently adapts to both heterogeneity and discreteness of p-value distributions. We theoretically justify the nonasymptotic conservativeness of the wFDR procedure under independence, and show via simulation studies that, for MT based on p-values of binomial test or Fisher's exact test, it is more powerful than six other procedures. The wFDR procedure is applied to two examples based on discrete data, a drug safety study, and a differential methylation study, where it makes more discoveries than two existing methods.  相似文献   

8.
E-values have been the dominant statistic for protein sequence analysis for the past two decades: from identifying statistically significant local sequence alignments to evaluating matches to hidden Markov models describing protein domain families. Here we formally show that for “stratified” multiple hypothesis testing problems—that is, those in which statistical tests can be partitioned naturally—controlling the local False Discovery Rate (lFDR) per stratum, or partition, yields the most predictions across the data at any given threshold on the FDR or E-value over all strata combined. For the important problem of protein domain prediction, a key step in characterizing protein structure, function and evolution, we show that stratifying statistical tests by domain family yields excellent results. We develop the first FDR-estimating algorithms for domain prediction, and evaluate how well thresholds based on q-values, E-values and lFDRs perform in domain prediction using five complementary approaches for estimating empirical FDRs in this context. We show that stratified q-value thresholds substantially outperform E-values. Contradicting our theoretical results, q-values also outperform lFDRs; however, our tests reveal a small but coherent subset of domain families, biased towards models for specific repetitive patterns, for which weaknesses in random sequence models yield notably inaccurate statistical significance measures. Usage of lFDR thresholds outperform q-values for the remaining families, which have as-expected noise, suggesting that further improvements in domain predictions can be achieved with improved modeling of random sequences. Overall, our theoretical and empirical findings suggest that the use of stratified q-values and lFDRs could result in improvements in a host of structured multiple hypothesis testing problems arising in bioinformatics, including genome-wide association studies, orthology prediction, and motif scanning.  相似文献   

9.
Higher-order inference about a scalar parameter in the presenceof nuisance parameters can be achieved by bootstrapping, incircumstances where the parameter of interest is a componentof the canonical parameter in a full exponential family. Theoptimal test, which is approximated, is a conditional one basedon conditioning on the sufficient statistic for the nuisanceparameter. A bootstrap procedure that ignores the conditioningis shown to have desirable conditional properties in providingthird-order relative accuracy in approximation of p-values associatedwith the optimal test, in both continuous and discrete models.The bootstrap approach is equivalent to third-order analyticalapproaches, and is demonstrated in a number of examples to givevery accurate approximations even for very small sample sizes.  相似文献   

10.
Summary .  Latent class models have been recently developed for the joint analysis of a longitudinal quantitative outcome and a time to event. These models assume that the population is divided in  G  latent classes characterized by different risk functions for the event, and different profiles of evolution for the markers that are described by a mixed model for each class. However, the key assumption of conditional independence between the marker and the event given the latent classes is difficult to evaluate because the latent classes are not observed. Using a joint model with latent classes and shared random effects, we propose a score test for the null hypothesis of independence between the marker and the outcome given the latent classes versus the alternative hypothesis that the risk of event depends on one or several random effects from the mixed model in addition to the latent classes. A simulation study was performed to compare the behavior of the score test to other previously proposed tests, including situations where the alternative hypothesis or the baseline risk function are misspecified. In all the investigated situations, the score test was the most powerful. The methodology was applied to develop a prognostic model for recurrence of prostate cancer given the evolution of prostate-specific antigen in a cohort of patients treated by radiation therapy.  相似文献   

11.
Houseman EA  Coull BA  Betensky RA 《Biometrics》2006,62(4):1062-1070
Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of "features" for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.  相似文献   

12.
This note considers association between nonnegative random variables in which the two observed survival times depend on an unobservable random variable via the proportional hazard model. When the random variables are subject to censoring, the conditional hazard functions provides a reasonable means of describing the association between the two variables. A numerical example demonstrating association in disease incidence in ordered pairs of individuals is analysed. Also, examples of distributions satisfying the notions of dependence considered are provided.  相似文献   

13.
A superpopulation model generates the probabilities of a Bernouilli random variable. The ranks of the involved variables are considered as survey weights. The distribution f each linear rank statistic is derived under the null hypothesis for the two sample problem and for the case k2 when a simple random sampling or stratified sampling is used. The growth of a population of insects and the behavior of patients with imsomnia are studied using these procedures.  相似文献   

14.
For clinical trials with interim analyses conditional rejection probabilities play an important role when stochastic curtailment or design adaptations are performed. The conditional rejection probability gives the conditional probability to finally reject the null hypothesis given the interim data. It is computed either under the null or the alternative hypothesis. We investigate the properties of the conditional rejection probability for the one sided, one sample t‐test and show that it can be non monotone in the interim mean of the data and non monotone in the non‐centrality parameter for the alternative. We give several proposals how to implement design adaptations (that are based on the conditional rejection probability) for the t‐test and give a numerical example. Additionally, the conditional rejection probability given the interim t‐statistic is investigated. It does not depend on the unknown σ and can be used in stochastic curtailment procedures. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

15.
A two-tailed P-value is presented for a significance test in two by two contingency tables. There is no extraneous quasi-observation such as is needed in the exact randomized uniformly most powerful unbiased (UMPU) test of the hypothesis of independence. The proposed P-value can never exceed unity and is always two-tailed, unlike other P-values proposed in the literature  相似文献   

16.
Several asymptotic tests were proposed for testing the null hypothesis of marginal homogeneity in square contingency tables with r categories. A simulation study was performed for comparing the power of four finite conservative conditional test procedures and of two asymptotic tests for twelve different contingency schemes for small sample sizes. While an asymptotic test proposed by STUART (1955) showed a rather satisfactory behaviour for moderate sample sizes, an asymptotic test proposed by BHAPKAR (1966) was quite anticonservative. With no a priori information the performance of (r - 1) simultaneous conditional binomial tests with a Bonferroni adjustment proved to be a quite efficient procedure. With assumptions about where to expect the deviations from the null hypothesis, other procedures favouring the larger or smaller conditional sample sizes, respectively, can have a great efficiency. The procedures are illustrated by means of a numerical example from clinical psychology.  相似文献   

17.

Background

q-value is a widely used statistical method for estimating false discovery rate (FDR), which is a conventional significance measure in the analysis of genome-wide expression data. q-value is a random variable and it may underestimate FDR in practice. An underestimated FDR can lead to unexpected false discoveries in the follow-up validation experiments. This issue has not been well addressed in literature, especially in the situation when the permutation procedure is necessary for p-value calculation.

Results

We proposed a statistical method for the conservative adjustment of q-value. In practice, it is usually necessary to calculate p-value by a permutation procedure. This was also considered in our adjustment method. We used simulation data as well as experimental microarray or sequencing data to illustrate the usefulness of our method.

Conclusions

The conservativeness of our approach has been mathematically confirmed in this study. We have demonstrated the importance of conservative adjustment of q-value, particularly in the situation that the proportion of differentially expressed genes is small or the overall differential expression signal is weak.
  相似文献   

18.
Polymorphisms of the methyl-CpG binding domain 1 (MBD1) gene may influence MBD1 activity on gene expression profiles, thereby modulating individual susceptibility to lung cancer. To test this hypothesis, we investigated the associations of four MBD1 polymorphisms and lung cancer risk in a Chinese population. Single locus analysis revealed significant associations between two polymorphisms (rs125555 and rs140689) and lung cancer risk (p=0.011 and p=0.005, respectively). Since the two polymorphisms were in linkage disequilibrium, further haplotype analyses were performed and revealed a significant association with lung cancer (global test p-value=0.0041). Our results suggested that MBD1 polymorphisms might be involved in the development of lung cancer. Validation of these findings in larger studies of other populations is needed.  相似文献   

19.
The finger ridge count (a measure of pattern size) is one of the most heritable complex traits studied in humans and has been considered a model human polygenic trait in quantitative genetic analysis. Here, we report the results of the first genome-wide linkage scan for finger ridge count in a sample of 2,114 offspring from 922 nuclear families. Both univariate linkage to the absolute ridge count (a sum of all the ridge counts on all ten fingers), and multivariate linkage analyses of the counts on individual fingers, were conducted. The multivariate analyses yielded significant linkage to 5q14.1 (Logarithm of odds [LOD] = 3.34, pointwise-empirical p-value = 0.00025) that was predominantly driven by linkage to the ring, index, and middle fingers. The strongest univariate linkage was to 1q42.2 (LOD = 2.04, point-wise p-value = 0.002, genome-wide p-value = 0.29). In summary, the combination of univariate and multivariate results was more informative than simple univariate analyses alone. Patterns of quantitative trait loci factor loadings consistent with developmental fields were observed, and the simple pleiotropic model underlying the absolute ridge count was not sufficient to characterize the interrelationships between the ridge counts of individual fingers.  相似文献   

20.
Rohlfs RV  Weir BS 《Genetics》2008,180(3):1609-1616
It is well established that test statistics and P-values derived from discrete data, such as genetic markers, are also discrete. In most genetic applications, the null distribution for a discrete test statistic is approximated with a continuous distribution, but this approximation may not be reasonable. In some cases using the continuous approximation for the expected null distribution may cause truly null test statistics to appear nonnull. We explore the implications of using continuous distributions to approximate the discrete distributions of Hardy–Weinberg equilibrium test statistics and P-values. We derive exact P-value distributions under the null and alternative hypotheses, enabling a more accurate analysis than is possible with continuous approximations. We apply these methods to biological data and find that using continuous distribution theory with exact tests may underestimate the extent of Hardy–Weinberg disequilibrium in a sample. The implications may be most important for the widespread use of whole-genome case–control association studies and Hardy–Weinberg equilibrium (HWE) testing for data quality control.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号