首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
For the analysis of combinations of 2×2 non-contingency tables as obtained from density follow-up studies (relating a number of events to a number of person-years of follow-up) an analogue of the Mantel-Haenszel test for 2×2 contingency tables is widely used. In this paper the small sample properties of this test, both with and without continuity correction, are evaluated. Also the improvement of the test-statistic by using the first four cumulants via the Edgeworth expansion was studied. Results on continuity correction agree with similar studies on the Mantel-Haenszel statistic for 2×2 contingency tables: Continuity correction gives a p-value which approximates the exact p-value better than the p-value obtained without this correction; both the exact test and its approximations show considerable conservatism in small samples; the uncorrected Mantel-Haenszel test statistic gives a p-value that agrees more with the nominal significance level, but can be anti-conservative. The p-value based on the first four cumulants gives a better approximation of the exact p-value than the continuity corrected test, especially when the distribution has marked skewness.  相似文献   

2.
3.
For genome-wide association studies in family-based designs, a new, universally applicable approach is proposed. Using a modified Liptak’s method, we combine the p-value of the family-based association test (FBAT) statistic with the p-value for the Van Steen-statistic. The Van Steen-statistic is independent of the FBAT-statistic and utilizes information that is ignored by traditional FBAT-approaches. The new test statistic takes advantages of all available information about the genetic association, while, by virtue of its design, it achieves complete robustness against confounding due to population stratification. The approach is suitable for the analysis of almost any trait type for which FBATs are available, e.g. binary, continuous, time-to-onset, multivariate, etc. The efficiency and the validity of the new approach depend on the specification of a nuisance/tuning parameter and the weight parameters in the modified Liptak’s method. For different trait types and ascertainment conditions, we discuss general guidelines for the optimal specification of the tuning parameter and the weight parameters. Our simulation experiments and an application to an Alzheimer study show the validity and the efficiency of the new method, which achieves power levels that are comparable to those of population-based approaches.  相似文献   

4.
Multiple testing (MT) with false discovery rate (FDR) control has been widely conducted in the “discrete paradigm” where p-values have discrete and heterogeneous null distributions. However, in this scenario existing FDR procedures often lose some power and may yield unreliable inference, and for this scenario there does not seem to be an FDR procedure that partitions hypotheses into groups, employs data-adaptive weights and is nonasymptotically conservative. We propose a weighted p-value-based FDR procedure, “weighted FDR (wFDR) procedure” for short, for MT in the discrete paradigm that efficiently adapts to both heterogeneity and discreteness of p-value distributions. We theoretically justify the nonasymptotic conservativeness of the wFDR procedure under independence, and show via simulation studies that, for MT based on p-values of binomial test or Fisher's exact test, it is more powerful than six other procedures. The wFDR procedure is applied to two examples based on discrete data, a drug safety study, and a differential methylation study, where it makes more discoveries than two existing methods.  相似文献   

5.
Increasing locations are often accompanied by an increase in variability. In this case apparent heteroscedasticity can indicate that there are treatment effects and it is appropriate to consider an alternative involving differences in location as well as in scale. As a location‐scale test the sum of a location and a scale test statistic can be used. However, the power can be raised through weighting the sum. In order to select values for this weighting an adaptive design with an interim analysis is proposed: The data of the first stage are used to calculate the weights and with the second stage's data a weighted location‐scale test is carried out. The p‐values of the two stages are combined through Fisher's combination test. With a Lepage‐type location‐scale test it is illustrated that the resultant adaptive test can be more powerful than the ‘optimum’ test with no interim analysis. The principle to calculate weights, which cannot be reasonably chosen a priori, with the data of the first stage may be useful for other tests which utilize weighted statistics, too. Furthermore, the proposed test is illustrated with an example from experimental ecology.  相似文献   

6.

Background

Evaluating the significance for a group of genes or proteins in a pathway or biological process for a disease could help researchers understand the mechanism of the disease. For example, identifying related pathways or gene functions for chromatin states of tumor-specific T cells will help determine whether T cells could reprogram or not, and further help design the cancer treatment strategy. Some existing p-value combination methods can be used in this scenario. However, these methods suffer from different disadvantages, and thus it is still challenging to design more powerful and robust statistical method.

Results

The existing method of Group combined p-value (GCP) first partitions p-values to several groups using a set of several truncation points, but the method is often sensitive to these truncation points. Another method of adaptive rank truncated product method(ARTP) makes use of multiple truncation integers to adaptively combine the smallest p-values, but the method loses statistical power since it ignores the larger p-values. To tackle these problems, we propose a robust p-value combination method (rPCMP) by considering multiple partitions of p-values with different sets of truncation points. The proposed rPCMP statistic have a three-layer hierarchical structure. The inner-layer considers a statistic which combines p-values in a specified interval defined by two thresholds points, the intermediate-layer uses a GCP statistic which optimizes the statistic from the inner layer for a partition set of threshold points, and the outer-layer integrates the GCP statistic from multiple partitions of p-values. The empirical distribution of statistic under null distribution could be estimated by permutation procedure.

Conclusions

Our proposed rPCMP method has been shown to be more robust and have higher statistical power. Simulation study shows that our method can effectively control the type I error rates and have higher statistical power than the existing methods. We finally apply our rPCMP method to an ATAC-seq dataset for discovering the related gene functions with chromatin states in mouse tumors T cell.
  相似文献   

7.
Relatively little is known about the genetic aberrations of conjunctival melanomas (CoM) and their correlation with clinical and histomorphological features as well as prognosis. The aim of this large collaborative multicenter study was to determine potential key biomarkers for metastatic risk and any druggable targets for high metastatic risk CoM. Using Affymetrix single nucleotide polymorphism genotyping arrays on 59 CoM, we detected frequent amplifications on chromosome (chr) 6p and deletions on 7q, and characterized mutation‐specific copy number alterations. Deletions on chr 10q11.21‐26.2, a region harboring the tumor suppressor genes, PDCD4, SUFU, NEURL1, PTEN, RASSF4, DMBT1, and C10orf90 and C10orf99, significantly correlated with metastasis (Fisher's exact, p ≤ 0.04), lymphatic invasion (Fisher's exact, p ≤ 0.02), increasing tumor thickness (Mann–Whitney, p ≤ 0.02), and BRAF mutation (Fisher's exact, p ≤ 0.05). This enhanced insight into CoM biology is a step toward identifying patients at risk of metastasis and potential therapeutic targets for systemic disease.  相似文献   

8.
A two-tailed P-value is proposed for testing two-sided departures from Hardy-Weinberg equilibrium at a diallelic locus. The calculation of P uses the exact conditional distribution of the test statistic P, the observed number of heterozygotes in the sample. The proposed P-value is always two-tailed, unlike other P-values proposed in the literature.  相似文献   

9.
We have a statistic for assessing an observed data point relativeto a statistical model but find that its distribution functiondepends on the parameter. To obtain the corresponding p-value,we require the minimally modified statistic that is ancillary;this process is called Studentization. We use recent likelihoodtheory to develop a maximal third-order ancillary; this givesimmediately a candidate Studentized statistic. We show thatthe corresponding p-value is higher-order Un(0, 1), is equivalentto a repeated bootstrap version of the initial statistic andagrees with a special Bayesian modification of the originalstatistic. More importantly, the modified statistic and p-valueare available by Markov chain Monte Carlo simulations and, insome cases, by higher-order approximation methods. Examples,including the Behrens–Fisher problem, are given to indicatethe ease and flexibility of the approach.  相似文献   

10.
Given independent multivariate random samples {Xij: j = 1, …, ni} from Fi, for i = 1,2, a test is desired for H0: F1 = F2 against general alternatives. Consider the k · (n1 + n2) possible ways of choosing one observation from the combined samples and then one of its k nearest neighbors, and let Sk be the proportion of these choices in which the point and neighbor are in the same sample. Schilling (1986) proposed Sk as a test statistic, but did not indicate how to determine k. We suggest as test statistic W = N Σ kSk, which we show is equivalent to a sum of N Wilcoxon rank sums, and also to a sum of two two-sample U-statistics of degrees (1, 2) and (2, 1). Simulation with multivariate normal data suggests that our test is generally more powerful than Schilling's test using k = 1, 2, or 3. We illustrate its use with Fisher's iris data.  相似文献   

11.
Li MX  Yeung JM  Cherny SS  Sham PC 《Human genetics》2012,131(5):747-756
Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (M e) for the adjustment of multiple testing, but current methods of calculation for M e are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate M e. Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the M e, and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ~10−7 as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ~5 × 10−8 for current or merged commercial genotyping arrays, ~10−8 for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10−8 for the common SNPs only within genes.  相似文献   

12.
The increasing interest in subpopulation analysis has led to the development of various new trial designs and analysis methods in the fields of personalized medicine and targeted therapies. In this paper, subpopulations are defined in terms of an accumulation of disjoint population subsets and will therefore be called composite populations. The proposed trial design is applicable to any set of composite populations, considering normally distributed endpoints and random baseline covariates. Treatment effects for composite populations are tested by combining p-values, calculated on the subset levels, using the inverse normal combination function to generate test statistics for those composite populations while the closed testing procedure accounts for multiple testing. Critical boundaries for intersection hypothesis tests are derived using multivariate normal distributions, reflecting the joint distribution of composite population test statistics given no treatment effect exists. For sample size calculation and sample size, recalculation multivariate normal distributions are derived which describe the joint distribution of composite population test statistics under an assumed alternative hypothesis. Simulations demonstrate the absence of any practical relevant inflation of the type I error rate. The target power after sample size recalculation is typically met or close to being met.  相似文献   

13.
With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher''s Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu''s procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data.

Software Availability

A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads.  相似文献   

14.
Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this ‘co-piloting’ currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors.  相似文献   

15.
This paper is concerned with the power behaviour of four goodness-of-fit test statistics in sparse multinomials with k cells. Most previous work has been concerned only with both Pearson's X2 and the likelihood ratio test statistics. We consider in this study, two additional test statistics, namely, the Cressie-Read test statistic – I(2/3) and the modified Freeman-Tukey test (FT) statistic. Because k ≥ 10 in this study, a Monte Carlo procedure based on 1000 simulated samples is used to estimate the powers for the four test statistics. Alternatives on various line segments are employed. Results suggest that none of the test statistics completely dominate the other and that the choice of which test to use depends on the nature of the alternative hypothesis. These results are consistent with those obtained by West and Kempthorne (1972), although, the Pearson's χ2 test statistic may be preferred because of its closer approximation to the χ2 distribution in terms of the attained α levels.  相似文献   

16.
We propose a measure of multivariate kurtosis suggested from Mardia's measure of multivariate skewness b1,p, and examine its relationship both to Mardia's measure of multivariate kurtosis b2,p, and to a smooth test of multivariate kurtosis ǔ42.  相似文献   

17.
MISRA (1978) sets confidence intervals for a double linear compound of multivariate normal regression coefficients by using ROY'S maximum root test criterion. The exact test statistic to be used is STUDENT'S t. The t statistic gives narrower confidence bounds than those given by ROY's maximum root statistic. A result given by MORRISON (1975, p. 18, equation 10) for profile analysis is also obtained by using the STUDENT'S t test.  相似文献   

18.
The problem of combining p-values from independent experiments is discussed. It is shown that Fisher's solution to the problem can be derived from a “weight-free” method that has been suggested for the purpose of ranking vector observations (Biometrics 19: 85–97, 1963). The method implies that the value p = 0.37 is a critical one: p-values below 0.37 suggest that the null hypothesis is more likely to be false, whereas p-values above 0.37 suggest that it is more likely to be true.  相似文献   

19.
Li Q  Yu K  Li Z  Zheng G 《Human genetics》2008,123(6):617-623
In genome-wide association studies (GWAS), single-marker analysis is usually employed to identify the most significant single nucleotide polymorphisms (SNPs). The trend test has been proposed for analysis of case-control association. Three trend tests, optimal for the recessive, additive and dominant models respectively, are available. When the underlying genetic model is unknown, the maximum of the three trend test results (MAX) has been shown to be robust against genetic model misspecification. Since the asymptotic distribution of MAX depends on the allele frequency of the SNP, using the P-value of MAX for ranking may be different from using the MAX statistic. Calculating the P-value of MAX for 300,000 (300 K) or more SNPs is computationally intensive and the software and program to obtain the P-value of MAX are not widely available. On the other hand, the MAX statistic is very easy to calculate without complex computer programs. Thus, we study whether or not one could use the MAX statistic instead of its P-value to rank SNPs in GWAS. The approaches using the MAX and its P-value to rank SNPs are referred to as MAX-rank and P-rank. By applying MAX-rank and P-rank to simulated and four real datasets from GWAS, we found the ranks of SNPs with true association are very similar using both approaches. Thus, we recommend to use MAX-rank for genome-wide scans. After the top-ranked SNPs are identified, their P-values based on MAX can be calculated and compared with the significance level. The work of Q. Li was partially supported by the Knowledge Innovation Program of the Chinese Academy of Sciences, No. 30465W0 and 30475V0. The research of Z Li was partially sponsored by NIH grant EY014478.  相似文献   

20.
Regarding Paper “Stratified Fisher's exact test and its sample size calculation” by Sin‐Ho Jung Biometrical Journal (2014) 56 (1): 129–140 Article: http://dx.doi.org/10.1002/bimj.201300048  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号