首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Multipoint (MP) linkage analysis represents a valuable tool for whole-genome studies but suffers from the disadvantage that its probability distribution is unknown and varies as a function of marker information and density, genetic model, number and structure of pedigrees, and the affection status distribution [Xing and Elston: Genet Epidemiol 2006;30:447-458; Hodge et al.: Genet Epidemiol 2008;32:800-815]. This implies that the MP significance criterion can differ for each marker and each dataset, and this fact makes planning and evaluation of MP linkage studies difficult. One way to circumvent this difficulty is to use simulations or permutation testing. Another approach is to use an alternative statistical paradigm to assess the statistical evidence for linkage, one that does not require computation of a p value. Here we show how to use the evidential statistical paradigm for planning, conducting, and interpreting MP linkage studies when the disease model is known (lod analysis) or unknown (mod analysis). As a key feature, the evidential paradigm decouples uncertainty (i.e. error probabilities) from statistical evidence. In the planning stage, the user calculates error probabilities, as functions of one's design choices (sample size, choice of alternative hypothesis, choice of likelihood ratio (LR) criterion k) in order to ensure a reliable study design. In the data analysis stage one no longer pays attention to those error probabilities. In this stage, one calculates the LR for two simple hypotheses (i.e. trait locus is unlinked vs. trait locus is located at a particular position) as a function of the parameter of interest (position). The LR directly measures the strength of evidence for linkage in a given data set and remains completely divorced from the error probabilities calculated in the planning stage. An important consequence of this procedure is that one can use the same criterion k for all analyses. This contrasts with the situation described above, in which the value one uses to conclude significance may differ for each marker and each dataset in order to accommodate a fixed test size, α. In this study we accomplish two goals that lead to a general algorithm for conducting evidential MP linkage studies. (1) We provide two theoretical results that translate into guidelines for investigators conducting evidential MP linkage: (a) Comparing mods to lods, error rates (including probabilities of weak evidence) are generally higher for mods when the null hypothesis is true, but lower for mods in the presence of true linkage. Royall [J Am Stat Assoc 2000;95:760-780] has shown that errors based on lods are bounded and generally small. Therefore when the true disease model is unknown and one chooses to use mods, one needs to control misleading evidence rates only under the null hypothesis; (b) for any given pair of contiguous marker loci, error rates under the null are greatest at the midpoint between the markers spaced furthest apart, which provides an obvious simple alternative hypothesis to specify for planning MP linkage studies. (2) We demonstrate through extensive simulation that this evidential approach can yield low error rates under the null and alternative hypotheses for both lods and mods, despite the fact that mod scores are not true LRs. Using these results we provide a coherent approach to implement a MP linkage study using the evidential paradigm.  相似文献   

2.
Strug LJ  Hodge SE 《Human heredity》2006,61(4):200-209
The 'multiple testing problem' currently bedevils the field of genetic epidemiology. Briefly stated, this problem arises with the performance of more than one statistical test and results in an increased probability of committing at least one Type I error. The accepted/conventional way of dealing with this problem is based on the classical Neyman-Pearson statistical paradigm and involves adjusting one's error probabilities. This adjustment is, however, problematic because in the process of doing that, one is also adjusting one's measure of evidence. Investigators have actually become wary of looking at their data, for fear of having to adjust the strength of the evidence they observed at a given locus on the genome every time they conduct an additional test. In a companion paper in this issue (Strug & Hodge I), we presented an alternative statistical paradigm, the 'evidential paradigm', to be used when planning and evaluating linkage studies. The evidential paradigm uses the lod score as the measure of evidence (as opposed to a p value), and provides new, alternatively defined error probabilities (alternative to Type I and Type II error rates). We showed how this paradigm separates or decouples the two concepts of error probabilities and strength of the evidence. In the current paper we apply the evidential paradigm to the multiple testing problem - specifically, multiple testing in the context of linkage analysis. We advocate using the lod score as the sole measure of the strength of evidence; we then derive the corresponding probabilities of being misled by the data under different multiple testing scenarios. We distinguish two situations: performing multiple tests of a single hypothesis, vs. performing a single test of multiple hypotheses. For the first situation the probability of being misled remains small regardless of the number of times one tests the single hypothesis, as we show. For the second situation, we provide a rigorous argument outlining how replication samples themselves (analyzed in conjunction with the original sample) constitute appropriate adjustments for conducting multiple hypothesis tests on a data set.  相似文献   

3.
Zhang SD 《PloS one》2011,6(4):e18874
BACKGROUND: Biomedical researchers are now often faced with situations where it is necessary to test a large number of hypotheses simultaneously, eg, in comparative gene expression studies using high-throughput microarray technology. To properly control false positive errors the FDR (false discovery rate) approach has become widely used in multiple testing. The accurate estimation of FDR requires the proportion of true null hypotheses being accurately estimated. To date many methods for estimating this quantity have been proposed. Typically when a new method is introduced, some simulations are carried out to show the improved accuracy of the new method. However, the simulations are often very limited to covering only a few points in the parameter space. RESULTS: Here I have carried out extensive in silico experiments to compare some commonly used methods for estimating the proportion of true null hypotheses. The coverage of these simulations is unprecedented thorough over the parameter space compared to typical simulation studies in the literature. Thus this work enables us to draw conclusions globally as to the performance of these different methods. It was found that a very simple method gives the most accurate estimation in a dominantly large area of the parameter space. Given its simplicity and its overall superior accuracy I recommend its use as the first choice for estimating the proportion of true null hypotheses in multiple testing.  相似文献   

4.
This paper presents a look at the underused procedure of testing for Type II errors when "negative" results are encountered during research. It recommends setting a statistical alternative hypothesis based on anthropologically derived information and calculating the probability of committing this type of error. In this manner, the process is similar to that used for testing Type I errors, which is clarified by examples from the literature. It is hoped that researchers will use the information presented here as a means of attaching levels of probability to acceptance of null hypotheses.  相似文献   

5.

Background  

Time-course microarray experiments are widely used to study the temporal profiles of gene expression. Storey et al. (2005) developed a method for analyzing time-course microarray studies that can be applied to discovering genes whose expression trajectories change over time within a single biological group, or those that follow different time trajectories among multiple groups. They estimated the expression trajectories of each gene using natural cubic splines under the null (no time-course) and alternative (time-course) hypotheses, and used a goodness of fit test statistic to quantify the discrepancy. The null distribution of the statistic was approximated through a bootstrap method. Gene expression levels in microarray data are often complicatedly correlated. An accurate type I error control adjusting for multiple testing requires the joint null distribution of test statistics for a large number of genes. For this purpose, permutation methods have been widely used because of computational ease and their intuitive interpretation.  相似文献   

6.
I analyze and summarize the empirical evidence supporting alternative hypotheses posed to explain the evolution of rodent group-living. Eight hypotheses are considered: two rely on net fitness benefits to individuals, five rely on ecological and life-history constraints, and one uses elements of both. I expose the logic behind each hypothesis, identify its key predictions, examine how the available evidence on rodent socioecology supports or rejects its predictions, and identify some priorities for future research. I show that empirical support for most hypotheses is meager due to a lack of relevant studies. Also, empirical support for a particular hypothesis, when it exists, comes from studies of the same species used to formulate the original hypothesis. Two exceptions are the hypothesis that individual rodents live in groups to reduce their predation risk and the hypothesis that group-living was adopted by individuals to reduce their cost of thermoregulation. Finally, most hypotheses have been examined without regard to competing hypotheses and often in a restricted taxonomic context. This is clearly an unfortunate situation given that most competing hypotheses are not mutually exclusive. I suggest that in the future comparative approaches should be used. These studies should examine simultaneously the relevance of different benefits and constraints hypothesized to explain the evolution of rodent sociality.  相似文献   

7.
In this paper, we propose a Bayesian design framework for a biosimilars clinical program that entails conducting concurrent trials in multiple therapeutic indications to establish equivalent efficacy for a proposed biologic compared to a reference biologic in each indication to support approval of the proposed biologic as a biosimilar. Our method facilitates information borrowing across indications through the use of a multivariate normal correlated parameter prior (CPP), which is constructed from easily interpretable hyperparameters that represent direct statements about the equivalence hypotheses to be tested. The CPP accommodates different endpoints and data types across indications (eg, binary and continuous) and can, therefore, be used in a wide context of models without having to modify the data (eg, rescaling) to provide reasonable information-borrowing properties. We illustrate how one can evaluate the design using Bayesian versions of the type I error rate and power with the objective of determining the sample size required for each indication such that the design has high power to demonstrate equivalent efficacy in each indication, reasonably high power to demonstrate equivalent efficacy simultaneously in all indications (ie, globally), and reasonable type I error control from a Bayesian perspective. We illustrate the method with several examples, including designing biosimilars trials for follicular lymphoma and rheumatoid arthritis using binary and continuous endpoints, respectively.  相似文献   

8.
This paper addresses issues concerning methodologies on the sample size required for statistical evaluation of bridging evidence for a registration of pharmaceutical products in a new region. The bridging data can be either in the Complete Clinical Data Package (CCDP) generated during clinical drug development for submission to the original region or from a bridging study conducted in the new region after the pharmaceutical product was approved in the original region. When the data are in the CCDP, the randomized parallel dose‐response design stratified to the ethnic factors and region will generate internally valid data for evaluating similarity concurrently between the regions for assessment of the ability of extrapolation to the new region. Formula for sample size under this design is derived. The required sample size for evaluation of similarity between the regions can be at least four times as large as that needed for evaluation of treatment effects only. For a bridging study conducted in the new region in which the data of the foreign and new regions are not generated concurrently, a hierarchical model approach to incorporating the foreign bridging information into the data generated by the bridging study is suggested. The sample size required is evaluated. In general, the required sample size for the bridging trials in the new region is inversely proportional to equivalence limits, variability of primary endpoints, and the number of patients of the trials conducted in the original region.  相似文献   

9.
For genetic association studies with multiple phenotypes, we propose a new strategy for multiple testing with family-based association tests (FBATs). The strategy increases the power by both using all available family data and reducing the number of hypotheses tested while being robust against population admixture and stratification. By use of conditional power calculations, the approach screens all possible null hypotheses without biasing the nominal significance level, and it identifies the subset of phenotypes that has optimal power when tested for association by either univariate or multivariate FBATs. An application of our strategy to an asthma study shows the practical relevance of the proposed methodology. In simulation studies, we compare our testing strategy with standard methodology for family studies. Furthermore, the proposed principle of using all data without biasing the nominal significance in an analysis prior to the computation of the test statistic has broad and powerful applications in many areas of family-based association studies.  相似文献   

10.
In many phase III clinical trials, it is desirable to separately assess the treatment effect on two or more primary endpoints. Consider the MERIT-HF study, where two endpoints of primary interest were time to death and the earliest of time to first hospitalization or death (The International Steering Committee on Behalf of the MERIT-HF Study Group, 1997, American Journal of Cardiology 80[9B], 54J-58J). It is possible that treatment has no effect on death but a beneficial effect on first hospitalization time, or it has a detrimental effect on death but no effect on hospitalization. A good clinical trial design should permit early stopping as soon as the treatment effect on both endpoints becomes clear. Previous work in this area has not resolved how to stop the study early when one or more endpoints have no treatment effect or how to assess and control the many possible error rates for concluding wrong hypotheses. In this article, we develop a general methodology for group sequential clinical trials with multiple primary endpoints. This method uses a global alpha-spending function to control the overall type I error and a multiple decision rule to control error rates for concluding wrong alternative hypotheses. The method is demonstrated with two simulated examples based on the MERIT-HF study.  相似文献   

11.
Case‐control studies are primary study designs used in genetic association studies. Sasieni (Biometrics 1997, 53, 1253–1261) pointed out that the allelic chi‐square test used in genetic association studies is invalid when Hardy‐Weinberg equilibrium (HWE) is violated in a combined population. It is important to know how much type I error rate is deviated from the nominal level under violated HWE. We examine bounds of type I error rate of the allelic chi‐square test. We also investigate power of the goodness‐of‐fit test for HWE which can be used as a guideline for selecting an appropriate test between the allelic chi‐square test and the modified allelic chi‐square test, the latter of which was proposed for cases of violated HWE. In small samples, power is not large enough to detect the Wright's inbreeding model of small values of inbreeding coefficient. Therefore, when the null hypothesis of HWE is barely accepted, the modified test should be considered as an alternative method. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

12.
One of the major challenges facing genome-scan studies to discover disease genes is the assessment of the genomewide significance. The assessment becomes particularly challenging if the scan involves a large number of markers collected from a relatively small number of meioses. Typically, this assessment has two objectives: to assess genomewide significance under the null hypothesis of no linkage and to evaluate true-positive and false-positive prediction error rates under alternative hypotheses. The distinction between these goals allows one to formulate the problem in the well-established paradigm of statistical hypothesis testing. Within this paradigm, we evaluate the traditional criterion of LOD score 3.0 and a recent suggestion of LOD score 3.6, using the Monte Carlo simulation method. The Monte Carlo experiments show that the type I error varies with the chromosome length, with the number of markers, and also with sample sizes. For a typical setup with 50 informative meioses on 50 markers uniformly distributed on a chromosome of average length (i.e., 150 cM), the use of LOD score 3.0 entails an estimated chromosomewide type I error rate of.00574, leading to a genomewide significance level >.05. In contrast, the corresponding type I error for LOD score 3.6 is.00191, giving a genomewide significance level of slightly <.05. However, with a larger sample size and a shorter chromosome, a LOD score between 3.0 and 3.6 may be preferred, on the basis of proximity to the targeted type I error. In terms of reliability, these two LOD-score criteria appear not to have appreciable differences. These simulation experiments also identified factors that influence power and reliability, shedding light on the design of genome-scan studies.  相似文献   

13.
In experiments with many statistical tests there is need to balance type I and type II error rates while taking multiplicity into account. In the traditional approach, the nominal -level such as 0.05 is adjusted by the number of tests, , i.e., as 0.05/. Assuming that some proportion of tests represent “true signals”, that is, originate from a scenario where the null hypothesis is false, power depends on the number of true signals and the respective distribution of effect sizes. One way to define power is for it to be the probability of making at least one correct rejection at the assumed -level. We advocate an alternative way of establishing how “well-powered” a study is. In our approach, useful for studies with multiple tests, the ranking probability is controlled, defined as the probability of making at least correct rejections while rejecting hypotheses with smallest P-values. The two approaches are statistically related. Probability that the smallest P-value is a true signal (i.e., ) is equal to the power at the level , to an excellent approximation. Ranking probabilities are also related to the false discovery rate and to the Bayesian posterior probability of the null hypothesis. We study properties of our approach when the effect size distribution is replaced for convenience by a single “typical” value taken to be the mean of the underlying distribution. We conclude that its performance is often satisfactory under this simplification; however, substantial imprecision is to be expected when is very large and is small. Precision is largely restored when three values with the respective abundances are used instead of a single typical effect size value.  相似文献   

14.
Switching between testing for superiority and non-inferiority has been an important statistical issue in the design and analysis of active controlled clinical trial. In practice, it is often conducted with a two-stage testing procedure. It has been assumed that there is no type I error rate adjustment required when either switching to test for non-inferiority once the data fail to support the superiority claim or switching to test for superiority once the null hypothesis of non-inferiority is rejected with a pre-specified non-inferiority margin in a generalized historical control approach. However, when using a cross-trial comparison approach for non-inferiority testing, controlling the type I error rate sometimes becomes an issue with the conventional two-stage procedure. We propose to adopt a single-stage simultaneous testing concept as proposed by Ng (2003) to test both non-inferiority and superiority hypotheses simultaneously. The proposed procedure is based on Fieller's confidence interval procedure as proposed by Hauschke et al. (1999).  相似文献   

15.
We evaluate a common reasoning strategy used in community ecology and comparative psychology for selecting between competing hypotheses. This strategy labels one hypothesis as a “null” on the grounds of its simplicity and epistemically privileges it as accepted until rejected. We argue that this strategy is unjustified. The asymmetrical treatment of statistical null hypotheses is justified through the experimental and mathematical contexts in which they are used, but these contexts are missing in the case of the “pseudo-null hypotheses” found in our case studies. Moreover, statistical nulls are often not epistemically privileged in practice over their alternatives because failing to reject the null is usually a negative result about the alternative, experimental hypothesis. Scientists should eschew the appeal to pseudo-nulls. It is a rhetorical strategy that glosses over a commitment to valuing simplicity over other epistemic virtues in the name of good scientific and statistical methodology.  相似文献   

16.
Several independent clinical trials are usually conducted to demonstrate and support the evidence of the efficacy of a new drug. When not all the trials demonstrate a treatment effect because of a lack of statistical significant finding, the sponsor sometimes conducts a post hoc pooled test and uses the pooled result as extra statistical evidence. In this paper, we study the extent of type I error rate inflation with the post hoc pooled analysis and the power of interaction test in assessing the homogeneity of the trials with respect to treatment effect size. We also compare the power of several test procedures with or without pooled test involved and discuss the appropriateness of pooled tests under different alternative hypotheses.  相似文献   

17.
Current use of terms to describe evolutionary patterns is vague and inconsistent. In this paper, logical definitions of terms that describe specific evolutionary patterns are proposed. Evolutionary inertia is defined in a manner analogous to inertia in physics. A character in a static state of evolutionary inertia represents evolutionary stasis while a character showing consistent directional evolutionary change represents evolutionary thrust. I argue that evolutionary stasis should serve as the null hypothesis in all character evolution studies. Deviations from this null model consistent with alternative hypotheses (e.g. random drift, adaptation) can then give us insight into evolutionary processes. Failure to reject a null hypothesis of evolutionary stasis should not be used as a serious explanation of data. The term evolutionary constraint is appropriate only when a selective advantage for a character state transition is established but this transition is prevented by specific, identified factors. One type of evolutionary constraint discussed is evolutionary momentum. A final pattern of evolutionary change discussed is closely related to evolutionary thrust and is referred to as evolutionary acceleration. I provide examples of how this set of definitions can improve our ability to communicate interpretations of evolutionary patterns.  相似文献   

18.
This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of Type I error rates, defined as generalized tail probability (gTP) error rates, gTP (q,g) = Pr(g (V(n),S(n)) > q), and generalized expected value (gEV) error rates, gEV (g) = E [g (V(n),S(n))], for arbitrary functions g (V(n),S(n)) of the numbers of false positives V(n) and true positives S(n). Of particular interest are error rates based on the proportion g (V(n),S(n)) = V(n) /(V(n) + S(n)) of Type I errors among the rejected hypotheses, such as the false discovery rate (FDR), FDR = E [V(n) /(V(n) + S(n))]. The proposed procedures offer several advantages over existing methods. They provide Type I error control for general data generating distributions, with arbitrary dependence structures among variables. Gains in power are achieved by deriving rejection regions based on guessed sets of true null hypotheses and null test statistics randomly sampled from joint distributions that account for the dependence structure of the data. The Type I error and power properties of an FDR-controlling version of the resampling-based empirical Bayes approach are investigated and compared to those of widely-used FDR-controlling linear step-up procedures in a simulation study. The Type I error and power trade-off achieved by the empirical Bayes procedures under a variety of testing scenarios allows this approach to be competitive with or outperform the Storey and Tibshirani (2003) linear step-up procedure, as an alternative to the classical Benjamini and Hochberg (1995) procedure.  相似文献   

19.
Furihata S  Ito T  Kamatani N 《Genetics》2006,174(3):1505-1516
The use of haplotype information in case-control studies is an area of focus for the research on the association between phenotypes and genetic polymorphisms. We examined the validity of the application of the likelihood-based algorithm, which was originally developed to analyze the data from cohort studies or clinical trials, to the data from case-control studies. This algorithm was implemented in a computer program called PENHAPLO. In this program, haplotype frequencies and penetrances are estimated using the expectation-maximization algorithm, and the haplotype-phenotype association is tested using the generalized likelihood ratio. We show that this algorithm was useful not only for cohort studies but also for case-control studies. Simulations under the null hypothesis (no association between haplotypes and phenotypes) have shown that the type I error rates were accurately estimated. The simulations under alternative hypotheses showed that PENHAPLO is a robust method for the analysis of the data from case-control studies even when the haplotypes were not in HWE, although real penetrances cannot be estimated. The power of PENHAPLO was higher than that of other methods using the likelihood-ratio test for the comparison of haplotype frequencies. Results of the analysis of real data indicated that a significant association between haplotypes in the SAA1 gene and AA-amyloidosis phenotype was observed in patients with rheumatoid arthritis, thereby suggesting the validity of the application of PENHAPLO for case-control data.  相似文献   

20.
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号