首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
After variable selection, standard inferential procedures for regression parameters may not be uniformly valid; there is no finite-sample size at which a standard test is guaranteed to approximately attain its nominal size. This problem is exacerbated in high-dimensional settings, where variable selection becomes unavoidable. This has prompted a flurry of activity in developing uniformly valid hypothesis tests for a low-dimensional regression parameter (eg, the causal effect of an exposure A on an outcome Y) in high-dimensional models. So far there has been limited focus on model misspecification, although this is inevitable in high-dimensional settings. We propose tests of the null that are uniformly valid under sparsity conditions weaker than those typically invoked in the literature, assuming working models for the exposure and outcome are both correctly specified. When one of the models is misspecified, by amending the procedure for estimating the nuisance parameters, our tests continue to be valid; hence, they are doubly robust. Our proposals are straightforward to implement using existing software for penalized maximum likelihood estimation and do not require sample splitting. We illustrate them in simulations and an analysis of data obtained from the Ghent University intensive care unit.  相似文献   

2.
3.
Bayesian empirical likelihood   总被引:2,自引:0,他引:2  
Lazar  Nicole A. 《Biometrika》2003,90(2):319-326
  相似文献   

4.
Tang NS  Tang ML 《Biometrics》2002,58(4):972-980
In this article, we consider small-sample statistical inference for rate ratio (RR) in a correlated 2 x 2 table with a structural zero in one of the off-diagonal cells. Existing Wald's test statistic and logarithmic transformation test statistic will be adopted for this purpose. Hypothesis testing and confidence interval construction based on large-sample theory will be reviewed first. We then propose reliable small-sample exact unconditional procedures for hypothesis testing and confidence interval construction. We present empirical results to evince the better confidence interval performance of our proposed exact unconditional procedures over the traditional large-sample procedures in small-sample designs. Unlike the findings given in Lui (1998, Biometrics 54, 706-711), our empirical studies show that the existing asymptotic procedures may not attain a prespecified confidence level even in moderate sample-size designs (e.g., n = 50). Our exact unconditional procedures on the other hand do not suffer from this problem. Hence, the asymptotic procedures should be applied with caution. We propose two approximate unconditional confidence interval construction methods that outperform the existing asymptotic ones in terms of coverage probability and expected interval width. Also, we empirically demonstrate that the approximate unconditional tests are more powerful than their associated exact unconditional tests. A real data set from a two-step tuberculosis testing study is used to illustrate the methodologies.  相似文献   

5.
6.
7.
In phylogenetic systematics a problem of great practical and theoretical interest is to construct one or more large phylogenies (evolutionary trees), i.e., supertrees, from a given set of small phylogenies with overlapping sets of leaf labels. Although the methods being used to solve this problem are usually given plausible biological or theoretical justifications, occasionally it is possible to see that the result of a supertree method (SM) is explosive, and therefore logically meaningless, in the sense that it has been inferred from logical propositions that are contradictory. This paper presents the basic ideas and issues of how explosions affect the inference of rooted trees by SMs. We define the relevant concepts, give examples, and show how sometimes it is possible to identify hot spots in the input from which an SM may make explosive inferences that cannot be logically justified.  相似文献   

8.
9.
Spatial extent inference (SEI) is widely used across neuroimaging modalities to adjust for multiple comparisons when studying brain‐phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF)‐based tools can have inflated family‐wise error rates (FWERs). This has led to substantial controversy as to which processing choices are necessary to control the FWER using GRF‐based SEI. The failure of GRF‐based methods is due to unrealistic assumptions about the spatial covariance function of the imaging data. A permutation procedure is the most robust SEI tool because it estimates the spatial covariance function from the imaging data. However, the permutation procedure can fail because its assumption of exchangeability is violated in many imaging modalities. Here, we propose the (semi‐) parametric bootstrap joint (PBJ; sPBJ) testing procedures that are designed for SEI of multilevel imaging data. The sPBJ procedure uses a robust estimate of the spatial covariance function, which yields consistent estimates of standard errors, even if the covariance model is misspecified. We use the methods to study the association between performance and executive functioning in a working memory functional magnetic resonance imaging study. The sPBJ has similar or greater power to the PBJ and permutation procedures while maintaining the nominal type 1 error rate in reasonable sample sizes. We provide an R package to perform inference using the PBJ and sPBJ procedures.  相似文献   

10.
11.
The ends of eukaryotic chromosomes are protected by telomeres, nucleoprotein structures that are essential for chromosomal stability and integrity. Understanding how telomere length is controlled has significant medical implications, especially in the fields of aging and cancer. Two recent systematic genome‐wide surveys measuring the telomere length of deleted mutants in the yeast Saccharomyces cerevisiae have identified hundreds of telomere length maintenance (TLM) genes, which span a large array of functional categories and different localizations within the cell. This study presents a novel general method that integrates large‐scale screening mutant data with protein–protein interaction information to rigorously chart the cellular subnetwork underlying the function investigated. Applying this method to the yeast telomere length control data, we identify pathways that connect the TLM proteins to the telomere‐processing machinery, and predict new TLM genes and their effect on telomere length. We experimentally validate some of these predictions, demonstrating that our method is remarkably accurate. Our results both uncover the complex cellular network underlying TLM and validate a new method for inferring such networks.  相似文献   

12.
Almudevar A 《Biometrics》2001,57(3):757-763
The problem of assessing the variability in pedigree reconstruction using DNA markers is considered for the special case of single generation samples with no parents present. Error in pedigree reconstruction is measured through a metric imposed on the space of partitions of the individuals into family groups. A confidence set can therefore be taken to be a neighborhood of a point estimate, analogous to the estimation of a parameter in Euclidean space. The coverage probability is estimated using bootstrap techniques. Although the distributional properties of the sample depend on the population genotype frequencies, these are in practice usually unknown. Confidence sets conditioned on a statistic approximately sufficient for these frequencies are compared with confidence sets obtained by substituting frequency estimates directly into the sampling distribution. In two simulation studies, the difference is found to be of some consequence.  相似文献   

13.
In some cases model-based and model-assisted inferences canlead to very different estimators. These two paradigms are notso different if we search for an optimal strategy rather thanjust an optimal estimator, a strategy being a pair composedof a sampling design and an estimator. We show that, under alinear model, the optimal model-assisted strategy consists ofa balanced sampling design with inclusion probabilities thatare proportional to the standard deviations of the errors ofthe model and the Horvitz–Thompson estimator. If the heteroscedasticityof the model is 'fully explainable’ by the auxiliary variables,then this strategy is also optimal in a model-based sense. Moreover,under balanced sampling and with inclusion probabilities thatare proportional to the standard deviation of the model, thebest linear unbiased estimator and the Horvitz–Thompsonestimator are equal. Finally, it is possible to construct asingle estimator for both the design and model variance. Theinference can thus be valid under the sampling design and underthe model.  相似文献   

14.
15.
Abstract

Statistical inference on accumulation curves is considered from a design-based perspective. Preliminaries on probabilistic sampling of plants and species are given, emphasizing the fundamental role of independent replications of the sampling scheme. The role of rarefaction curves as a tool for making inference on the effectiveness of the sampling procedures to compile accurate species lists is outlined. Design-based and model-based inference are discussed and compared. Some future developments for design-based inference are considered.  相似文献   

16.
The field of phylogenetic tree estimation has been dominated by three broad classes of methods: distance-based approaches, parsimony and likelihood-based methods (including maximum likelihood (ML) and Bayesian approaches). Here we introduce two new approaches to tree inference: pairwise likelihood estimation and a distance-based method that estimates the number of substitutions along the paths through the tree. Our results include the derivation of the formulae for the probability that two leaves will be identical at a site given a number of substitutions along the path connecting them. We also derive the posterior probability of the number of substitutions along a path between two sequences. The calculations for the posterior probabilities are exact for group-based, symmetric models of character evolution, but are only approximate for more general models.  相似文献   

17.
18.
Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH–Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.  相似文献   

19.
In many cell types, the inositol trisphosphate receptor (IPR) is one of the important components that control intracellular calcium dynamics, and an understanding of this receptor (which is also a calcium channel) is necessary for an understanding of calcium oscillations and waves. Recent advances in experimental techniques now allow for the measurement of single-channel activity of the IPR in conditions similar to its native environment, and these data can be used to determine the rate constants in Markov models of the IPR. We illustrate a parameter estimation method based on Markov chain Monte Carlo, which can be used to fit directly to single-channel data, and determining, as an intrinsic part of the fit, the times at which the IPR is opening and closing. We show, using simulated data, the most complex Markov model that can be unambiguously determined from steady-state data and show that non-steady-state data is required to determine more complex models.  相似文献   

20.
Interference occurs between individuals when the treatment (or exposure) of one individual affects the outcome of another individual. Previous work on causal inference methods in the presence of interference has focused on the setting where it is a priori assumed that there is “partial interference,” in the sense that individuals can be partitioned into groups wherein there is no interference between individuals in different groups. Bowers et al. (2012, Political Anal, 21, 97–124) and Bowers et al. (2016, Political Anal, 24, 395–403) consider randomization-based inferential methods that allow for more general interference structures in the context of randomized experiments. In this paper, extensions of Bowers et al. that allow for failure time outcomes subject to right censoring are proposed. Permitting right-censored outcomes is challenging because standard randomization-based tests of the null hypothesis of no treatment effect assume that whether an individual is censored does not depend on treatment. The proposed extension of Bowers et al. to allow for censoring entails adapting the method of Wang et al. (2010, Biostatistics, 11, 676–692) for two-sample survival comparisons in the presence of unequal censoring. The methods are examined via simulation studies and utilized to assess the effects of cholera vaccination in an individually randomized trial of 73 000 children and women in Matlab, Bangladesh.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号