首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.
2.
Summary This article develops semiparametric approaches for estimation of propensity scores and causal survival functions from prevalent survival data. The analytical problem arises when the prevalent sampling is adopted for collecting failure times and, as a result, the covariates are incompletely observed due to their association with failure time. The proposed procedure for estimating propensity scores shares interesting features similar to the likelihood formulation in case‐control study, but in our case it requires additional consideration in the intercept term. The result shows that the corrected propensity scores in logistic regression setting can be obtained through standard estimation procedure with specific adjustments on the intercept term. For causal estimation, two different types of missing sources are encountered in our model: one can be explained by potential outcome framework; the other is caused by the prevalent sampling scheme. Statistical analysis without adjusting bias from both sources of missingness will lead to biased results in causal inference. The proposed methods were partly motivated by and applied to the Surveillance, Epidemiology, and End Results (SEER)‐Medicare linked data for women diagnosed with breast cancer.  相似文献   

3.
A method of inverse sampling of controls in a matched case-control study is described in which, for each case, controls are sampled until a discordant set is achieved. For a binary exposure, inverse sampling is used to determine the number of controls for each case. When most individuals in a population have the same exposure, standard case-control sampling may result in many case-control sets being concordant with respect to exposure and thus uninformative in the conditional logistic analysis. The method using inverse control sampling is proposed as a solution to this problem in situations when it is practically feasible. In many circumstances, inverse control sampling is found to offer improved statistical efficiency relative to a comparable study with a fixed number of controls per case.  相似文献   

4.
5.
Fears TR  Gail MH 《Biometrics》2000,56(1):190-198
We present a pseudolikelihood approach for analyzing a two-stage population-based case-control study with cluster sampling of controls. These methods were developed to analyze data from a study of nonmelanoma skin cancer (NMSC). This study was designed to evaluate the role of ultraviolet radiation (UVB) on NMSC risk while adjusting for age group, which is known for all subjects, and for other individual-level factors, such as susceptibility to sunburn, which are known only for participants in the case-control study. The methods presented yield estimates of relative and absolute risk, with standard errors, while accounting naturally for the two-stage sampling of the cohort and cluster sampling of controls.  相似文献   

6.
In landscape genetics, it is largely unknown how choices regarding sampling density and study area size impact inferences upon which habitat features impede vs. facilitate gene flow. While it is recommended that sampling locations be spaced no further apart than the average individual''s dispersal distance, for low‐mobility species, this could lead to a challenging number of sampling locations, or an unrepresentative study area. We assessed the effects of sampling density and study area size on landscape genetic inferences for a dispersal‐limited amphibian, Plethodon mississippi, via analysis of nested datasets. Microsatellite‐based genetic distances among individuals were divided into three datasets representing sparse sampling across a large study area, dense sampling across a small study area, or sparse sampling across the same small study area. These datasets were a proxy for gene flow (i.e., the response variable) in maximum‐likelihood population effects models that assessed the nature and strength of their relationship with each of five land‐use classes (i.e., potential predictor variables). Comparisons of outcomes were based on the rank order of effect, sign of effect (i.e., gene flow resistance vs. facilitation), spatial scale of effect, and functional relationship with gene flow. The best‐fit model for each dataset had the same sign of effect for hardwood forests, manmade structures, and pine forests, indicating the impacts of these land‐use classes on dispersal and gene flow in P. mississippi are robust to sampling scheme. Contrasting sampling densities led to a different inferred functional relationship between agricultural areas and gene flow. Study area size appeared to influence the scale of effect of manmade structures and the sign of effect of pine forests. Our findings provided evidence for an influence of sampling density, study area size, and sampling effort upon inferences. Accordingly, we recommend iterative subsampling of empirical datasets and continued investigation into the sensitivities of landscape genetic analyses using simulations.  相似文献   

7.
8.
Hawkins DL  Han CP 《Biometrics》2000,56(3):848-854
Longitudinal studies often collect only aggregate data, which allows only inefficient transition probability estimates. Barring enormous aggregate samples, improving the efficiency of transition probability estimates seems to be impossible without additional partial-transition data. This paper discusses several sampling plans that collect data of both types, as well as a methodology that combines them into efficient estimates of transition probabilities. The method handles both fixed and time-dependent categorical covariates and requires no assumptions (e.g., time homogeneity, Markov) about the population evolution.  相似文献   

9.
The problem of exact conditional inference for discrete multivariate case-control data has two forms. The first is grouped case-control data, where Monte Carlo computations can be done using the importance sampling method of Booth and Butler (1999, Biometrika86, 321-332), or a proposed alternative sequential importance sampling method. The second form is matched case-control data. For this analysis we propose a new exact sampling method based on the conditional-Poisson distribution for conditional testing with one binary and one integral ordered covariate. This method makes computations on data sets with large numbers of matched sets fast and accurate. We provide detailed derivation of the constraints and conditional distributions for conditional inference on grouped and matched data. The methods are illustrated on several new and old data sets.  相似文献   

10.
Pfeiffer RM  Ryan L  Litonjua A  Pee D 《Biometrics》2005,61(4):982-991
The case-cohort design for longitudinal data consists of a subcohort sampled at the beginning of the study that is followed repeatedly over time, and a case sample that is ascertained through the course of the study. Although some members in the subcohort may experience events over the study period, we refer to it as the "control-cohort." The case sample is a random sample of subjects not in the control-cohort, who have experienced at least one event during the study period. Different correlations among repeated observations on the same individual are accommodated by a two-level random-effects model. This design allows consistent estimation of all parameters estimable in a cohort design and is a cost-effective way to study the effects of covariates on repeated observations of relatively rare binary outcomes when exposure assessment is expensive. It is an extension of the case-cohort design (Prentice, 1986, Biometrika73, 1-11) and the bidirectional case-crossover design (Navidi, 1998, Biometrics54, 596-605). A simulation study compares the efficiency of the longitudinal case-cohort design to a full cohort analysis, and we find that in certain situations up to 90% efficiency can be obtained with half the sample size required for a full cohort analysis. A bootstrap method is presented that permits testing for intra-subject homogeneity in the presence of unidentifiable nuisance parameters in the two-level random-effects model. As an illustration we apply the design to data from an ongoing study of childhood asthma.  相似文献   

11.
Fitting regression models to case-control data by maximum likelihood   总被引:3,自引:0,他引:3  
SCOTT  A. J.; WILD  C. J. 《Biometrika》1997,84(1):57-71
  相似文献   

12.
Chen Z  Wang YG 《Biometrics》2004,60(4):997-1004
This article is motivated by a lung cancer study where a regression model is involved and the response variable is too expensive to measure but the predictor variable can be measured easily with relatively negligible cost. This situation occurs quite often in medical studies, quantitative genetics, and ecological and environmental studies. In this article, by using the idea of ranked-set sampling (RSS), we develop sampling strategies that can reduce cost and increase efficiency of the regression analysis for the above-mentioned situation. The developed method is applied retrospectively to a lung cancer study. In the lung cancer study, the interest is to investigate the association between smoking status and three biomarkers: polyphenol DNA adducts, micronuclei, and sister chromatic exchanges. Optimal sampling schemes with different optimality criteria such as A-, D-, and integrated mean square error (IMSE)-optimality are considered in the application. With set size 10 in RSS, the improvement of the optimal schemes over simple random sampling (SRS) is great. For instance, by using the optimal scheme with IMSE-optimality, the IMSEs of the estimated regression functions for the three biomarkers are reduced to about half of those incurred by using SRS.  相似文献   

13.
Familial risk of disease is often assessed using case control studies based on referent databases. A referent database is a collection of family histories of cases typically assembled as a result of one family member being diagnosed with disease. This sampling scheme is equivalent to sampling families proportional to their size. The larger the family, the greater the probability of finding the family in the referent registry. This phenomena is known as length-biased sampling. The consequence of this kind of sampling is to bias the regression estimate associated with family history. The estimate is typically inflated in comparison to what is true for the actual population.  相似文献   

14.
Transportation infrastructures such as roads, railroads and canals can have major environmental impacts. Ecological road effects include the destruction and fragmentation of habitat, the interruption of ecological processes and increased erosion and pollution. Growing concern about these ecological road effects has led to the emergence of a new scientific discipline called road ecology. The goal of road ecology is to provide planners with scientific advice on how to avoid, minimize or mitigate negative environmental impacts of transportation. In this review, we explore the potential of molecular genetics to contribute to road ecology. First, we summarize general findings from road ecology and review studies that investigate road effects using genetic data. These studies generally focus only on barrier effects of roads on local genetic diversity and structure and only use a fraction of available molecular approaches. Thus, we propose additional molecular applications that can be used to evaluate road effects across multiple scales and dimensions of the biodiversity hierarchy. Finally, we make recommendations for future research questions and study designs that would advance molecular road ecology. Our review demonstrates that molecular approaches can substantially contribute to road ecology research and that interdisciplinary, long-term collaborations will be particularly important for realizing the full potential of molecular road ecology.  相似文献   

15.
  1. Download : Download high-res image (108KB)
  2. Download : Download full-size image
  相似文献   

16.
Magic bullets and golden rules: data sampling in molecular phylogenetics   总被引:6,自引:0,他引:6  
Data collection for molecular phylogenetic studies is based on samples of both genes and taxa. In an ideal world, with no limitations to resources, as many genes could be sampled as deemed necessary to address phylogenetic problems. Given limited resources in the real world, inadequate (in terms of choice of genes or number of genes) sequences or restricted taxon sampling can adversely affect the reliability or information gained in phylogenetics. Recent empirical and simulation-based studies of data sampling in molecular phylogenetics have reached differing conclusions on how to deal with these problems. Some advocated sampling more genes, others more taxa. There is certainly no ‘magic bullet’ that will fit all phylogenetic problems, and no specific ‘golden rules’ have been deduced, other than that single genes may not always contain sufficient phylogenetic information. However, several general conclusions and suggestions can be made. One suggestion is that the determination of a multiple, but moderate number (e.g., 6–10) of gene sequences might take precedence over sequencing a larger set of genes and thereby permit the sampling of more taxa for a phylogenetic study.  相似文献   

17.
18.
19.
Ranked set sampling (RSS) is a sampling procedure that can be considerably more efficient than simple random sampling (SRS). When the variable of interest is binary, ranking of the sample observations can be implemented using the estimated probabilities of success obtained from a logistic regression model developed for the binary variable. The main objective of this study is to use substantial data sets to investigate the application of RSS to estimation of a proportion for a population that is different from the one that provides the logistic regression. Our results indicate that precision in estimation of a population proportion is improved through the use of logistic regression to carry out the RSS ranking and, hence, the sample size required to achieve a desired precision is reduced. Further, the choice and the distribution of covariates in the logistic regression model are not overly crucial for the performance of a balanced RSS procedure.  相似文献   

20.
Fewster RM 《Biometrics》2011,67(4):1518-1531
Summary In spatial surveys for estimating the density of objects in a survey region, systematic designs will generally yield lower variance than random designs. However, estimating the systematic variance is well known to be a difficult problem. Existing methods tend to overestimate the variance, so although the variance is genuinely reduced, it is over‐reported, and the gain from the more efficient design is lost. The current approaches to estimating a systematic variance for spatial surveys are to approximate the systematic design by a random design, or approximate it by a stratified design. Previous work has shown that approximation by a random design can perform very poorly, while approximation by a stratified design is an improvement but can still be severely biased in some situations. We develop a new estimator based on modeling the encounter process over space. The new “striplet” estimator has negligible bias and excellent precision in a wide range of simulation scenarios, including strip‐sampling, distance‐sampling, and quadrat‐sampling surveys, and including populations that are highly trended or have strong aggregation of objects. We apply the new estimator to survey data for the spotted hyena (Crocuta crocuta) in the Serengeti National Park, Tanzania, and find that the reported coefficient of variation for estimated density is 20% using approximation by a random design, 17% using approximation by a stratified design, and 11% using the new striplet estimator. This large reduction in reported variance is verified by simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号