首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Cohort studies provide information on relative hazards and pure risks of disease. For rare outcomes, large cohorts are needed to have sufficient numbers of events, making it costly to obtain covariate information on all cohort members. We focus on nested case-control designs that are used to estimate relative hazard in the Cox regression model. In 1997, Langholz and Borgan showed that pure risk can also be estimated from nested case-control data. However, these approaches do not take advantage of some covariates that may be available on all cohort members. Researchers have used weight calibration to increase the efficiency of relative hazard estimates from case-cohort studies and nested cased-control studies. Our objective is to extend weight calibration approaches to nested case-control designs to improve precision of estimates of relative hazards and pure risks. We show that calibrating sample weights additionally against follow-up times multiplied by relative hazards during the risk projection period improves estimates of pure risk. Efficiency improvements for relative hazards for variables that are available on the entire cohort also contribute to improved efficiency for pure risks. We develop explicit variance formulas for the weight-calibrated estimates. Simulations show how much precision is improved by calibration and confirm the validity of inference based on asymptotic normality. Examples are provided using data from the American Association of Retired Persons Diet and Health Cohort Study.  相似文献   

2.
Sampling is a key issue for answering most ecological and evolutionary questions. The importance of developing a rigorous sampling design tailored to specific questions has already been discussed in the ecological and sampling literature and has provided useful tools and recommendations to sample and analyse ecological data. However, sampling issues are often difficult to overcome in ecological studies due to apparent inconsistencies between theory and practice, often leading to the implementation of simplified sampling designs that suffer from unknown biases. Moreover, we believe that classical sampling principles which are based on estimation of means and variances are insufficient to fully address many ecological questions that rely on estimating relationships between a response and a set of predictor variables over time and space. Our objective is thus to highlight the importance of selecting an appropriate sampling space and an appropriate sampling design. We also emphasize the importance of using prior knowledge of the study system to estimate models or complex parameters and thus better understand ecological patterns and processes generating these patterns. Using a semi‐virtual simulation study as an illustration we reveal how the selection of the space (e.g. geographic, climatic), in which the sampling is designed, influences the patterns that can be ultimately detected. We also demonstrate the inefficiency of common sampling designs to reveal response curves between ecological variables and climatic gradients. Further, we show that response‐surface methodology, which has rarely been used in ecology, is much more efficient than more traditional methods. Finally, we discuss the use of prior knowledge, simulation studies and model‐based designs in defining appropriate sampling designs. We conclude by a call for development of methods to unbiasedly estimate nonlinear ecologically relevant parameters, in order to make inferences while fulfilling requirements of both sampling theory and field work logistics.  相似文献   

3.
Estimating the effects of haplotypes on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A haplotype is a specific sequence of nucleotides on the same chromosome of an individual and can only be measured indirectly through the genotype. We consider cohort studies which collect genotype data on a subset of cohort members through case-cohort or nested case-control sampling. We formulate the effects of haplotypes and possibly time-varying environmental variables on the age of onset through a broad class of semiparametric regression models. We construct appropriate nonparametric likelihoods, which involve both finite- and infinite-dimensional parameters. The corresponding nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Consistent variance-covariance estimators are provided, and efficient and reliable numerical algorithms are developed. Simulation studies demonstrate that the asymptotic approximations are accurate in practical settings and that case-cohort and nested case-control designs are highly cost-effective. An application to a major cardiovascular study is provided.  相似文献   

4.
T R Fears  C C Brown 《Biometrics》1986,42(4):955-960
There are a number of possible designs for case-control studies. The simplest uses two separate simple random samples, but an actual study may use more complex sampling procedures. Typically, stratification is used to control for the effects of one or more risk factors in which we are interested. It has been shown (Anderson, 1972, Biometrika 59, 19-35; Prentice and Pyke, 1979, Biometrika 66, 403-411) that the unconditional logistic regression estimators apply under stratified sampling, so long as the logistic model includes a term for each stratum. We consider the case-control problem with stratified samples and assume a logistic model that does not include terms for strata, i.e., for fixed covariates the (prospective) probability of disease does not depend on stratum. We assume knowledge of the proportion sampled in each stratum as well as the total number in the stratum. We use this knowledge to obtain the maximum likelihood estimators for all parameters in the logistic model including those for variables completely associated with strata. The approach may also be applied to obtain estimators under probability sampling.  相似文献   

5.
To effectively manage rare populations, accurate monitoring data are critical. Yet many monitoring programs are initiated without careful consideration of whether chosen sampling designs will provide accurate estimates of population parameters. Obtaining accurate estimates is especially difficult when natural variability is high, or limited budgets determine that only a small fraction of the population can be sampled. The Missouri bladderpod, Lesquerella filiformis Rollins, is a federally threatened winter annual that has an aggregated distribution pattern and exhibits dramatic interannual population fluctuations. Using the simulation program SAMPLE, we evaluated five candidate sampling designs appropriate for rare populations, based on 4 years of field data: (1) simple random sampling, (2) adaptive simple random sampling, (3) grid-based systematic sampling, (4) adaptive grid-based systematic sampling, and (5) GIS-based adaptive sampling. We compared the designs based on the precision of density estimates for fixed sample size, cost, and distance traveled. Sampling fraction and cost were the most important factors determining precision of density estimates, and relative design performance changed across the range of sampling fractions. Adaptive designs did not provide uniformly more precise estimates than conventional designs, in part because the spatial distribution of L. filiformis was relatively widespread within the study site. Adaptive designs tended to perform better as sampling fraction increased and when sampling costs, particularly distance traveled, were taken into account. The rate that units occupied by L. filiformis were encountered was higher for adaptive than for conventional designs. Overall, grid-based systematic designs were more efficient and practically implemented than the others. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

6.
There are two common designs for association mapping of complex diseases: case-control and family-based designs. A case-control sample is more powerful to detect genetic effects than a family-based sample that contains the same numbers of affected and unaffected persons, although additional markers may be required to control for spurious association. When family and unrelated samples are available, statistical analyses are often performed in the family and unrelated samples separately, conditioning on parental information for the former, thus resulting in reduced power. In this report, we propose a unified approach that can incorporate both family and case-control samples and, provided the additional markers are available, at the same time corrects for population stratification. We apply the principal components of a marker matrix to adjust for the effect of population stratification. This unified approach makes it unnecessary to perform a conditional analysis of the family data and is more powerful than the separate analyses of unrelated and family samples, or a meta-analysis performed by combining the results of the usual separate analyses. This property is demonstrated in both a variety of simulation models and empirical data. The proposed approach can be equally applied to the analysis of both qualitative and quantitative traits.  相似文献   

7.
S Wacholder  M Gail  D Pee 《Biometrics》1991,47(1):63-76
We develop approximate methods to compare the efficiencies and to compute the power of alternative potential designs for sampling from a cohort before beginning to collect exposure data. Our methods require only that the cohort be assembled, meaning that the numbers of individuals Nkj at risk at pairs of event times tk and tj greater than or equal to tk are available. To compute Nkj, one needs to know the entry, follow-up, censoring, and event history, but not the exposure, for each individual. Our methods apply to any "unbiased control sampling design," in which cases are compared to a random sample of noncases at risk at the time of an event. We apply our methods to approximate the efficiencies of the nested case-control design, the case-cohort design, and an augmented case-cohort design, compared to the full cohort design, in an assembled cohort of 17,633 members of an insurance cooperative who were followed for mortality from prostatic cancer. The assumptions underlying the approximation are that exposure is unrelated both to the hazard of an event and to the hazard for censoring. The approximations performed well in simulations when both assumptions held and when the exposure was moderately related to censoring.  相似文献   

8.
Nested case-control sampling is designed to reduce the costs of large cohort studies. It is important to estimate the parameters of interest as efficiently as possible. We present a new maximum likelihood estimator (MLE) for nested case-control sampling in the context of Cox's proportional hazards model. The MLE is computed by the EM-algorithm, which is easy to implement in the proportional hazards setting. Standard errors are estimated by a numerical profile likelihood approach based on EM aided differentiation. The work was motivated by a nested case-control study that hypothesized that insulin-like growth factor I was associated with ischemic heart disease. The study was based on a population of 3784 Danes and 231 cases of ischemic heart disease where controls were matched on age and gender. We illustrate the use of the MLE for these data and show how the maximum likelihood framework can be used to obtain information additional to the relative risk estimates of covariates.  相似文献   

9.
Case-cohort and nested case-control sampling methods have recently been introduced as a means of reducing cost in large cohort studies. The asymptotic distribution theory results for relative rate estimation based on Cox type partial or pseudolikelihoods for case-cohort and nested case-control studies have been accounted for. However, many researchers use (stratified) frequency table methods for a first or primary summarization of the most important evidence on exposure-disease or dose-response relationships, i.e. the classical Mantel-Haenszel analyses, trend tests and tests for heterogeneity of relative rates. These can be followed by exponential failure time regression methods on grouped or individual data to model relationships between several factors and response. In this paper we present the adaptations needed to use these methods with case-cohort designs, illustrating their use with data from a recent case-cohort study on the relationship between diet, life-style and cancer. We assume a very general setup allowing piecewise constant failure rates, possible recurrent events per individual, independent censoring and left truncation.  相似文献   

10.
We investigated the genetic structure of Eryngium alpinum (Apiaceae) in an Alpine valley where the plant occurs in patches of various sizes. In a conservation perspective, our goal was to determine whether the valley consists of one or several genetic units. Habitat fragmentation and previous observations of restricted pollen/seed dispersal suggested pronounced genetic structure, but gene dispersal often follows a leptokurtic distribution, which may lead to weak genetic structure. We used nine microsatellite loci and two nested sampling designs (50 × 50 m grid throughout the valley and 2 × 2 m grid in two 50 × 10 m quadrats). Within the overall valley, F -statistics and Bayesian approaches indicated high genetic homogeneity. This result might be explained by: (1) underestimation of long-distance pollen/seed dispersal by in situ experiments and (2) too recent fragmentation events to build up genetic structure. Spatial autocorrelation revealed isolation by distance on the overall valley but this pattern was much more pronounced in the 50 × 10 m quadrats sampled with a 2-m mesh. This was probably associated with limited primary seed dispersal, leading to the spatial clustering of half-sibs around maternal plants. We emphasize the interest of nested sampling designs and of combining several statistical tools.  © 2008 The Linnean Society of London, Biological Journal of the Linnean Society , 2008, 93 , 667–677.  相似文献   

11.
The effect of the plot shape, number of subplots and their spatial arrangement on the sample variance for spatially explicit point populations is analysed for a simple intensity estimator. We derive the sample variance and covariance for sampling designs involving more than one subplot. Some numerical approximations are also presented. If a clustered point pattern has to be sampled, the best strategy to reduce the sample variance is to consider as many rectangular subplots as possible, for a prescribed total sample area, distributed over a grid. In contrast, if a regular point pattern is to be sampled, then a single circular subplot should be considered. If we assume that the point configuration is Poisson, then we can consider any subplot shape and spatial distribution ensuring no overlapping between the subplots. A case study in forestry is considered to assess the validity of our results.  相似文献   

12.
Standard analyses of data from case-control studies that are nested in a large cohort ignore information available for cohort members not sampled for the sub-study. This paper reviews several methods designed to increase estimation efficiency by using more of the data, treating the case-control sample as a two or three phase stratified sample. When applied to a study of coronary heart disease among women in the hormone trials of the Women’s Health Initiative, modest but increasing gains in precision of regression coefficients were observed depending on the amount of cohort information used in the analysis. The gains were particularly evident for pseudo- or maximum likelihood estimates whose validity depends on the assumed model being correct. Larger standard errors were obtained for coefficients estimated by inverse probability weighted methods that are more robust to model misspecification. Such misspecification may have been responsible for an important difference in one key regression coefficient estimated using the weighted compared with the more efficient methods.  相似文献   

13.
《Acta Oecologica》2007,31(1):54-59
Species–area relationships (SARs) are one of the fundamental patterns in ecology. However, how the way they were constructed influences resulting SAR shapes has gained astonishingly little attention. We use data of the distribution atlas of Polish butterflies to compare SARs constructed from four different designs: adding up species numbers of independent areas (species accumulation curves using contiguous and non-contiguous areas), using a nested design, and comparing species numbers of independent areas of different sizes. It appeared that the way of constructing SARs influences the outcome. We attribute this influence to the pronounced faunal dissimilarities of more distant areas (spatial species turnover). The nested design resulted in significantly higher slopes and lower intercepts of power function SARs than the other designs. SARs from all four sampling designs showed a pronounced downward curvature on small spatial scales. Only the nested design predicted species densities correctly. The implications of these results for the use of SARs in bioconservation are discussed.  相似文献   

14.
Outcome-dependent sampling (ODS) schemes can be a cost effective way to enhance study efficiency. The case-control design has been widely used in epidemiologic studies. However, when the outcome is measured on a continuous scale, dichotomizing the outcome could lead to a loss of efficiency. Recent epidemiologic studies have used ODS sampling schemes where, in addition to an overall random sample, there are also a number of supplemental samples that are collected based on a continuous outcome variable. We consider a semiparametric empirical likelihood inference procedure in which the underlying distribution of covariates is treated as a nuisance parameter and is left unspecified. The proposed estimator has asymptotic normality properties. The likelihood ratio statistic using the semiparametric empirical likelihood function has Wilks-type properties in that, under the null, it follows a chi-square distribution asymptotically and is independent of the nuisance parameters. Our simulation results indicate that, for data obtained using an ODS design, the semiparametric empirical likelihood estimator is more efficient than conditional likelihood and probability weighted pseudolikelihood estimators and that ODS designs (along with the proposed estimator) can produce more efficient estimates than simple random sample designs of the same size. We apply the proposed method to analyze a data set from the Collaborative Perinatal Project (CPP), an ongoing environmental epidemiologic study, to assess the relationship between maternal polychlorinated biphenyl (PCB) level and children's IQ test performance.  相似文献   

15.
We consider two-stage sampling designs, including so-called nested case control studies, where one takes a random sample from a target population and completes measurements on each subject in the first stage. The second stage involves drawing a subsample from the original sample, collecting additional data on the subsample. This data structure can be viewed as a missing data structure on the full-data structure collected in the second-stage of the study. Methods for analyzing two-stage designs include parametric maximum likelihood estimation and estimating equation methodology. We propose an inverse probability of censoring weighted targeted maximum likelihood estimator (IPCW-TMLE) in two-stage sampling designs and present simulation studies featuring this estimator.  相似文献   

16.
Linkage mapping of complex diseases is often followed by association studies between phenotypes and marker genotypes through use of case-control or family-based designs. Given fixed genotyping resources, it is important to know which study designs are the most efficient. To address this problem, we extended the likelihood-based method of Li et al., which assesses whether there is linkage disequilibrium between a disease locus and a SNP, to accommodate sibships of arbitrary size and disease-phenotype configuration. A key advantage of our method is the ability to combine data from different family structures. We consider scenarios for which genotypes are available for unrelated cases, affected sib pairs (ASPs), or only one sibling per ASP. We construct designs that use cases only and others that use unaffected siblings or unrelated unaffected individuals as controls. Different combinations of cases and controls result in seven study designs. We compare the efficiency of these designs when the number of individuals to be genotyped is fixed. Our results suggest that (1) when the disease is influenced by a single gene, the one sibling per ASP-control design is the most efficient, followed by the ASP-control design, and familial cases contribute more association information than singleton cases; (2) when the disease is influenced by multiple genes, familial cases provide more association information than singleton cases, unless the effect of the locus being tested is much smaller than at least one other untested disease locus; and (3) the case-control design can be useful for detecting genes with small effect in the presence of genes with much larger effect. Our findings will be helpful for researchers designing and analyzing complex disease-association studies and will facilitate genotyping resource allocation.  相似文献   

17.
18.
When a ground and vegetation cover factor related to soil erosion is mapped with the aid of remotely sensed data, a cost-efficient sample design to collect ground data and to obtain an accurate map is required. However, the supports used to collect ground data are often smaller than the desirable pixels used for mapping, which leads to complexity in developing procedures for sample design and mapping. For these purposes, a sampling and mapping method was developed by integrating stratification and an up-scaling method in geostatistics — block cokriging with Landsat Thematic Mapper imagery. This method is based on spatial correlation and stratified sampling. It scales up not only the ground sample data but also the uncertainties associated with the data aggregation from smaller supports to larger pixels or blocks. This method uses the advantages of both stratification and block cokriging variance-based sample design, which leads to sample designs with variable grid spacing, and thus significantly increases the unit cost-efficiency of sample data in sampling and mapping. This outcome was verified by the results of this study.  相似文献   

19.
The paper proposes an approach to causal mediation analysis in nested case-control study designs, often incorporated with countermatching schemes using conditional likelihood, and we compare the method's performance to that of mediation analysis using the Cox model for the full cohort with a continuous or dichotomous mediator. Simulation studies are conducted to assess our proposed method and investigate the efficiency relative to the cohort. We illustrate the method using actual data from two studies of potential mediation of radiation risk conducted within the Adult Health Study cohort of atomic-bomb survivors. The performance becomes comparable to that based on the full cohort, illustrating the potential for valid mediation analysis based on the reduced data obtained through the nested case-control design.  相似文献   

20.
Understanding the drivers of biodiversity is important for forecasting changes in the distribution of life on earth. However, most studies of biodiversity are limited by uneven sampling effort, with some regions or taxa better sampled than others. Numerous methods have been developed to account for differences in sampling effort, but most methods were developed for systematic surveys in which all study units are sampled using the same design and assemblages are sampled randomly. Databases compiled from multiple sources, such as from the literature, often violate these assumptions because they are composed of studies that vary widely in their goals and methods. Here, we compared the performance of several popular methods for estimating parasite diversity based on a large and widely used parasite database, the Global Mammal Parasite Database (GMPD). We created artificial datasets of host–parasite interactions based on the structure of the GMPD, then used these datasets to evaluate which methods best control for differential sampling effort. We evaluated the precision and bias of seven methods, including species accumulation and nonparametric diversity estimators, compared to analyzing the raw data without controlling for sampling variation. We find that nonparametric estimators, and particularly the Chao2 and second-order jackknife estimators, perform better than other methods. However, these estimators still perform poorly relative to systematic sampling, and effect sizes should be interpreted with caution because they tend to be lower than actual effect sizes. Overall, these estimators are more effective in comparative studies than for producing true estimates of diversity. We make recommendations for future sampling strategies and statistical methods that would improve estimates of global parasite diversity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号