首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In observational cohort studies with complex sampling schemes, truncation arises when the time to event of interest is observed only when it falls below or exceeds another random time, that is, the truncation time. In more complex settings, observation may require a particular ordering of event times; we refer to this as sequential truncation. Estimators of the event time distribution have been developed for simple left-truncated or right-truncated data. However, these estimators may be inconsistent under sequential truncation. We propose nonparametric and semiparametric maximum likelihood estimators for the distribution of the event time of interest in the presence of sequential truncation, under two truncation models. We show the equivalence of an inverse probability weighted estimator and a product limit estimator under one of these models. We study the large sample properties of the proposed estimators and derive their asymptotic variance estimators. We evaluate the proposed methods through simulation studies and apply the methods to an Alzheimer's disease study. We have developed an R package, seqTrun , for implementation of our method.  相似文献   

2.
The asymptotic variance and distribution of Spearman’s rank correlation have previously been known only under independence. For variables with finite support, the population version of Spearman’s rank correlation has been derived. Using this result, we show convergence to a normal distribution irrespectively of dependence, and derive the asymptotic variance. A small simulation study indicates that the asymptotic properties are of practical importance.  相似文献   

3.
Heritability is a population parameter of importance in evolution, plant and animal breeding, and human medical genetics. It can be estimated using pedigree designs and, more recently, using relationships estimated from markers. We derive the sampling variance of the estimate of heritability for a wide range of experimental designs, assuming that estimation is by maximum likelihood and that the resemblance between relatives is solely due to additive genetic variation. We show that well-known results for balanced designs are special cases of a more general unified framework. For pedigree designs, the sampling variance is inversely proportional to the variance of relationship in the pedigree and it is proportional to 1/N, whereas for population samples it is approximately proportional to 1/N2, where N is the sample size. Variation in relatedness is a key parameter in the quantification of the sampling variance of heritability. Consequently, the sampling variance is high for populations with large recent effective population size (e.g., humans) because this causes low variation in relationship. However, even using human population samples, low sampling variance is possible with high N.  相似文献   

4.
The outcome-dependent sampling (ODS) design, which allows observation of exposure variable to depend on the outcome, has been shown to be cost efficient. In this article, we propose a new statistical inference method, an estimated penalized likelihood method, for a partial linear model in the setting of a 2-stage ODS with a continuous outcome. We develop the asymptotic properties and conduct simulation studies to demonstrate the performance of the proposed estimator. A real environmental study data set is used to illustrate the proposed method.  相似文献   

5.
Matrix models are often used to predict the dynamics of size-structured or age-structured populations. The asymptotic behaviour of such models is defined by their malthusian growth rate lambda, and by their stationary distribution w that gives the asymptotic proportion of individuals in each stage. As the coefficients of the transition matrix are estimated from a sample of observations, lambda and w can be considered as random variables whose law depends on the distribution of the observations. The goal of this study is to specify the asymptotic law of lambda and w when using the maximum likelihood estimators of the coefficients of the transition matrix. We prove that lambda and w are asymptotically normal, and the expressions of the asymptotic variance of lambda and of the asymptotic covariance matrix of w are given. The convergence speed of lambda and w towards their asymptotic law is studied using simulations. The results are applied to a real case study that consists of a Usher model for a tropical rain forest in French Guiana. They permit to assess the number of trees to measure to get a given precision on the estimated asymptotic diameter distribution, which is an important information on tropical forest management.  相似文献   

6.
Dai JY  LeBlanc M  Kooperberg C 《Biometrics》2009,65(1):178-187
Summary .  Recent results for case–control sampling suggest when the covariate distribution is constrained by gene-environment independence, semiparametric estimation exploiting such independence yields a great deal of efficiency gain. We consider the efficient estimation of the treatment–biomarker interaction in two-phase sampling nested within randomized clinical trials, incorporating the independence between a randomized treatment and the baseline markers. We develop a Newton–Raphson algorithm based on the profile likelihood to compute the semiparametric maximum likelihood estimate (SPMLE). Our algorithm accommodates both continuous phase-one outcomes and continuous phase-two biomarkers. The profile information matrix is computed explicitly via numerical differentiation. In certain situations where computing the SPMLE is slow, we propose a maximum estimated likelihood estimator (MELE), which is also capable of incorporating the covariate independence. This estimated likelihood approach uses a one-step empirical covariate distribution, thus is straightforward to maximize. It offers a closed-form variance estimate with limited increase in variance relative to the fully efficient SPMLE. Our results suggest exploiting the covariate independence in two-phase sampling increases the efficiency substantially, particularly for estimating treatment–biomarker interactions.  相似文献   

7.
It is widely believed that risks of many complex diseases are determined by genetic susceptibilities, environmental exposures, and their interaction. Chatterjee and Carroll (2005, Biometrika 92, 399-418) developed an efficient retrospective maximum-likelihood method for analysis of case-control studies that exploits an assumption of gene-environment independence and leaves the distribution of the environmental covariates to be completely nonparametric. Spinka, Carroll, and Chatterjee (2005, Genetic Epidemiology 29, 108-127) extended this approach to studies where certain types of genetic information, such as haplotype phases, may be missing on some subjects. We further extend this approach to situations when some of the environmental exposures are measured with error. Using a polychotomous logistic regression model, we allow disease status to have K+ 1 levels. We propose use of a pseudolikelihood and a related EM algorithm for parameter estimation. We prove consistency and derive the resulting asymptotic covariance matrix of parameter estimates when the variance of the measurement error is known and when it is estimated using replications. Inferences with measurement error corrections are complicated by the fact that the Wald test often behaves poorly in the presence of large amounts of measurement error. The likelihood-ratio (LR) techniques are known to be a good alternative. However, the LR tests are not technically correct in this setting because the likelihood function is based on an incorrect model, i.e., a prospective model in a retrospective sampling scheme. We corrected standard asymptotic results to account for the fact that the LR test is based on a likelihood-type function. The performance of the proposed method is illustrated using simulation studies emphasizing the case when genetic information is in the form of haplotypes and missing data arises from haplotype-phase ambiguity. An application of our method is illustrated using a population-based case-control study of the association between calcium intake and the risk of colorectal adenoma.  相似文献   

8.
Summary Nested case–control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor.  相似文献   

9.
Summary A time‐specific log‐linear regression method on quantile residual lifetime is proposed. Under the proposed regression model, any quantile of a time‐to‐event distribution among survivors beyond a certain time point is associated with selected covariates under right censoring. Consistency and asymptotic normality of the regression estimator are established. An asymptotic test statistic is proposed to evaluate the covariate effects on the quantile residual lifetimes at a specific time point. Evaluation of the test statistic does not require estimation of the variance–covariance matrix of the regression estimators, which involves the probability density function of the survival distribution with censoring. Simulation studies are performed to assess finite sample properties of the regression parameter estimator and test statistic. The new regression method is applied to a breast cancer data set with long‐term follow‐up to estimate the patients' median residual lifetimes, adjusting for important prognostic factors.  相似文献   

10.
We discuss a method for simultaneously estimating the fixed parameters of a generalized linear mixed-effects model and the random-effects distribution of which no parametric assumption is made. In addition, classifying subjects into clusters according to the random regression coefficients is a natural by-product of the proposed method. An alternative approach to maximum-likelihood method, maximum-penalized-likelihood method, is used to avoid estimating “too many” clusters. Consistency and asymptotic normality properties of the estimators are presented. We also provide robust variance estimators of the fixed parameters estimators which remain consistent even in presence of misspecification. The methodology is illustrated by an application to a weight loss study.  相似文献   

11.
Liang Y  Lu W  Ying Z 《Biometrics》2009,65(2):377-384
Summary .  In analysis of longitudinal data, it is often assumed that observation times are predetermined and are the same across study subjects. Such an assumption, however, is often violated in practice. As a result, the observation times may be highly irregular. It is well known that if the sampling scheme is correlated with the outcome values, the usual statistical analysis may yield bias. In this article, we propose joint modeling and analysis of longitudinal data with possibly informative observation times via latent variables. A two-step estimation procedure is developed for parameter estimation. We show that the resulting estimators are consistent and asymptotically normal, and that the asymptotic variance can be consistently estimated using the bootstrap method. Simulation studies and a real data analysis demonstrate that our method performs well with realistic sample sizes and is appropriate for practical use.  相似文献   

12.
The case-cohort study involves two-phase samplings: simple random sampling from an infinite superpopulation at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model-based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design-based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators.  相似文献   

13.
Variation partitioning is one of the most frequently used method to infer the importance of environmental (niche based) and spatial (dispersal) processes in metacommunity structuring. However, the reliability of the method in predicting the role of the major structuring forces is less known. We studied the effect of field sampling design on the result of variation partitioning of fish assemblages in a stream network. Along with four different sample sizes, a simple random sampling from a total of 115 stream segments (sampling objects) was applied in 400 iterations, and community variation of each random sample was partitioned into four fractions: pure environmentally (landscape variables) explained, pure spatially (MEM eigenvectors) explained, jointly explained by environment and space, and unexplained variance. Results were highly sensitive to sample size. Even at a given sample size, estimated variance fractions had remarkable random fluctuation, which can lead to inconsistent results on the relative importance of environmental and spatial variables on the structuring of metacommunities. Interestingly, all the four variance fractions correlated better with the number of the selected spatial variables than with any design properties. Sampling interval proved to be a fundamentally influential sampling design property because it affected the number of the selected spatial variables. Our findings suggest that the effect of sampling design on variation partitioning is related to the ability of the eigenvectors to model complex spatial patterns. Hence, properties of the sampling design should be more intensively considered in metacommunity studies.  相似文献   

14.
We present a new method of quantitative-trait linkage analysis that combines the simplicity and robustness of regression-based methods and the generality and greater power of variance-components models. The new method is based on a regression of estimated identity-by-descent (IBD) sharing between relative pairs on the squared sums and squared differences of trait values of the relative pairs. The method is applicable to pedigrees of arbitrary structure and to pedigrees selected on the basis of trait value, provided that population parameters of the trait distribution can be correctly specified. Ambiguous IBD sharing (due to incomplete marker information) can be accommodated in the method by appropriate specification of the variance-covariance matrix of IBD sharing between relative pairs. We have implemented this regression-based method and have performed simulation studies to assess, under a range of conditions, estimation accuracy, type I error rate, and power. For normally distributed traits and in large samples, the method is found to give the correct type I error rate and an unbiased estimate of the proportion of trait variance accounted for by the additive effects of the locus-although, in cases where asymptotic theory is doubtful, significance levels should be checked by simulations. In large sibships, the new method is slightly more powerful than variance-components models. The proposed method provides a practical and powerful tool for the linkage analysis of quantitative traits.  相似文献   

15.
C R Weinberg 《Biometrics》1985,41(1):117-127
In a study designed to assess the relationship between a dichotomous exposure and the eventual occurrence of a dichotomous outcome, frequency matching has been proposed as a way to balance the exposure cohorts with respect to the sampling distribution of potential confounding factors. This paper discusses the pooled estimator for the log relative risk, and provides an estimator for its variance which takes into account the dependency in the pooled outcomes induced by frequency matching. The pooled estimator has asymptotic relative efficiency less than but close to 1, relative to the usual, inverse variance weighted, stratified estimator. Simulations suggest, however, that the pooled estimator is likely to outperform the stratified estimator when samples are of moderate size. This estimator carries the added advantage that it consistently estimates a meaningful population parameter under heterogeneity of the relative risk across strata.  相似文献   

16.
On the asymptotics of penalized splines   总被引:1,自引:0,他引:1  
Li  Yingxing; Ruppert  David 《Biometrika》2008,95(2):415-436
We study the asymptotic behaviour of penalized spline estimatorsin the univariate case. We use B-splines and a penalty is placedon mth-order differences of the coefficients. The number ofknots is assumed to converge to infinity as the sample sizeincreases. We show that penalized splines behave similarly toNadaraya--Watson kernel estimators with ‘equivalent’kernels depending upon m. The equivalent kernels we obtain forpenalized splines are the same as those found by Silverman forsmoothing splines. The asymptotic distribution of the penalizedspline estimator is Gaussian and we give simple expressionsfor the asymptotic mean and variance. Provided that it is fastenough, the rate at which the number of knots converges to infinitydoes not affect the asymptotic distribution. The optimal rateof convergence of the penalty parameter is given. Penalizedsplines are not design-adaptive.  相似文献   

17.
We describe the characteristics of a sampling procedure called random median sampling that was proposed to enhance the precision of population estimates. In performing random median sampling, we first select a sampling item at random from the sampling area. We roughly compare the abundance of individuals in the selected item with that of the adjacent two items in order to identify the item that has median abundance, i.e., the item that has the second largest abundance among the three items. We count the number of individuals of the item having the median abundance. This procedure is repeated n times in the sampling area (i = 1, 2, ..., n). Let m i be the ith median abundance. The estimates of the mean abundance per sampling item and the variance of estimates are given by Σm i /n and Σ(m i –Σm i /n)2/n(n – 1), respectively. This method is a local application of the median ranked set sampling that was proposed by Muttlak (J Appl Stat Sci 6:245–255, 1997). Random median sampling is effective when the correlation coefficient between adjacent items is small. If the correlation coefficient is close to zero, random median sampling reduces the variance of estimates to 45 or 32% of that in simple random sampling when the distribution follows a normal distribution or a Laplace distribution, respectively. The sample size required to achieve a given precision of estimate decreases accordingly. The effectiveness of random median sampling, however, is small if the correlation coefficient is large. The condition in which random median sampling is superior to simple random sampling is also discussed.  相似文献   

18.
A diagnostic cut‐off point of a biomarker measurement is needed for classifying a random subject to be either diseased or healthy. However, the cut‐off point is usually unknown and needs to be estimated by some optimization criteria. One important criterion is the Youden index, which has been widely adopted in practice. The Youden index, which is defined as the maximum of (sensitivity + specificity ?1), directly measures the largest total diagnostic accuracy a biomarker can achieve. Therefore, it is desirable to estimate the optimal cut‐off point associated with the Youden index. Sometimes, taking the actual measurements of a biomarker is very difficult and expensive, while ranking them without the actual measurement can be relatively easy. In such cases, ranked set sampling can give more precise estimation than simple random sampling, as ranked set samples are more likely to span the full range of the population. In this study, kernel density estimation is utilized to numerically solve for an estimate of the optimal cut‐off point. The asymptotic distributions of the kernel estimators based on two sampling schemes are derived analytically and we prove that the estimators based on ranked set sampling are relatively more efficient than that of simple random sampling and both estimators are asymptotically unbiased. Furthermore, the asymptotic confidence intervals are derived. Intensive simulations are carried out to compare the proposed method using ranked set sampling with simple random sampling, with the proposed method outperforming simple random sampling in all cases. A real data set is analyzed for illustrating the proposed method.  相似文献   

19.
S. Mandal  J. Qin  R.M. Pfeiffer 《Biometrics》2023,79(3):1701-1712
We propose and study a simple and innovative non-parametric approach to estimate the age-of-onset distribution for a disease from a cross-sectional sample of the population that includes individuals with prevalent disease. First, we estimate the joint distribution of two event times, the age of disease onset and the survival time after disease onset. We accommodate that individuals had to be alive at the time of the study by conditioning on their survival until the age at sampling. We propose a computationally efficient expectation–maximization (EM) algorithm and derive the asymptotic properties of the resulting estimates. From these joint probabilities we then obtain non-parametric estimates of the age-at-onset distribution by marginalizing over the survival time after disease onset to death. The method accommodates categorical covariates and can be used to obtain unbiased estimates of the covariate distribution in the source population. We show in simulations that our method performs well in finite samples even under large amounts of truncation for prevalent cases. We apply the proposed method to data from female participants in the Washington Ashkenazi Study to estimate the age-at-onset distribution of breast cancer associated with carrying BRCA1 or BRCA2 mutations.  相似文献   

20.
We investigate the patterns of abundance‐spatial occupancy relationships of adult parasite nematodes in mammal host populations (828 populations of nematodes from 66 different species of terrestrial mammals). A positive relationship between mean parasite abundance and host occupancy, i.e. prevalence, is found which suggests that local abundance is linked to spatial distribution across species. Moreover, the frequency distribution of the parasite prevalence is bimodal, which is consistent with a core‐satellite species distribution. In addition, a strong positive relationship between the abundance (log‐transformed) and its variance (log‐transformed) is observed, the distribution of worm abundance being lognormally distributed when abundance values have been corrected for host body size.
Hanski et al. proposed three distinct hypotheses, which might account for the positive relationship between abundance and prevalence in free and associated organisms: 1) ecological specialisation, 2) sampling artefact, and 3) metapopulation dynamics. In addition, Gaston and co‐workers listed five additional hypotheses. Four solutions were not applicable to our parasitological data due to the lack of relevant information in most host‐parasite studies. The fifth hypothesis, i.e. the confounded effects exerted by common history on observed patterns of parasite distributions, was considered using a phylogeny‐based comparison method. Testing the four possible hypotheses, we obtained the following results: 1) the variation of parasite distribution across host species is not due to phylogenetic confounding effects; 2) the positive relationship between mean abundance and prevalence of nematodes may not result from an ecological specialisation, i.e. host specificity, of these parasites; 3) both a positive abundance‐prevalence relationship and a negative coefficient of variation of abundance‐prevalence relationship are likely to occur which corroborates the sampling model developed by Hanski et al. We argue that demographic explanations may be of particular importance to explain the patterns of bimodality of prevalence when testing Monte‐Carlo simulations using epidemiological modelling frameworks, and when considering empirical findings. We conclude that both the bimodal distribution of parasite prevalence and the mean‐variance power function simply result from demographic and stochastic patterns (highlighted by the sampling model), which present compelling evidence that nematode parasite species might adjust their spatial distribution and burden in mammal hosts for simple epidemiological reasons.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号