首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Böhning D  Kuhnert R 《Biometrics》2006,62(4):1207-1215
This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.  相似文献   

2.
Estimation of a population size by means of capture‐recapture techniques is an important problem occurring in many areas of life and social sciences. We consider the frequencies of frequencies situation, where a count variable is used to summarize how often a unit has been identified in the target population of interest. The distribution of this count variable is zero‐truncated since zero identifications do not occur in the sample. As an application we consider the surveillance of scrapie in Great Britain. In this case study holdings with scrapie that are not identified (zero counts) do not enter the surveillance database. The count variable of interest is the number of scrapie cases per holding. For count distributions a common model is the Poisson distribution and, to adjust for potential heterogeneity, a discrete mixture of Poisson distributions is used. Mixtures of Poissons usually provide an excellent fit as will be demonstrated in the application of interest. However, as it has been recently demonstrated, mixtures also suffer under the so‐called boundary problem, resulting in overestimation of population size. It is suggested here to select the mixture model on the basis of the Bayesian Information Criterion. This strategy is further refined by employing a bagging procedure leading to a series of estimates of population size. Using the median of this series, highly influential size estimates are avoided. In limited simulation studies it is shown that the procedure leads to estimates with remarkable small bias. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

3.
We present the one‐inflated zero‐truncated negative binomial (OIZTNB) model, and propose its use as the truncated count distribution in Horvitz–Thompson estimation of an unknown population size. In the presence of unobserved heterogeneity, the zero‐truncated negative binomial (ZTNB) model is a natural choice over the positive Poisson (PP) model; however, when one‐inflation is present the ZTNB model either suffers from a boundary problem, or provides extremely biased population size estimates. Monte Carlo evidence suggests that in the presence of one‐inflation, the Horvitz–Thompson estimator under the ZTNB model can converge in probability to infinity. The OIZTNB model gives markedly different population size estimates compared to some existing truncated count distributions, when applied to several capture–recapture data that exhibit both one‐inflation and unobserved heterogeneity.  相似文献   

4.
Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency.  相似文献   

5.
Zero‐truncated data arises in various disciplines where counts are observed but the zero count category cannot be observed during sampling. Maximum likelihood estimation can be used to model these data; however, due to its nonstandard form it cannot be easily implemented using well‐known software packages, and additional programming is often required. Motivated by the Rao–Blackwell theorem, we develop a weighted partial likelihood approach to estimate model parameters for zero‐truncated binomial and Poisson data. The resulting estimating function is equivalent to a weighted score function for standard count data models, and allows for applying readily available software. We evaluate the efficiency for this new approach and show that it performs almost as well as maximum likelihood estimation. The weighted partial likelihood approach is then extended to regression modelling and variable selection. We examine the performance of the proposed methods through simulation and present two case studies using real data.  相似文献   

6.
We consider parametric distributions intended to model heterogeneity in population size estimation, especially parametric stochastic abundance models for species richness estimation. We briefly review (conditional) maximum likelihood estimation of the number of species, and summarize the results of fitting 7 candidate models to frequency‐count data, from a database of >40000 such instances, mostly arising from microbial ecology. We consider error estimation, goodness‐of‐fit assessment, data subsetting, and other practical matters. We find that, although the array of candidate models can be improved, finite mixtures of a small number of components (point masses or simple diffuse distributions) represent a promising direction. Finally we consider the connections between parametric models for abundance and incidence data, again noting the usefulness of finite mixture models. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

7.
Important aspects of population evolution have been investigated using nucleotide sequences. Under the neutral Wright–Fisher model, the scaled mutation rate represents twice the average number of new mutations per generations and it is one of the key parameters in population genetics. In this study, we present various methods of estimation of this parameter, analytical studies of their asymptotic behavior as well as comparisons of the distribution's behavior of these estimators through simulations. As knowledge of the genealogy is needed to estimate the maximum likelihood estimator (MLE), an application with real data is also presented, using jackknife to correct the bias of the MLE, which can be generated by the estimation of the tree. We proved analytically that the Waterson's estimator and the MLE are asymptotically equivalent with the same rate of convergence to normality. Furthermore, we showed that the MLE has a better rate of convergence than Waterson's estimator for values of the parameter greater than one and this relationship is reversed when the parameter is less than one.  相似文献   

8.
Yao YC  Tai JJ 《Biometrics》2000,56(3):795-800
Segregation ratio estimation has long been important in human genetics. A simple truncated binomial model is considered that assumes complete ascertainment and a deterministic genotype-phenotype relationship. A simple but intuitively appealing estimator of the segregation ratio, previously proposed, is shown to have a negative bias. It is also shown that the bias of this estimator can be largely reduced via a randomization device, resulting in a new estimator that has the same large-sample behavior but with a negligible bias (decaying at a geometric rate). Numerical results are given to show the small-sample performance of this new estimator. An extension to incomplete ascertainment is also considered.  相似文献   

9.
In follow‐up studies, the disease event time can be subject to left truncation and right censoring. Furthermore, medical advancements have made it possible for patients to be cured of certain types of diseases. In this article, we consider a semiparametric mixture cure model for the regression analysis of left‐truncated and right‐censored data. The model combines a logistic regression for the probability of event occurrence with the class of transformation models for the time of occurrence. We investigate two techniques for estimating model parameters. The first approach is based on martingale estimating equations (EEs). The second approach is based on the conditional likelihood function given truncation variables. The asymptotic properties of both proposed estimators are established. Simulation studies indicate that the conditional maximum‐likelihood estimator (cMLE) performs well while the estimator based on EEs is very unstable even though it is shown to be consistent. This is a special and intriguing phenomenon for the EE approach under cure model. We provide insights into this issue and find that the EE approach can be improved significantly by assigning appropriate weights to the censored observations in the EEs. This finding is useful in overcoming the instability of the EE approach in some more complicated situations, where the likelihood approach is not feasible. We illustrate the proposed estimation procedures by analyzing the age at onset of the occiput‐wall distance event for patients with ankylosing spondylitis.  相似文献   

10.
Taylor (1953) proposed a distance function in connection with the logit χ2 estimator. For product (associated) multinomial distributions, he showed that minimization of the distance function yields BAN estimators. Aithal (1986) and Rao (1989) considered a modified version of Taylor's distance function and showed that a member belonging to this class leads to a second order efficient estimator. In this paper we consider Taylor's distance function and show that a member belonging to this class produces a second order efficient estimator. In addition to the above two, the m.l. estimator is also second order efficient. In order to compare these three second order efficient estimators, the small sample variances of the estimators are estimated through a simulation study. The results indicate that the variance of the m.l. estimator is the smallest in most of the cases.  相似文献   

11.
Moming Li  Guoqing Diao  Jing Qin 《Biometrics》2020,76(4):1216-1228
We consider a two-sample problem where data come from symmetric distributions. Usual two-sample data with only magnitudes recorded, arising from case-control studies or logistic discriminant analyses, may constitute a symmetric two-sample problem. We propose a semiparametric model such that, in addition to symmetry, the log ratio of two unknown density functions is modeled in a known parametric form. The new semiparametric model, tailor-made for symmetric two-sample data, can also be viewed as a biased sampling model subject to symmetric constraint. A maximum empirical likelihood estimation approach is adopted to estimate the unknown model parameters, and the corresponding profile empirical likelihood ratio test is utilized to perform hypothesis testing regarding the two population distributions. Symmetry, however, comes with irregularity. It is shown that, under the null hypothesis of equal symmetric distributions, the maximum empirical likelihood estimator has degenerate Fisher information, and the test statistic has a mixture of χ2-type asymptotic distribution. Extensive simulation studies have been conducted to demonstrate promising statistical powers under correct and misspecified models. We apply the proposed methods to two real examples.  相似文献   

12.
This paper proposes a modified randomization device for collecting information on sensitive issues. The estimator based on the suggested strategy is found to be unbiased for population proportion and is better than the Greenberg et. al.'s (1969) estimator. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

13.
Summary Nested case–control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor.  相似文献   

14.
Matrix population models are a standard tool for studying stage‐structured populations, but they are not flexible in describing stage duration distributions. This study describes a method for modeling various such distributions in matrix models. The method uses a mixture of two negative binomial distributions (parametrized using a maximum likelihood method) to approximate a target (true) distribution. To examine the performance of the method, populations consisting of two life stages (juvenile and adult) were considered. The juvenile duration distribution followed a gamma distribution, lognormal distribution, or zero‐truncated (over‐dispersed) Poisson distribution, each of which represents a target distribution to be approximated by a mixture distribution. The true population growth rate based on a target distribution was obtained using an individual‐based model, and the extent to which matrix models can approximate the target dynamics was examined. The results show that the method generally works well for the examined target distributions, but is prone to biased predictions under some conditions. In addition, the method works uniformly better than an existing method whose performance was also examined for comparison. Other details regarding parameter estimation and model development are also discussed.  相似文献   

15.
A nonparametric estimator of a joint distribution function F0 of a d‐dimensional random vector with interval‐censored (IC) data is the generalized maximum likelihood estimator (GMLE), where d ≥ 2. The GMLE of F0 with univariate IC data is uniquely defined at each follow‐up time. However, this is no longer true in general with multivariate IC data as demonstrated by a data set from an eye study. How to estimate the survival function and the covariance matrix of the estimator in such a case is a new practical issue in analyzing IC data. We propose a procedure in such a situation and apply it to the data set from the eye study. Our method always results in a GMLE with a nonsingular sample information matrix. We also give a theoretical justification for such a procedure. Extension of our procedure to Cox's regression model is also mentioned.  相似文献   

16.
The estimation of the unknown parameters in the stratified Cox's proportional hazard model is a typical example of the trade‐off between bias and precision. The stratified partial likelihood estimator is unbiased when the number of strata is large but suffer from being unstable when many strata are non‐informative about the unknown parameters. The estimator obtained by ignoring the heterogeneity among strata, on the other hand, increases the precision of estimates although pays the price for being biased. An estimating procedure, based on the asymptotic properties of the above two estimators, serving to compromise between bias and precision is proposed. Two examples in a radiosurgery for brain metastases study provide some interesting demonstration of such applications.  相似文献   

17.
Person‐time incidence rates are frequently used in medical research. However, standard estimation theory for this measure of event occurrence is based on the assumption of independent and identically distributed (iid) exponential event times, which implies that the hazard function remains constant over time. Under this assumption and assuming independent censoring, observed person‐time incidence rate is the maximum‐likelihood estimator of the constant hazard, and asymptotic variance of the log rate can be estimated consistently by the inverse of the number of events. However, in many practical applications, the assumption of constant hazard is not very plausible. In the present paper, an average rate parameter is defined as the ratio of expected event count to the expected total time at risk. This rate parameter is equal to the hazard function under constant hazard. For inference about the average rate parameter, an asymptotically robust variance estimator of the log rate is proposed. Given some very general conditions, the robust variance estimator is consistent under arbitrary iid event times, and is also consistent or asymptotically conservative when event times are independent but nonidentically distributed. In contrast, the standard maximum‐likelihood estimator may become anticonservative under nonconstant hazard, producing confidence intervals with less‐than‐nominal asymptotic coverage. These results are derived analytically and illustrated with simulations. The two estimators are also compared in five datasets from oncology studies.  相似文献   

18.
The classical normal-theory tests for testing the null hypothesis of common variance and the classical estimates of scale have long been known to be quite nonrobust to even mild deviations from normality assumptions for moderate sample sizes. Levene (1960) suggested a one-way ANOVA type statistic as a robust test. Brown and Forsythe (1974) considered a modified version of Levene's test by replacing the sample means with sample medians as estimates of population locations, and their test is computationally the simplest among the three tests recommended by Conover , Johnson , and Johnson (1981) in terms of robustness and power. In this paper a new robust and powerful test for homogeneity of variances is proposed based on a modification of Levene's test using the weighted likelihood estimates (Markatou , Basu , and Lindsay , 1996) of the population means. For two and three populations the proposed test using the Hellinger distance based weighted likelihood estimates is observed to achieve better empirical level and power than Brown-Forsythe's test in symmetric distributions having a thicker tail than the normal, and higher empirical power in skew distributions under the use of F distribution critical values.  相似文献   

19.
The problem of estimation of ratio of population proportions is considered and a difference-type estimator is proposed using auxiliary information. The bias and mean squared error of the proposed estimator is found and compared to the usual estimator and also to WYNN'S (1976) type estimator. An example is included for illustration.  相似文献   

20.
Molecular marker data provide a means of circumventing the problem of not knowing the population structure of a natural population, as observed similarities between a pair's genotypes provide information on their genetic relationship. Numerous method-of-moment (MOM) estimators have been developed for estimating relationship coefficients using this information. Here, I present a simplified form of Wang's 2002 relationship estimator that is not dependent upon a previously required weighting scheme, thus improving the efficiency of the estimator when used with genuinely related pairs. The new estimator is compared against other estimators under a range of conditions, including situations where the parameter estimates are truncated to lie within the legitimate parameter space. The advantages of the new estimator are most notable for the two-gene coefficient of relatedness. Truncating the MOM estimators results in parameter estimates whose properties are similar to maximum likelihood estimates, with them having generally lower sampling variances, but being biased.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号