首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
In case-control studies with matched pairs, the traditional point estimator of odds ratio (OR) is well-known to be biased with no exact finite variance under binomial sampling. In this paper, we consider use of inverse sampling in which we continue to sample subjects to form matched pairs until we obtain a pre-determined number (>0) of index pairs with the case unexposed but the control exposed. In contrast to use of binomial sampling, we show that the uniformly minimum variance unbiased estimator (UMVUE) of OR does exist under inverse sampling. We further derive an exact confidence interval of OR in closed form. Finally, we develop an exact test and an asymptotic test for testing the null hypothesis H0: OR = 1, as well as discuss sample size determination on the minimum required number of index pairs for a desired power at α-level.  相似文献   

2.
Combining multivariate bioassays   总被引:1,自引:0,他引:1  
Linear multivariate theory is applied to the problem of combining several multivariate bioassays. Results are an asymptotic test of the hypothesis of a common log relative potency; the maximum likelihood estimator of the common log relative potency; and an exact and asymptotic confidence interval estimator for log relative potency.  相似文献   

3.
Readily available proxies for the time of disease onset such as the time of the first diagnostic code can lead to substantial risk prediction error if performing analyses based on poor proxies. Due to the lack of detailed documentation and labor intensiveness of manual annotation, it is often only feasible to ascertain for a small subset the current status of the disease by a follow-up time rather than the exact time. In this paper, we aim to develop risk prediction models for the onset time efficiently leveraging both a small number of labels on the current status and a large number of unlabeled observations on imperfect proxies. Under a semiparametric transformation model for onset and a highly flexible measurement error model for proxy onset time, we propose the semisupervised risk prediction method by combining information from proxies and limited labels efficiently. From an initially estimator solely based on the labeled subset, we perform a one-step correction with the full data augmenting against a mean zero rank correlation score derived from the proxies. We establish the consistency and asymptotic normality of the proposed semisupervised estimator and provide a resampling procedure for interval estimation. Simulation studies demonstrate that the proposed estimator performs well in a finite sample. We illustrate the proposed estimator by developing a genetic risk prediction model for obesity using data from Mass General Brigham Healthcare Biobank.  相似文献   

4.
Yin G  Cai J 《Biometrics》2005,61(1):151-161
As an alternative to the mean regression model, the quantile regression model has been studied extensively with independent failure time data. However, due to natural or artificial clustering, it is common to encounter multivariate failure time data in biomedical research where the intracluster correlation needs to be accounted for appropriately. For right-censored correlated survival data, we investigate the quantile regression model and adapt an estimating equation approach for parameter estimation under the working independence assumption, as well as a weighted version for enhancing the efficiency. We show that the parameter estimates are consistent and asymptotically follow normal distributions. The variance estimation using asymptotic approximation involves nonparametric functional density estimation. We employ the bootstrap and perturbation resampling methods for the estimation of the variance-covariance matrix. We examine the proposed method for finite sample sizes through simulation studies, and illustrate it with data from a clinical trial on otitis media.  相似文献   

5.
We consider the problem of jointly modeling survival time and longitudinal data subject to measurement error. The survival times are modeled through the proportional hazards model and a random effects model is assumed for the longitudinal covariate process. Under this framework, we propose an approximate nonparametric corrected-score estimator for the parameter, which describes the association between the time-to-event and the longitudinal covariate. The term nonparametric refers to the fact that assumptions regarding the distribution of the random effects and that of the measurement error are unnecessary. The finite sample size performance of the approximate nonparametric corrected-score estimator is examined through simulation studies and its asymptotic properties are also developed. Furthermore, the proposed estimator and some existing estimators are applied to real data from an AIDS clinical trial.  相似文献   

6.
Typically, in many studies in ecology, epidemiology, biomedicine and others, we are confronted with panels of short time-series of which we are interested in obtaining a biologically meaningful grouping. Here, we propose a bootstrap approach to test whether the regression functions or the variances of the error terms in a family of stochastic regression models are the same. Our general setting includes panels of time-series models as a special case. We rigorously justify the use of the test by investigating its asymptotic properties, both theoretically and through simulations. The latter confirm that for finite sample size, bootstrap provides a better approximation than classical asymptotic theory. We then apply the proposed tests to the mink-muskrat data across 81 trapping regions in Canada. Ecologically interpretable groupings are obtained, which serve as a necessary first step before a fuller biological and statistical analysis of the food chain interaction.  相似文献   

7.
MOTIVATION: Ranking feature sets is a key issue for classification, for instance, phenotype classification based on gene expression. Since ranking is often based on error estimation, and error estimators suffer to differing degrees of imprecision in small-sample settings, it is important to choose a computationally feasible error estimator that yields good feature-set ranking. RESULTS: This paper examines the feature-ranking performance of several kinds of error estimators: resubstitution, cross-validation, bootstrap and bolstered error estimation. It does so for three classification rules: linear discriminant analysis, three-nearest-neighbor classification and classification trees. Two measures of performance are considered. One counts the number of the truly best feature sets appearing among the best feature sets discovered by the error estimator and the other computes the mean absolute error between the top ranks of the truly best feature sets and their ranks as given by the error estimator. Our results indicate that bolstering is superior to bootstrap, and bootstrap is better than cross-validation, for discovering top-performing feature sets for classification when using small samples. A key issue is that bolstered error estimation is tens of times faster than bootstrap, and faster than cross-validation, and is therefore feasible for feature-set ranking when the number of feature sets is extremely large.  相似文献   

8.
Lu SE  Wang MC 《Biometrics》2002,58(4):764-772
Cohort case-control design is an efficient and economical design to study risk factors for disease incidence or mortality in a large cohort. In the last few decades, a variety of cohort case-control designs have been developed and theoretically justified. These designs have been exclusively applied to the analysis of univariate failure-time data. In this work, a cohort case-control design adapted to multivariate failure-time data is developed. A risk set sampling method is proposed to sample controls from nonfailures in a large cohort for each case matched by failure time. This method leads to a pseudolikelihood approach for the estimation of regression parameters in the marginal proportional hazards model (Cox, 1972, Journal of the Royal Statistical Society, Series B 34, 187-220), where the correlation structure between individuals within a cluster is left unspecified. The performance of the proposed estimator is demonstrated by simulation studies. A bootstrap method is proposed for inferential purposes. This methodology is illustrated by a data example from a child vitamin A supplementation trial in Nepal (Nepal Nutrition Intervention Project-Sarlahi, or NNIPS).  相似文献   

9.
We consider the efficient estimation of a regression parameter in a partially linear additive nonparametric regression model from repeated measures data when the covariates are multivariate. To date, while there is some literature in the scalar covariate case, the problem has not been addressed in the multivariate additive model case. Ours represents a first contribution in this direction. As part of this work, we first describe the behavior of nonparametric estimators for additive models with repeated measures when the underlying model is not additive. These results are critical when one considers variants of the basic additive model. We apply them to the partially linear additive repeated-measures model, deriving an explicit consistent estimator of the parametric component; if the errors are in addition Gaussian, the estimator is semiparametric efficient. We also apply our basic methods to a unique testing problem that arises in genetic epidemiology; in combination with a projection argument we develop an efficient and easily computed testing scheme. Simulations and an empirical example from nutritional epidemiology illustrate our methods.  相似文献   

10.
Assessment of the misclassification error rate is of high practical relevance in many biomedical applications. As it is a complex problem, theoretical results on estimator performance are few. The origin of most findings are Monte Carlo simulations, which take place in the “normal setting”: The covariables of two groups have a multivariate normal distribution; The groups differ in location, but have the same covariance matrix and the linear discriminant function LDF is used for prediction. We perform a new simulation to compare existing nonparametric estimators in a more complex situation. The underlying distribution is based on a logistic model with six binary as well as continuous covariables. To study estimator performance for varying true error rates, three prediction rules including nonparametric classification trees and parametric logistic regression and sample sizes ranging from 100‐1,000 are considered. In contrast to most published papers we turn our attention to estimator performance based on simple, even inappropriate prediction rules and relatively large training sets. For the major part, results are in agreement with usual findings. The most strikingly behavior was seen in applying (simple) classification trees for prediction: Since the apparent error rate Êrr.app is biased, linear combinations incorporating Êrr.app underestimate the true error rate even for large sample sizes. The .632+ estimator, which was designed to correct for the overoptimism of Efron's .632 estimator for nonparametric prediction rules, performs best of all such linear combinations. The bootstrap estimator Êrr.B0 and the crossvalidation estimator Êrr.cv, which do not depend on Êrr.app, seem to track the true error rate. Although the disadvantages of both estimators – pessimism of Êrr.B0 and high variability of Êrr.cv – shrink with increased sample sizes, they are still visible. We conclude that for the choice of a particular estimator the asymptotic behavior of the apparent error rate is important. For the assessment of estimator performance the variance of the true error rate is crucial, where in general the stability of prediction procedures is essential for the application of estimators based on resampling methods. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

11.
Hidden Markov models were successfully applied in various fields of time series analysis, especially for analyzing ion channel recordings. The maximum likelihood estimator (MLE) has recently been proven to be asymptotically normally distributed. Here, we investigate finite sample properties of the MLE and of different types of likelihood ratio tests (LRTs) by means of simulation studies. The MLE is shown to reach the asymptotic behavior within sample sizes that are common for various applications. Thus, reliable estimates and confidence intervals can be obtained. We give an approximative scaling function for the estimation error for finite samples, and investigate the power of different LRTs suitable for applications to ion channels, including tests for superimposed hidden Markov processes. Our results are applied to physiological sodium channel data.  相似文献   

12.
It is not uncommon that we may encounter a randomized clinical trial (RCT) in which there are confounders which are needed to control and patients who do not comply with their assigned treatments. In this paper, we concentrate our attention on interval estimation of the proportion ratio (PR) of probabilities of response between two treatments in a stratified noncompliance RCT. We have developed and considered five asymptotic interval estimators for the PR, including the interval estimator using the weighted-least squares (WLS) estimator, the interval estimator using the Mantel-Haenszel type of weight, the interval estimator derived from Fieller's Theorem with the corresponding WLS optimal weight, the interval estimator derived from Fieller's Theorem with the randomization-based optimal weight, and the interval estimator based on a stratified two-sample proportion test with the optimal weight suggested elsewhere. To evaluate and compare the finite sample performance of these estimators, we apply Monte Carlo simulation to calculate the coverage probability and average length in a variety of situations. We discuss the limitation and usefulness for each of these interval estimators, as well as include a general guideline about which estimators may be used for given various situations.  相似文献   

13.
Zhou XH  Tu W 《Biometrics》2000,56(4):1118-1125
In this paper, we consider the problem of interval estimation for the mean of diagnostic test charges. Diagnostic test charge data may contain zero values, and the nonzero values can often be modeled by a log-normal distribution. Under such a model, we propose three different interval estimation procedures: a percentile-t bootstrap interval based on sufficient statistics and two likelihood-based confidence intervals. For theoretical properties, we show that the two likelihood-based one-sided confidence intervals are only first-order accurate and that the bootstrap-based one-sided confidence interval is second-order accurate. For two-sided confidence intervals, all three proposed methods are second-order accurate. A simulation study in finite-sample sizes suggests all three proposed intervals outperform a widely used minimum variance unbiased estimator (MVUE)-based interval except for the case of one-sided lower end-point intervals when the skewness is very small. Among the proposed one-sided intervals, the bootstrap interval has the best coverage accuracy. For the two-sided intervals, when the sample size is small, the bootstrap method still yields the best coverage accuracy unless the skewness is very small, in which case the bias-corrected ML method has the best accuracy. When the sample size is large, all three proposed intervals have similar coverage accuracy. Finally, we analyze with the proposed methods one real example assessing diagnostic test charges among older adults with depression.  相似文献   

14.
Best linear unbiased allele-frequency estimation in complex pedigrees   总被引:4,自引:0,他引:4  
McPeek MS  Wu X  Ober C 《Biometrics》2004,60(2):359-367
Many types of genetic analyses depend on estimates of allele frequencies. We consider the problem of allele-frequency estimation based on data from related individuals. The motivation for this work is data collected on the Hutterites, an isolated founder population, so we focus particularly on the case in which the relationships among the sampled individuals are specified by a large, complex pedigree for which maximum likelihood estimation is impractical. For this case, we propose to use the best linear unbiased estimator (BLUE) of allele frequency. We derive this estimator, which is equivalent to the quasi-likelihood estimator for this problem, and we describe an efficient algorithm for computing the estimate and its variance. We show that our estimator has certain desirable small-sample properties in common with the maximum likelihood estimator (MLE) for this problem. We treat both the case when parental origin of each allele is known and when it is unknown. The results are extended to prediction of allele frequency in some set of individuals S based on genotype data collected on a set of individuals R. We compare the mean-squared error of the BLUE, the commonly used naive estimator (sample frequency) and the MLE when the latter is feasible to calculate. The results indicate that although the MLE performs the best of the three, the BLUE is close in performance to the MLE and is substantially easier to calculate, making it particularly useful for large complex pedigrees in which MLE calculation is impractical or infeasible. We apply our method to allele-frequency estimation in a Hutterite data set.  相似文献   

15.
Ratio estimation with measurement error in the auxiliary variate   总被引:1,自引:0,他引:1  
Gregoire TG  Salas C 《Biometrics》2009,65(2):590-598
Summary .  With auxiliary information that is well correlated with the primary variable of interest, ratio estimation of the finite population total may be much more efficient than alternative estimators that do not make use of the auxiliary variate. The well-known properties of ratio estimators are perturbed when the auxiliary variate is measured with error. In this contribution we examine the effect of measurement error in the auxiliary variate on the design-based statistical properties of three common ratio estimators. We examine the case of systematic measurement error as well as measurement error that varies according to a fixed distribution. Aside from presenting expressions for the bias and variance of these estimators when they are contaminated with measurement error we provide numerical results based on a specific population. Under systematic measurement error, the biasing effect is asymmetric around zero, and precision may be improved or degraded depending on the magnitude of the error. Under variable measurement error, bias of the conventional ratio-of-means estimator increased slightly with increasing error dispersion, but far less than the increased bias of the conventional mean-of-ratios estimator. In similar fashion, the variance of the mean-of-ratios estimator incurs a greater loss of precision with increasing error dispersion compared with the other estimators we examine. Overall, the ratio-of-means estimator appears to be remarkably resistant to the effects of measurement error in the auxiliary variate.  相似文献   

16.
The bootstrap error estimation method is investigated in comparison with the known π-method and with a combined error estimation suggested by us using simulated and normally distributed “populations” in 15 and 30 characters, respectively. For small sample sizes (below the double to threefold number of characters per class) the estimates resulting from the bootstrap method are on the average too small and can no longer be accepted. Significantly better results (with an essentially lower calculation expenditure) are obtained for the π-method and the combined estimation. The variability is essentially the same for all the three methods. This applies both in the case of rather badly separated and in the case of very well separated populations. A bootstrap estimation modified by us also gives unsatisfactory results.  相似文献   

17.
In various guises, feasible generalized least squares (FGLS) estimation has occupied an important place in regression analysis for more than 35 years. Past studies on the characteristics of the FGLS estimators are largely based on large sample evaluations, and the important issue of admissibility remains unexplored in the case of the FGLS estimator. In this paper, an exact sufficient condition for the dominance of a Stein‐type shrinkage estimator over the FGLS estimator in finite samples based on squared error loss is given. In deriving the condition, we assume that the model's disturbance covariance matrix is unknown except for a scalar multiple. Further, for models with AR(1) disturbances, it is observed that the dominance condition reduces to one that involves no unknown parameter. In other words, in the case of AR(1) disturbances and where the condition for risk dominance is met, the FGLS estimator is rendered inadmissible under squared error loss.  相似文献   

18.
When the sample size is not large or when the underlying disease is rare, to assure collection of an appropriate number of cases and to control the relative error of estimation, one may employ inverse sampling, in which one continues sampling subjects until one obtains exactly the desired number of cases. This paper focuses discussion on interval estimation of the simple difference between two proportions under independent inverse sampling. This paper develops three asymptotic interval estimators on the basis of the maximum likelihood estimator (MLE), the uniformly minimum variance unbiased estimator (UMVUE), and the asymptotic likelihood ratio test (ALRT). To compare the performance of these three estimators, this paper calculates the coverage probability and the expected length of the resulting confidence intervals on the basis of the exact distribution. This paper finds that when the underlying proportions of cases in both two comparison populations are small or moderate (≤0.20), all three asymptotic interval estimators developed here perform reasonably well even for the pre-determined number of cases as small as 5. When the pre-determined number of cases is moderate or large (≥50), all three estimators are essentially equivalent in all the situations considered here. Because application of the two interval estimators derived from the MLE and the UMVUE does not involve any numerical iterative procedure needed in the ALRT, for simplicity we may use these two estimators without losing efficiency.  相似文献   

19.
Small study effects occur when smaller studies show different, often larger, treatment effects than large ones, which may threaten the validity of systematic reviews and meta-analyses. The most well-known reasons for small study effects include publication bias, outcome reporting bias, and clinical heterogeneity. Methods to account for small study effects in univariate meta-analysis have been extensively studied. However, detecting small study effects in a multivariate meta-analysis setting remains an untouched research area. One of the complications is that different types of selection processes can be involved in the reporting of multivariate outcomes. For example, some studies may be completely unpublished while others may selectively report multiple outcomes. In this paper, we propose a score test as an overall test of small study effects in multivariate meta-analysis. Two detailed case studies are given to demonstrate the advantage of the proposed test over various naive applications of univariate tests in practice. Through simulation studies, the proposed test is found to retain nominal Type I error rates with considerable power in moderate sample size settings. Finally, we also evaluate the concordance between the proposed tests with the naive application of univariate tests by evaluating 44 systematic reviews with multiple outcomes from the Cochrane Database.  相似文献   

20.
Zhao and Tsiatis (1997) consider the problem of estimation of the distribution of the quality-adjusted lifetime when the chronological survival time is subject to right censoring. The quality-adjusted lifetime is typically defined as a weighted sum of the times spent in certain states up until death or some other failure time. They propose an estimator and establish the relevant asymptotics under the assumption of independent censoring. In this paper we extend the data structure with a covariate process observed until the end of follow-up and identify the optimal estimation problem. Because of the curse of dimensionality, no globally efficient nonparametric estimators, which have a good practical performance at moderate sample sizes, exist. Given a correctly specified model for the hazard of censoring conditional on the observed quality-of-life and covariate processes, we propose a closed-form one-step estimator of the distribution of the quality-adjusted lifetime whose asymptotic variance attains the efficiency bound if we can correctly specify a lower-dimensional working model for the conditional distribution of quality-adjusted lifetime given the observed quality-of-life and covariate processes. The estimator remains consistent and asymptotically normal even if this latter submodel is misspecified. The practical performance of the estimators is illustrated with a simulation study. We also extend our proposed one-step estimator to the case where treatment assignment is confounded by observed risk factors so that this estimator can be used to test a treatment effect in an observational study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号