首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The nonparametric transformation model makes no parametric assumptions on the forms of the transformation function and the error distribution. This model is appealing in its flexibility for modeling censored survival data. Current approaches for estimation of the regression parameters involve maximizing discontinuous objective functions, which are numerically infeasible to implement with multiple covariates. Based on the partial rank (PR) estimator (Khan and Tamer, 2004), we propose a smoothed PR estimator which maximizes a smooth approximation of the PR objective function. The estimator is shown to be asymptotically equivalent to the PR estimator but is much easier to compute when there are multiple covariates. We further propose using the weighted bootstrap, which is more stable than the usual sandwich technique with smoothing parameters, for estimating the standard error. The estimator is evaluated via simulation studies and illustrated with the Veterans Administration lung cancer data set.  相似文献   

2.
Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic arguments to propose a fixed 0.632 weight, whereas the more recent 0.632+ bootstrap error estimator attempts to set the weight adaptively. In this paper, we study the finite sample problem in the case of linear discriminant analysis under Gaussian populations. We derive exact expressions for the weight that guarantee unbiasedness of the convex bootstrap error estimator in the univariate and multivariate cases, without making asymptotic simplifications. Using exact computation in the univariate case and an accurate approximation in the multivariate case, we obtain the required weight and show that it can deviate significantly from the constant 0.632 weight, depending on the sample size and Bayes error for the problem. The methodology is illustrated by application on data from a well-known cancer classification study.  相似文献   

3.
The problem of estimation of ratio of population proportions is considered and a difference-type estimator is proposed using auxiliary information. The bias and mean squared error of the proposed estimator is found and compared to the usual estimator and also to WYNN'S (1976) type estimator. An example is included for illustration.  相似文献   

4.
In this paper the properties of C-optimal designs constructed for estimating the median effective dose within the framework of two-parametric linear logistic models are critically assessed. It is well known that this design criterion which is based on the first-order variance approximation of the exact variance of the maximum likelihood estimate of the ED50 leads to a one-point design where the maximum likelihood theory breaks down. The single dose used in this design is identical with the true but unknown value of the ED50. It will be shown, that at this one-point design the asymptotic variance does not exist. A two-point design in the neighbourhood of the one-point design which is symmetrical about the ED50 and associated with a small dose-distance would be nearly optimal, but extremely nonrobust if the best guess of the ED50 differs from the true value. In this situation the asymptotic variance of the two-point design converging towards the one-point design tends to infinity. Moreover, taking in consideration, that for searching an optimal design the exact variance is of primary interest and the asymptotic variance serves only as an approximation of the exact variance, we calculate the exact variance of the estimator from balanced, symmetric 2-point designs in the neighbourhood of the limiting 1-point design for various dose distances and initial best guesses of the ED50. We compare the true variance of the estimate of the ED50 with the asymptotic variance and show that the approximations generally do not represent suitable substitutes for the exact variance even in case of unrealistically large sample sizes. Kalish (1990) proposed a criterion based on the second-order asymptotic variance of the maximum likelihood estimate of the ED50 to overcome the degenerated 1-point design as the solution of the optimization procedure. In fact, we are able to show that this variance approximation does not perform substantially better than the first–order variance. From these considerations it follows, that the C-optimality criterion is not useful in this estimation problem. Other criteria like the F-optimality should be used.  相似文献   

5.
MOTIVATION: Ranking feature sets is a key issue for classification, for instance, phenotype classification based on gene expression. Since ranking is often based on error estimation, and error estimators suffer to differing degrees of imprecision in small-sample settings, it is important to choose a computationally feasible error estimator that yields good feature-set ranking. RESULTS: This paper examines the feature-ranking performance of several kinds of error estimators: resubstitution, cross-validation, bootstrap and bolstered error estimation. It does so for three classification rules: linear discriminant analysis, three-nearest-neighbor classification and classification trees. Two measures of performance are considered. One counts the number of the truly best feature sets appearing among the best feature sets discovered by the error estimator and the other computes the mean absolute error between the top ranks of the truly best feature sets and their ranks as given by the error estimator. Our results indicate that bolstering is superior to bootstrap, and bootstrap is better than cross-validation, for discovering top-performing feature sets for classification when using small samples. A key issue is that bolstered error estimation is tens of times faster than bootstrap, and faster than cross-validation, and is therefore feasible for feature-set ranking when the number of feature sets is extremely large.  相似文献   

6.
When stratified random sampling is used for the estimation of population mean, use of ‘Combined ratio estimator’ is well known. Some improved estimators for population mean are proposed which are better than ‘Combined ratio estimator’ and some other well known existing ones, from the point of view of bias and mean square error. An empirical illustration is given.  相似文献   

7.
Ratio estimation with measurement error in the auxiliary variate   总被引:1,自引:0,他引:1  
Gregoire TG  Salas C 《Biometrics》2009,65(2):590-598
Summary .  With auxiliary information that is well correlated with the primary variable of interest, ratio estimation of the finite population total may be much more efficient than alternative estimators that do not make use of the auxiliary variate. The well-known properties of ratio estimators are perturbed when the auxiliary variate is measured with error. In this contribution we examine the effect of measurement error in the auxiliary variate on the design-based statistical properties of three common ratio estimators. We examine the case of systematic measurement error as well as measurement error that varies according to a fixed distribution. Aside from presenting expressions for the bias and variance of these estimators when they are contaminated with measurement error we provide numerical results based on a specific population. Under systematic measurement error, the biasing effect is asymmetric around zero, and precision may be improved or degraded depending on the magnitude of the error. Under variable measurement error, bias of the conventional ratio-of-means estimator increased slightly with increasing error dispersion, but far less than the increased bias of the conventional mean-of-ratios estimator. In similar fashion, the variance of the mean-of-ratios estimator incurs a greater loss of precision with increasing error dispersion compared with the other estimators we examine. Overall, the ratio-of-means estimator appears to be remarkably resistant to the effects of measurement error in the auxiliary variate.  相似文献   

8.
Statistical inference for microarray experiments usually involves the estimation of error variance for each gene. Because the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Based on the model, we develop an empirical Bayes estimator of the error variance for each combination of gene and experiment and call this estimator BAGE because information is Borrowed Across Genes and Experiments. A permutation strategy is used to make inference about the differential expression status of each gene. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches.  相似文献   

9.
The performance of diagnostic tests is often evaluated by estimating their sensitivity and specificity with respect to a traditionally accepted standard test regarded as a “gold standard” in making the diagnosis. Correlated samples of binary data arise in many fields of application. The fundamental unit for analysis is occasionally the site rather than the subject in site-specific studies. Statistical methods that take into account the within-subject corelation should be employed to estimate the sensitivity and the specificity of diagnostic tests since site-specific results within a subject can be highly correlated. I introduce several statistical methods for the estimation of the sensitivity and the specificity of sitespecific diagnostic tests. I apply these techniques to the data from a study involving an enzymatic diagnostic test to motivate and illustrate the estimation of the sensitivity and the specificity of periodontal diagnostic tests. I present results from a simulation study for the estimation of diagnostic sensitivity when the data are correlated within subjects. Through a simulation study, I compare the performance of the binomial estimator pCBE, the ratio estimator pCBE, the weighted estimator pCWE, the intracluster correlation estimator pCIC, and the generalized estimating equation (GEE) estimator PCGEE in terms of biases, observed variances, mean squared errors (MSE), relative efficiencies of their variances and 95 per cent coverage proportions. I recommend using PCBE when σ == 0. I recommend use of the weighted estimator PCWE when σ = 0.6. When σ == 0.2 or σ == 0.4, and the number of subjects is at least 30, PCGEE performs well.  相似文献   

10.
A restricted maximum likelihood estimator for truncated height samples   总被引:1,自引:0,他引:1  
A restricted maximum likelihood (ML) estimator is presented and evaluated for use with truncated height samples. In the common situation of a small sample truncated at a point not far below the mean, the ordinary ML estimator suffers from high sampling variability. The restricted estimator imposes an a priori value on the standard deviation and freely estimates the mean, exploiting the known empirical stability of the former to obtain less variable estimates of the latter. Simulation results validate the conjecture that restricted ML behaves like restricted ordinary least squares (OLS), whose properties are well established on theoretical grounds. Both estimators display smaller sampling variability when constrained, whether the restrictions are correct or not. The bias induced by incorrect restrictions sets up a decision problem involving a bias-precision tradeoff, which can be evaluated using the mean squared error (MSE) criterion. Simulated MSEs suggest that restricted ML estimation offers important advantages when samples are small and truncation points are high, so long as the true standard deviation is within roughly 0.5 cm of the chosen value.  相似文献   

11.
Regulatory authorities require that the sample size of a confirmatory trial is calculated prior to the start of the trial. However, the sample size quite often depends on parameters that might not be known in advance of the study. Misspecification of these parameters can lead to under‐ or overestimation of the sample size. Both situations are unfavourable as the first one decreases the power and the latter one leads to a waste of resources. Hence, designs have been suggested that allow a re‐assessment of the sample size in an ongoing trial. These methods usually focus on estimating the variance. However, for some methods the performance depends not only on the variance but also on the correlation between measurements. We develop and compare different methods for blinded estimation of the correlation coefficient that are less likely to introduce operational bias when the blinding is maintained. Their performance with respect to bias and standard error is compared to the unblinded estimator. We simulated two different settings: one assuming that all group means are the same and one assuming that different groups have different means. Simulation results show that the naïve (one‐sample) estimator is only slightly biased and has a standard error comparable to that of the unblinded estimator. However, if the group means differ, other estimators have better performance depending on the sample size per group and the number of groups.  相似文献   

12.
Malka Gorfine 《Biometrics》2001,57(2):589-597
In this article, we investigate estimation of a secondary parameter in group sequential tests. We study the model in which the secondary parameter is the mean of the normal distribution in a subgroup of the subjects. The bias of the naive secondary parameter estimator is studied. It is shown that the sampling proportions of the subgroup have a crucial effect on the bias: As the sampling proportion of the subgroup at or just before the stopping time increases, the bias of the naive subgroup parameter estimator increases as well. An unbiased estimator for the subgroup parameter and an unbiased estimator for its variance are derived. Using simulations, we compare the mean squared error of the unbiased estimator to that of the naive estimator, and we show that the differences are negligible. As an example, the methods of estimation are applied to an actual group sequential clinical trial, The Beta-Blocker Heart Attack Trial.  相似文献   

13.
In various guises, feasible generalized least squares (FGLS) estimation has occupied an important place in regression analysis for more than 35 years. Past studies on the characteristics of the FGLS estimators are largely based on large sample evaluations, and the important issue of admissibility remains unexplored in the case of the FGLS estimator. In this paper, an exact sufficient condition for the dominance of a Stein‐type shrinkage estimator over the FGLS estimator in finite samples based on squared error loss is given. In deriving the condition, we assume that the model's disturbance covariance matrix is unknown except for a scalar multiple. Further, for models with AR(1) disturbances, it is observed that the dominance condition reduces to one that involves no unknown parameter. In other words, in the case of AR(1) disturbances and where the condition for risk dominance is met, the FGLS estimator is rendered inadmissible under squared error loss.  相似文献   

14.
Is cross-validation valid for small-sample microarray classification?   总被引:5,自引:0,他引:5  
MOTIVATION: Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) cross-validation error estimation is employed in the majority of the papers. Thus, it is necessary to have a quantifiable understanding of the behavior of cross-validation in the context of very small samples. RESULTS: An extensive simulation study has been performed comparing cross-validation, resubstitution and bootstrap estimation for three popular classification rules-linear discriminant analysis, 3-nearest-neighbor and decision trees (CART)-using both synthetic and real breast-cancer patient data. Comparison is via the distribution of differences between the estimated and true errors. Various statistics for the deviation distribution have been computed: mean (for estimator bias), variance (for estimator precision), root-mean square error (for composition of bias and variance) and quartile ranges, including outlier behavior. In general, while cross-validation error estimation is much less biased than resubstitution, it displays excessive variance, which makes individual estimates unreliable for small samples. Bootstrap methods provide improved performance relative to variance, but at a high computational cost and often with increased bias (albeit, much less than with resubstitution).  相似文献   

15.
Isofemale lines are commonly used inDrosophila and other genera for the purpose of assaying genetic variation. Isofemale lines can be kept in the laboratory for many generations before genetic work is carried out, and permit the confirmation of newly discovered alleles. A problem not realized by many workers is that the commonly used estimate of allele frequency from these lines is biased. This estimation bias occurs at all times after the first laboratory generation, regardless of whether single individuals or pooled samples are used in each well of an electrophoretic gel. This bias can potentially affect the estimation of population genetic parameters, and in the case of rare allele analysis it can cause gross overestimates of gene flow. This paper provides a correction for allele frequency estimates derived from isofemale lines for any time after the lines are established in the laboratory. When pooled samples are used, this estimator performs better than the standard estimator at all times after the first generation. The estimator is also insensitive to multiple inseminations. After the lines have drifted oneN e generations, multiple inseminations actually make the new estimator perform better than it does in singly inseminated females. Simulations show that estimates made using either estimator after the lines have drifted to fixation have a much greater error associated with their use than do those estimates made earlier in time using the correction. In general it is better to use corrected estimates of gene frequency soon after lines are established than to use uncorrected estimates made after the first laboratory generation. This work was supported by an NSERC fellowship to A.D.L.  相似文献   

16.

Background

When unaccounted-for group-level characteristics affect an outcome variable, traditional linear regression is inefficient and can be biased. The random- and fixed-effects estimators (RE and FE, respectively) are two competing methods that address these problems. While each estimator controls for otherwise unaccounted-for effects, the two estimators require different assumptions. Health researchers tend to favor RE estimation, while researchers from some other disciplines tend to favor FE estimation. In addition to RE and FE, an alternative method called within-between (WB) was suggested by Mundlak in 1978, although is utilized infrequently.

Methods

We conduct a simulation study to compare RE, FE, and WB estimation across 16,200 scenarios. The scenarios vary in the number of groups, the size of the groups, within-group variation, goodness-of-fit of the model, and the degree to which the model is correctly specified. Estimator preference is determined by lowest mean squared error of the estimated marginal effect and root mean squared error of fitted values.

Results

Although there are scenarios when each estimator is most appropriate, the cases in which traditional RE estimation is preferred are less common. In finite samples, the WB approach outperforms both traditional estimators. The Hausman test guides the practitioner to the estimator with the smallest absolute error only 61% of the time, and in many sample sizes simply applying the WB approach produces smaller absolute errors than following the suggestion of the test.

Conclusions

Specification and estimation should be carefully considered and ultimately guided by the objective of the analysis and characteristics of the data. The WB approach has been underutilized, particularly for inference on marginal effects in small samples. Blindly applying any estimator can lead to bias, inefficiency, and flawed inference.  相似文献   

17.
Small area estimation methods typically combine direct estimatesfrom a survey with predictions from a model in order to obtainestimates of population quantities with reduced mean squarederror. When the auxiliary information used in the model is measuredwith error, using a small area estimator such as the Fay–Herriotestimator while ignoring measurement error may be worse thansimply using the direct estimator. We propose a new small areaestimator that accounts for sampling variability in the auxiliaryinformation, and derive its properties, in particular showingthat it is approximately unbiased. The estimator is appliedto predict quantities measured in the U.S. National Health andNutrition Examination Survey, with auxiliary information fromthe U.S. National Health Interview Survey.  相似文献   

18.
For the calculation of relative measures such as risk ratio (RR) and odds ratio (OR) in a single study, additional approaches are required for the case of zero events. In the case of zero events in one treatment arm, the Peto odds ratio (POR) can be calculated without continuity correction, and is currently the relative effect estimation method of choice for binary data with rare events. The aim of this simulation study is a variegated comparison of the estimated OR and estimated POR with the true OR in a single study with two parallel groups without confounders in data situations where the POR is currently recommended. This comparison was performed by means of several performance measures, that is the coverage, confidence interval (CI) width, mean squared error (MSE), and mean percentage error (MPE). We demonstrated that the estimator for the POR does not outperform the estimator for the OR for all the performance measures investigated. In the case of rare events, small treatment effects and similar group sizes, we demonstrated that the estimator for the POR performed better than the estimator for the OR only regarding the coverage and MPE, but not the CI width and MSE. For larger effects and unbalanced group size ratios, the coverage and MPE of the estimator for the POR were inappropriate. As in practice the true effect is unknown, the POR method should be applied only with the utmost caution.  相似文献   

19.
We consider the estimation of the scaled mutation parameter θ, which is one of the parameters of key interest in population genetics. We provide a general result showing when estimators of θ can be improved using shrinkage when taking the mean squared error as the measure of performance. As a consequence, we show that Watterson’s estimator is inadmissible, and propose an alternative shrinkage-based estimator that is easy to calculate and has a smaller mean squared error than Watterson’s estimator for all possible parameter values 0<θ<. This estimator is admissible in the class of all linear estimators. We then derive improved versions for other estimators of θ, including the MLE. We also investigate how an improvement can be obtained both when combining information from several independent loci and when explicitly taking into account recombination. A simulation study provides information about the amount of improvement achieved by our alternative estimators.  相似文献   

20.
Gene diversity is sometimes estimated from samples that contain inbred or related individuals. If inbred or related individuals are included in a sample, then the standard estimator for gene diversity produces a downward bias caused by an inflation of the variance of estimated allele frequencies. We develop an unbiased estimator for gene diversity that relies on kinship coefficients for pairs of individuals with known relationship and that reduces to the standard estimator when all individuals are noninbred and unrelated. Applying our estimator to data simulated based on allele frequencies observed for microsatellite loci in human populations, we find that the new estimator performs favorably compared with the standard estimator in terms of bias and similarly in terms of mean squared error. For human population-genetic data, we find that a close linear relationship previously seen between gene diversity and distance from East Africa is preserved when adjusting for the inclusion of close relatives.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号