首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Diagnostic plots in Cox's regression model.   总被引:3,自引:0,他引:3  
C H Chen  P C Wang 《Biometrics》1991,47(3):841-850
Two diagnostic plots are presented for validating the fitting of a Cox proportional hazards model. The added variable plot is developed to assess the effect of adding a covariate to the model. The constructed variable plot is applied to detect nonlinearity of a fitted covariate. Both plots are also useful for identifying influential observations on the issues of interest. The methods are illustrated on examples of multiple myeloma and lung cancer data.  相似文献   

2.
Lin DY  Wei LJ  Ying Z 《Biometrics》2002,58(1):1-12
Residuals have long been used for graphical and numerical examinations of the adequacy of regression models. Conventional residual analysis based on the plots of raw residuals or their smoothed curves is highly subjective, whereas most numerical goodness-of-fit tests provide little information about the nature of model misspecification. In this paper, we develop objective and informative model-checking techniques by taking the cumulative sums of residuals over certain coordinates (e.g., covariates or fitted values) or by considering some related aggregates of residuals, such as moving sums and moving averages. For a variety of statistical models and data structures, including generalized linear models with independent or dependent observations, the distributions of these stochastic processes tinder the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be easily generated by computer simulation. Each observed process can then be compared, both graphically and numerically, with a number of realizations from the Gaussian process. Such comparisons enable one to assess objectively whether a trend seen in a residual plot reflects model misspecification or natural variation. The proposed techniques are particularly useful in checking the functional form of a covariate and the link function. Illustrations with several medical studies are provided.  相似文献   

3.
Wang CY  Huang WT 《Biometrics》2000,56(1):98-105
We consider estimation in logistic regression where some covariate variables may be missing at random. Satten and Kupper (1993, Journal of the American Statistical Association 88, 200-208) proposed estimating odds ratio parameters using methods based on the probability of exposure. By approximating a partial likelihood, we extend their idea and propose a method that estimates the cumulant-generating function of the missing covariate given observed covariates and surrogates in the controls. Our proposed method first estimates some lower order cumulants of the conditional distribution of the unobserved data and then solves a resulting estimating equation for the logistic regression parameter. A simple version of the proposed method is to replace a missing covariate by the summation of its conditional mean and conditional variance given observed data in the controls. We note that one important property of the proposed method is that, when the validation is only on controls, a class of inverse selection probability weighted semiparametric estimators cannot be applied because selection probabilities on cases are zeroes. The proposed estimator performs well unless the relative risk parameters are large, even though it is technically inconsistent. Small-sample simulations are conducted. We illustrate the method by an example of real data analysis.  相似文献   

4.
In the analysis of repeated measurements, multivariate regression models that account for the correlations among the observations from the same subject are widely used. Like the usual univariate regression models, these multivariate regression models also need some model diagnostic procedures. Though these models have been widely used, not many studies have been performed in model diagnostic areas. In this paper, we propose simple residual plots to investigate the goodness of model fit for repeated measures data. Here, we mainly focus on the mean model diagnostics. The proposed residual plots are based on the quantile‐quantile(Q–Q) plots of a χ2 distribution and a normal distribution. In particular, the proposed model is useful in comparing several models simultaneously. The proposed method is illustrated using two examples. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

5.
Summary A time‐specific log‐linear regression method on quantile residual lifetime is proposed. Under the proposed regression model, any quantile of a time‐to‐event distribution among survivors beyond a certain time point is associated with selected covariates under right censoring. Consistency and asymptotic normality of the regression estimator are established. An asymptotic test statistic is proposed to evaluate the covariate effects on the quantile residual lifetimes at a specific time point. Evaluation of the test statistic does not require estimation of the variance–covariance matrix of the regression estimators, which involves the probability density function of the survival distribution with censoring. Simulation studies are performed to assess finite sample properties of the regression parameter estimator and test statistic. The new regression method is applied to a breast cancer data set with long‐term follow‐up to estimate the patients' median residual lifetimes, adjusting for important prognostic factors.  相似文献   

6.
Holcroft CA  Spiegelman D 《Biometrics》1999,55(4):1193-1201
We compared several validation study designs for estimating the odds ratio of disease with misclassified exposure. We assumed that the outcome and misclassified binary covariate are available and that the error-free binary covariate is measured in a subsample, the validation sample. We considered designs in which the total size of the validation sample is fixed and the probability of selection into the validation sample may depend on outcome and misclassified covariate values. Design comparisons were conducted for rare and common disease scenarios, where the optimal design is the one that minimizes the variance of the maximum likelihood estimator of the true log odds ratio relating the outcome to the exposure of interest. Misclassification rates were assumed to be independent of the outcome. We used a sensitivity analysis to assess the effect of misspecifying the misclassification rates. Under the scenarios considered, our results suggested that a balanced design, which allocates equal numbers of validation subjects into each of the four outcome/mismeasured covariate categories, is preferable for its simplicity and good performance. A user-friendly Fortran program is available from the second author, which calculates the optimal sampling fractions for all designs considered and the efficiencies of these designs relative to the optimal hybrid design for any scenario of interest.  相似文献   

7.
Body condition indices are widely used by ecologists, but many indices are used without empirical validation. To test the validity of a variety of indices, we compared how well a broad range of body condition indices predicted body fat mass, percent body fat and residual fat mass in mice Mus musculus. We also compared the performance of these condition indices with the multiple regression of several morphometric variables on body fat mass, percent body fat and residual fat mass. In our study population, two ratio based condition indices – body mass/body length and log body mass/log body length – predicted body fat mass as well as or better than other ratio and residual indices of condition in females. In males one ratio based condition index (log body mass/log body length) and one residual index (residuals from a regression of pelvic circumference on body length) were best at predicting body fat mass. All indices were better at estimating body fat mass, and residual fat mass than at estimating percent body fat. The predictions of body fat were much better for females than for males. Multiple regressions incorporating pelvic circumference (i.e. girth at the iliac crests) were the best predictors of body fat mass, residual fat mass, and percent body fat, and these multiple regressions were better than any of the condition indices. We recommend 1) that condition be precisely defined, 2) that predictors of condition be empirically validated, 3) that pelvic circumference be considered as a potential predictor of fat content, and 4) that, in general, multiple regression be considered as an alternative to condition indices.  相似文献   

8.
Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one‐step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster‐deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts.  相似文献   

9.
Combining diagnostic test results to increase accuracy   总被引:4,自引:0,他引:4  
When multiple diagnostic tests are performed on an individual or multiple disease markers are available it may be possible to combine the information to diagnose disease. We consider how to choose linear combinations of markers in order to optimize diagnostic accuracy. The accuracy index to be maximized is the area or partial area under the receiver operating characteristic (ROC) curve. We propose a distribution-free rank-based approach for optimizing the area under the ROC curve and compare it with logistic regression and with classic linear discriminant analysis (LDA). It has been shown that the latter method optimizes the area under the ROC curve when test results have a multivariate normal distribution for diseased and non-diseased populations. Simulation studies suggest that the proposed non-parametric method is efficient when data are multivariate normal.The distribution-free method is generalized to a smooth distribution-free approach to: (i) accommodate some reasonable smoothness assumptions; (ii) incorporate covariate effects; and (iii) yield optimized partial areas under the ROC curve. This latter feature is particularly important since it allows one to focus on a region of the ROC curve which is of most relevance to clinical practice. Neither logistic regression nor LDA necessarily maximize partial areas. The approaches are illustrated on two cancer datasets, one involving serum antigen markers for pancreatic cancer and the other involving longitudinal prostate specific antigen data.  相似文献   

10.
Question: The utility of beta (β‐) diversity measures that incorporate information about the degree of taxonomic (dis)similarity between species plots is becoming increasingly recognized. In this framework, the question for this study is: can we define an ecologically meaningful index of β‐diversity that, besides indicating simple species turnover, is able to account for taxonomic similarity amongst species in plots? Methods: First, the properties of existing measures of taxonomic similarity measures are briefly reviewed. Next, a new measure of plot‐to‐plot taxonomic similarity is presented that is based on the maximal common subgraph of two taxonomic trees. The proposed measure is computed from species presences and absences and include information about the degree of higher‐level taxonomic similarity between species plots. The performance of the proposed measure with respect to existing coefficients of taxonomic similarity and the coefficient of Jaccard is discussed using a small data set of heath plant communities. Finally, a method to quantify β‐diversity from taxonomic dissimilarities is discussed. Results: The proposed measure of taxonomic β‐diversity incorporates not only species richness, but also information about the degree of higher‐order taxonomic structure between species plots. In this view, it comes closer to a modern notion of biological diversity than more traditional measures of β‐di‐versity. From regression analysis between the new coefficient and existing measures of taxonomic similarity it is shown that there is an evident nonlinearity between the coefficients. This nonlinearity demonstrates that the new coefficient measures similarity in a conceptually different way from previous indices. Also, in good agreement with the findings of previous authors, the regression between the new index and the Jaccard coefficient of similarity shows that more than 80% of the variance of the former is explained by the community structure at the species level, while only the residual variance is explained by differences in the higher‐order taxonomic structure of the species plots. This means that a genuine taxonomic approach to the quantification of plot‐to‐plot similarity is only needed if we are interested in the residual system's variation that is related to the higher‐order taxonomic structure of a pair of species plots.  相似文献   

11.
Kauermann G 《Biometrics》2000,56(3):692-698
This paper presents a smooth regression model for ordinal data with longitudinal dependence structure. A marginal model with cumulative logit link is applied to cope with the ordinal scale and the main and covariate effects in the model are allowed to vary with time. Local fitting is pursued and asymptotic properties of the estimates are discussed. In a second step, the longitudinal dependence of the observations is considered. Cumulative log odds ratios are fitted locally, which allows investigation of how the longitudinal dependence of the ordinal observations changes with time.  相似文献   

12.
We consider the estimation of the prevalence of a rare disease, and the log‐odds ratio for two specified groups of individuals from group testing data. For a low‐prevalence disease, the maximum likelihood estimate of the log‐odds ratio is severely biased. However, Firth correction to the score function leads to a considerable improvement of the estimator. Also, for a low‐prevalence disease, if the diagnostic test is imperfect, the group testing is found to yield more precise estimate of the log‐odds ratio than the individual testing.  相似文献   

13.
Partial AUC estimation and regression   总被引:2,自引:0,他引:2  
Dodd LE  Pepe MS 《Biometrics》2003,59(3):614-623
Accurate diagnosis of disease is a critical part of health care. New diagnostic and screening tests must be evaluated based on their abilities to discriminate diseased from nondiseased states. The partial area under the receiver operating characteristic (ROC) curve is a measure of diagnostic test accuracy. We present an interpretation of the partial area under the curve (AUC), which gives rise to a nonparametric estimator. This estimator is more robust than existing estimators, which make parametric assumptions. We show that the robustness is gained with only a moderate loss in efficiency. We describe a regression modeling framework for making inference about covariate effects on the partial AUC. Such models can refine knowledge about test accuracy. Model parameters can be estimated using binary regression methods. We use the regression framework to compare two prostate-specific antigen biomarkers and to evaluate the dependence of biomarker accuracy on the time prior to clinical diagnosis of prostate cancer.  相似文献   

14.
OBJECTIVE--To determine whether rectal examination provides any diagnostic information in patients admitted to hospital with pain in the right lower quadrant of the abdomen. DESIGN--Casualty officer or surgical registrar recorded symptoms and signs on admission on detailed forms. Final diagnosis was noted on discharge from hospital. SETTING--District general hospital. PATIENTS--1204 Consecutive patients admitted to hospital with pain in the right lower quadrant of the abdomen as their major complaint; 1028 had a rectal examination on admission. MAIN OUTCOME MEASURES--Odds ratio for each symptom and sign related to final diagnosis. Results of multiple logistic regression analysis for acute appendicitis. RESULTS--Right sided rectal tenderness, present in 309 of those examined, was more common in patients with acute appendicitis (odds ratio 1.34, p less than 0.05). This odds ratio was considerably less than that for other clinical signs--namely, tenderness in the right lower quadrant (odds ratio 5.09), rebound tenderness (3.34), guarding (3.07), and muscular rigidity in the abdomen (5.03). In the logistic regression analysis of patients with acute appendicitis, when allowance was made for the presence or absence of rebound tenderness, rectal tenderness on the right lost its significance. Six patients had masses palpable rectally, of which three were palpable on abdominal examination; the other three patients had acute appendicitis. No other unexpected diagnoses were established, and no useful additional diagnostic information was obtained by routine rectal examination. CONCLUSION--If patients presenting with pain in the right lower quadrant of the abdomen are tested for rebound tenderness then rectal examination does not give any further diagnostic information.  相似文献   

15.
G Heller  J S Simonoff 《Biometrics》1992,48(1):101-115
Although the analysis of censored survival data using the proportional hazards and linear regression models is common, there has been little work examining the ability of these estimators to predict time to failure. This is unfortunate, since a predictive plot illustrating the relationship between time to failure and a continuous covariate can be far more informative regarding the risk associated with the covariate than a Kaplan-Meier plot obtained by discretizing the variable. In this paper the predictive power of the Cox (1972, Journal of the Royal Statistical Society, Series B 34, 187-202) proportional hazards estimator and the Buckley-James (1979, Biometrika 66, 429-436) censored regression estimator are compared. Using computer simulations and heuristic arguments, it is shown that the choice of method depends on the censoring proportion, strength of the regression, the form of the censoring distribution, and the form of the failure distribution. Several examples are provided to illustrate the usefulness of the methods.  相似文献   

16.
D P Byar  N Mantel 《Biometrics》1975,31(4):943-947
Interrelationships among three response-time models which incorporate covariate information are explored. The most general of these models is the logistic-exponential in which the log odds of the probability of responding in a fixed interval is assumed to be a linear function of the covariates; this model includes a parameter W for the width of discrete time intervals in which responses occur. As W leads to O this model is equivalent to a continuous time exponential model in which the log hazard is linear in the covariates. As W leads to infininity it is equivalent to a continuous time exponential model in which the hazard itself is a linear function of the covariates. This second model was fitted to the data used in an earlier publication describing the logistic exponential model, and very close agreement of the estimates of the regression coefficients is demonstrated.  相似文献   

17.
Historical ecological data are valuable for reconstructing early environmental and vegetation community conditions and examining change to vegetation communities and disturbance regimes over decadal and longer temporal scales, but these data are not free from error. We examine the spatial uncertainties associated with 18,000 vegetation plots in the decades-old California Vegetation Type Mapping (VTM) dataset that has been digitized for use in modern ecological analysis. We examine the relationship between plot location error and basemap year, basemap scale, plot elevation, plot slope, and general plot habitat type. Bivariate plots and classification and regression tree analysis (CART) confirm that basemap scale and age are the strongest explanation of total error. Total error in spatial location for all plots ranged from 126.9 m to 462.3 m; plots drawn on 15-min (1:62,500-scale) basemaps had total error ranging from 126 m to 199.7 m, and plots drawn on coarser-scale basemaps (1:125,000-scale) had total errors ranging from 241 m to 461.2 m. Relocation of individual VTM plots is considerably easier for plots originally marked on 1:62,500-scale maps produced after 1904, and more difficult for plots originally marked on 1:125,000-scale maps produced before 1898. Biogeographical analyses that rely less on relocating individual plots, such as environmental niche modeling or multivariate analyses can alleviate some of these concerns, but all researchers using these kinds of data need to consider errors in spatial location of plots. The paper also discusses ways in which the differing spatial error might be reported and visualized by those using the dataset, and how the data might be used in modern environmental niche models.  相似文献   

18.
We propose a parametric regression model for the cumulative incidence functions (CIFs) commonly used for competing risks data. The model adopts a modified logistic model as the baseline CIF and a generalized odds‐rate model for covariate effects, and it explicitly takes into account the constraint that a subject with any given prognostic factors should eventually fail from one of the causes such that the asymptotes of the CIFs should add up to one. This constraint intrinsically holds in a nonparametric analysis without covariates, but is easily overlooked in a semiparametric or parametric regression setting. We hence model the CIF from the primary cause assuming the generalized odds‐rate transformation and the modified logistic function as the baseline CIF. Under the additivity constraint, the covariate effects on the competing cause are modeled by a function of the asymptote of the baseline distribution and the covariate effects on the primary cause. The inference procedure is straightforward by using the standard maximum likelihood theory. We demonstrate desirable finite‐sample performance of our model by simulation studies in comparison with existing methods. Its practical utility is illustrated in an analysis of a breast cancer dataset to assess the treatment effect of tamoxifen, adjusting for age and initial pathological tumor size, on breast cancer recurrence that is subject to dependent censoring by second primary cancers and deaths.  相似文献   

19.
重要值的改进及其在羊草群落分类中的应用   总被引:12,自引:0,他引:12  
对重要值进行了改进, 首次提出了理论平均重要值(theoretical mean importance value, TMIV)、简化重要值(simple importance value, SIV)和样地指数(sample plot index, SPI)的概念。即, 理论平均重要值是指随物种数目变化样地内各种植物理论上的平均重要值。简化重要值是指理论平均重要值乘以某一植物的生物量与样地内所有植物的平均生物量的比值或者乘以某一植物的体积(盖度乘以高度)与样地内所有植物的平均体积的比值。样地指数是指简化重要值乘以某一植物的生物量与所有样地中该植物的平均生物量的比值或者乘以某一植物的体积与所有样地中该植物的平均体积的比值。相对重要值而言, 简化重要值减少了野外工作量。而样地指数既反映了物种在样地内所占有的优势, 又反映了物种在样地间所占有的优势, 使得不同样地的重要值更具可比性。文中用重要值、简化重要值和样地指数三个指标进行羊草群落的聚类分析, 结果表明, 简化重要值可用于植物群落的数量分类, 样地指数比重要值更适宜用于植物群落的数量分类。  相似文献   

20.
We present an approach for analyzing internal dependencies in counting processes. This covers the case with repeated events on each of a number of individuals, and more generally, the situation where several processes are observed for each individual. We define dynamic covariates, i.e., covariates depending on the past of the processes. The statistical analysis is performed mainly by the nonparametric additive approach. This yields a method for analyzing multivariate survival data, which is an alternative to the frailty approach. We present cumulative regression plots, statistical tests, residual plots, and a hat matrix plot for studying outliers. A program in R and S-PLUS for analyzing survival data with the additive regression model is available on the web site http://www.med.uio.no/imb/stat/addreg. The program has been developed to fit the counting process framework.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号