首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary A time‐specific log‐linear regression method on quantile residual lifetime is proposed. Under the proposed regression model, any quantile of a time‐to‐event distribution among survivors beyond a certain time point is associated with selected covariates under right censoring. Consistency and asymptotic normality of the regression estimator are established. An asymptotic test statistic is proposed to evaluate the covariate effects on the quantile residual lifetimes at a specific time point. Evaluation of the test statistic does not require estimation of the variance–covariance matrix of the regression estimators, which involves the probability density function of the survival distribution with censoring. Simulation studies are performed to assess finite sample properties of the regression parameter estimator and test statistic. The new regression method is applied to a breast cancer data set with long‐term follow‐up to estimate the patients' median residual lifetimes, adjusting for important prognostic factors.  相似文献   

2.
Quantile regression methods have been used to estimate upper and lower quantile reference curves as the function of several covariates. Especially, in survival analysis, median regression models to the right‐censored data are suggested with several assumptions. In this article, we consider a median regression model for interval‐censored data and construct an estimating equation based on weights derived from interval‐censored data. In a simulation study, the performances of the proposed method are evaluated for both symmetric and right‐skewed distributed failure times. A well‐known breast cancer data are analyzed to illustrate the proposed method.  相似文献   

3.
Censored quantile regression models, which offer great flexibility in assessing covariate effects on event times, have attracted considerable research interest. In this study, we consider flexible estimation and inference procedures for competing risks quantile regression, which not only provides meaningful interpretations by using cumulative incidence quantiles but also extends the conventional accelerated failure time model by relaxing some of the stringent model assumptions, such as global linearity and unconditional independence. Current method for censored quantile regressions often involves the minimization of the L1‐type convex function or solving the nonsmoothed estimating equations. This approach could lead to multiple roots in practical settings, particularly with multiple covariates. Moreover, variance estimation involves an unknown error distribution and most methods rely on computationally intensive resampling techniques such as bootstrapping. We consider the induced smoothing procedure for censored quantile regressions to the competing risks setting. The proposed procedure permits the fast and accurate computation of quantile regression parameter estimates and standard variances by using conventional numerical methods such as the Newton–Raphson algorithm. Numerical studies show that the proposed estimators perform well and the resulting inference is reliable in practical settings. The method is finally applied to data from a soft tissue sarcoma study.  相似文献   

4.
Xiong  Ying  Chen  Shuai  Tang  Buzhou  Chen  Qingcai  Wang  Xiaolong  Yan  Jun  Zhou  Yi 《BMC bioinformatics》2021,22(1):1-18
Background

For differential abundance analysis, zero-inflated generalized linear models, typically zero-inflated NB models, have been increasingly used to model microbiome and other sequencing count data. A common assumption in estimating the false discovery rate is that the p values are uniformly distributed under the null hypothesis, which demands that the postulated model fit the count data adequately. Mis-specification of the distribution of the count data may lead to excess false discoveries. Therefore, model checking is critical to control the FDR at a nominal level in differential abundance analysis. Increasing studies show that the method of randomized quantile residual (RQR) performs well in diagnosing count regression models. However, the performance of RQR in diagnosing zero-inflated GLMMs for sequencing count data has not been extensively investigated in the literature.

Results

We conduct large-scale simulation studies to investigate the performance of the RQRs for zero-inflated GLMMs. The simulation studies show that the type I error rates of the GOF tests with RQRs are very close to the nominal level; in addition, the scatter-plots and Q–Q plots of RQRs are useful in discerning the good and bad models. We also apply the RQRs to diagnose six GLMMs to a real microbiome dataset. The results show that the OTU counts at the genus level of this dataset (after a truncation treatment) can be modelled well by zero-inflated and zero-modified NB models.

Conclusion

RQR is an excellent tool for diagnosing GLMMs for zero-inflated count data, particularly the sequencing count data arising in microbiome studies. In the supplementary materials, we provided two generic R functions, called rqr.glmmtmb and rqr.hurdle.glmmtmb, for calculating the RQRs given fitting outputs of the R package glmmTMB.

  相似文献   

5.
Disease mapping of a single disease has been widely studied in the public health setup. Simultaneous modeling of related diseases can also be a valuable tool both from the epidemiological and from the statistical point of view. In particular, when we have several measurements recorded at each spatial location, we need to consider multivariate models in order to handle the dependence among the multivariate components as well as the spatial dependence between locations. It is then customary to use multivariate spatial models assuming the same distribution through the entire population density. However, in many circumstances, it is a very strong assumption to have the same distribution for all the areas of population density. To overcome this issue, we propose a hierarchical multivariate mixture generalized linear model to simultaneously analyze spatial Normal and non‐Normal outcomes. As an application of our proposed approach, esophageal and lung cancer deaths in Minnesota are used to show the outperformance of assuming different distributions for different counties of Minnesota rather than assuming a single distribution for the population density. Performance of the proposed approach is also evaluated through a simulation study.  相似文献   

6.
Serfling-type periodic regression models have been widely used to identify and analyse epidemic of influenza. In these approaches, the baseline is traditionally determined using cleaned historical non-epidemic data. However, we found that the previous exclusion of epidemic seasons was empirical, since year-year variations in the seasonal pattern of activity had been ignored. Therefore, excluding fixed ‘epidemic’ months did not seem reasonable. We made some adjustments in the rule of epidemic-period removal to avoid potentially subjective definition of the start and end of epidemic periods. We fitted the baseline iteratively. Firstly, we established a Serfling regression model based on the actual observations without any removals. After that, instead of manually excluding a predefined ‘epidemic’ period (the traditional method), we excluded observations which exceeded a calculated boundary. We then established Serfling regression once more using the cleaned data and excluded observations which exceeded a calculated boundary. We repeated this process until the R2 value stopped to increase. In addition, the definitions of the onset of influenza epidemic were heterogeneous, which might make it impossible to accurately evaluate the performance of alternative approaches. We then used this modified model to detect the peak timing of influenza instead of the onset of epidemic and compared this model with traditional Serfling models using observed weekly case counts of influenza-like illness (ILIs), in terms of sensitivity, specificity and lead time. A better performance was observed. In summary, we provide an adjusted Serfling model which may have improved performance over traditional models in early warning at arrival of peak timing of influenza.  相似文献   

7.
Sufficient dimension reduction (SDR) that effectively reduces the predictor dimension in regression has been popular in high‐dimensional data analysis. Under the presence of censoring, however, most existing SDR methods suffer. In this article, we propose a new algorithm to perform SDR with censored responses based on the quantile‐slicing scheme recently proposed by Kim et al. First, we estimate the conditional quantile function of the true survival time via the censored kernel quantile regression (Shin et al.) and then slice the data based on the estimated censored regression quantiles instead of the responses. Both simulated and real data analysis demonstrate promising performance of the proposed method.  相似文献   

8.
Model checking for ROC regression analysis   总被引:1,自引:0,他引:1  
Cai T  Zheng Y 《Biometrics》2007,63(1):152-163
Summary .   The receiver operating characteristic (ROC) curve is a prominent tool for characterizing the accuracy of a continuous diagnostic test. To account for factors that might influence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date, practical model-checking techniques suitable for validating existing ROC regression models are not yet available. In this article, we develop cumulative residual-based procedures to graphically and numerically assess the goodness of fit for some commonly used ROC regression models, and show how specific components of these models can be examined within this framework. We derive asymptotic null distributions for the residual processes and discuss resampling procedures to approximate these distributions in practice. We illustrate our methods with a dataset from the cystic fibrosis registry.  相似文献   

9.
冠幅是反映单木生长状态及构建林木生长收获模型的重要变量。本研究以辽东山区大边沟林场10~55年生红松人工林为对象,基于66块固定样地的2763株红松的每木检尺数据,选取冠幅基础模型,采用再参数化的方法引入单木竞争指标(Rd),利用哑变量的方法引入了林分密度、林层变量,构建不同分位点(0.50、0.90、0.93、0.95、0.96、0.99)的冠幅分位数回归模型,并与传统方法进行比较,选取模拟林分最大冠幅的最优分位点。为反映林分中单木冠幅在林木个体之间的差异,建立了基于样地水平的最优分位点的线性混合效应分位数回归冠幅模型,分析各变量对单木冠幅的影响。结果表明: 基于F统计检验,不同林分密度和林层的冠幅模型具有显著差异,在基础模型中引入林层、林分密度和竞争后,模型Ra2提高0.0104,均方根误差降低0.0115,均方误差降低为7.4%;与最小二乘法比较,分位数回归模型能够较好地模拟林分状态下的单木最大冠幅,并选出0.96分位点和0.93分位点作为上林层和下林层的分位数回归模型的最优分位点。引入混合效应的线性分位数回归模型的赤池信息准则、贝叶斯信息准则、HQ信息准则等评价指标优于传统分位数回归,参数标准误显著降低,混合效应的引入很好地解释了样地之间的差异。就上林层和下林层而言,林分密度越大,最大冠幅越小;相对直径越大,最大冠幅越大,其中林分密度对下林层的冠幅影响大于上林层,当林分密度足够大时,冠幅随着胸径的增大先增大后降低。本研究构建的基于混合效应的分位数回归模型能有效提高模型的拟合优度,今后可通过调控林分密度、适度抚育间伐等措施,实现对辽东山区红松人工林的科学营建和可持续发展。  相似文献   

10.
M Tsujitani  G G Koch 《Biometrics》1991,47(3):1135-1141
This article describes graphical diagnostic methods for log odds ratio regression models. To study the effects of an additional covariate on log odds ratio regression analysis, three types of residual plots based on weighted least squares (WLS) are discussed: (i) added variable plot (partial regression plot), (ii) partial residual plot, and (iii) augmented partial residual plot. These plots provide diagnostic procedures for identifying heterogeneity of error variances, outliers, or nonlinearity of the model. They are especially useful for clarifying whether including a covariate as a linear term is appropriate, or whether quadratic or other nonlinear transformations are preferable. A well-known data set for case-control studies is analyzed to illustrate the residual plots.  相似文献   

11.
Aim Species distribution models are increasingly used to predict the impacts of global change on whole ecological communities by modelling the individualistic niche responses of large numbers of species. However, it is not clear whether this single‐species ensemble approach is preferable to community‐wide strategies that represent interspecific associations or shared responses to environmental gradients. Here, we test the performance of two multi‐species modelling approaches against equivalent single‐species models. Location Great Britain. Methods Single‐ and multi‐species distribution models were fitted for 701 native British plant species at a 10‐km grid scale. Two machine learning methods were used – classification and regression trees (CARTs) and artificial neural networks (ANNs). The single‐species versions are widely used in ecology but their multivariate extensions are less well known and have not previously been evaluated against one another. We compared their abilities to predict species distributions, community compositions and species richness in an independent geographical region reserved from model‐fitting. Results The single‐ and multi‐species models performed similarly, although the community models gave slightly poorer predictive accuracy by all measures. However, from the point of view of the whole community they were much simpler than the array of single‐species models, involving orders of magnitude fewer parameters. Multi‐species approaches also left greater residual spatial autocorrelation than the individualistic models and, contrary to expectation, were relatively less accurate for rarer species. However, the fitted multi‐species response curves had lower tendency for pronounced discontinuities that are unlikely to be a feature of realized niche responses. Main conclusions Although community distribution models were slightly less accurate than single‐species models, they offered a highly simplified way of modelling spatial patterns in British plant diversity. Moreover, an advantage of the multi‐species approach was that the modelling of shared environmental responses resolved more realistic response curves. However, there was a slight tendency for community models to predict rare species less accurately, which is potentially disadvantageous for conservation applications. We conclude that multi‐species distribution models may have potential for understanding and predicting the structure of ecological communities, but were slightly inferior to single‐species ensembles for our data.  相似文献   

12.
Motivated by investigating the relationship between progesterone and the days in a menstrual cycle in a longitudinal study, we propose a multikink quantile regression model for longitudinal data analysis. It relaxes the linearity condition and assumes different regression forms in different regions of the domain of the threshold covariate. In this paper, we first propose a multikink quantile regression for longitudinal data. Two estimation procedures are proposed to estimate the regression coefficients and the kink points locations: one is a computationally efficient profile estimator under the working independence framework while the other one considers the within-subject correlations by using the unbiased generalized estimation equation approach. The selection consistency of the number of kink points and the asymptotic normality of two proposed estimators are established. Second, we construct a rank score test based on partial subgradients for the existence of the kink effect in longitudinal studies. Both the null distribution and the local alternative distribution of the test statistic have been derived. Simulation studies show that the proposed methods have excellent finite sample performance. In the application to the longitudinal progesterone data, we identify two kink points in the progesterone curves over different quantiles and observe that the progesterone level remains stable before the day of ovulation, then increases quickly in 5 to 6 days after ovulation and then changes to stable again or drops slightly.  相似文献   

13.
Till now, multivariate reference regions have played only a marginal role in the practice of clinical chemistry and laboratory medicine. The major reason for this fact is that such regions are traditionally determined by means of concentration ellipsoids of multidimensional Gaussian distributions yielding reference limits which do not allow statements about possible outlyingness of measurements taken in specific diagnostic tests from a given patient or subject. As a promising way around this difficulty we propose to construct multivariate reference regions as p-dimensional rectangles or (in the one-sided case) rectangular half-spaces whose edges determine univariate percentile ranges of the same probability content in each marginal distribution. In a first step, the corresponding notion of a quantile of a p-dimensional probability distribution of any type and shape is made mathematically precise. Subsequently, both parametric and nonparametric procedures of estimating such a quantile are described. Furthermore, results on sample-size calculation for reference-centile studies based on the proposed definition of multivariate quantiles are presented generalizing the approach of Jennen-Steinmetz and Wellek.  相似文献   

14.
Recently, although advances were made on modeling multivariate count data, existing models really has several limitations: (i) The multivariate Poisson log‐normal model (Aitchison and Ho, 1989) cannot be used to fit multivariate count data with excess zero‐vectors; (ii) The multivariate zero‐inflated Poisson (ZIP) distribution (Li et al., 1999) cannot be used to model zero‐truncated/deflated count data and it is difficult to apply to high‐dimensional cases; (iii) The Type I multivariate zero‐adjusted Poisson (ZAP) distribution (Tian et al., 2017) could only model multivariate count data with a special correlation structure for random components that are all positive or negative. In this paper, we first introduce a new multivariate ZAP distribution, based on a multivariate Poisson distribution, which allows the correlations between components with a more flexible dependency structure, that is some of the correlation coefficients could be positive while others could be negative. We then develop its important distributional properties, and provide efficient statistical inference methods for multivariate ZAP model with or without covariates. Two real data examples in biomedicine are used to illustrate the proposed methods.  相似文献   

15.
Lin DY  Wei LJ  Ying Z 《Biometrics》2002,58(1):1-12
Residuals have long been used for graphical and numerical examinations of the adequacy of regression models. Conventional residual analysis based on the plots of raw residuals or their smoothed curves is highly subjective, whereas most numerical goodness-of-fit tests provide little information about the nature of model misspecification. In this paper, we develop objective and informative model-checking techniques by taking the cumulative sums of residuals over certain coordinates (e.g., covariates or fitted values) or by considering some related aggregates of residuals, such as moving sums and moving averages. For a variety of statistical models and data structures, including generalized linear models with independent or dependent observations, the distributions of these stochastic processes tinder the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be easily generated by computer simulation. Each observed process can then be compared, both graphically and numerically, with a number of realizations from the Gaussian process. Such comparisons enable one to assess objectively whether a trend seen in a residual plot reflects model misspecification or natural variation. The proposed techniques are particularly useful in checking the functional form of a covariate and the link function. Illustrations with several medical studies are provided.  相似文献   

16.
Summary Quantile regression, which models the conditional quantiles of the response variable given covariates, usually assumes a linear model. However, this kind of linearity is often unrealistic in real life. One situation where linear quantile regression is not appropriate is when the response variable is piecewise linear but still continuous in covariates. To analyze such data, we propose a bent line quantile regression model. We derive its parameter estimates, prove that they are asymptotically valid given the existence of a change‐point, and discuss several methods for testing the existence of a change‐point in bent line quantile regression together with a power comparison by simulation. An example of land mammal maximal running speeds is given to illustrate an application of bent line quantile regression in which this model is theoretically justified and its parameters are of direct biological interests.  相似文献   

17.
We consider the general case of probability prediction models having two or more outcomes and propose an adjusted χ2 statistic which can be used to assess the goodness of fit of these models. We present a simulation study to show that our proposed statistic has an approximate χ2 distribution under the null hypothesis. Two applications are provided to illustrate the use of the new statistic. The first application examines the fit of a logistic regression model using both the proposed statistic and the popular Hosmer-Lemeshow statistic and we compare and contrast these two methods. The second application evaluates the goodness of fit of a polychotomous regression model.  相似文献   

18.
The popularity of penalized regression in high‐dimensional data analysis has led to a demand for new inferential tools for these models. False discovery rate control is widely used in high‐dimensional hypothesis testing, but has only recently been considered in the context of penalized regression. Almost all of this work, however, has focused on lasso‐penalized linear regression. In this paper, we derive a general method for controlling the marginal false discovery rate that can be applied to any penalized likelihood‐based model, such as logistic regression and Cox regression. Our approach is fast, flexible and can be used with a variety of penalty functions including lasso, elastic net, MCP, and MNet. We derive theoretical results under which the proposed method is valid, and use simulation studies to demonstrate that the approach is reasonably robust, albeit slightly conservative, when these assumptions are violated. Despite being conservative, we show that our method often offers more power to select causally important features than existing approaches. Finally, the practical utility of the method is demonstrated on gene expression datasets with binary and time‐to‐event outcomes.  相似文献   

19.
Huiping Xu  Bruce A. Craig 《Biometrics》2009,65(4):1145-1155
Summary Traditional latent class modeling has been widely applied to assess the accuracy of dichotomous diagnostic tests. These models, however, assume that the tests are independent conditional on the true disease status, which is rarely valid in practice. Alternative models using probit analysis have been proposed to incorporate dependence among tests, but these models consider restricted correlation structures. In this article, we propose a probit latent class model that allows a general correlation structure. When combined with some helpful diagnostics, this model provides a more flexible framework from which to evaluate the correlation structure and model fit. Our model encompasses several other PLC models but uses a parameter‐expanded Monte Carlo EM algorithm to obtain the maximum‐likelihood estimates. The parameter‐expanded EM algorithm was designed to accelerate the convergence rate of the EM algorithm by expanding the complete‐data model to include a larger set of parameters and it ensures a simple solution in fitting the PLC model. We demonstrate our estimation and model selection methods using a simulation study and two published medical studies.  相似文献   

20.
高慧淋  董利虎  李凤日 《生态学杂志》2016,27(11):3420-3426
基于东北地区378块固定样地和415块临时样地的调查数据和Reineke方程,利用线性分位数回归技术建立了不同分位点(τ=0.90、0.95、0.99)下的长白落叶松人工林最大林分密度与林木平均胸径的关系模型,选出拟合长白落叶松人工林最大密度线的最优模型. 利用人为选取最大的拟合数据,采用最小二乘(OLS)和最大似然(ML)回归同时建立最大密度线模型. 采用极值统计理论的广义Pareto模型推算现实林分特定径阶的极限最大株数,进一步建立极限密度线模型. 将线性分位数回归模型与其他方法进行对比.结果表明: 在全部径阶范围内选取5个最大数据点拟合的方法能够得到现实林分的最大密度线,选取的样点过多会使模拟结果偏离最大密度线,且ML法要优于OLS法. 分位点为0.99的线性分位数回归模型能够取得与ML接近的拟合结果,但分位数回归模型参数的估计结果更稳定. 人为选取拟合数据具有一定的人为性,最终选取分位点为0.99的分位数回归模型为拟合最大密度线的最优模型,参数估计结果为k=11.790、β=-1.586,极限密度线模型的参数估计结果为k=11.820、β=-1.594. 所确定的极限密度线位置略高于最大密度线,但二者差异不明显. 由固定样地数据的验证结果可知,所建立的最大林分密度线及极限密度线能够对现实林分的最大密度及极限密度进行预测,为长白落叶松人工林的合理经营提供依据.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号