首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary Quantile regression, which models the conditional quantiles of the response variable given covariates, usually assumes a linear model. However, this kind of linearity is often unrealistic in real life. One situation where linear quantile regression is not appropriate is when the response variable is piecewise linear but still continuous in covariates. To analyze such data, we propose a bent line quantile regression model. We derive its parameter estimates, prove that they are asymptotically valid given the existence of a change‐point, and discuss several methods for testing the existence of a change‐point in bent line quantile regression together with a power comparison by simulation. An example of land mammal maximal running speeds is given to illustrate an application of bent line quantile regression in which this model is theoretically justified and its parameters are of direct biological interests.  相似文献   

2.
Ying Yuan  Guosheng Yin 《Biometrics》2010,66(1):105-114
Summary .  We study quantile regression (QR) for longitudinal measurements with nonignorable intermittent missing data and dropout. Compared to conventional mean regression, quantile regression can characterize the entire conditional distribution of the outcome variable, and is more robust to outliers and misspecification of the error distribution. We account for the within-subject correlation by introducing a   ℓ2   penalty in the usual QR check function to shrink the subject-specific intercepts and slopes toward the common population values. The informative missing data are assumed to be related to the longitudinal outcome process through the shared latent random effects. We assess the performance of the proposed method using simulation studies, and illustrate it with data from a pediatric AIDS clinical trial.  相似文献   

3.
Research has shown that high blood glucose levels are important predictors of incident diabetes. However, they are also strongly associated with other cardiometabolic risk factors such as high blood pressure, adiposity, and cholesterol, which are also highly correlated with one another. The aim of this analysis was to ascertain how these highly correlated cardiometabolic risk factors might be associated with high levels of blood glucose in older adults aged 50 or older from wave 2 of the English Longitudinal Study of Ageing (ELSA). Due to the high collinearity of predictor variables and our interest in extreme values of blood glucose we proposed a new method, called quantile profile regression, to answer this question. Profile regression, a Bayesian nonparametric model for clustering responses and covariates simultaneously, is a powerful tool to model the relationship between a response variable and covariates, but the standard approach of using a mixture of Gaussian distributions for the response model will not identify the underlying clusters correctly, particularly with outliers in the data or heavy tail distribution of the response. Therefore, we propose quantile profile regression to model the response variable with an asymmetric Laplace distribution, allowing us to model more accurately clusters that are asymmetric and predict more accurately for extreme values of the response variable and/or outliers. Our new method performs more accurately in simulations when compared to Normal profile regression approach as well as robustly when outliers are present in the data. We conclude with an analysis of the ELSA.  相似文献   

4.
Summary A time‐specific log‐linear regression method on quantile residual lifetime is proposed. Under the proposed regression model, any quantile of a time‐to‐event distribution among survivors beyond a certain time point is associated with selected covariates under right censoring. Consistency and asymptotic normality of the regression estimator are established. An asymptotic test statistic is proposed to evaluate the covariate effects on the quantile residual lifetimes at a specific time point. Evaluation of the test statistic does not require estimation of the variance–covariance matrix of the regression estimators, which involves the probability density function of the survival distribution with censoring. Simulation studies are performed to assess finite sample properties of the regression parameter estimator and test statistic. The new regression method is applied to a breast cancer data set with long‐term follow‐up to estimate the patients' median residual lifetimes, adjusting for important prognostic factors.  相似文献   

5.
Practitioners of current data analysis are regularly confronted with the situation where the heavy-tailed skewed response is related to both multiple functional predictors and high-dimensional scalar covariates. We propose a new class of partially functional penalized convolution-type smoothed quantile regression to characterize the conditional quantile level between a scalar response and predictors of both functional and scalar types. The new approach overcomes the lack of smoothness and severe convexity of the standard quantile empirical loss, considerably improving the computing efficiency of partially functional quantile regression. We investigate a folded concave penalized estimator for simultaneous variable selection and estimation by the modified local adaptive majorize-minimization (LAMM) algorithm. The functional predictors can be dense or sparse and are approximated by the principal component basis. Under mild conditions, the consistency and oracle properties of the resulting estimators are established. Simulation studies demonstrate a competitive performance against the partially functional standard penalized quantile regression. A real application using Alzheimer's Disease Neuroimaging Initiative data is utilized to illustrate the practicality of the proposed model.  相似文献   

6.
冠幅是反映单木生长状态及构建林木生长收获模型的重要变量。本研究以辽东山区大边沟林场10~55年生红松人工林为对象,基于66块固定样地的2763株红松的每木检尺数据,选取冠幅基础模型,采用再参数化的方法引入单木竞争指标(Rd),利用哑变量的方法引入了林分密度、林层变量,构建不同分位点(0.50、0.90、0.93、0.95、0.96、0.99)的冠幅分位数回归模型,并与传统方法进行比较,选取模拟林分最大冠幅的最优分位点。为反映林分中单木冠幅在林木个体之间的差异,建立了基于样地水平的最优分位点的线性混合效应分位数回归冠幅模型,分析各变量对单木冠幅的影响。结果表明: 基于F统计检验,不同林分密度和林层的冠幅模型具有显著差异,在基础模型中引入林层、林分密度和竞争后,模型Ra2提高0.0104,均方根误差降低0.0115,均方误差降低为7.4%;与最小二乘法比较,分位数回归模型能够较好地模拟林分状态下的单木最大冠幅,并选出0.96分位点和0.93分位点作为上林层和下林层的分位数回归模型的最优分位点。引入混合效应的线性分位数回归模型的赤池信息准则、贝叶斯信息准则、HQ信息准则等评价指标优于传统分位数回归,参数标准误显著降低,混合效应的引入很好地解释了样地之间的差异。就上林层和下林层而言,林分密度越大,最大冠幅越小;相对直径越大,最大冠幅越大,其中林分密度对下林层的冠幅影响大于上林层,当林分密度足够大时,冠幅随着胸径的增大先增大后降低。本研究构建的基于混合效应的分位数回归模型能有效提高模型的拟合优度,今后可通过调控林分密度、适度抚育间伐等措施,实现对辽东山区红松人工林的科学营建和可持续发展。  相似文献   

7.
Yin G  Cai J 《Biometrics》2005,61(1):151-161
As an alternative to the mean regression model, the quantile regression model has been studied extensively with independent failure time data. However, due to natural or artificial clustering, it is common to encounter multivariate failure time data in biomedical research where the intracluster correlation needs to be accounted for appropriately. For right-censored correlated survival data, we investigate the quantile regression model and adapt an estimating equation approach for parameter estimation under the working independence assumption, as well as a weighted version for enhancing the efficiency. We show that the parameter estimates are consistent and asymptotically follow normal distributions. The variance estimation using asymptotic approximation involves nonparametric functional density estimation. We employ the bootstrap and perturbation resampling methods for the estimation of the variance-covariance matrix. We examine the proposed method for finite sample sizes through simulation studies, and illustrate it with data from a clinical trial on otitis media.  相似文献   

8.
Summary In studies involving functional data, it is commonly of interest to model the impact of predictors on the distribution of the curves, allowing flexible effects on not only the mean curve but also the distribution about the mean. Characterizing the curve for each subject as a linear combination of a high‐dimensional set of potential basis functions, we place a sparse latent factor regression model on the basis coefficients. We induce basis selection by choosing a shrinkage prior that allows many of the loadings to be close to zero. The number of latent factors is treated as unknown through a highly‐efficient, adaptive‐blocked Gibbs sampler. Predictors are included on the latent variables level, while allowing different predictors to impact different latent factors. This model induces a framework for functional response regression in which the distribution of the curves is allowed to change flexibly with predictors. The performance is assessed through simulation studies and the methods are applied to data on blood pressure trajectories during pregnancy.  相似文献   

9.
Dunson DB  Watson M  Taylor JA 《Biometrics》2003,59(2):296-304
Often a response of interest cannot be measured directly and it is necessary to rely on multiple surrogates, which can be assumed to be conditionally independent given the latent response and observed covariates. Latent response models typically assume that residual densities are Gaussian. This article proposes a Bayesian median regression modeling approach, which avoids parametric assumptions about residual densities by relying on an approximation based on quantiles. To accommodate within-subject dependency, the quantile response categories of the surrogate outcomes are related to underlying normal variables, which depend on a latent normal response. This underlying Gaussian covariance structure simplifies interpretation and model fitting, without restricting the marginal densities of the surrogate outcomes. A Markov chain Monte Carlo algorithm is proposed for posterior computation, and the methods are applied to single-cell electrophoresis (comet assay) data from a genetic toxicology study.  相似文献   

10.
Community resilience offers a conceptual framework for assessing a community's capacity for coping with environmental changes and emergency situations. It is perceived as a core element of sustainable lifestyle, helping to mitigate the community's reaction to crises by facilitating purposeful and collective action on the part of its’ members. The conjoint community resilience assessment measure (CCRAM) provides a standard measure of community resilience including five factors: leadership, collective efficacy, preparedness, place attachment, and social trust. The mean scores of each the factors portray a community resilience profile and the overall CCRAM score is calculated as the average of the scores of the 21 survey items with an equal weight.Two regression models were employed. Logistic regression, a commonly used tool in the field of applied statistics, and quantile regression, which is a non-parametric method that facilitates the detection of the effect of a regressor on various quantiles of the dependent variable.The study aims to demonstrate the innovative use of quantile regression modeling in community resilience analysis.The results demonstrate that the quantile regression was significantly more sensitive to sub-populations than the logistic regression.Having an income below average, which was negatively correlated with perceived community resilience in the logistic model was found to be significant only in the lower (Q10, Q25) resilience quantiles. Age (per year) and previous involvement in emergency situations which were not noted as significant in the logistic regression, were found to be positively associated with perceived community resilience in the lowest quantile. A difference between quantiles of perceived community resilience was noted in regard to size of community. The association between size of community and perceived community resilience which was negative in the logistic regression (residents of larger towns had lower community resilience), was found to be such only up to quantile 75, but it reversed in the highest quantile.It was concluded that the utilization of quantile regression analysis in studies of community resilience can facilitate the creation of tailored response plans, adapted to the needs of sub (such as weaker) populations and help enhance overall community resilience in crises.  相似文献   

11.
We propose a censored quantile regression model for the analysis of relative survival data. We create a hybrid data set consisting of the study observations and counterpart randomly sampled pseudopopulation observations imputed from population life tables that adjust for expected mortality. We then fit a censored quantile regression model to the hybrid data incorporating demographic variables (e.g., age, biologic sex, calendar time) corresponding to the population life tables of demographically-similar individuals, a population versus study covariate, and its interactions with the variables of interest. These latter variables can be interpreted as relative survival parameters that depict the differences in failure quantiles between the study participants and their population counterparts.  相似文献   

12.
Ecologists often estimate population trends of animals in time series of counts using linear regression to estimate parameters in a linear transformation of multiplicative growth models, where logarithms of rates of change in counts in time intervals are used as response variables. We present quantile regression estimates for the median (0.50) and interquartile (0.25, 0.75) relationships as an alternative to mean regression estimates for common density-dependent and density-independent population growth models. We demonstrate that the quantile regression estimates are more robust to outliers and require fewer distributional assumptions than conventional mean regression estimates and can provide information on heterogeneous rates of change ignored by mean regression. We provide quantile regression trend estimates for 2 populations of greater sage-grouse (Centrocercus urophasianus) in Wyoming, USA, and for the Crawford population of Gunnison sage-grouse (Centrocercus minimus) in southwestern Colorado, USA. Our selected Gompertz models of density dependence for both populations of greater sage-grouse had smaller negative estimates of density-dependence terms and less variation in corresponding predicted growth rates (λ) for quantile than mean regression models. In contrast, our selected Gompertz models of density dependence with piecewise linear effects of years for the Crawford population of Gunnison sage-grouse had predicted changes in λ across years from quantile regressions that varied more than those from mean regression because of heterogeneity in estimated λs that were both less and greater than mean estimates. Our results add to literature establishing that quantile regression provides better behaved estimates than mean regression when there are outlying growth rates, including those induced by adjustments for zeros in the time series of counts. The 0.25 and 0.75 quantiles bracketing the median provide robust estimates of population changes (λ) for the central 50% of time series data and provide a 50% prediction interval for a single new prediction without making parametric distributional assumptions or assuming homogeneous λs. Compared to mean estimates, our quantile regression trend estimates for greater sage-grouse indicated less variation in density-dependent λs by minimizing sensitivity to outlying values, and for Gunnison sage-grouse indicated greater variation in density-dependent λs associated with heterogeneity among quantiles.  相似文献   

13.
Censored quantile regression models, which offer great flexibility in assessing covariate effects on event times, have attracted considerable research interest. In this study, we consider flexible estimation and inference procedures for competing risks quantile regression, which not only provides meaningful interpretations by using cumulative incidence quantiles but also extends the conventional accelerated failure time model by relaxing some of the stringent model assumptions, such as global linearity and unconditional independence. Current method for censored quantile regressions often involves the minimization of the L1‐type convex function or solving the nonsmoothed estimating equations. This approach could lead to multiple roots in practical settings, particularly with multiple covariates. Moreover, variance estimation involves an unknown error distribution and most methods rely on computationally intensive resampling techniques such as bootstrapping. We consider the induced smoothing procedure for censored quantile regressions to the competing risks setting. The proposed procedure permits the fast and accurate computation of quantile regression parameter estimates and standard variances by using conventional numerical methods such as the Newton–Raphson algorithm. Numerical studies show that the proposed estimators perform well and the resulting inference is reliable in practical settings. The method is finally applied to data from a soft tissue sarcoma study.  相似文献   

14.
Flexible estimation of multiple conditional quantiles is of interest in numerous applications, such as studying the effect of pregnancy-related factors on low and high birth weight. We propose a Bayesian nonparametric method to simultaneously estimate noncrossing, nonlinear quantile curves. We expand the conditional distribution function of the response in I-spline basis functions where the covariate-dependent coefficients are modeled using neural networks. By leveraging the approximation power of splines and neural networks, our model can approximate any continuous quantile function. Compared to existing models, our model estimates all rather than a finite subset of quantiles, scales well to high dimensions, and accounts for estimation uncertainty. While the model is arbitrarily flexible, interpretable marginal quantile effects are estimated using accumulative local effect plots and variable importance measures. A simulation study shows that our model can better recover quantiles of the response distribution when the data are sparse, and an analysis of birth weight data is presented.  相似文献   

15.
MOTIVATION: The identification of DNA copy number changes provides insights that may advance our understanding of initiation and progression of cancer. Array-based comparative genomic hybridization (array-CGH) has emerged as a technique allowing high-throughput genome-wide scanning for chromosomal aberrations. A number of statistical methods have been proposed for the analysis of array-CGH data. In this article, we consider a fused quantile regression model based on three motivations: (1) quantile regression may provide a more comprehensive picture for the ratio profile of copy numbers than the standard mean regression approach; (2) for simplicity, most available methods assume uniform spacing between neighboring clones, while incorporating the information of physical locations of clones may be helpful and (3) most current methods have a set of tuning parameters that must be carefully tuned, which introduces complexity to the implementation. RESULTS: We formulate the detection of regions of gains and losses in a fused regularized quantile regression framework, incorporating physical locations of clones. We derive an efficient algorithm that computes the entire solution path for the resulting optimization problem, and we propose a simple estimate for the complexity of the fitted model, which leads to convenient selection of the tuning parameter. Three published array-CGH datasets are used to demonstrate our approach. AVAILABILITY: R code are available at http://www.stat.lsa.umich.edu/~jizhu/code/cgh/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

16.
We present a graphical measure of assessing the explanatory power of regression models with a binary response. The binary regression quantile plot and an area defined by it are used for the visual comparison and ordering of nested binary response regression models. The plot shows how well various models explain the data. Two data sets are analyzed and the area representing the fit of a model is shown to agree with the usual likelihood ratio test.  相似文献   

17.
Tan M  Qu Y  Rao JS 《Biometrics》1999,55(1):258-263
The marginal regression model offers a useful alternative to conditional approaches to analyzing binary data (Liang, Zeger, and Qaqish, 1992, Journal of the Royal Statistical Society, Series B 54, 3-40). Instead of modelling the binary data directly as do Liang and Zeger (1986, Biometrika 73, 13-22), the parametric marginal regression model developed by Qu et al. (1992, Biometrics 48, 1095-1102) assumes that there is an underlying multivariate normal vector that gives rise to the observed correlated binary outcomes. Although this parametric approach provides a flexible way to model different within-cluster correlation structures and does not restrict the parameter space, it is of interest to know how robust the parameter estimates are with respect to choices of the latent distribution. We first extend the latent modelling to include multivariate t-distributed latent vectors and assess the robustness in this class of distributions. Then we show through a simulation that the parameter estimates are robust with respect to the latent distribution even if latent distribution is skewed. In addtion to this empirical evidence for robustness, we show through the iterative algorithm that the robustness of the regression coefficents with respect to misspecifications of covariance structure in Liang and Zeger's model in fact indicates robustness with respect to underlying distributional assumptions of the latent vector in the latent variable model.  相似文献   

18.
In this paper, we propose a frequentist model averaging method for quantile regression with high-dimensional covariates. Although research on these subjects has proliferated as separate approaches, no study has considered them in conjunction. Our method entails reducing the covariate dimensions through ranking the covariates based on marginal quantile utilities. The second step of our method implements model averaging on the models containing the covariates that survive the screening of the first step. We use a delete-one cross-validation method to select the model weights, and prove that the resultant estimator possesses an optimal asymptotic property uniformly over any compact (0,1) subset of the quantile indices. Our proof, which relies on empirical process theory, is arguably more challenging than proofs of similar results in other contexts owing to the high-dimensional nature of the problem and our relaxation of the conventional assumption of the weights summing to one. Our investigation of finite-sample performance demonstrates that the proposed method exhibits very favorable properties compared to the least absolute shrinkage and selection operator (LASSO) and smoothly clipped absolute deviation (SCAD) penalized regression methods. The method is applied to a microarray gene expression data set.  相似文献   

19.
Roy J  Lin X 《Biometrics》2000,56(4):1047-1054
Multiple outcomes are often used to properly characterize an effect of interest. This paper proposes a latent variable model for the situation where repeated measures over time are obtained on each outcome. These outcomes are assumed to measure an underlying quantity of main interest from different perspectives. We relate the observed outcomes using regression models to a latent variable, which is then modeled as a function of covariates by a separate regression model. Random effects are used to model the correlation due to repeated measures of the observed outcomes and the latent variable. An EM algorithm is developed to obtain maximum likelihood estimates of model parameters. Unit-specific predictions of the latent variables are also calculated. This method is illustrated using data from a national panel study on changes in methadone treatment practices.  相似文献   

20.
高慧淋  董利虎  李凤日 《生态学杂志》2016,27(11):3420-3426
基于东北地区378块固定样地和415块临时样地的调查数据和Reineke方程,利用线性分位数回归技术建立了不同分位点(τ=0.90、0.95、0.99)下的长白落叶松人工林最大林分密度与林木平均胸径的关系模型,选出拟合长白落叶松人工林最大密度线的最优模型. 利用人为选取最大的拟合数据,采用最小二乘(OLS)和最大似然(ML)回归同时建立最大密度线模型. 采用极值统计理论的广义Pareto模型推算现实林分特定径阶的极限最大株数,进一步建立极限密度线模型. 将线性分位数回归模型与其他方法进行对比.结果表明: 在全部径阶范围内选取5个最大数据点拟合的方法能够得到现实林分的最大密度线,选取的样点过多会使模拟结果偏离最大密度线,且ML法要优于OLS法. 分位点为0.99的线性分位数回归模型能够取得与ML接近的拟合结果,但分位数回归模型参数的估计结果更稳定. 人为选取拟合数据具有一定的人为性,最终选取分位点为0.99的分位数回归模型为拟合最大密度线的最优模型,参数估计结果为k=11.790、β=-1.586,极限密度线模型的参数估计结果为k=11.820、β=-1.594. 所确定的极限密度线位置略高于最大密度线,但二者差异不明显. 由固定样地数据的验证结果可知,所建立的最大林分密度线及极限密度线能够对现实林分的最大密度及极限密度进行预测,为长白落叶松人工林的合理经营提供依据.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号