共查询到20条相似文献,搜索用时 15 毫秒
1.
Joanne C. Beer Howard J. Aizenstein Stewart J. Anderson Robert T. Krafty 《Biometrics》2019,75(4):1299-1309
Predicting clinical variables from whole‐brain neuroimages is a high‐dimensional problem that can potentially benefit from feature selection or extraction. Penalized regression is a popular embedded feature selection method for high‐dimensional data. For neuroimaging applications, spatial regularization using the or norm of the image gradient has shown good performance, yielding smooth solutions in spatially contiguous brain regions. Enormous resources have been devoted to establishing structural and functional brain connectivity networks that can be used to define spatially distributed yet related groups of voxels. We propose using the fused sparse group lasso (FSGL) penalty to encourage structured, sparse, and interpretable solutions by incorporating prior information about spatial and group structure among voxels. We present optimization steps for FSGL penalized regression using the alternating direction method of multipliers algorithm. With simulation studies and in application to real functional magnetic resonance imaging data from the Autism Brain Imaging Data Exchange, we demonstrate conditions under which fusion and group penalty terms together outperform either of them alone. 相似文献
2.
There has been a continuing interest in approaches that analyze pairwise locus-by-locus (epistasis) interactions using multilocus association models in genome-wide data sets. In this paper, we suggest an approach that uses sure independence screening to first lower the dimension of the problem by considering the marginal importance of each interaction term within the huge loop. Subsequent multilocus association steps are executed using an extended Bayesian least absolute shrinkage and selection operator (LASSO) model and fast generalized expectation-maximization estimation algorithms. The potential of this approach is illustrated and compared with PLINK software using data examples where phenotypes have been simulated conditionally on marker data from the Quantitative Trait Loci Mapping and Marker Assisted Selection (QTLMAS) Workshop 2008 and real pig data sets. 相似文献
3.
Summary . In Li and Yin (2008, Biometrics 64, 124–131), a ridge SIR estimator is introduced as the solution of a minimization problem and computed thanks to an alternating least-squares algorithm. This methodology reveals good performance in practice. In this note, we focus on the theoretical properties of the estimator. It is shown that the minimization problem is degenerated in the sense that only two situations can occur: Either the ridge SIR estimator does not exist or it is zero. 相似文献
4.
5.
The distribution of health care payments to insurance plans has substantial consequences for social policy. Risk adjustment formulas predict spending in health insurance markets in order to provide fair benefits and health care coverage for all enrollees, regardless of their health status. Unfortunately, current risk adjustment formulas are known to underpredict spending for specific groups of enrollees leading to undercompensated payments to health insurers. This incentivizes insurers to design their plans such that individuals in undercompensated groups will be less likely to enroll, impacting access to health care for these groups. To improve risk adjustment formulas for undercompensated groups, we expand on concepts from the statistics, computer science, and health economics literature to develop new fair regression methods for continuous outcomes by building fairness considerations directly into the objective function. We additionally propose a novel measure of fairness while asserting that a suite of metrics is necessary in order to evaluate risk adjustment formulas more fully. Our data application using the IBM MarketScan Research Databases and simulation studies demonstrates that these new fair regression methods may lead to massive improvements in group fairness (eg, 98%) with only small reductions in overall fit (eg, 4%). 相似文献
6.
Summary . In high-dimensional data analysis, sliced inverse regression (SIR) has proven to be an effective dimension reduction tool and has enjoyed wide applications. The usual SIR, however, cannot work with problems where the number of predictors, p , exceeds the sample size, n , and can suffer when there is high collinearity among the predictors. In addition, the reduced dimensional space consists of linear combinations of all the original predictors and no variable selection is achieved. In this article, we propose a regularized SIR approach based on the least-squares formulation of SIR. The L 2 regularization is introduced, and an alternating least-squares algorithm is developed, to enable SIR to work with n < p and highly correlated predictors. The L 1 regularization is further introduced to achieve simultaneous reduction estimation and predictor selection. Both simulations and the analysis of a microarray expression data set demonstrate the usefulness of the proposed method. 相似文献
7.
Case-control studies of mobile phones are commonly based on retrospective, self-reported exposure information, which are often characterized as involving substantial uncertainty concerning data validity. We assessed the validity of self-reported mobile phone use and developed a statistical model to account for the over-reporting of exposure. We collected information on mobile phone use from 70 volunteers using two sources of data: self-report in an interview and network operator records. We used regression models to obtain bias-corrected estimates of exposure. A correlation coefficient of 0.71 was obtained between the self-reported and the network operators' data on average calling time (log-transformed minutes per month). A simple linear regression model, where the duration of calls acquired from network operators is explained with the self-reported duration fitted the data reasonably well (adjusted R(2) 0.51). The constant term was 2.71 and the regression coefficient 0.49 (logarithmic scale). No significant improvement in the model fit was achieved by including potential predictors of accuracy in self-reported exposure estimates, such as the pattern of mobile phone use, the modality of response to the questionnaire or demographic characteristics. Overestimation in self-reported intensity of mobile phone use can be accounted for by the use of regression calibration. The estimates obtained in our study may not be applicable in other contexts, but similar methods could be used to reduce bias in other studies. 相似文献
8.
In data analysis using dimension reduction methods, the main goal is to summarize how the response is related to the covariates through a few linear combinations. One key issue is to determine the number of independent, relevant covariate combinations, which is the dimension of the sufficient dimension reduction (SDR) subspace. In this work, we propose an easily-applied approach to conduct inference for the dimension of the SDR subspace, based on augmentation of the covariate set with simulated pseudo-covariates. Applying the partitioning principal to the possible dimensions, we use rigorous sequential testing to select the dimensionality, by comparing the strength of the signal arising from the actual covariates to that appearing to arise from the pseudo-covariates. We show that under a “uniform direction” condition, our approach can be used in conjunction with several popular SDR methods, including sliced inverse regression. In these settings, the test statistic asymptotically follows a beta distribution and therefore is easily calibrated. Moreover, the family-wise type I error rate of our sequential testing is rigorously controlled. Simulation studies and an analysis of newborn anthropometric data demonstrate the robustness of the proposed approach, and indicate that the power is comparable to or greater than the alternatives. 相似文献
9.
Hengjian Cui Yanyan Liu Guangcai Mao Jing Zhang 《Biometrical journal. Biometrische Zeitschrift》2023,65(3):2200089
How to select the active variables that have significant impact on the event of interest is a very important and meaningful problem in the statistical analysis of ultrahigh-dimensional data. In many applications, researchers often know that a certain set of covariates are active variables from some previous investigations and experiences. With the knowledge of the important prior knowledge of active variables, we propose a model-free conditional screening procedure for ultrahigh dimensional survival data based on conditional distance correlation. The proposed procedure can effectively detect the hidden active variables that are jointly important but are weakly correlated with the response. Moreover, it performs well when covariates are strongly correlated with each other. We establish the sure screening property and the ranking consistency of the proposed method and conduct extensive simulation studies, which suggests that the proposed procedure works well for practical situations. Then, we illustrate the new approach through a real dataset from the diffuse large-B-cell lymphoma study S1 . 相似文献
10.
Dimension reduction is central to an analysis of data with many predictors. Sufficient dimension reduction aims to identify the smallest possible number of linear combinations of the predictors, called the sufficient predictors, that retain all of the information in the predictors about the response distribution. In this article, we propose a Bayesian solution for sufficient dimension reduction. We directly model the response density in terms of the sufficient predictors using a finite mixture model. This approach is computationally efficient and offers a unified framework to handle categorical predictors, missing predictors, and Bayesian variable selection. We illustrate the method using both a simulation study and an analysis of an HIV data set. 相似文献
11.
Qiang Hu Liang Zhu Yanyan Liu Jianguo Sun Deo Kumar Srivastava Leslie L. Robison 《Biometrical journal. Biometrische Zeitschrift》2020,62(8):1909-1925
For the analysis of ultrahigh-dimensional data, the first step is often to perform screening and feature selection to effectively reduce the dimensionality while retaining all the active or relevant variables with high probability. For this, many methods have been developed under various frameworks but most of them only apply to complete data. In this paper, we consider an incomplete data situation, case II interval-censored failure time data, for which there seems to be no screening procedure. Basing on the idea of cumulative residual, a model-free or nonparametric method is developed and shown to have the sure independent screening property. In particular, the approach is shown to tend to rank the active variables above the inactive ones in terms of their association with the failure time of interest. A simulation study is conducted to demonstrate the usefulness of the proposed method and, in particular, indicates that it works well with general survival models and is capable of capturing the nonlinear covariates with interactions. Also the approach is applied to a childhood cancer survivor study that motivated this investigation. 相似文献
12.
J. Hoogland;T. P. A. Debray;M. J. Crowther;R. D. Riley;J. IntHout;J. B. Reitsma;A. H. Zwinderman; 《Biometrical journal. Biometrische Zeitschrift》2024,66(1):2200319
We propose to combine the benefits of flexible parametric survival modeling and regularization to improve risk prediction modeling in the context of time-to-event data. Thereto, we introduce ridge, lasso, elastic net, and group lasso penalties for both log hazard and log cumulative hazard models. The log (cumulative) hazard in these models is represented by a flexible function of time that may depend on the covariates (i.e., covariate effects may be time-varying). We show that the optimization problem for the proposed models can be formulated as a convex optimization problem and provide a user-friendly R implementation for model fitting and penalty parameter selection based on cross-validation. Simulation study results show the advantage of regularization in terms of increased out-of-sample prediction accuracy and improved calibration and discrimination of predicted survival probabilities, especially when sample size was relatively small with respect to model complexity. An applied example illustrates the proposed methods. In summary, our work provides both a foundation for and an easily accessible implementation of regularized parametric survival modeling and suggests that it improves out-of-sample prediction performance. 相似文献
13.
Yi Zhao Bingkai Wang Chin-Fu Liu Andreia V. Faria Michael I. Miller Brian S. Caffo Xi Luo 《Biometrics》2023,79(3):2333-2345
Brain segmentation at different levels is generally represented as hierarchical trees. Brain regional atrophy at specific levels was found to be marginally associated with Alzheimer's disease outcomes. In this study, we propose an ℓ1-type regularization for predictors that follow a hierarchical tree structure. Considering a tree as a directed acyclic graph, we interpret the model parameters from a path analysis perspective. Under this concept, the proposed penalty regulates the total effect of each predictor on the outcome. With regularity conditions, it is shown that under the proposed regularization, the estimator of the model coefficient is consistent in ℓ2-norm and the model selection is also consistent. When applied to a brain sMRI dataset acquired from the Alzheimer's Disease Neuroimaging Initiative (ADNI), the proposed approach identifies brain regions where atrophy in these regions demonstrates the declination in memory. With regularization on the total effects, the findings suggest that the impact of atrophy on memory deficits is localized from small brain regions, but at various levels of brain segmentation. Data used in preparation of this paper were obtained from the ADNI database. 相似文献
14.
When it comes to fitting simple allometric slopes through measurement data, evolutionary biologists have been torn between regression methods. On the one hand, there is the ordinary least squares (OLS) regression, which is commonly used across many disciplines of biology to fit lines through data, but which has a reputation for underestimating slopes when measurement error is present. On the other hand, there is the reduced major axis (RMA) regression, which is often recommended as a substitute for OLS regression in studies of allometry, but which has several weaknesses of its own. Here, we review statistical theory as it applies to evolutionary biology and studies of allometry. We point out that the concerns that arise from measurement error for OLS regression are small and straightforward to deal with, whereas RMA has several key properties that make it unfit for use in the field of allometry. The recommended approach for researchers interested in allometry is to use OLS regression on measurements taken with low (but realistically achievable) measurement error. If measurement error is unavoidable and relatively large, it is preferable to correct for slope attenuation rather than to turn to RMA regression, or to take the expected amount of attenuation into account when interpreting the data. 相似文献
15.
Sarit Agami David M. Zucker Donna Spiegelman 《Biometrical journal. Biometrische Zeitschrift》2020,62(5):1139-1163
The Cox regression model is a popular model for analyzing the relationship between a covariate vector and a survival endpoint. The standard Cox model assumes a constant covariate effect across the entire covariate domain. However, in many epidemiological and other applications, the covariate of main interest is subject to a threshold effect: a change in the slope at a certain point within the covariate domain. Often, the covariate of interest is subject to some degree of measurement error. In this paper, we study measurement error correction in the case where the threshold is known. Several bias correction methods are examined: two versions of regression calibration (RC1 and RC2, the latter of which is new), two methods based on the induced relative risk under a rare event assumption (RR1 and RR2, the latter of which is new), a maximum pseudo-partial likelihood estimator (MPPLE), and simulation-extrapolation (SIMEX). We develop the theory, present simulations comparing the methods, and illustrate their use on data concerning the relationship between chronic air pollution exposure to particulate matter PM10 and fatal myocardial infarction (Nurses Health Study (NHS)), and on data concerning the effect of a subject's long-term underlying systolic blood pressure level on the risk of cardiovascular disease death (Framingham Heart Study (FHS)). The simulations indicate that the best methods are RR2 and MPPLE. 相似文献
16.
17.
Sufficient dimension reduction (SDR) that effectively reduces the predictor dimension in regression has been popular in high‐dimensional data analysis. Under the presence of censoring, however, most existing SDR methods suffer. In this article, we propose a new algorithm to perform SDR with censored responses based on the quantile‐slicing scheme recently proposed by Kim et al. First, we estimate the conditional quantile function of the true survival time via the censored kernel quantile regression (Shin et al.) and then slice the data based on the estimated censored regression quantiles instead of the responses. Both simulated and real data analysis demonstrate promising performance of the proposed method. 相似文献
18.
The aim of this article is to develop optimal sufficient dimensionreduction methodology for the conditional mean in multivariateregression. The context is roughly the same as that of a relatedmethod by Cook & Setodji (2003), but the new method hasseveral advantages. It is asymptotically optimal in the sensedescribed herein and its test statistic for dimension alwayshas a chi-squared distribution asymptotically under the nullhypothesis. Additionally, the optimal method allows tests ofpredictor effects. A comparison of the two methods is provided. 相似文献
19.
In nutritional epidemiology, dietary intake assessed with a food frequency questionnaire is prone to measurement error. Ignoring the measurement error in covariates causes estimates to be biased and leads to a loss of power. In this paper, we consider an additive error model according to the characteristics of the European Prospective Investigation into Cancer and Nutrition (EPIC)‐InterAct Study data, and derive an approximate maximum likelihood estimation (AMLE) for covariates with measurement error under logistic regression. This method can be regarded as an adjusted version of regression calibration and can provide an approximate consistent estimator. Asymptotic normality of this estimator is established under regularity conditions, and simulation studies are conducted to empirically examine the finite sample performance of the proposed method. We apply AMLE to deal with measurement errors in some interested nutrients of the EPIC‐InterAct Study under a sensitivity analysis framework. 相似文献
20.
Xinying Fang;Shouhao Zhou; 《Biometrical journal. Biometrische Zeitschrift》2024,66(1):2200092
Quantifying drug potency, which requires an accurate estimation of dose–response relationship, is essential for drug development in biomedical research and life sciences. However, the standard estimation procedure of the median–effect equation to describe the dose–response curve is vulnerable to extreme observations in common experimental data. To facilitate appropriate statistical inference, many powerful estimation tools have been developed in R, including various dose–response packages based on the nonlinear least squares method with different optimization strategies. Recently, beta regression-based methods have also been introduced in estimation of the median–effect equation. In theory, they can overcome nonnormality, heteroscedasticity, and asymmetry and accommodate flexible robust frameworks and coefficients penalization. To identify a reliable estimation method(s) to estimate dose–response curves even with extreme observations, we conducted a comparative study to review 14 different tools in R and examine their robustness and efficiency via Monte Carlo simulation under a list of comprehensive scenarios. The simulation results demonstrate that penalized beta regression using the mgcv package outperforms other methods in terms of stable, accurate estimation, and reliable uncertainty quantification. 相似文献