首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
MOTIVATION: It is important to predict the outcome of patients with diffuse large-B-cell lymphoma after chemotherapy, since the survival rate after treatment of this common lymphoma disease is <50%. Both clinically based outcome predictors and the gene expression-based molecular factors have been proposed independently in disease prognosis. However combining the high-dimensional genomic data and the clinically relevant information to predict disease outcome is challenging. RESULTS: We describe an integrated clinicogenomic modeling approach that combines gene expression profiles and the clinically based International Prognostic Index (IPI) for personalized prediction in disease outcome. Dimension reduction methods are proposed to produce linear combinations of gene expressions, while taking into account clinical IPI information. The extracted summary measures capture all the regression information of the censored survival phenotype given both genomic and clinical data, and are employed as covariates in the subsequent survival model formulation. A case study of diffuse large-B-cell lymphoma data, as well as Monte Carlo simulations, both demonstrate that the proposed integrative modeling improves the prediction accuracy, delivering predictions more accurate than those achieved by using either clinical data or molecular predictors alone.  相似文献   

2.
Summary In functional data classification, functional observations are often contaminated by various systematic effects, such as random batch effects caused by device artifacts, or fixed effects caused by sample‐related factors. These effects may lead to classification bias and thus should not be neglected. Another issue of concern is the selection of functions when predictors consist of multiple functions, some of which may be redundant. The above issues arise in a real data application where we use fluorescence spectroscopy to detect cervical precancer. In this article, we propose a Bayesian hierarchical model that takes into account random batch effects and selects effective functions among multiple functional predictors. Fixed effects or predictors in nonfunctional form are also included in the model. The dimension of the functional data is reduced through orthonormal basis expansion or functional principal components. For posterior sampling, we use a hybrid Metropolis–Hastings/Gibbs sampler, which suffers slow mixing. An evolutionary Monte Carlo algorithm is applied to improve the mixing. Simulation and real data application show that the proposed model provides accurate selection of functional predictors as well as good classification.  相似文献   

3.
When modeling longitudinal biomedical data, often dimensionality reduction as well as dynamic modeling in the resulting latent representation is needed. This can be achieved by artificial neural networks for dimension reduction and differential equations for dynamic modeling of individual-level trajectories. However, such approaches so far assume that parameters of individual-level dynamics are constant throughout the observation period. Motivated by an application from psychological resilience research, we propose an extension where different sets of differential equation parameters are allowed for observation subperiods. Still, estimation for intra-individual subperiods is coupled for being able to fit the model also with a relatively small dataset. We subsequently derive prediction targets from individual dynamic models of resilience in the application. These serve as outcomes for predicting resilience from characteristics of individuals, measured at baseline and a follow-up time point, and selecting a small set of important predictors. Our approach is seen to successfully identify individual-level parameters of dynamic models that allow to stably select predictors, that is, resilience factors. Furthermore, we can identify those characteristics of individuals that are the most promising for updates at follow-up, which might inform future study design. This underlines the usefulness of our proposed deep dynamic modeling approach with changes in parameters between observation subperiods.  相似文献   

4.
It has been hypothesized that mechanical risk factors may be used to predict future atherosclerotic plaque rupture. Truly predictive methods for plaque rupture and methods to identify the best predictor(s) from all the candidates are lacking in the literature. A novel combination of computational and statistical models based on serial magnetic resonance imaging (MRI) was introduced to quantify sensitivity and specificity of mechanical predictors to identify the best candidate for plaque rupture site prediction. Serial in vivo MRI data of carotid plaque from one patient was acquired with follow-up scan showing ulceration. 3D computational fluid-structure interaction (FSI) models using both baseline and follow-up data were constructed and plaque wall stress (PWS) and strain (PWSn) and flow maximum shear stress (FSS) were extracted from all 600 matched nodal points (100 points per matched slice, baseline matching follow-up) on the lumen surface for analysis. Each of the 600 points was marked "ulcer" or "nonulcer" using follow-up scan. Predictive statistical models for each of the seven combinations of PWS, PWSn, and FSS were trained using the follow-up data and applied to the baseline data to assess their sensitivity and specificity using the 600 data points for ulcer predictions. Sensitivity of prediction is defined as the proportion of the true positive outcomes that are predicted to be positive. Specificity of prediction is defined as the proportion of the true negative outcomes that are correctly predicted to be negative. Using probability 0.3 as a threshold to infer ulcer occurrence at the prediction stage, the combination of PWS and PWSn provided the best predictive accuracy with (sensitivity, specificity)?=?(0.97, 0.958). Sensitivity and specificity given by PWS, PWSn, and FSS individually were (0.788, 0.968), (0.515, 0.968), and (0.758, 0.928), respectively. The proposed computational-statistical process provides a novel method and a framework to assess the sensitivity and specificity of various risk indicators and offers the potential to identify the optimized predictor for plaque rupture using serial MRI with follow-up scan showing ulceration as the gold standard for method validation. While serial MRI data with actual rupture are hard to acquire, this single-case study suggests that combination of multiple predictors may provide potential improvement to existing plaque assessment schemes. With large-scale patient studies, this predictive modeling process may provide more solid ground for rupture predictor selection strategies and methods for image-based plaque vulnerability assessment.  相似文献   

5.
Cheung YK 《Biometrics》2005,61(2):524-531
When comparing follow-up measurements from two independent populations, missing records may arise due to censoring by events whose occurrence is associated with baseline covariates. In these situations, inferences based only on the completely followed observations may be biased if the follow-up measurements and the covariates are correlated. This article describes exact inference for a class of modified U-statistics under covariate-dependent dropouts. The method involves weighing each permutation according to the retention probabilities, and thus requires estimation of the missing data mechanism. The proposed procedure is nonparametric in that no distributional assumption is necessary for the outcome variables and the missingness patterns. Monte Carlo approximation by the Gibbs sampler is proposed, and is shown to be fast and accurate via simulation. The method is illustrated in two small data sets for which asymptotic inferential procedures may not be appropriate.  相似文献   

6.
Wildfires are impactful natural disasters, creating a significant impact across many rural communities. Predicting wildfire probability provides authorities with invaluable information to take preventive measures at the early stages. This study establishes Bayesian modelling for predicting the wildfire event probability based on a set of environmental predictors and forest vulnerability, represented by the normalized difference vegetation index. Prior information about the impact of these predictors on the likelihood of wildfire is available in the reports on the past major wildfire events. In that sense, the use of prior information in the Bayesian models has the potential to provide accurate predictions for the wildfire probability. Moreover, the relationship between the predictors creates mediating effects on the likelihood of a wildfire event. A multivariate prior distribution in the Bayesian modelling can capture the mediating effects. In this study, Bayesian models with informative and noninformative priors are considered with independent and multivariate prior distributions to utilize the available prior information and handle the mediating effects between the predictors using the normalized difference vegetation index data provided by Google Earth Engine. Nine years of data were gathered across 9841 sampled areas in a forested land of Australia. Modelling results concluded that forest vulnerability is found to be the dominant predictor of wildfire probability. This modelling can help create a Wildfire Warning Index based on climate data and forest vulnerability measurements, enabling preventative actions in high-risk and targeted areas.  相似文献   

7.
Sparse sufficient dimension reduction   总被引:2,自引:0,他引:2  
Li  Lexin 《Biometrika》2007,94(3):603-613
Existing sufficient dimension reduction methods suffer fromthe fact that each dimension reduction component is a linearcombination of all the original predictors, so that it is difficultto interpret the resulting estimates. We propose a unified estimationstrategy, which combines a regression-type formulation of sufficientdimension reduction methods and shrinkage estimation, to producesparse and accurate solutions. The method can be applied tomost existing sufficient dimension reduction methods such assliced inverse regression, sliced average variance estimationand principal Hessian directions. We demonstrate the effectivenessof the proposed method by both simulations and real data analysis.  相似文献   

8.
Gen Li  Yan Li  Kun Chen 《Biometrics》2023,79(2):1318-1329
Compositional data reside in a simplex and measure fractions or proportions of parts to a whole. Most existing regression methods for such data rely on log-ratio transformations that are inadequate or inappropriate in modeling high-dimensional data with excessive zeros and hierarchical structures. Moreover, such models usually lack a straightforward interpretation due to the interrelation between parts of a composition. We develop a novel relative-shift regression framework that directly uses proportions as predictors. The new framework provides a paradigm shift for regression analysis with compositional predictors and offers a superior interpretation of how shifting concentration between parts affects the response. New equi-sparsity and tree-guided regularization methods and an efficient smoothing proximal gradient algorithm are developed to facilitate feature aggregation and dimension reduction in regression. A unified finite-sample prediction error bound is derived for the proposed regularized estimators. We demonstrate the efficacy of the proposed methods in extensive simulation studies and a real gut microbiome study. Guided by the taxonomy of the microbiome data, the framework identifies important taxa at different taxonomic levels associated with the neurodevelopment of preterm infants.  相似文献   

9.
Dimension reduction methods have been proposed for regression analysis with predictors of high dimension, but have not received much attention on the problems with censored data. In this article, we present an iterative imputed spline approach based on principal Hessian directions (PHD) for censored survival data in order to reduce the dimension of predictors without requiring a prespecified parametric model. Our proposal is to replace the right-censored survival time with its conditional expectation for adjusting the censoring effect by using the Kaplan-Meier estimator and an adaptive polynomial spline regression in the residual imputation. A sparse estimation strategy is incorporated in our approach to enhance the interpretation of variable selection. This approach can be implemented in not only PHD, but also other methods developed for estimating the central mean subspace. Simulation studies with right-censored data are conducted for the imputed spline approach to PHD (IS-PHD) in comparison with two methods of sliced inverse regression, minimum average variance estimation, and naive PHD in ignorance of censoring. The results demonstrate that the proposed IS-PHD method is particularly useful for survival time responses approximating symmetric or bending structures. Illustrative applications to two real data sets are also presented.  相似文献   

10.
Functional Generalized Linear Models with Images as Predictors   总被引:1,自引:0,他引:1  
Summary .  Functional principal component regression (FPCR) is a promising new method for regressing scalar outcomes on functional predictors. In this article, we present a theoretical justification for the use of principal components in functional regression. FPCR is then extended in two directions: from linear to the generalized linear modeling, and from univariate signal predictors to high-resolution image predictors. We show how to implement the method efficiently by adapting generalized additive model technology to the functional regression context. A technique is proposed for estimating simultaneous confidence bands for the coefficient function; in the neuroimaging setting, this yields a novel means to identify brain regions that are associated with a clinical outcome. A new application of likelihood ratio testing is described for assessing the null hypothesis of a constant coefficient function. The performance of the methodology is illustrated via simulations and real data analyses with positron emission tomography images as predictors.  相似文献   

11.
Large efforts have been deployed in developing methods to estimate methane emissions from cattle. For large scale applications, accurate and inexpensive methane predictors are required. Within a livestock precision farming context, the objective of this work was to integrate real-time data on animal feeding behaviour with an in silico model for predicting the individual dynamic pattern of methane emission in cattle. The integration of real-time data with a mathematical model to predict variables that are not directly measured constitutes a software sensor. We developed a dynamic parsimonious grey-box model that uses as predictor variables either dry matter intake (DMI) or the intake time (IT). The model is described by ordinary differential equations.Model building was supported by experimental data of methane emissions from respiration chambers. The data set comes from a study with finishing beef steers (cross-bred Charolais and purebred Luing finishing). Dry matter intake and IT were recorded using feed bins. For research purposes, in this work, our software sensor operated off-line. That is, the predictor variables (DMI, IT) were extracted from the recorded data (rather than from an on-line sensor). A total of 37 individual dynamic patterns of methane production were analyzed. Model performance was assessed by concordance analysis between the predicted methane output and the methane measured in respiration chambers. The model predictors DMI and IT performed similarly with a Lin’s concordance correlation coefficient (CCC) of 0.78 on average. When predicting the daily methane production, the CCC was 0.99 for both DMI and IT predictors. Consequently, on the basis of concordance analysis, our model performs very well compared with reported literature results for methane proxies and predictive models. As IT measurements are easier to obtain than DMI measurements, this study suggests that a software sensor that integrates our in silico model with a real-time sensor providing accurate IT measurements is a viable solution for predicting methane output in a large scale context.  相似文献   

12.
Comparative genomic hybridizations (CGH) using microarrays are performed with bacteria in order to determine the level of genomic similarity between various strains. The microarrays applied in CGH experiments are constructed on the basis of the genome sequence of one strain, which is used as a control, or reference, in each experiment. A strain being compared with the known strain is called the unknown strain. The ratios of fluorescent intensities obtained from the spots on the microarrays can be used to determine which genes are divergent in the unknown strain, as well as to predict the copy number of actual genes in the unknown strain. In this paper, we focus on the prediction of gene copy number based on data from CGH experiments. We assumed a linear connection between the log2 of the copy number and the observed log2-ratios, then predictors based on the factor analysis model and the linear random model were proposed in an attempt to identify the copy numbers. These predictors were compared to using the ratio of the intensities directly. Simulations indicated that the proposed predictors improved the prediction of the copy number in most situations. The predictors were applied on CGH data obtained from experiments with Enterococcus faecalis strains in order to determine copy number of relevant genes in five different strains.  相似文献   

13.
Prediction of conversion from mild cognitive impairment (MCI) to Alzheimer's disease (AD) is of major interest in AD research. A large number of potential predictors have been proposed, with most investigations tending to examine one or a set of related predictors. In this study, we simultaneously examined multiple features from different modalities of data, including structural magnetic resonance imaging (MRI) morphometry, cerebrospinal fluid (CSF) biomarkers and neuropsychological and functional measures (NMs), to explore an optimal set of predictors of conversion from MCI to AD in an Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. After FreeSurfer-derived MRI feature extraction, CSF and NM feature collection, feature selection was employed to choose optimal subsets of features from each modality. Support vector machine (SVM) classifiers were then trained on normal control (NC) and AD participants. Testing was conducted on MCIc (MCI individuals who have converted to AD within 24 months) and MCInc (MCI individuals who have not converted to AD within 24 months) groups. Classification results demonstrated that NMs outperformed CSF and MRI features. The combination of selected NM, MRI and CSF features attained an accuracy of 67.13%, a sensitivity of 96.43%, a specificity of 48.28%, and an AUC (area under curve) of 0.796. Analysis of the predictive values of MCIc who converted at different follow-up evaluations showed that the predictive values were significantly different between individuals who converted within 12 months and after 12 months. This study establishes meaningful multivariate predictors composed of selected NM, MRI and CSF measures which may be useful and practical for clinical diagnosis.  相似文献   

14.
We propose a general class of nonlinear transformation models for analyzing censored survival data, of which the nonlinear proportional hazards and proportional odds models are special cases. A cubic smoothing spline-based component-wise boosting algorithm is derived to estimate covariate effects nonparametrically using the gradient of the marginal likelihood, that is computed using importance sampling. The proposed method can be applied to survival data with high-dimensional covariates, including the case when the sample size is smaller than the number of predictors. Empirical performance of the proposed method is evaluated via simulations and analysis of a microarray survival data.  相似文献   

15.
Parental education and maternal intelligence are well-known predictors of child IQ. However, the literature regarding other factors that may contribute to individual differences in IQ is inconclusive. The aim of this study was to examine the contribution of a number of variables whose predictive status remain unclarified, in a sample of basically healthy children with a low rate of pre- and postnatal complications. 1,782 5-year-old children sampled from the Danish National Birth Cohort (2003–2007) were assessed with a short form of the Wechsler Preschool and Primary Scale of Intelligence – Revised. Information on parental characteristics, pregnancy and birth factors, postnatal influences, and postnatal growth was collected during pregnancy and at follow-up. A model including study design variables and child’s sex explained 7% of the variance in IQ, while parental education and maternal IQ increased the explained variance to 24%. Other predictors were parity, maternal BMI, birth weight, breastfeeding, and the child’s head circumference and height at follow-up. These variables, however, only increased the explained variance to 29%. The results suggest that parental education and maternal IQ are major predictors of IQ and should be included routinely in studies of cognitive development. Obstetrical and postnatal factors also predict IQ, but their contribution may be of comparatively limited magnitude.  相似文献   

16.
In clinical research and practice, landmark models are commonly used to predict the risk of an adverse future event, using patients' longitudinal biomarker data as predictors. However, these data are often observable only at intermittent visits, making their measurement times irregularly spaced and unsynchronized across different subjects. This poses challenges to conducting dynamic prediction at any post-baseline time. A simple solution is the last-value-carry-forward method, but this may result in bias for the risk model estimation and prediction. Another option is to jointly model the longitudinal and survival processes with a shared random effects model. However, when dealing with multiple biomarkers, this approach often results in high-dimensional integrals without a closed-form solution, and thus the computational burden limits its software development and practical use. In this article, we propose to process the longitudinal data by functional principal component analysis techniques, and then use the processed information as predictors in a class of flexible linear transformation models to predict the distribution of residual time-to-event occurrence. The measurement schemes for multiple biomarkers are allowed to be different within subject and across subjects. Dynamic prediction can be performed in a real-time fashion. The advantages of our proposed method are demonstrated by simulation studies. We apply our approach to the African American Study of Kidney Disease and Hypertension, predicting patients' risk of kidney failure or death by using four important longitudinal biomarkers for renal functions.  相似文献   

17.

Background

The aim of the study was to determine predictors that influence health-related quality of life (HRQOL) in a large cohort of elderly diabetes patients from primary care over a follow-up period of five years.

Methods and Results

At the baseline measurement of the ESTHER cohort study (2000–2002), 1375 out of 9953 participants suffered from diabetes (13.8%). 1057 of these diabetes patients responded to the second-follow up (2005–2007). HRQOL at baseline and follow-up was measured using the SF-12; mental component scores (MCS) and physical component scores (PCS) were calculated; multiple linear regression models were used to determine predictors of HRQOL at follow-up. As possible predictors for HRQOL, the following baseline variables were examined: treatment with insulin, glycated hemoglobin (HbA1c), number of diabetes related complications, number of comorbid diseases, Body-Mass-Index (BMI), depression and HRQOL. Regression analyses were adjusted for sociodemographic variables and smoking status. 1034 patients (97.8%) responded to the SF-12 both at baseline and after five years and were therefore included in the study. Regression analyses indicated that significant predictors of decreased MCS were a lower HRQOL, a higher number of diabetes related complications and a reported history of depression at baseline. Complications, BMI, smoking and HRQOL at baseline significantly predicted PCS at the five year follow-up.

Conclusions

Our findings expand evidence from previous cross-sectional data indicating that in elderly diabetes patients, depression, diabetes related complications, smoking and BMI are temporally predictive for HRQOL.  相似文献   

18.
Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.  相似文献   

19.
Species data held in museum and herbaria, survey data and opportunistically observed data are a substantial information resource. A key challenge in using these data is the uncertainty about where an observation is located. This is important when the data are used for species distribution modelling (SDM), because the coordinates are used to extract the environmental variables and thus, positional error may lead to inaccurate estimation of the species–environment relationship. The magnitude of this effect is related to the level of spatial autocorrelation in the environmental variables. Using local spatial association can be relevant because it can lead to the identification of the specific occurrence records that cause the largest drop in SDM accuracy. Therefore, in this study, we tested whether the SDM predictions are more affected by positional uncertainty originating from locations that have lower local spatial association in their predictors. We performed this experiment for Spain and the Netherlands, using simulated datasets derived from well known species distribution models (SDMs). We used the K statistic to quantify the local spatial association in the predictors at each species occurrence location. A probabilistic approach using Monte Carlo simulations was employed to introduce the error in the species locations. The results revealed that positional uncertainty in species occurrence data at locations with low local spatial association in predictors reduced the prediction accuracy of the SDMs. We propose that local spatial association is a way to identify the species occurrence records that require treatment for positional uncertainty. We also developed and present a tool in the R environment to target observations that are likely to create error in the output from SDMs as a result of positional uncertainty.  相似文献   

20.
The eye-estimation method is widely used in practice. Several agronomic and biological measures are currently estimated by this method. If a simple linear regression is the kernel model a shrinkage technique can be used for correcting the bias associated with this method. Two predictors of the population total are proposed and the corresponding model-based errors are deduced. A simulation study fixes the behaviour of the predictors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号