首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 875 毫秒
1.
MOTIVATION: An important application of microarray technology is to relate gene expression profiles to various clinical phenotypes of patients. Success has been demonstrated in molecular classification of cancer in which the gene expression data serve as predictors and different types of cancer serve as a categorical outcome variable. However, there has been less research in linking gene expression profiles to the censored survival data such as patients' overall survival time or time to cancer relapse. It would be desirable to have models with good prediction accuracy and parsimony property. RESULTS: We propose to use the L(1) penalized estimation for the Cox model to select genes that are relevant to patients' survival and to build a predictive model for future prediction. The computational difficulty associated with the estimation in the high-dimensional and low-sample size settings can be efficiently solved by using the recently developed least-angle regression (LARS) method. Our simulation studies and application to real datasets on predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed procedure, which we call the LARS-Cox procedure, can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. The LARS-Cox regression gives better predictive performance than the L(2) penalized regression and a few other dimension-reduction based methods. CONCLUSIONS: We conclude that the proposed LARS-Cox procedure can be very useful in identifying genes relevant to survival phenotypes and in building a parsimonious predictive model that can be used for classifying future patients into clinically relevant high- and low-risk groups based on the gene expression profile and survival times of previous patients.  相似文献   

2.
An effective forecasting model for short-term load plays a significant role in promoting the management efficiency of an electric power system. This paper proposes a new forecasting model based on the improved neural networks with random weights (INNRW). The key is to introduce a weighting technique to the inputs of the model and use a novel neural network to forecast the daily maximum load. Eight factors are selected as the inputs. A mutual information weighting algorithm is then used to allocate different weights to the inputs. The neural networks with random weights and kernels (KNNRW) is applied to approximate the nonlinear function between the selected inputs and the daily maximum load due to the fast learning speed and good generalization performance. In the application of the daily load in Dalian, the result of the proposed INNRW is compared with several previously developed forecasting models. The simulation experiment shows that the proposed model performs the best overall in short-term load forecasting.  相似文献   

3.
Ding J  Wang JL 《Biometrics》2008,64(2):546-556
Summary .   In clinical studies, longitudinal biomarkers are often used to monitor disease progression and failure time. Joint modeling of longitudinal and survival data has certain advantages and has emerged as an effective way to mutually enhance information. Typically, a parametric longitudinal model is assumed to facilitate the likelihood approach. However, the choice of a proper parametric model turns out to be more elusive than models for standard longitudinal studies in which no survival endpoint occurs. In this article, we propose a nonparametric multiplicative random effects model for the longitudinal process, which has many applications and leads to a flexible yet parsimonious nonparametric random effects model. A proportional hazards model is then used to link the biomarkers and event time. We use B-splines to represent the nonparametric longitudinal process, and select the number of knots and degrees based on a version of the Akaike information criterion (AIC). Unknown model parameters are estimated through maximizing the observed joint likelihood, which is iteratively maximized by the Monte Carlo Expectation Maximization (MCEM) algorithm. Due to the simplicity of the model structure, the proposed approach has good numerical stability and compares well with the competing parametric longitudinal approaches. The new approach is illustrated with primary biliary cirrhosis (PBC) data, aiming to capture nonlinear patterns of serum bilirubin time courses and their relationship with survival time of PBC patients.  相似文献   

4.
Buckley–James (BJ) model is a typical semiparametric accelerated failure time model, which is closely related to the ordinary least squares method and easy to be constructed. However, traditional BJ model built on linearity assumption only captures simple linear relationships, while it has difficulty in processing nonlinear problems. To overcome this difficulty, in this paper, we develop a novel regression model for right-censored survival data within the learning framework of BJ model, basing on random survival forests (RSF), extreme learning machine (ELM), and L2 boosting algorithm. The proposed method, referred to as ELM-based BJ boosting model, employs RSF for covariates imputation first, then develops a new ensemble of ELMs—ELM-based boosting algorithm for regression by ensemble scheme of L2 boosting, and finally, uses the output function of the proposed ELM-based boosting model to replace the linear combination of covariates in BJ model. Due to fitting the logarithm of survival time with covariates by the nonparametric ELM-based boosting method instead of the least square method, the ELM-based BJ boosting model can capture both linear covariate effects and nonlinear covariate effects. In both simulation studies and real data applications, in terms of concordance index and integrated Brier sore, the proposed ELM-based BJ boosting model can outperform traditional BJ model, two kinds of BJ boosting models proposed by Wang et al., RSF, and Cox proportional hazards model.  相似文献   

5.
Zhou X  Yan L  Prows DR  Yang R 《Genomics》2011,97(6):379-385
As the two most popular models in survival analysis, the accelerated failure time (AFT) model can more easily fit survival data than the Cox proportional hazards model (PHM). In this study, we develop a general parametric AFT model for identifying survival trait loci, in which the flexible generalized F distribution, including many commonly used distributions as special cases, is specified as the baseline survival distribution. EM algorithm for maximum likelihood estimation of model parameters is given. Simulations are conducted to validate the flexibility and the utility of the proposed mapping procedure. In analyzing survival time following hyperoxic acute lung injury (HALI) of mice in an F(2) mating population, the generalized F distribution performed best among the six competing survival distributions and detected four QTLs controlling differential HALI survival.  相似文献   

6.
Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L1- and L2-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study.  相似文献   

7.

Background  

Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.  相似文献   

8.
Cancer survival is one of the most important measures to evaluate the effectiveness of treatment and early diagnosis. The ultimate goal of cancer research and patient care is the cure of cancer. As cancer treatments progress, cure becomes a reality for many cancers if patients are diagnosed early and get effective treatment. If a cure does exist for a certain type of cancer, it is useful to estimate the time of cure. For cancers that impose excess risk of mortality, it is informative to understand the difference in survival between cancer patients and the general cancer-free population. In population-based cancer survival studies, relative survival is the standard measure of excess mortality due to cancer. Cure is achieved when the survival of cancer patients is equivalent to that of the general population. This definition of cure is usually called the statistical cure, which is an important measure of burden due to cancer. In this paper, a minimum version of the log-rank test is proposed to test the equivalence of cancer patients' survival using the relative survival data. Performance of the proposed test is evaluated by simulation. Relative survival data from population-based cancer registries in SEER Program are used to examine patients' survival after diagnosis for various major cancer sites.  相似文献   

9.
Researches on hydrologic extreme events have great significance in reducing and avoiding the severe losses and impacts caused by natural disasters. When forecasting hydrologic design values of the hydrologic extreme events of interest by the conventional hydrologic frequency analysis (HFA) model, the results cannot take uncertainties and risks into account. In this article, in order to overcome conventional HFA model's disadvantages and to improve hydrologic design values’ forecast results, an improved HFA model named AM-MCMC-HFA is proposed by employing the AM-MCMC algorithm (adaptive Metropolis-Markov chain Monte Carlo) to HFA process. Differing with conventional HFA model, which is seeking single optimal forecast result, the AM-MCMC-HFA model can not only get the optimal but also the probabilistic forecast results of hydrologic design values. By applying to two obviously different hydrologic series, the performances of the model proposed have been verified. Analysis results show that four factors have great influence on hydrologic design values’ reliability, and also indicate that AM-MCMC-HFA has the ability of assessing the uncertainties of parameters and hydrologic design values. Therefore, by using the AM-MCMC-HFA model, hydrologic designs tasks can be operated more reasonably, and more rational decisions can be made by governmental decision-makers and public in practice.  相似文献   

10.
Summary .   Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley–James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L 1- and L 2-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study.  相似文献   

11.
Although survival analysis is a well-established mathematical discipline, there seem to be almost no attempts in survival modeling for experimentally virus-infected laboratory animals. We have taken up a stochastic approach originally developed by Shortley in the sixties and have applied it to three different types of experimental data: to virus titer determination, to the dose dependence of the mean survival time and to single survival curves. Experience concerning parameter estimation is reported and new ways of working with the model parameters are proposed. A standard mean survival time is defined and suggested as a new quantitative measure of virulence. Moreover, for the comparison of two experiments for which the amount of virions inoculated is kept fixed, but for which other parameters may vary, a new scheme of systematizing survival data from experimentally virus-infected laboratory animals is proposed. It is very likely that the model can be also applied to cancer survival data or any other infectious pathogen.  相似文献   

12.
Semi-competing risks data include the time to a nonterminating event and the time to a terminating event, while competing risks data include the time to more than one terminating event. Our work is motivated by a prostate cancer study, which has one nonterminating event and two terminating events with both semi-competing risks and competing risks present as well as two censoring times. In this paper, we propose a new multi-risks survival (MRS) model for this type of data. In addition, the proposed MRS model can accommodate noninformative right-censoring times for nonterminating and terminating events. Properties of the proposed MRS model are examined in detail. Theoretical and empirical results show that the estimates of the cumulative incidence function for a nonterminating event may be biased if the information on a terminating event is ignored. A Markov chain Monte Carlo sampling algorithm is also developed. Our methodology is further assessed using simulations and also an analysis of the real data from a prostate cancer study. As a result, a prostate-specific antigen velocity greater than 2.0 ng/mL per year and higher biopsy Gleason scores are positively associated with a shorter time to death due to prostate cancer.  相似文献   

13.
As a common vector-borne disease, dengue fever remains challenging to predict due to large variations in epidemic size across seasons driven by a number of factors including population susceptibility, mosquito density, meteorological conditions, geographical factors, and human mobility. An ensemble forecast system for dengue fever is first proposed that addresses the difficulty of predicting outbreaks with drastically different scales. The ensemble forecast system based on a susceptible-infected-recovered (SIR) type of compartmental model coupled with a data assimilation method called the ensemble adjusted Kalman filter (EAKF) is constructed to generate real-time forecasts of dengue fever spread dynamics. The model was informed by meteorological and mosquito density information to depict the transmission of dengue virus among human and mosquito populations, and generate predictions. To account for the dramatic variations of outbreak size in different seasons, the effective population size parameter that is sequentially updated to adjust the predicted outbreak scale is introduced into the model. Before optimizing the transmission model, we update the effective population size using the most recent observations and historical records so that the predicted outbreak size is dynamically adjusted. In the retrospective forecast of dengue outbreaks in Guangzhou, China during the 2011–2017 seasons, the proposed forecast model generates accurate projections of peak timing, peak intensity, and total incidence, outperforming a generalized additive model approach. The ensemble forecast system can be operated in real-time and inform control planning to reduce the burden of dengue fever.  相似文献   

14.
15.
MOTIVATION: Patient outcome prediction using microarray technologies is an important application in bioinformatics. Based on patients' genotypic microarray data, predictions are made to estimate patients' survival time and their risk of tumor metastasis or recurrence. So, accurate prediction can potentially help to provide better treatment for patients. RESULTS: We present a new computational method for patient outcome prediction. In the training phase of this method, we make use of two types of extreme patient samples: short-term survivors who got an unfavorable outcome within a short period and long-term survivors who were maintaining a favorable outcome after a long follow-up time. These extreme training samples yield a clear platform for us to identify relevant genes whose expression is closely related to the outcome. The selected extreme samples and the relevant genes are then integrated by a support vector machine to build a prediction model, by which each validation sample is assigned a risk score that falls into one of the special pre-defined risk groups. We apply this method to several public datasets. In most cases, patients in high and low risk groups stratified by our method have clearly distinguishable outcome status as seen in their Kaplan-Meier curves. We also show that the idea of selecting only extreme patient samples for training is effective for improving the prediction accuracy when different gene selection methods are used.  相似文献   

16.
In biomedical or public health research, it is common for both survival time and longitudinal categorical outcomes to be collected for a subject, along with the subject’s characteristics or risk factors. Investigators are often interested in finding important variables for predicting both survival time and longitudinal outcomes which could be correlated within the same subject. Existing approaches for such joint analyses deal with continuous longitudinal outcomes. New statistical methods need to be developed for categorical longitudinal outcomes. We propose to simultaneously model the survival time with a stratified Cox proportional hazards model and the longitudinal categorical outcomes with a generalized linear mixed model. Random effects are introduced to account for the dependence between survival time and longitudinal outcomes due to unobserved factors. The Expectation–Maximization (EM) algorithm is used to derive the point estimates for the model parameters, and the observed information matrix is adopted to estimate their asymptotic variances. Asymptotic properties for our proposed maximum likelihood estimators are established using the theory of empirical processes. The method is demonstrated to perform well in finite samples via simulation studies. We illustrate our approach with data from the Carolina Head and Neck Cancer Study (CHANCE) and compare the results based on our simultaneous analysis and the separately conducted analyses using the generalized linear mixed model and the Cox proportional hazards model. Our proposed method identifies more predictors than by separate analyses.  相似文献   

17.
Hydrological time series forecasting remains a difficult task due to its complicated nonlinear, non-stationary and multi-scale characteristics. To solve this difficulty and improve the prediction accuracy, a novel four-stage hybrid model is proposed for hydrological time series forecasting based on the principle of ‘denoising, decomposition and ensemble’. The proposed model has four stages, i.e., denoising, decomposition, components prediction and ensemble. In the denoising stage, the empirical mode decomposition (EMD) method is utilized to reduce the noises in the hydrological time series. Then, an improved method of EMD, the ensemble empirical mode decomposition (EEMD), is applied to decompose the denoised series into a number of intrinsic mode function (IMF) components and one residual component. Next, the radial basis function neural network (RBFNN) is adopted to predict the trend of all of the components obtained in the decomposition stage. In the final ensemble prediction stage, the forecasting results of all of the IMF and residual components obtained in the third stage are combined to generate the final prediction results, using a linear neural network (LNN) model. For illustration and verification, six hydrological cases with different characteristics are used to test the effectiveness of the proposed model. The proposed hybrid model performs better than conventional single models, the hybrid models without denoising or decomposition and the hybrid models based on other methods, such as the wavelet analysis (WA)-based hybrid models. In addition, the denoising and decomposition strategies decrease the complexity of the series and reduce the difficulties of the forecasting. With its effective denoising and accurate decomposition ability, high prediction precision and wide applicability, the new model is very promising for complex time series forecasting. This new forecast model is an extension of nonlinear prediction models.  相似文献   

18.
Deadlock-free scheduling of parts is vital for increasing the utilization of an Automated Manufacturing System (AMS). An existing literature survey has identified the role of an effective modeling methodology for AMS in ensuring the appropriate scheduling of the parts on the available resources. In this paper, a new modeling methodology termed as Extended Color Time Net of Set of Simple Sequential Process with Resources (ECTS3PR) has been presented that efficiently handles dynamic behavior of the manufacturing system. The model is subsequently utilized to obtain a deadlock-free schedule with minimized makespan using a new Evolutionary Endosymbiotic Learning Automata (EELA) algorithm. The ECTS3PR model, which can easily handle various relations and structural interactions, proves to be very helpful in measuring and managing system performances. The novel algorithm EELA has the merits of both endosymbiotic systems and learning automata. The proposed algorithm performs better than various benchmark strategies available in the literature. Extensive experiments have been performed to examine the effectiveness of the proposed methodology, and the results obtained over different data sets of varying dimensions authenticate the performance claim. Superiority of the proposed approach has been validated by defining a new performance index termed as the ‘makespan index’ (MI), whereas the ANOVA analysis reveals the robustness of the algorithm.  相似文献   

19.
Novel high-throughput measurement techniques in vivo are beginning to produce dense high-quality time series which can be used to investigate the structure and regulation of biochemical networks. We propose an automated information extraction procedure which takes advantage of the unique S-system structure and supports model building from time traces, curve fitting, model selection, and structure identification based on parameter estimation. The procedure comprises of three modules: model Generation, parameter estimation or model Fitting, and model Selection (GFS algorithm). The GFS algorithm has been implemented in MATLAB and returns a list of candidate S-systems which adequately explain the data and guides the search to the most plausible model for the time series under study. By combining two strategies (namely decoupling and limiting connectivity) with methods of data smoothing, the proposed algorithm is scalable up to realistic situations of moderate size. We illustrate the proposed methodology with a didactic example.  相似文献   

20.
In many clinical trials and evaluations using medical care administrative databases it is of interest to estimate not only the survival time of a given treatment modality but also the total associated cost. The most widely used estimator for data subject to censoring is the Kaplan-Meier (KM) or product-limit (PL) estimator. The optimality properties of this estimator applied to time-to-event data (consistency, etc.) under the assumptions of random censorship have been established. However, whenever the relationship between cost and survival time includes an error term to account for random differences among patients' costs, the dependency between cumulative treatment cost at the time of censoring and at the survival time results in KM giving biased estimates. A similar phenomenon has previously been noted in the context of estimating quality-adjusted survival time. We propose an estimator for mean cost which exploits the underlying relationship between total treatment cost and survival time. The proposed method utilizes either parametric or nonparametric regression to estimate this relationship and is consistent when this relationship is consistently estimated. We then present simulation results which illustrate the gain in finite-sample efficiency when compared with another recently proposed estimator. The methods are then applied to the estimation of mean cost for two studies where right-censoring was present. The first is the heart failure clinical trial Studies of Left Ventricular Dysfunction (SOLVD). The second is a Health Maintenance Organization (HMO) database study of the cost of ulcer treatment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号