首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary .   Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley–James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L 1- and L 2-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study.  相似文献   

2.
MOTIVATION: An important application of microarray technology is to relate gene expression profiles to various clinical phenotypes of patients. Success has been demonstrated in molecular classification of cancer in which the gene expression data serve as predictors and different types of cancer serve as a categorical outcome variable. However, there has been less research in linking gene expression profiles to the censored survival data such as patients' overall survival time or time to cancer relapse. It would be desirable to have models with good prediction accuracy and parsimony property. RESULTS: We propose to use the L(1) penalized estimation for the Cox model to select genes that are relevant to patients' survival and to build a predictive model for future prediction. The computational difficulty associated with the estimation in the high-dimensional and low-sample size settings can be efficiently solved by using the recently developed least-angle regression (LARS) method. Our simulation studies and application to real datasets on predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed procedure, which we call the LARS-Cox procedure, can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. The LARS-Cox regression gives better predictive performance than the L(2) penalized regression and a few other dimension-reduction based methods. CONCLUSIONS: We conclude that the proposed LARS-Cox procedure can be very useful in identifying genes relevant to survival phenotypes and in building a parsimonious predictive model that can be used for classifying future patients into clinically relevant high- and low-risk groups based on the gene expression profile and survival times of previous patients.  相似文献   

3.
In this paper a generalization of the Poisson regression model indexed by a shape parameter is proposed for the analysis of life table and follow-up data with concomitant variables. The model is suitable for analysis of extra-Poisson variation data. The model is used to fit the survival data given in Holford (1980). The model parameters, the hazard and survival functions are estimated by the method of maximum likelihood. The results obtained from this study seem to be comparable to those obtained by Chen (1988). Approximate tests of the dispersion and goodness-of-fit of the data to the model are also discussed.  相似文献   

4.
Lam KF  Lee YW  Leung TL 《Biometrics》2002,58(2):316-323
In this article, the focus is on the analysis of multivariate survival time data with various types of dependence structures. Examples of multivariate survival data include clustered data and repeated measurements from the same subject, such as the interrecurrence times of cancer tumors. A random effect semiparametric proportional odds model is proposed as an alternative to the proportional hazards model. The distribution of the random effects is assumed to be multivariate normal and the random effect is assumed to act additively to the baseline log-odds function. This class of models, which includes the usual shared random effects model, the additive variance components model, and the dynamic random effects model as special cases, is highly flexible and is capable of modeling a wide range of multivariate survival data. A unified estimation procedure is proposed to estimate the regression and dependence parameters simultaneously by means of a marginal-likelihood approach. Unlike the fully parametric case, the regression parameter estimate is not sensitive to the choice of correlation structure of the random effects. The marginal likelihood is approximated by the Monte Carlo method. Simulation studies are carried out to investigate the performance of the proposed method. The proposed method is applied to two well-known data sets, including clustered data and recurrent event times data.  相似文献   

5.
We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the Sparse Probabilistic Principal Component Analysis (SPPCA). A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.  相似文献   

6.
Sangbum Choi  Xuelin Huang 《Biometrics》2012,68(4):1126-1135
Summary We propose a semiparametrically efficient estimation of a broad class of transformation regression models for nonproportional hazards data. Classical transformation models are to be viewed from a frailty model paradigm, and the proposed method provides a unified approach that is valid for both continuous and discrete frailty models. The proposed models are shown to be flexible enough to model long‐term follow‐up survival data when the treatment effect diminishes over time, a case for which the PH or proportional odds assumption is violated, or a situation in which a substantial proportion of patients remains cured after treatment. Estimation of the link parameter in frailty distribution, considered to be unknown and possibly dependent on a time‐independent covariates, is automatically included in the proposed methods. The observed information matrix is computed to evaluate the variances of all the parameter estimates. Our likelihood‐based approach provides a natural way to construct simple statistics for testing the PH and proportional odds assumptions for usual survival data or testing the short‐ and long‐term effects for survival data with a cure fraction. Simulation studies demonstrate that the proposed inference procedures perform well in realistic settings. Applications to two medical studies are provided.  相似文献   

7.
New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where “omics” features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights.  相似文献   

8.
We propose methods for Bayesian inference for a new class of semiparametric survival models with a cure fraction. Specifically, we propose a semiparametric cure rate model with a smoothing parameter that controls the degree of parametricity in the right tail of the survival distribution. We show that such a parameter is crucial for these kinds of models and can have an impact on the posterior estimates. Several novel properties of the proposed model are derived. In addition, we propose a class of improper noninformative priors based on this model and examine the properties of the implied posterior. Also, a class of informative priors based on historical data is proposed and its theoretical properties are investigated. A case study involving a melanoma clinical trial is discussed in detail to demonstrate the proposed methodology.  相似文献   

9.
Liang Li  Bo Hu  Tom Greene 《Biometrics》2009,65(3):737-745
Summary .  In many longitudinal clinical studies, the level and progression rate of repeatedly measured biomarkers on each subject quantify the severity of the disease and that subject's susceptibility to progression of the disease. It is of scientific and clinical interest to relate such quantities to a later time-to-event clinical endpoint such as patient survival. This is usually done with a shared parameter model. In such models, the longitudinal biomarker data and the survival outcome of each subject are assumed to be conditionally independent given subject-level severity or susceptibility (also called frailty in statistical terms). In this article, we study the case where the conditional distribution of longitudinal data is modeled by a linear mixed-effect model, and the conditional distribution of the survival data is given by a Cox proportional hazard model. We allow unknown regression coefficients and time-dependent covariates in both models. The proposed estimators are maximizers of an exact correction to the joint log likelihood with the frailties eliminated as nuisance parameters, an idea that originated from correction of covariate measurement error in measurement error models. The corrected joint log likelihood is shown to be asymptotically concave and leads to consistent and asymptotically normal estimators. Unlike most published methods for joint modeling, the proposed estimation procedure does not rely on distributional assumptions of the frailties. The proposed method was studied in simulations and applied to a data set from the Hemodialysis Study.  相似文献   

10.
Yin G  Cai J 《Biometrics》2005,61(1):151-161
As an alternative to the mean regression model, the quantile regression model has been studied extensively with independent failure time data. However, due to natural or artificial clustering, it is common to encounter multivariate failure time data in biomedical research where the intracluster correlation needs to be accounted for appropriately. For right-censored correlated survival data, we investigate the quantile regression model and adapt an estimating equation approach for parameter estimation under the working independence assumption, as well as a weighted version for enhancing the efficiency. We show that the parameter estimates are consistent and asymptotically follow normal distributions. The variance estimation using asymptotic approximation involves nonparametric functional density estimation. We employ the bootstrap and perturbation resampling methods for the estimation of the variance-covariance matrix. We examine the proposed method for finite sample sizes through simulation studies, and illustrate it with data from a clinical trial on otitis media.  相似文献   

11.
Association-based linkage disequilibrium (LD) mapping is an increasingly important tool for localizing genes that show potential influence on human aging and longevity. As haplotypes contain more LD information than single markers, a haplotype-based LD approach can have increased power in detecting associations as well as increased robustness in statistical testing. In this paper, we develop a new statistical model to estimate haplotype relative risks (HRRs) on human survival using unphased multilocus genotype data from unrelated individuals in cross-sectional studies. Based on the proportional hazard assumption, the model can estimate haplotype risk and frequency parameters, incorporate observed covariates, assess interactions between haplotypes and the covariates, and investigate the modes of gene function. By introducing population survival information available from population statistics, we are able to develop a procedure that carries out the parameter estimation using a nonparametric baseline hazard function and estimates sex-specific HRRs to infer gene-sex interaction. We also evaluate the haplotype effects on human survival while taking into account individual heterogeneity in the unobserved genetic and nongenetic factors or frailty by introducing the gamma-distributed frailty into the survival function. After model validation by computer simulation, we apply our method to an empirical data set to measure haplotype effects on human survival and to estimate haplotype frequencies at birth and over the observed ages. Results from both simulation and model application indicate that our survival analysis model is an efficient method for inferring haplotype effects on human survival in population-based association studies.  相似文献   

12.
Yin G  Ibrahim JG 《Biometrics》2005,61(2):403-412
We propose a new class of survival models which naturally links a family of proper and improper population survival functions. The models resulting in improper survival functions are often referred to as cure rate models. This class of regression models is formulated through the Box-Cox transformation on the population hazard function and a proper density function. By adding an extra transformation parameter into the cure rate model, we are able to generate models with a zero cure rate, thus leading to a proper population survival function. A graphical illustration of the behavior and the influence of the transformation parameter on the regression model is provided. We consider a Bayesian approach which is motivated by the complexity of the model. Prior specification needs to accommodate parameter constraints due to the non-negativity of the survival function. Moreover, the likelihood function involves a complicated integral on the survival function, which may not have an analytical closed form, and thus makes the implementation of Gibbs sampling more difficult. We propose an efficient Markov chain Monte Carlo computational scheme based on Gaussian quadrature. The proposed method is illustrated with an example involving a melanoma clinical trial.  相似文献   

13.
Z Jiang  L Wang  C Li  J Xia  H Jia 《PloS one》2012,7(9):e44013
Group sequential design has been widely applied in clinical trials in the past few decades. The sample size estimation is a vital concern of sponsors and investigators. Especially in the survival group sequential trials, it is a thorny question because of its ambiguous distributional form, censored data and different definition of information time. A practical and easy-to-use simulation-based method is proposed for multi-stage two-arm survival group sequential design in the article and its SAS program is available. Besides the exponential distribution, which is usually assumed for survival data, the Weibull distribution is considered here. The incorporation of the probability of discontinuation in the simulation leads to the more accurate estimate. The assessment indexes calculated in the simulation are helpful to the determination of number and timing of the interim analysis. The use of the method in the survival group sequential trials is illustrated and the effects of the varied shape parameter on the sample size under the Weibull distribution are explored by employing an example. According to the simulation results, a method to estimate the shape parameter of the Weibull distribution is proposed based on the median survival time of the test drug and the hazard ratio, which are prespecified by the investigators and other participants. 10+ simulations are recommended to achieve the robust estimate of the sample size. Furthermore, the method is still applicable in adaptive design if the strategy of sample size scheme determination is adopted when designing or the minor modifications on the program are made.  相似文献   

14.
Although survival analysis is a well-established mathematical discipline, there seem to be almost no attempts in survival modeling for experimentally virus-infected laboratory animals. We have taken up a stochastic approach originally developed by Shortley in the sixties and have applied it to three different types of experimental data: to virus titer determination, to the dose dependence of the mean survival time and to single survival curves. Experience concerning parameter estimation is reported and new ways of working with the model parameters are proposed. A standard mean survival time is defined and suggested as a new quantitative measure of virulence. Moreover, for the comparison of two experiments for which the amount of virions inoculated is kept fixed, but for which other parameters may vary, a new scheme of systematizing survival data from experimentally virus-infected laboratory animals is proposed. It is very likely that the model can be also applied to cancer survival data or any other infectious pathogen.  相似文献   

15.
16.
A relatively simple method is proposed for the estimation of parameters of stage-structured populations from sample data for situation where (a) unit time survival rates may vary with time, and (b) the distribution of entry times to stage 1 is too complicated to be fitted with a simple parametric model such as a normal or gamma distribution. The key aspects of this model are that the entry time distribution is approximated by an exponential function withp parameters, the unit time survival rates in stages are approximated by anr parameter exponential polynomial in the stage number, and the durations of stages are assumed to be the same for all individuals. The new method is applied to four Zooplankton data sets, with parametric bootstrapping used to assess the bias and variation in estimates. It is concluded that good estimates of demographic parameters from stagefrequency data from natural populations will usually only be possible if extra information such as the durations of stages is known.  相似文献   

17.
Predicting survival from microarray data--a comparative study   总被引:1,自引:0,他引:1  
MOTIVATION: Survival prediction from gene expression data and other high-dimensional genomic data has been subject to much research during the last years. These kinds of data are associated with the methodological problem of having many more gene expression values than individuals. In addition, the responses are censored survival times. Most of the proposed methods handle this by using Cox's proportional hazards model and obtain parameter estimates by some dimension reduction or parameter shrinkage estimation technique. Using three well-known microarray gene expression data sets, we compare the prediction performance of seven such methods: univariate selection, forward stepwise selection, principal components regression (PCR), supervised principal components regression, partial least squares regression (PLS), ridge regression and the lasso. RESULTS: Statistical learning from subsets should be repeated several times in order to get a fair comparison between methods. Methods using coefficient shrinkage or linear combinations of the gene expression values have much better performance than the simple variable selection methods. For our data sets, ridge regression has the overall best performance. AVAILABILITY: Matlab and R code for the prediction methods are available at http://www.med.uio.no/imb/stat/bmms/software/microsurv/.  相似文献   

18.
MOTIVATION: An important goal of microarray studies is to discover genes that are associated with clinical outcomes, such as disease status and patient survival. While a typical experiment surveys gene expressions on a global scale, there may be only a small number of genes that have significant influence on a clinical outcome. Moreover, expression data have cluster structures and the genes within a cluster have correlated expressions and coordinated functions, but the effects of individual genes in the same cluster may be different. Accordingly, we seek to build statistical models with the following properties. First, the model is sparse in the sense that only a subset of the parameter vector is non-zero. Second, the cluster structures of gene expressions are properly accounted for. RESULTS: For gene expression data without pathway information, we divide genes into clusters using commonly used methods, such as K-means or hierarchical approaches. The optimal number of clusters is determined using the Gap statistic. We propose a clustering threshold gradient descent regularization (CTGDR) method, for simultaneous cluster selection and within cluster gene selection. We apply this method to binary classification and censored survival analysis. Compared to the standard TGDR and other regularization methods, the CTGDR takes into account the cluster structure and carries out feature selection at both the cluster level and within-cluster gene level. We demonstrate the CTGDR on two studies of cancer classification and two studies correlating survival of lymphoma patients with microarray expressions. AVAILABILITY: R code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

19.
Cancer is a complex genetic disease, resulting from defects of multiple genes. Development of microarray techniques makes it possible to survey the whole genome and detect genes that have influential impacts on the progression of cancer. Statistical analysis of cancer microarray data is challenging because of the high dimensionality and cluster nature of gene expressions. Here, clusters are composed of genes with coordinated pathological functions and/or correlated expressions. In this article, we consider cancer studies where censored survival endpoint is measured along with microarray gene expressions. We propose a hybrid clustering approach, which uses both pathological pathway information retrieved from KEGG and statistical correlations of gene expressions, to construct gene clusters. Cancer survival time is modeled as a linear function of gene expressions. We adopt the clustering threshold gradient directed regularization (CTGDR) method for simultaneous gene cluster selection, within-cluster gene selection, and predictive model building. Analysis of two lymphoma studies shows that the proposed approach - which is composed of the hybrid gene clustering, linear regression model for survival, and clustering regularized estimation with CTGDR - can effectively identify gene clusters and genes within selected clusters that have satisfactory predictive power for censored cancer survival outcomes.  相似文献   

20.
MOTIVATION: Recent research has shown that gene expression profiles can potentially be used for predicting various clinical phenotypes, such as tumor class, drug response and survival time. While there has been extensive studies on tumor classification, there has been less emphasis on other phenotypic features, in particular, patient survival time or time to cancer recurrence, which are subject to right censoring. We consider in this paper an analysis of censored survival time based on microarray gene expression profiles. RESULTS: We propose a dimension reduction strategy, which combines principal components analysis and sliced inverse regression, to identify linear combinations of genes, that both account for the variability in the gene expression levels and preserve the phenotypic information. The extracted gene combinations are then employed as covariates in a predictive survival model formulation. We apply the proposed method to a large diffuse large-B-cell lymphoma dataset, which consists of 240 patients and 7399 genes, and build a Cox proportional hazards model based on the derived gene expression components. The proposed method is shown to provide a good predictive performance for patient survival, as demonstrated by both the significant survival difference between the predicted risk groups and the receiver operator characteristics analysis. AVAILABILITY: R programs are available upon request from the authors. SUPPLEMENTARY INFORMATION: http://dna.ucdavis.edu/~hli/bioinfo-surv-supp.pdf.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号