首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Background  

Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.  相似文献   

2.
MOTIVATION: Oligonucleotide fingerprinting of ribosomal RNA genes (OFRG) is a procedure that sorts rRNA gene (rDNA) clones into taxonomic groups through a series of hybridization experiments. The hybridization signals are classified into three discrete values 0, 1 and N, where 0 and 1, respectively, specify negative and positive hybridization events and N designates an uncertain assignment. This study examined various approaches for classifying the values including Bayesian classification with normally distributed signal data, Bayesian classification with the exponentially distributed data, and with gamma distributed data, along with tree-based classification. All classification data were clustered using the unweighted pair group method with arithmetic mean. RESULTS: The performance of each classification/clustering procedure was compared with results from known reference data. Comparisons indicated that the approach using the Bayesian classification with normal densities followed by tree clustering out-performed all others. The paper includes a discussion of how this Bayesian approach may be useful for the analysis of gene expression data.  相似文献   

3.

Objectives

Hip fractures commonly result in permanent disability, institutionalization or death in elderly. Existing hip-fracture predicting tools are underused in clinical practice, partly due to their lack of intuitive interpretation. By use of a graphical layer, Bayesian network models could increase the attractiveness of fracture prediction tools. Our aim was to study the potential contribution of a causal Bayesian network in this clinical setting. A logistic regression was performed as a standard control approach to check the robustness of the causal Bayesian network approach.

Setting

EPIDOS is a multicenter study, conducted in an ambulatory care setting in five French cities between 1992 and 1996 and updated in 2010. The study included 7598 women aged 75 years or older, in which fractures were assessed quarterly during 4 years. A causal Bayesian network and a logistic regression were performed on EPIDOS data to describe major variables involved in hip fractures occurrences.

Results

Both models had similar association estimations and predictive performances. They detected gait speed and mineral bone density as variables the most involved in the fracture process. The causal Bayesian network showed that gait speed and bone mineral density were directly connected to fracture and seem to mediate the influence of all the other variables included in our model. The logistic regression approach detected multiple interactions involving psychotropic drug use, age and bone mineral density.

Conclusion

Both approaches retrieved similar variables as predictors of hip fractures. However, Bayesian network highlighted the whole web of relation between the variables involved in the analysis, suggesting a possible mechanism leading to hip fracture. According to the latter results, intervention focusing concomitantly on gait speed and bone mineral density may be necessary for an optimal prevention of hip fracture occurrence in elderly people.  相似文献   

4.
Particle classification is an important component of multivariate statistical analysis methods that has been used extensively to extract information from electron micrographs of single particles. Here we describe a new Bayesian Gibbs sampling algorithm for the classification of such images. This algorithm, which is applied after dimension reduction by correspondence analysis or by principal components analysis, dynamically learns the parameters of the multivariate Gaussian distributions that characterize each class. These distributions describe tilted ellipsoidal clusters that adaptively adjust shape to capture differences in the variances of factors and the correlations of factors within classes. A novel Bayesian procedure to objectively select factors for inclusion in the classification models is a component of this procedure. A comparison of this algorithm with hierarchical ascendant classification of simulated data sets shows improved classification over a broad range of signal-to-noise ratios.  相似文献   

5.
The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms) to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI) datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.  相似文献   

6.
MOTIVATION: A common task in microarray data analysis consists of identifying genes associated with a phenotype. When the outcomes of interest are censored time-to-event data, standard approaches assess the effect of genes by fitting univariate survival models. In this paper, we propose a Bayesian variable selection approach, which allows the identification of relevant markers by jointly assessing sets of genes. We consider accelerated failure time (AFT) models with log-normal and log-t distributional assumptions. A data augmentation approach is used to impute the failure times of censored observations and mixture priors are used for the regression coefficients to identify promising subsets of variables. The proposed method provides a unified procedure for the selection of relevant genes and the prediction of survivor functions. RESULTS: We demonstrate the performance of the method on simulated examples and on several microarray datasets. For the simulation study, we consider scenarios with large number of noisy variables and different degrees of correlation between the relevant and non-relevant (noisy) variables. We are able to identify the correct covariates and obtain good prediction of the survivor functions. For the microarray applications, some of our selected genes are known to be related to the diseases under study and a few are in agreement with findings from other researchers. AVAILABILITY: The Matlab code for implementing the Bayesian variable selection method may be obtained from the corresponding author. CONTACT: mvannucci@stat.tamu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

7.
《Process Biochemistry》2014,49(2):188-194
As the key precursor for l-ascorbic acid synthesis, 2-keto-l-gulonic acid (2-KGA) is widely produced by the mixed culture of Bacillus megaterium and Ketogulonicigenium vulgare. In this study, a Bayesian combination of multiple neural networks is developed to obtain accurate prediction of the product formation. The historical batches are classified into three categories with a batch classification algorithm based on the statistical analysis of the product formation profiles. For each category, an artificial neural network is constructed. The input vector of the neural network consists of a series of time-discretized process variables. The output of the neural network is the predicted product formation. The training database for each neural network is composed of both the input–output data pairs from the historical bathes in the corresponding category, and all the available data pairs collected from the batch of present interest. The prediction of the product formation is practiced through a Bayesian combination of three trained neural networks. Validation was carried out in a Chinese pharmaceutical factory for 140 industrial batches in total, and the average root mean square error (RMSE) is 2.2% and 2.6% for 4 h and 8 h ahead prediction of product formation, respectively.  相似文献   

8.
The development of clinical prediction models requires the selection of suitable predictor variables. Techniques to perform objective Bayesian variable selection in the linear model are well developed and have been extended to the generalized linear model setting as well as to the Cox proportional hazards model. Here, we consider discrete time‐to‐event data with competing risks and propose methodology to develop a clinical prediction model for the daily risk of acquiring a ventilator‐associated pneumonia (VAP) attributed to P. aeruginosa (PA) in intensive care units. The competing events for a PA VAP are extubation, death, and VAP due to other bacteria. Baseline variables are potentially important to predict the outcome at the start of ventilation, but may lose some of their predictive power after a certain time. Therefore, we use a landmark approach for dynamic Bayesian variable selection where the set of relevant predictors depends on the time already spent at risk. We finally determine the direct impact of a variable on each competing event through cause‐specific variable selection.  相似文献   

9.
1. Early versions of the river invertebrate prediction and classification system (RIVPACS) used TWINSPAN to classify reference sites based on the macro-invertebrate fauna, followed by multiple discriminant analysis (MDA) for prediction of the fauna to be expected at new sites from environmental variables. This paper examines some alternative methods for the initial site classification and a different technique for prediction. 2. A data set of 410 sites from RIVPACS II was used for initial screening of seventeen alternative methods of site classification. Multiple discriminant analysis was used to predict classification group from environmental variables. 3. Five of the classification–prediction systems which showed promise were developed further to facilitate prediction of taxa at species and at Biological Monitoring Working Party (BMWP) family level. 4. The predictive capability of these new systems, plus RIVPACS II, was tested on an independent data set of 101 sites from locations throughout Great Britain. 5. Differences between the methods were often marginal but two gave the most consistently reliable outputs: the original TWINSPAN method, and the ordination method semi-strong hybrid multidimensional scaling (SSH) followed by K-means clustering. 6. Logistic regression, an alternative approach to prediction which does not require the prior development of a classification system, was also examined. Although its performance fell within the range offered by the other five systems tested, it conveyed no advantages over them. 7. This study demonstrated that several different multivariate methods were suitable for developing a reliable system for predicting expected probability of occurrence of taxa. This is because the prediction system involves a weighted average smoothing across site groupings. 8. Hence, the two most promising procedures for site classification, coupled to MDA, were both used in the exploratory analyses for RIVPACS III development, which utilized over 600 reference sites.  相似文献   

10.
Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com.
This is a PLOS Computational Biology Software Article
  相似文献   

11.
为探讨小流域尺度丘陵区的高分辨率数字土壤制图方法,通过对景观相分类的探索,配合应用不同尺度的Geomorphons(GM)微地形特征数据构成分类变量组参与高分辨率土壤pH、黏粒含量和阳离子交换量的预测制图,并与传统数字高程模型衍生变量和遥感变量进行组合与比较分析。此外,采用支持向量机、偏最小二乘回归和随机森林3种机器学习模型择优与残差回归克里金复合参与预测模型的构建与评价。结果表明: 景观及多尺度微地形分类变量组的应用分别提高小流域尺度丘陵地貌区pH、黏粒含量和阳离子交换量预测精度的18.8%、8.2%和8.7%。包含植被信息的景观相分类图相比土地利用数据有更高的模型贡献度;5 m分辨率的GM微地形分类图相比低分辨率的分类图更适宜高精度的预测制图。黏粒含量使用随机森林复合模型有最高的预测精度,而pH和阳离子交换量则不适宜在随机森林模型的基础上加入残差回归克里金模型。景观-多尺度微地形分类变量、数字高程模型衍生变量和遥感变量三者结合的模型预测表现最佳,表明多元变量在起伏地形区域相比单一数据源能够包含更多的土壤有效信息。由GM数据和地表景观数据组成的景观分类变量组作为主要变量能够解释小流域丘陵区部分土壤属性约40%的空间变异。在同类型土壤预测制图研究中,多分辨率GM及景观分类数据有潜力作为环境变量参与预测模型的构建。  相似文献   

12.
A Bayesian procedure is developed for the selection of concomitant variables in survival models. The variables are selected in a step-up procedure according to the criterion of maximum expected likelihood, where the expectation is over the prior parameter space. Prior knowledge of the influence of these covariates on patient prognosis is incorporated into the analysis. The step-up procedure is stopped when the Bayes factor in favor of omitting the variable selected in a particular step exceeds a specified value. The resulting model with the selected variables is fitted using Bayes estimates of the coefficients. This technique is applied to Hodgkin's disease data from a large Cooperative Clinical Trial Group and the results are compared to the results from the classical likelihood selection procedure.  相似文献   

13.
选取癌症基因组图谱数据库的肺鳞状细胞癌(Lung Squamous Cell Carcinoma,LUSC)样本作为数据集,在全基因组的水平上研究肺鳞状细胞癌病人从正常到发病I期基因表达的变化,寻找与LUSC发病密切相关的早期标志物,并建立一种基于早期标志基因的肿瘤预测模型。方法 采用模式识别分类法和基因通路和功能分析相结合的筛选方法,对LUSC的早期标志物进行识别,并运用Fisher判别建立肿瘤预测模型。得到12个LUSC的早期标志物,分别是CLDN18, CD34, ESAM, JAM2, CDH5, F11, F8, CFD, MRC1, MARCO, SFTPA2 和 SFTPA1,机器学习建模后对LUSC早期癌症样本和正常肺组织样本的分类精度达到了98%以上。由基因SFTPA1和ESAM建立的LUSC早期肿瘤预测模型,对正常肺组织和LUSC肿瘤Ⅰ期样本的分类敏感性和特异性分别为99.18%和100%,并且独立验证集的分类准确率也在90%以上。结论 筛选出的12个早期分子标志物有望成为LUSC诊断的标志分子,并且建立的肿瘤预测模型具有极高的准确性,可以为LUSC的发生机理研究以及早期肿瘤预测提供帮助。  相似文献   

14.
Cancer subtype classification and survival prediction both relate directly to patients'' specific treatment plans, making them fundamental medical issues. Although the two factors are interrelated learning problems, most studies tackle each separately. In this paper, expression levels of genes are used for both cancer subtype classification and survival prediction. We considered 350 diffuse large B-cell lymphoma (DLBCL) subjects, taken from four groups of patients (activated B-cell-like subtype dead, activated B-cell-like subtype alive, germinal center B-cell-like subtype dead, and germinal center B-cell-like subtype alive). As classification features, we used 11,271 gene expression levels of each subject. The features were first ranked by mRMR (Maximum Relevance Minimum Redundancy) principle and further selected by IFS (Incremental Feature Selection) procedure. Thirty-five gene signatures were selected after the IFS procedure, and the patients were divided into the above mentioned four groups. These four groups were combined in different ways for subtype prediction and survival prediction, specifically, the activated versus the germinal center and the alive versus the dead. Subtype prediction accuracy of the 35-gene signature was 98.6%. We calculated cumulative survival time of high-risk group and low-risk groups by the Kaplan-Meier method. The log-rank test p-value was 5.98e-08. Our methodology provides a way to study subtype classification and survival prediction simultaneously. Our results suggest that for some diseases, especially cancer, subtype classification may be used to predict survival, and, conversely, survival prediction features may shed light on subtype features.  相似文献   

15.

Background

Accurate estimations of life expectancy are important in the management of patients with metastatic cancer affecting the extremities, and help set patient, family, and physician expectations. Clinically, the decision whether to operate on patients with skeletal metastases, as well as the choice of surgical procedure, are predicated on an individual patient''s estimated survival. Currently, there are no reliable methods for estimating survival in this patient population. Bayesian classification, which includes Bayesian belief network (BBN) modeling, is a statistical method that explores conditional, probabilistic relationships between variables to estimate the likelihood of an outcome using observed data. Thus, BBN models are being used with increasing frequency in a variety of diagnoses to codify complex clinical data into prognostic models. The purpose of this study was to determine the feasibility of developing Bayesian classifiers to estimate survival in patients undergoing surgery for metastases of the axial and appendicular skeleton.

Methods

We searched an institution-owned patient management database for all patients who underwent surgery for skeletal metastases between 1999 and 2003. We then developed and trained a machine-learned BBN model to estimate survival in months using candidate features based on historical data. Ten-fold cross-validation and receiver operating characteristic (ROC) curve analysis were performed to evaluate the BNN model''s accuracy and robustness.

Results

A total of 189 consecutive patients were included. First-degree predictors of survival differed between the 3-month and 12-month models. Following cross validation, the area under the ROC curve was 0.85 (95% CI: 0.80–0.93) for 3-month probability of survival and 0.83 (95% CI: 0.77–0.90) for 12-month probability of survival.

Conclusions

A robust, accurate, probabilistic naïve BBN model was successfully developed using observed clinical data to estimate individualized survival in patients with operable skeletal metastases. This method warrants further development and must be externally validated in other patient populations.  相似文献   

16.
Individualized approaches to prognosis are crucial to effective management of cancer patients. We developed a methodology to assign individualized 5-year disease-specific death probabilities to 1,222 patients with melanoma and to 1,225 patients with breast cancer. For each cancer, three risk subgroups were identified by stratifying patients according to initial stage, and prediction probabilities were generated based on the factors most closely related to 5-year disease-specific death. Separate subgroup probabilities were merged to form a single composite index, and its predictive efficacy was assessed by several measures, including the area (AUC) under its receiver operating characteristic (ROC) curve. The patient-centered methodology achieved an AUC of 0.867 in the prediction of 5-year disease-specific death, compared with 0.787 using the AJCC staging classification alone. When applied to breast cancer patients, it achieved an AUC of 0.907, compared with 0.802 using the AJCC staging classification alone. A prognostic algorithm produced from a randomly selected training subsample of 800 melanoma patients preserved 92.5% of its prognostic efficacy (as measured by AUC) when the same algorithm was applied to a validation subsample containing the remaining patients. Finally, the tailored prognostic approach enhanced the identification of high-risk candidates for adjuvant therapy in melanoma. These results describe a novel patient-centered prognostic methodology with improved predictive efficacy when compared with AJCC stage alone in two distinct malignancies drawn from two separate populations.  相似文献   

17.
MOTIVATION: An important area of research in the postgenomics era is to relate high-dimensional genetic or genomic data to various clinical phenotypes of patients. Due to large variability in time to certain clinical events among patients, studying possibly censored survival phenotypes can be more informative than treating the phenotypes as categorical variables. Due to high dimensionality and censoring, building a predictive model for time to event is more difficult than the classification/linear regression problem. We propose to develop a boosting procedure using smoothing splines for estimating the general proportional hazards models. Such a procedure can potentially be used for identifying non-linear effects of genes on the risk of developing an event. RESULTS: Our empirical simulation studies showed that the procedure can indeed recover the true functional forms of the covariates and can identify important variables that are related to the risk of an event. Results from predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed method can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. In addition, there is clear evidence of non-linear effects of some genes on survival time.  相似文献   

18.
Classification tree models are flexible analysis tools which have the ability to evaluate interactions among predictors as well as generate predictions for responses of interest. We describe Bayesian analysis of a specific class of tree models in which binary response data arise from a retrospective case-control design. We are also particularly interested in problems with potentially very many candidate predictors. This scenario is common in studies concerning gene expression data, which is a key motivating example context. Innovations here include the introduction of tree models that explicitly address and incorporate the retrospective design, and the use of nonparametric Bayesian models involving Dirichlet process priors on the distributions of predictor variables. The model specification influences the generation of trees through Bayes' factor based tests of association that determine significant binary partitions of nodes during a process of forward generation of trees. We describe this constructive process and discuss questions of generating and combining multiple trees via Bayesian model averaging for prediction. Additional discussion of parameter selection and sensitivity is given in the context of an example which concerns prediction of breast tumour status utilizing high-dimensional gene expression data; the example demonstrates the exploratory/explanatory uses of such models as well as their primary utility in prediction. Shortcomings of the approach and comparison with alternative tree modelling algorithms are also discussed, as are issues of modelling and computational extensions.  相似文献   

19.

Background

The goal of personalized medicine is to provide patients optimal drug screening and treatment based on individual genomic or proteomic profiles. Reverse-Phase Protein Array (RPPA) technology offers proteomic information of cancer patients which may be directly related to drug sensitivity. For cancer patients with different drug sensitivity, the proteomic profiling reveals important pathophysiologic information which can be used to predict chemotherapy responses.

Results

The goal of this paper is to present a framework for personalized medicine using both RPPA and drug sensitivity (drug resistance or intolerance). In the proposed personalized medicine system, the prediction of drug sensitivity is obtained by a proposed augmented naive Bayesian classifier (ANBC) whose edges between attributes are augmented in the network structure of naive Bayesian classifier. For discriminative structure learning of ANBC, local classification rate (LCR) is used to score augmented edges, and greedy search algorithm is used to find the discriminative structure that maximizes classification rate (CR). Once a classifier is trained by RPPA and drug sensitivity using cancer patient samples, the classifier is able to predict the drug sensitivity given RPPA information from a patient.

Conclusion

In this paper we proposed a framework for personalized medicine where a patient is profiled by RPPA and drug sensitivity is predicted by ANBC and LCR. Experimental results with lung cancer data demonstrate that RPPA can be used to profile patients for drug sensitivity prediction by Bayesian network classifier, and the proposed ANBC for personalized cancer medicine achieves better prediction accuracy than naive Bayes classifier in small sample size data on average and outperforms other the state-of-the-art classifier methods in terms of classification accuracy.
  相似文献   

20.

Background  

In high density arrays, the identification of relevant genes for disease classification is complicated by not only the curse of dimensionality but also the highly correlated nature of the array data. In this paper, we are interested in the question of how many and which genes should be selected for a disease class prediction. Our work consists of a Bayesian supervised statistical learning approach to refine gene signatures with a regularization which penalizes for the correlation between the variables selected.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号