首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Prediction of medulloblastoma clinical outcome is crucial to personalizing treatment, both to identify high-risk patients for aggressive or alternative therapy and to spare those at low risk from excessive treatment. The best predictors [Pomeroy et al. (2002) Nature 415, 436–442], based on gene expression monitoring at diagnosis, have shown much less accuracy in recognizing patients with eventual failed outcomes – <50% for the predictor making fewest total errors – than those who would survive, while a single gene predictor exhibited reverse asymmetry. Such inaccuracy in recognizing one of the outcomes is a problem for clinical use. We hypothesized that a non-linear model could be built to significantly improve prediction of medulloblastoma outcome, thereby promoting use of gene-expression-based predictors in a clinical setting. In fact, this approach resulted in fewer errors and much less asymmetry in prediction, and bidirectional accuracy of about 80% could be obtained via its combination with other methods. Indeed, three combinations of methods were identified that yielded significantly better predictions of clinical outcome than previously attained, making feasible predictors of medulloblastoma treatment response with greatly improved bidirectional accuracy essential for clinical use.  相似文献   

2.
Accurately predicting clinical outcome or metastatic status from gene expression profiles remains one of the biggest hurdles facing the adoption of predictive medicine. Recently, MacDonald et al. (Nat. Genet. 2001, 29, 143-152) used gene expression profiles, from samples taken at diagnosis, to distinguish between clinically designated metastatic and nonmetastatic primary medulloblastomas, helping to elucidate the genetic mechanisms underlying metastasis and suggesting novel therapeutic targets. The obtained accuracy of predicting metastatic status does not, however, reach statistical significance on Fisher's exact test, although 22 training samples were used to make each prediction via leave-one-out testing. This paper introduces readily implemented nonlinear filters to transform sequences of gene expression levels into output signals that are significantly easier to classify and predict metastasis. It is shown that when only 3 exemplars each from the metastatic and nonmetastatic classes were assumed known, a predictor was constructed whose accuracy is statistically significant over the remaining profiles set aside as a test set. The predictor was as effective in recognizing metastatic as nonmetastatic medulloblastomas, and may be helpful in deciding which patients require more aggressive therapy. The same predictor was similarly effective on an independent set of 5 nonmetastatic tumors and 3 metastatic cell lines also used by MacDonald et al.  相似文献   

3.
Although it is increasingly evident that cancer is influenced by signals emanating from tumor stroma, little is known regarding how changes in stromal gene expression affect epithelial tumor progression. We used laser capture microdissection to compare gene expression profiles of tumor stroma from 53 primary breast tumors and derived signatures strongly associated with clinical outcome. We present a new stroma-derived prognostic predictor (SDPP) that stratifies disease outcome independently of standard clinical prognostic factors and published expression-based predictors. The SDPP predicts outcome in several published whole tumor-derived expression data sets, identifies poor-outcome individuals from multiple clinical subtypes, including lymph node-negative tumors, and shows increased accuracy with respect to previously published predictors, especially for HER2-positive tumors. Prognostic power increases substantially when the predictor is combined with existing outcome predictors. Genes represented in the SDPP reveal the strong prognostic capacity of differential immune responses as well as angiogenic and hypoxic responses, highlighting the importance of stromal biology in tumor progression.  相似文献   

4.
BackgroundStudies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following shortcomings: 1) They use the Cox model for prediction. However, the RSF model has been shown to significantly outperform the Cox model. 2) Testing was not done to see if a complete set of clinical predictors could predict as well as the gene expression signatures.Methodology/FindingsWe address these shortcomings. The METABRIC data set concerns 1981 breast cancer tumors. Features include 21 clinical features, expression levels for 16,384 genes, and survival. We compare the survival prediction performance of the Cox model and the RSF model using the clinical data and the gene expression data to their performance using only the clinical data. We obtain significantly better results when we used both clinical data and gene expression data for 5 year, 10 year, and 15 year survival prediction. When we replace the gene expression data by PAM50 subtype, our results are significant only for 5 year and 15 year prediction. We obtain significantly better results using the RSF model over the Cox model. Finally, our results indicate that gene expression data alone may predict long-term survival.Conclusions/SignificanceOur results indicate that we can obtain improved survival prediction using clinical data and gene expression data compared to prediction using only clinical data. We further conclude that we can obtain improved survival prediction using the RSF model instead of the Cox model. These results are significant because by incorporating more gene expression data with clinical features and using the RSF model, we could develop decision support systems that better utilize heterogeneous information to improve outcome prediction and decision making.  相似文献   

5.
Sequence-based prediction of protein secondary structure (SS) enjoys wide-spread and increasing use for the analysis and prediction of numerous structural and functional characteristics of proteins. The lack of a recent comprehensive and large-scale comparison of the numerous prediction methods results in an often arbitrary selection of a SS predictor. To address this void, we compare and analyze 12 popular, standalone and high-throughput predictors on a large set of 1975 proteins to provide in-depth, novel and practical insights. We show that there is no universally best predictor and thus detailed comparative studies are needed to support informed selection of SS predictors for a given application. Our study shows that the three-state accuracy (Q3) and segment overlap (SOV3) of the SS prediction currently reach 82% and 81%, respectively. We demonstrate that carefully designed consensus-based predictors improve the Q3 by additional 2% and that homology modeling-based methods are significantly better by 1.5% Q3 than ab initio approaches. Our empirical analysis reveals that solvent exposed and flexible coils are predicted with a higher quality than the buried and rigid coils, while inverse is true for the strands and helices. We also show that longer helices are easier to predict, which is in contrast to longer strands that are harder to find. The current methods confuse 1-6% of strand residues with helical residues and vice versa and they perform poorly for residues in the β- bridge and 3(10)-helix conformations. Finally, we compare predictions of the standalone implementations of four well-performing methods with their corresponding web servers.  相似文献   

6.
Measurements from microarrays and other high-throughput technologies are susceptible to non-biological artifacts like batch effects. It is known that batch effects can alter or obscure the set of significant results and biological conclusions in high-throughput studies. Here we examine the impact of batch effects on predictors built from genomic technologies. To investigate batch effects, we collected publicly available gene expression measurements with known outcomes, and estimated batches using date. Using these data we show (1) the impact of batch effects on prediction depends on the correlation between outcome and batch in the training data, and (2) removing expression measurements most affected by batch before building predictors may improve the accuracy of those predictors. These results suggest that (1) training sets should be designed to minimize correlation between batches and outcome, and (2) methods for identifying batch-affected probes should be developed to improve prediction results for studies with high correlation between batches and outcome.  相似文献   

7.
MOTIVATION: It is important to predict the outcome of patients with diffuse large-B-cell lymphoma after chemotherapy, since the survival rate after treatment of this common lymphoma disease is <50%. Both clinically based outcome predictors and the gene expression-based molecular factors have been proposed independently in disease prognosis. However combining the high-dimensional genomic data and the clinically relevant information to predict disease outcome is challenging. RESULTS: We describe an integrated clinicogenomic modeling approach that combines gene expression profiles and the clinically based International Prognostic Index (IPI) for personalized prediction in disease outcome. Dimension reduction methods are proposed to produce linear combinations of gene expressions, while taking into account clinical IPI information. The extracted summary measures capture all the regression information of the censored survival phenotype given both genomic and clinical data, and are employed as covariates in the subsequent survival model formulation. A case study of diffuse large-B-cell lymphoma data, as well as Monte Carlo simulations, both demonstrate that the proposed integrative modeling improves the prediction accuracy, delivering predictions more accurate than those achieved by using either clinical data or molecular predictors alone.  相似文献   

8.
Ocean currents are expected to be the predominant environmental factor influencing the dispersal of planktonic larvae or spores; yet, their characterization as predictors of marine connectivity has been hindered by a lack of understanding of how best to use oceanographic data. We used a high-resolution oceanographic model output and Lagrangian particle simulations to derive oceanographic distances (hereafter called transport times) between sites studied for Macrocystis pyrifera genetic differentiation. We build upon the classical isolation-by-distance regression model by asking how much additional variability in genetic differentiation is explained when adding transport time as predictor. We explored the extent to which gene flow is dependent upon seasonal changes in ocean circulation. Because oceanographic transport between two sites is inherently asymmetric, we also compare the explanatory power of models using the minimum or the mean transport times. Finally, we compare the direction of connectivity as estimated by the oceanographic model and genetic assignment tests. We show that the minimum transport time had higher explanatory power than the mean transport time, revealing the importance of considering asymmetry in ocean currents when modelling gene flow. Genetic assignment tests were much less effective in determining asymmetry in gene flow. Summer-derived transport times, in particular for the month of June, which had the strongest current speed, greatest asymmetry and highest spore production, resulted in the best-fit model explaining twice the variability in genetic differentiation relative to models that use geographic distance or habitat continuity. The best overall model also included habitat continuity and explained 65% of the variation in genetic differentiation among sites.  相似文献   

9.
ABSTRACT: BACKGROUND: Limited controlled data exist to guide treatment choices for clinicians caring for patients with major depressive disorder (MDD). Although many putative predictors of treatment response have been reported, most were identified through retrospective analyses of existing datasets and very few have been replicated in a manner that can impact clinical practice. One major confound in previous studies examining predictors of treatment response is the patient's treatment history, which may affect both the predictor of interest and treatment outcomes. Moreover, prior treatment history provides an important source of selection bias, thereby limiting generalizability. Consequently, we initiated a randomized clinical trial designed to identify factors that moderate response to three treatments for MDD among patients never treated previously for the condition. METHODS: Treatment-naive adults aged 18-65 years with moderate-to-severe, non-psychotic MDD are randomized equally to one of three 12-week treatment arms: 1) cognitive behavior therapy (CBT, 16 sessions), 2) duloxetine (30-60 mg/d), or 3) escitalopram (10-20 mg/d). Prior to randomization, patients undergo multiple assessments, including resting state functional magnetic resonance imaging (fMRI), immune markers, DNA and gene expression products, and dexamethasone-corticotropin releasing hormone (Dex/CRH) testing. Prior to or shortly after randomization, patients also complete a comprehensive personality assessment. Repeat assessment of the biological measures (fMRI, immune markers, and gene expression products) occur at an early time-point in treatment, and upon completion of 12-week treatment, when a a second Dex/CRH test is also conducted, Patients remitting by the end of this acute treatment phase are then eligible to enter a 21-month follow-up phase, with quarterly visits to monitor for recurrence. Non-remitters are offered augmentation treatment for a second 12-week course of treatment, during which they receive a combination of CBT and antidepressant medication. Predictors of the primary outcome, remission, will be identified for overall and treatment-specific effects, and a statistical model incorporating multiple predictors will be developed to predict outcomes. DISCUSSION: The PReDICT study's evaluation of biological, psychological, and clinical factors that may differentially impact treatment outcomes represents a sizeable step toward developing personalized treatments for MDD. Identified predictors should help guide the selection of initial treatments, and identify those patients most vulnerable to recurrence, who thus warrant maintenance or combination treatments to achieve and maintain wellness.  相似文献   

10.
Previous studies have reported conflicting assessments of the ability of cell line-derived multi-gene predictors (MGPs) to forecast patient clinical outcomes in cancer patients, thereby warranting an investigation into their suitability for this task. Here, 42 breast cancer cell lines were evaluated by chemoresponse tests after treatment with either TFAC or FEC, two widely used standard combination chemotherapies for breast cancer. We used two different training cell line sets and two independent prediction methods, superPC and COXEN, to develop cell line-based MGPs, which were then validated in five patient cohorts treated with these chemotherapies. This evaluation yielded high prediction performances by these MGPs, regardless of the training set, chemotherapy, or prediction method. The MGPs were also able to predict patient clinical outcomes for the subgroup of estrogen receptor (ER)-negative patients, which has proven difficult in the past. These results demonstrated a potential of using an in vitro-based chemoresponse data as a model system in creating MGPs for stratifying patients’ therapeutic responses. Clinical utility and applications of these MGPs will need to be carefully examined with relevant clinical outcome measurements and constraints in practical use.  相似文献   

11.
《Biophysical journal》2021,120(20):4312-4319
Intrinsically disordered proteins and protein regions make up a substantial fraction of many proteomes in which they play a wide variety of essential roles. A critical first step in understanding the role of disordered protein regions in biological function is to identify those disordered regions correctly. Computational methods for disorder prediction have emerged as a core set of tools to guide experiments, interpret results, and develop hypotheses. Given the multiple different predictors available, consensus scores have emerged as a popular approach to mitigate biases or limitations of any single method. Consensus scores integrate the outcome of multiple independent disorder predictors and provide a per-residue value that reflects the number of tools that predict a residue to be disordered. Although consensus scores help mitigate the inherent problems of using any single disorder predictor, they are computationally expensive to generate. They also necessitate the installation of multiple different software tools, which can be prohibitively difficult. To address this challenge, we developed a deep-learning-based predictor of consensus disorder scores. Our predictor, metapredict, utilizes a bidirectional recurrent neural network trained on the consensus disorder scores from 12 proteomes. By benchmarking metapredict using two orthogonal approaches, we found that metapredict is among the most accurate disorder predictors currently available. Metapredict is also remarkably fast, enabling proteome-scale disorder prediction in minutes. Importantly, metapredict is a fully open source and is distributed as a Python package, a collection of command-line tools, and a web server, maximizing the potential practical utility of the predictor. We believe metapredict offers a convenient, accessible, accurate, and high-performance predictor for single-proteins and proteomes alike.  相似文献   

12.
The concept of the reward prediction error—the difference between reward obtained and reward predicted—continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience. Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors. However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making. Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively. Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance. Contrary to this intuition, we show that on static “bandit” choice tasks, differential learning rates can be adaptive. This occurs because asymmetric learning enables a better separation of learned reward probabilities. We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience. These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data.  相似文献   

13.
A determination of some of the factors that predict the outcome of contests between male tree lizards, Urosaurus ornatus, was made using logistic regression modelling on matched-pair data. Two-day-long encounters were staged between pairs of males differing in size (snout-vent length and mass), previous contest status (previous winners and previous losers), and coloration (dorsal coloration during their previous contest and throat coloration, a fixed trait). Mass proved to be the best single predictor of contest outcome, resulting in an 80% correct classification rate for predicting winners and losers, far better than the less than 57% correct classification rate for snout-vent length. Previous social status (winner or loser) also was a powerful single predictor of contest outcome with a 793% correct classification rate, as was previous dorsal coloration (76.7%). When combined, mass and previous status produced the strongest combination of predictors with a better than 86% correct classification rate. Contrary to several previous studies, which implicated throat coloration as an important status signal of dominance, our results failed to show that throat coloration is a strong predictor of contest outcome. Possible reasons for this discrepancy with earlier findings are discussed. The logistic regression models also allow prediction of the magnitude of difference in mass between two contestants for there to be an equal chance of winning, given a second asymmetry in contest predictors.  相似文献   

14.
Previous work in predicting protein localization to the chloroplast organelle in plants led to the development of an artificial neural network-based approach capable of remarkable accuracy in its prediction (ChloroP). A common criticism against such neural network models is that it is difficult to interpret the criteria that are used in making predictions. We address this concern with several new prediction methods that base predictions explicitly on the abundance of different amino acid types in the N-terminal region of the protein. Our successful prediction accuracy suggests that ChloroP uses little positional information in its decision-making; an unexpected result given the elaborate ChloroP input scheme. By removing positional information, our simpler methods allow us to identify those amino acids that are useful for successful prediction. The identification of important sequence features, such as amino acid content, is advantageous if one of the goals of localization predictors is to gain an understanding of the biological process of chloroplast localization. Our most accurate predictor combines principal component analysis and logistic regression. Web-based prediction using this method is available online at http://apicoplast.cis.upenn.edu/pclr/.  相似文献   

15.
In this paper, we propose a unified Bayesian joint modeling framework for studying association between a binary treatment outcome and a baseline matrix-valued predictor. Specifically, a joint modeling approach relating an outcome to a matrix-valued predictor through a probabilistic formulation of multilinear principal component analysis is developed. This framework establishes a theoretical relationship between the outcome and the matrix-valued predictor, although the predictor is not explicitly expressed in the model. Simulation studies are provided showing that the proposed method is superior or competitive to other methods, such as a two-stage approach and a classical principal component regression in terms of both prediction accuracy and estimation of association; its advantage is most notable when the sample size is small and the dimensionality in the imaging covariate is large. Finally, our proposed joint modeling approach is shown to be a very promising tool in an application exploring the association between baseline electroencephalography data and a favorable response to treatment in a depression treatment study by achieving a substantial improvement in prediction accuracy in comparison to competing methods.  相似文献   

16.
A significant step towards establishing the structure and function of a protein is the prediction of the local conformation of the polypeptide chain. In this article, we present systems for the prediction of three new alphabets of local structural motifs. The motifs are built by applying multidimensional scaling (MDS) and clustering to pair-wise angular distances for multiple phi-psi angle values collected from high-resolution protein structures. The predictive systems, based on ensembles of bidirectional recurrent neural network architectures, and trained on a large non-redundant set of protein structures, achieve 72%, 66%, and 60% correct motif prediction on an independent test set for di-peptides (six classes), tri-peptides (eight classes) and tetra-peptides (14 classes), respectively, 28-30% above baseline statistical predictors. We then build a further system, based on ensembles of two-layered bidirectional recurrent neural networks, to map structural motif predictions into a traditional 3-class (helix, strand, coil) secondary structure. This system achieves 79.5% correct prediction using the "hard" CASP 3-class assignment, and 81.4% with a more lenient assignment, outperforming a sophisticated state-of-the-art predictor (Porter) trained in the same experimental conditions. The structural motif predictor is publicly available at: http://distill.ucd.ie/porter+/.  相似文献   

17.
Prediction of patient-centered outcomes in hospitals is useful for performance benchmarking, resource allocation, and guidance regarding active treatment and withdrawal of care. Yet, their use by clinicians is limited by the complexity of available tools and amount of data required. We propose to use Disjunctive Normal Forms as a novel approach to predict hospital and 90-day mortality from instance-based patient data, comprising demographic, genetic, and physiologic information in a large cohort of patients admitted with severe community acquired pneumonia. We develop two algorithms to efficiently learn Disjunctive Normal Forms, which yield easy-to-interpret rules that explicitly map data to the outcome of interest. Disjunctive Normal Forms achieve higher prediction performance quality compared to a set of state-of-the-art machine learning models, and unveils insights unavailable with standard methods. Disjunctive Normal Forms constitute an intuitive set of prediction rules that could be easily implemented to predict outcomes and guide criteria-based clinical decision making and clinical trial execution, and thus of greater practical usefulness than currently available prediction tools. The Java implementation of the tool JavaDNF will be publicly available.  相似文献   

18.
19.
20.
Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号