首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Fröhlich H 《PloS one》2011,6(10):e25364
Diagnostic and prognostic biomarkers for cancer based on gene expression profiles are viewed as a major step towards a better personalized medicine. Many studies using various computational approaches have been published in this direction during the last decade. However, when comparing different gene signatures for related clinical questions often only a small overlap is observed. This can have various reasons, such as technical differences of platforms, differences in biological samples or their treatment in lab, or statistical reasons because of the high dimensionality of the data combined with small sample size, leading to unstable selection of genes. In conclusion retrieved gene signatures are often hard to interpret from a biological point of view. We here demonstrate that it is possible to construct a consensus signature from a set of seemingly different gene signatures by mapping them on a protein interaction network. Common upstream proteins of close gene products, which we identified via our developed algorithm, show a very clear and significant functional interpretation in terms of overrepresented KEGG pathways, disease associated genes and known drug targets. Moreover, we show that such a consensus signature can serve as prior knowledge for predictive biomarker discovery in breast cancer. Evaluation on different datasets shows that signatures derived from the consensus signature reveal a much higher stability than signatures learned from all probesets on a microarray, while at the same time being at least as predictive. Furthermore, they are clearly interpretable in terms of enriched pathways, disease associated genes and known drug targets. In summary we thus believe that network based consensus signatures are not only a way to relate seemingly different gene signatures to each other in a functional manner, but also to establish prior knowledge for highly stable and interpretable predictive biomarkers.  相似文献   

2.
Multivariate biomarkers that can predict the effectiveness of targeted therapy in individual patients are highly desired. Previous biomarker discovery studies have largely focused on the identification of single biomarker signatures, aimed at maximizing prediction accuracy. Here, we present a different approach that identifies multiple biomarkers by simultaneously optimizing their predictive power, number of features, and proximity to the drug target in a protein-protein interaction network. To this end, we incorporated NSGA-II, a fast and elitist multi-objective optimization algorithm that is based on the principle of Pareto optimality, into the biomarker discovery workflow. The method was applied to quantitative phosphoproteome data of 19 non-small cell lung cancer (NSCLC) cell lines from a previous biomarker study. The algorithm successfully identified a total of 77 candidate biomarker signatures predicting response to treatment with dasatinib. Through filtering and similarity clustering, this set was trimmed to four final biomarker signatures, which then were validated on an independent set of breast cancer cell lines. All four candidates reached the same good prediction accuracy (83%) as the originally published biomarker. Although the newly discovered signatures were diverse in their composition and in their size, the central protein of the originally published signature — integrin β4 (ITGB4) — was also present in all four Pareto signatures, confirming its pivotal role in predicting dasatinib response in NSCLC cell lines. In summary, the method presented here allows for a robust and simultaneous identification of multiple multivariate biomarkers that are optimized for prediction performance, size, and relevance.  相似文献   

3.
4.
Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient.  相似文献   

5.
Because people age differently, age is not a sufficient marker of susceptibility to disabilities, morbidities, and mortality. We measured nineteen blood biomarkers that include constituents of standard hematological measures, lipid biomarkers, and markers of inflammation and frailty in 4704 participants of the Long Life Family Study (LLFS), age range 30–110 years, and used an agglomerative algorithm to group LLFS participants into clusters thus yielding 26 different biomarker signatures. To test whether these signatures were associated with differences in biological aging, we correlated them with longitudinal changes in physiological functions and incident risk of cancer, cardiovascular disease, type 2 diabetes, and mortality using longitudinal data collected in the LLFS. Signature 2 was associated with significantly lower mortality, morbidity, and better physical function relative to the most common biomarker signature in LLFS, while nine other signatures were associated with less successful aging, characterized by higher risks for frailty, morbidity, and mortality. The predictive values of seven signatures were replicated in an independent data set from the Framingham Heart Study with comparable significant effects, and an additional three signatures showed consistent effects. This analysis shows that various biomarker signatures exist, and their significant associations with physical function, morbidity, and mortality suggest that these patterns represent differences in biological aging. The signatures show that dysregulation of a single biomarker can change with patterns of other biomarkers, and age‐related changes of individual biomarkers alone do not necessarily indicate disease or functional decline.  相似文献   

6.
Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments considering samples with the same type of disease. The lack of a consensus is mostly due to the fact that sample sizes are far smaller than the numbers of candidate features to be considered, and therefore signature selection suffers from large variation. We propose a robust signature selection method that enhances the selection stability of penalized regression algorithms for predicting survival risk. Our method is based on an aggregation of multiple, possibly unstable, signatures obtained with the preconditioned lasso algorithm applied to random (internal) subsamples of a given cohort data, where the aggregated signature is shrunken by a simple thresholding strategy. The resulting method, RS-PL, is conceptually simple and easy to apply, relying on parameters automatically tuned by cross validation. Robust signature selection using RS-PL operates within an (external) subsampling framework to estimate the selection probabilities of features in multiple trials of RS-PL. These probabilities are used for identifying reliable features to be included in a signature. Our method was evaluated on microarray data sets from neuroblastoma, lung adenocarcinoma, and breast cancer patients, extracting robust and relevant signatures for predicting survival risk. Signatures obtained by our method achieved high prediction performance and robustness, consistently over the three data sets. Genes with high selection probability in our robust signatures have been reported as cancer-relevant. The ordering of predictor coefficients associated with signatures was well-preserved across multiple trials of RS-PL, demonstrating the capability of our method for identifying a transferable consensus signature. The software is available as an R package rsig at CRAN (http://cran.r-project.org).  相似文献   

7.
8.

Background  

There is an urgent need for new prognostic markers of breast cancer metastases to ensure that newly diagnosed patients receive appropriate therapy. Recent studies have demonstrated the potential value of gene expression signatures in assessing the risk of developing distant metastases. However, due to the small sample sizes of individual studies, the overlap among signatures is almost zero and their predictive power is often limited. Integrating microarray data from multiple studies in order to increase sample size is therefore a promising approach to the development of more robust prognostic tests.  相似文献   

9.
In the area of omics profiling in toxicology, i.e. toxicogenomics, characteristic molecular profiles have previously been incorporated into prediction models for early assessment of a carcinogenic potential and mechanism-based classification of compounds. Traditionally, the biomarker signatures used for model construction were derived from individual high-throughput techniques, such as microarrays designed for monitoring global mRNA expression. In this study, we built predictive models by integrating omics data across complementary microarray platforms and introduced new concepts for modeling of pathway alterations and molecular interactions between multiple biological layers. We trained and evaluated diverse machine learning-based models, differing in the incorporated features and learning algorithms on a cross-omics dataset encompassing mRNA, miRNA, and protein expression profiles obtained from rat liver samples treated with a heterogeneous set of substances. Most of these compounds could be unambiguously classified as genotoxic carcinogens, non-genotoxic carcinogens, or non-hepatocarcinogens based on evidence from published studies. Since mixed characteristics were reported for the compounds Cyproterone acetate, Thioacetamide, and Wy-14643, we reclassified these compounds as either genotoxic or non-genotoxic carcinogens based on their molecular profiles. Evaluating our toxicogenomics models in a repeated external cross-validation procedure, we demonstrated that the prediction accuracy of our models could be increased by joining the biomarker signatures across multiple biological layers and by adding complex features derived from cross-platform integration of the omics data. Furthermore, we found that adding these features resulted in a better separation of the compound classes and a more confident reclassification of the three undefined compounds as non-genotoxic carcinogens.  相似文献   

10.
Colorectal cancer (CRC) is highly heterogeneous leading to variable prognosis and treatment responses. Therefore, it is necessary to explore novel personalized and reproducible prognostic signatures to aid clinical decision‐making. The present study combined large‐scale gene expression profiles and clinical data of 1828 patients with CRC from multi‐centre studies and identified a personalized gene prognostic signature consisting of 46 unique genes (called function‐derived personalized gene signature [FunPGS]) from an integrated statistics and function‐derived perspective. In the meta‐training and multiple independent validation cohorts, the FunPGS effectively discriminated patients with CRC with significantly different prognosis at the individual level and remained as an independent factor upon adjusting for clinical covariates in multivariate analysis. Furthermore, the FunPGS demonstrated superior performance for risk stratification with respect to other recently reported signatures and clinical factors. The complementary value of the molecular signature and clinical factors was further explored, and it was observed that the composite signature called IMCPS greatly improved the predictive performance of survival estimation relative to molecular signatures or clinical factors alone. With further prospective validation in clinical trials, the FunPGS may become a promising and powerful personalized prognostic tool for stratifying patients with CRC in order to achieve an optimal systemic therapy.  相似文献   

11.
GATHER: a systems approach to interpreting genomic signatures   总被引:1,自引:0,他引:1  
MOTIVATION: Understanding the full meaning of the biology captured in molecular profiles, within the context of the entire biological system, cannot be achieved with a simple examination of the individual genes in the signature. To facilitate such an understanding, we have developed GATHER, a tool that integrates various forms of available data to elucidate biological context within molecular signatures produced from high-throughput post-genomic assays. RESULTS: Analyzing the Rb/E2F tumor suppressor pathway, we show that GATHER identifies critical features of the pathway. We further show that GATHER identifies common biology in a series of otherwise unrelated gene expression signatures that each predict breast cancer outcome. We quantify the performance of GATHER and find that it successfully predicts 90% of the functions over a broad range of gene groups. We believe that GATHER provides an essential tool for extracting the full value from molecular signatures generated from genome-scale analyses. AVAILABILITY: GATHER is available at http://gather.genome.duke.edu/  相似文献   

12.
The success of reproduction depends greatly upon gamete quality, especially oocytes which carry most of the molecular material necessary for early embryogenesis. However, it remains difficult to find relevant morphologic and/or biochemical parameters to assess oocyte quality and thus have a reliable prediction of the reproduction performance. To understand which criteria are the most reliable to assess the reproductive success of the Eurasian perch (Perca fluviatilis), we measured 14 parameters characterizing female, spawn, oocyte, and embryonic or larval development on 20 independent spawn. A data analysis allowed the definition of two clusters of spawn with different larval characteristics: the first cluster was composed of spawn which led mainly to strong large larvae presenting a low deformity rate, while the second cluster rather corresponds to spawn leading to smaller and weaker larvae with a higher deformity rate. Moreover, a third cluster (unfertilized spawn) was studied. Our analysis revealed that most of the prefertilization biological traits that we studied appeared poorly relevant to predict larval features, proper embryonic development and deformity occurrences. We thus performed a large scale proteomic analysis to highlight proteins differently expressed in each spawn cluster. A 2D-DIGE study followed by an MS/MS spectrometry allowed the identification of 32 proteins involved in several biological functions and differently expressed between spawn clusters. Among them, proteins involved in cell response to the oxidative stress, as well as energetic metabolism, heat shock proteins and Vitellogenins are of particular interest. Several functions appear specific to a spawn cluster and could thus explain their corresponding reproduction performance. In the future, proteins involved in those cellular mechanisms may constitute molecular markers predictive of the reproduction performance in Perca fluviatilis.  相似文献   

13.
ABSTRACT: Since the advent of the new proteomics era more than a decade ago, large-scale studies of protein profiling have been used to identify distinctive molecular signatures in a wide array of biological systems, spanning areas of basic biological research, clinical diagnostics, and biomarker discovery directed toward therapeutic applications. Recent advances in protein separation and identification techniques have significantly improved proteomic approaches, leading to enhancement of the depth and breadth of proteome coverage. Proteomic signatures, specific for multiple diseases, including cancer and pre-invasive lesions, are emerging. This article combines, in a simple manner, relevant proteomic and OMICS clues used in the discovery and development of diagnostic and prognostic biomarkers that are applicable to all clinical fields, thus helping to improve applications of clinical proteomic strategies for translational medicine research.  相似文献   

14.
The dawn of a new Proteomics era, just over a decade ago, allowed for large-scale protein profiling studies that have been applied in the identification of distinctive molecular cell signatures. Proteomics provides a powerful approach for identifying and studying these multiple molecular markers in a vast array of biological systems, whether focusing on basic biological research, diagnosis, therapeutics, or systems biology. This is a continuously expanding field that relies on the combination of different methodologies and current advances, both technological and analytical, which have led to an explosion of protein signatures and biomarker candidates. But how are these biological markers obtained? And, most importantly, what can we learn from them? Herein, we briefly overview the currently available approaches for obtaining relevant information at the proteome level, while noting the current and future roles of both traditional and modern proteomics. Moreover, we provide some considerations on how the development of powerful and robust bioinformatics tools will greatly benefit high-throughput proteomics. Such strategies are of the utmost importance in the rapidly emerging field of immunoproteomics, which may play a key role in the identification of antigens with diagnostic and/or therapeutic potential and in the development of new vaccines. Finally, we consider the present limitations in the discovery of new signatures and biomarkers and speculate on how such hurdles may be overcome, while also offering a prospect for the next few years in what could be one of the most significant strategies in translational medicine research.  相似文献   

15.
Fu LM  Fu-Liu CS 《FEBS letters》2004,561(1-3):186-190
Differential diagnosis among a group of histologically similar cancers poses a challenging problem in clinical medicine. Constructing a classifier based on gene expression signatures comprising multiple discriminatory molecular markers derived from microarray data analysis is an emerging trend for cancer diagnosis. To identify the best genes for classification using a small number of samples relative to the genome size remains the bottleneck of this approach, despite its promise. We have devised a new method of gene selection with reliability analysis, and demonstrated that this method can identify a more compact set of genes than other methods for constructing a classifier with optimum predictive performance for both small round blue cell tumors and leukemia. High consensus between our result and the results produced by methods based on artificial neural networks and statistical techniques confers additional evidence of the validity of our method. This study suggests a way for implementing a reliable molecular cancer classifier based on gene expression signatures.  相似文献   

16.
17.

Purpose

To evaluate the accuracy of the sub-classification of renal cortical neoplasms using molecular signatures.

Experimental Design

A search of publicly available databases was performed to identify microarray datasets with multiple histologic sub-types of renal cortical neoplasms. Meta-analytic techniques were utilized to identify differentially expressed genes for each histologic subtype. The lists of genes obtained from the meta-analysis were used to create predictive signatures through the use of a pair-based method. These signatures were organized into an algorithm to sub-classify renal neoplasms. The use of these signatures according to our algorithm was validated on several independent datasets.

Results

We identified three Gene Expression Omnibus datasets that fit our criteria to develop a training set. All of the datasets in our study utilized the Affymetrix platform. The final training dataset included 149 samples represented by the four most common histologic subtypes of renal cortical neoplasms: 69 clear cell, 41 papillary, 16 chromophobe, and 23 oncocytomas. When validation of our signatures was performed on external datasets, we were able to correctly classify 68 of the 72 samples (94%). The correct classification by subtype was 19/20 (95%) for clear cell, 14/14 (100%) for papillary, 17/19 (89%) for chromophobe, 18/19 (95%) for oncocytomas.

Conclusions

Through the use of meta-analytic techniques, we were able to create an algorithm that sub-classified renal neoplasms on a molecular level with 94% accuracy across multiple independent datasets. This algorithm may aid in selecting molecular therapies and may improve the accuracy of subtyping of renal cortical tumors.  相似文献   

18.
In the last three decades, predictive models have been developed and applied worldwide for freshwater bioassessment. They consist of statistical tools that follow the concept of the Reference Condition Approach. Composed of several sequential steps, these assessment tools assess the deviation of given site assemblages from the expected biological condition in the absence of human disturbance. The most common approaches (RIVPACS/AUSRIVAS and BEAST) are based on a posteriori classifications that use the biological composition of a community to classify reference sites in groups, and afterwards to establish which environmental features best discriminate the biological groups obtained. Here, we review the predictive modeling procedures used in freshwaters bioassessment (RIVPACS/AUSRIVAS, BEAST, ANNA, Artificial Neural Networks, Bayesian Belief Networks and others) as well as the biological elements to which they have been applied. We also review the Spanish and Portuguese experiences in the development and application of predictive models, with particular attention to regional environmental conditions, the different modeling approaches, and the available implementation tools. Moreover, and considering the natural continuity within the Iberian Peninsula (which include several transnational rivers), we discuss the possibilities of the development of common predictive models across the region, considering all factors that may influence their performance, such as the target scale used to develop the models (regional or peninsular); common reference criteria; sampling and sorting procedures; the taxonomic resolution used in the models; the temporal variability (mainly in the Iberian Mediterranean region); and the biological elements to consider. We concluded that there are good technical conditions for the implementations of a common predictive approach throughout the Iberian Peninsula, which should allow a global biological assessment of streams with different biological elements and seasons that could be used by water managers in the context of the Water Framework Directive. (© 2011 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

19.
Species Distribution Models (SDMs) are a powerful tool to derive habitat suitability predictions relating species occurrence data with habitat features. Two of the most frequently applied algorithms to model species-habitat relationships are Generalised Linear Models (GLM) and Random Forest (RF). The former is a parametric regression model providing functional models with direct interpretability. The latter is a machine learning non-parametric algorithm, more tolerant than other approaches in its assumptions, which has often been shown to outperform parametric algorithms. Other approaches have been developed to produce robust SDMs, like training data bootstrapping and spatial scale optimisation. Using felid presence-absence data from three study regions in Southeast Asia (mainland, Borneo and Sumatra), we tested the performances of SDMs by implementing four modelling frameworks: GLM and RF with bootstrapped and non-bootstrapped training data. With Mantel and ANOVA tests we explored how the four combinations of algorithms and bootstrapping influenced SDMs and their predictive performances. Additionally, we tested how scale-optimisation responded to species' size, taxonomic associations (species and genus), study area and algorithm. We found that choice of algorithm had strong effect in determining the differences between SDMs' spatial predictions, while bootstrapping had no effect. Additionally, algorithm followed by study area and species, were the main factors driving differences in the spatial scales identified. SDMs trained with GLM showed higher predictive performance, however, ANOVA tests revealed that algorithm had significant effect only in explaining the variance observed in sensitivity and specificity and, when interacting with bootstrapping, in Percent Correctly Classified (PCC). Bootstrapping significantly explained the variance in specificity, PCC and True Skills Statistics (TSS). Our results suggest that there are systematic differences in the scales identified and in the predictions produced by GLM vs. RF, but that neither approach was consistently better than the other. The divergent predictions and inconsistent predictive abilities suggest that analysts should not assume machine learning is inherently superior and should test multiple methods. Our results have strong implications for SDM development, revealing the inconsistencies introduced by the choice of algorithm on scale optimisation, with GLM selecting broader scales than RF.  相似文献   

20.
In the drug discovery process, the metabolic fate of drugs is crucially important to prevent drug-drug interactions. Therefore, P450 isozyme selectivity prediction is an important task for screening drugs of appropriate metabolism profiles. Recently, large-scale activity data of five P450 isozymes (CYP1A2 CYP2C9, CYP3A4, CYP2D6, and CYP2C19) have been obtained using quantitative high-throughput screening with a bioluminescence assay. Although some isozymes share similar selectivities, conventional supervised learning algorithms independently learn a prediction model from each P450 isozyme. They are unable to exploit the other P450 isozyme activity data to improve the predictive performance of each P450 isozyme's selectivity. To address this issue, we apply transfer learning that uses activity data of the other isozymes to learn a prediction model from multiple P450 isozymes. After using the large-scale P450 isozyme selectivity dataset for five P450 isozymes, we evaluate the model's predictive performance. Experimental results show that, overall, our algorithm outperforms conventional supervised learning algorithms such as support vector machine (SVM), Weighted k-nearest neighbor classifier, Bagging, Adaboost, and latent semantic indexing (LSI). Moreover, our results show that the predictive performance of our algorithm is improved by exploiting the multiple P450 isozyme activity data in the learning process. Our algorithm can be an effective tool for P450 selectivity prediction for new chemical entities using multiple P450 isozyme activity data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号