首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Fröhlich H 《PloS one》2011,6(10):e25364
Diagnostic and prognostic biomarkers for cancer based on gene expression profiles are viewed as a major step towards a better personalized medicine. Many studies using various computational approaches have been published in this direction during the last decade. However, when comparing different gene signatures for related clinical questions often only a small overlap is observed. This can have various reasons, such as technical differences of platforms, differences in biological samples or their treatment in lab, or statistical reasons because of the high dimensionality of the data combined with small sample size, leading to unstable selection of genes. In conclusion retrieved gene signatures are often hard to interpret from a biological point of view. We here demonstrate that it is possible to construct a consensus signature from a set of seemingly different gene signatures by mapping them on a protein interaction network. Common upstream proteins of close gene products, which we identified via our developed algorithm, show a very clear and significant functional interpretation in terms of overrepresented KEGG pathways, disease associated genes and known drug targets. Moreover, we show that such a consensus signature can serve as prior knowledge for predictive biomarker discovery in breast cancer. Evaluation on different datasets shows that signatures derived from the consensus signature reveal a much higher stability than signatures learned from all probesets on a microarray, while at the same time being at least as predictive. Furthermore, they are clearly interpretable in terms of enriched pathways, disease associated genes and known drug targets. In summary we thus believe that network based consensus signatures are not only a way to relate seemingly different gene signatures to each other in a functional manner, but also to establish prior knowledge for highly stable and interpretable predictive biomarkers.  相似文献   

2.
Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available.  相似文献   

3.
Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice.  相似文献   

4.
Cancer genomes often harbor hundreds of molecular aberrations. Such genetic variants can be drivers or passengers of tumorigenesis and create vulnerabilities for potential therapeutic exploitation. To identify genotype‐dependent vulnerabilities, forward genetic screens in different genetic backgrounds have been conducted. We devised MINGLE, a computational framework to integrate CRISPR/Cas9 screens originating from different libraries building on approaches pioneered for genetic network discovery in model organisms. We applied this method to integrate and analyze data from 85 CRISPR/Cas9 screens in human cancer cells combining functional data with information on genetic variants to explore more than 2.1 million gene‐background relationships. In addition to known dependencies, we identified new genotype‐specific vulnerabilities of cancer cells. Experimental validation of predicted vulnerabilities identified GANAB and PRKCSH as new positive regulators of Wnt/β‐catenin signaling. By clustering genes with similar genetic interaction profiles, we drew the largest genetic network in cancer cells to date. Our scalable approach highlights how diverse genetic screens can be integrated to systematically build informative maps of genetic interactions in cancer, which can grow dynamically as more data are included.  相似文献   

5.
《遗传学报》2021,48(7):540-551
The response rate of most anti-cancer drugs is limited because of the high heterogeneity of cancer and the complex mechanism of drug action. Personalized treatment that stratifies patients into subgroups using molecular biomarkers is promising to improve clinical benefit. With the accumulation of preclinical models and advances in computational approaches of drug response prediction, pharmacogenomics has made great success over the last 20 years and is increasingly used in the clinical practice of personalized cancer medicine. In this article, we first summarize FDA-approved pharmacogenomic biomarkers and large-scale pharmacogenomic studies of preclinical cancer models such as patient-derived cell lines, organoids, and xenografts. Furthermore, we comprehensively review the recent developments of computational methods in drug response prediction, covering network, machine learning, and deep learning technologies and strategies to evaluate immunotherapy response. In the end, we discuss challenges and propose possible solutions for further improvement.  相似文献   

6.

Background

One of the major goals in gene and protein expression profiling of cancer is to identify biomarkers and build classification models for prediction of disease prognosis or treatment response. Many traditional statistical methods, based on microarray gene expression data alone and individual genes' discriminatory power, often fail to identify biologically meaningful biomarkers thus resulting in poor prediction performance across data sets. Nonetheless, the variables in multivariable classifiers should synergistically interact to produce more effective classifiers than individual biomarkers.

Results

We developed an integrated approach, namely network-constrained support vector machine (netSVM), for cancer biomarker identification with an improved prediction performance. The netSVM approach is specifically designed for network biomarker identification by integrating gene expression data and protein-protein interaction data. We first evaluated the effectiveness of netSVM using simulation studies, demonstrating its improved performance over state-of-the-art network-based methods and gene-based methods for network biomarker identification. We then applied the netSVM approach to two breast cancer data sets to identify prognostic signatures for prediction of breast cancer metastasis. The experimental results show that: (1) network biomarkers identified by netSVM are highly enriched in biological pathways associated with cancer progression; (2) prediction performance is much improved when tested across different data sets. Specifically, many genes related to apoptosis, cell cycle, and cell proliferation, which are hallmark signatures of breast cancer metastasis, were identified by the netSVM approach. More importantly, several novel hub genes, biologically important with many interactions in PPI network but often showing little change in expression as compared with their downstream genes, were also identified as network biomarkers; the genes were enriched in signaling pathways such as TGF-beta signaling pathway, MAPK signaling pathway, and JAK-STAT signaling pathway. These signaling pathways may provide new insight to the underlying mechanism of breast cancer metastasis.

Conclusions

We have developed a network-based approach for cancer biomarker identification, netSVM, resulting in an improved prediction performance with network biomarkers. We have applied the netSVM approach to breast cancer gene expression data to predict metastasis in patients. Network biomarkers identified by netSVM reveal potential signaling pathways associated with breast cancer metastasis, and help improve the prediction performance across independent data sets.  相似文献   

7.
Gene expression signatures can predict the activation of oncogenic pathways and other phenotypes of interest via quantitative models that combine the expression levels of multiple genes. However, as the number of platforms to measure genome-wide gene expression proliferates, there is an increasing need to develop models that can be ported across diverse platforms. Because of the range of technologies that measure gene expression, the resulting signal values can vary greatly. To understand how this variation can affect the prediction of gene expression signatures, we have investigated the ability of gene expression signatures to predict pathway activation across Affymetrix and Illumina microarrays. We hybridized the same RNA samples to both platforms and compared the resultant gene expression readings, as well as the signature predictions. Using a new approach to map probes across platforms, we found that the genes in the signatures from the two platforms were highly similar, and that the predictions they generated were also strongly correlated. This demonstrates that our method can map probes from Affymetrix and Illumina microarrays, and that this mapping can be used to predict gene expression signatures across platforms.  相似文献   

8.
9.
The task of extracting the maximal amount of information from a biological network has drawn much attention from researchers, for example, predicting the function of a protein from a protein-protein interaction (PPI) network. It is well known that biological networks consist of modules/communities, a set of nodes that are more densely inter-connected among themselves than with the rest of the network. However, practical applications of utilizing the community information have been rather limited. For protein function prediction on a network, it has been shown that none of the existing community-based protein function prediction methods outperform a simple neighbor-based method. Recently, we have shown that proper utilization of a highly optimal modularity community structure for protein function prediction can outperform neighbor-assisted methods. In this study, we propose two function prediction approaches on bipartite networks that consider the community structure information as well as the neighbor information from the network: 1) a simple screening method and 2) a random forest based method. We demonstrate that our community-assisted methods outperform neighbor-assisted methods and the random forest method yields the best performance. In addition, we show that using the optimal community structure information is essential for more accurate function prediction for the protein-complex bipartite network of Saccharomyces cerevisiae. Community detection can be carried out either using a modified modularity for dealing with the original bipartite network or first projecting the network into a single-mode network (i.e., PPI network) and then applying community detection to the reduced network. We find that the projection leads to the loss of information in a significant way. Since our prediction methods rely only on the network topology, they can be applied to various fields where an efficient network-based analysis is required.  相似文献   

10.
Zhao  Chengshuai  Qiu  Yang  Zhou  Shuang  Liu  Shichao  Zhang  Wen  Niu  Yanqing 《BMC genomics》2020,21(13):1-12
Background

Researchers discover LncRNA–miRNA regulatory paradigms modulate gene expression patterns and drive major cellular processes. Identification of lncRNA-miRNA interactions (LMIs) is critical to reveal the mechanism of biological processes and complicated diseases. Because conventional wet experiments are time-consuming, labor-intensive and costly, a few computational methods have been proposed to expedite the identification of lncRNA-miRNA interactions. However, little attention has been paid to fully exploit the structural and topological information of the lncRNA-miRNA interaction network.

Results

In this paper, we propose novel lncRNA-miRNA prediction methods by using graph embedding and ensemble learning. First, we calculate lncRNA-lncRNA sequence similarity and miRNA-miRNA sequence similarity, and then we combine them with the known lncRNA-miRNA interactions to construct a heterogeneous network. Second, we adopt several graph embedding methods to learn embedded representations of lncRNAs and miRNAs from the heterogeneous network, and construct the ensemble models using two ensemble strategies. For the former, we consider individual graph embedding based models as base predictors and integrate their predictions, and develop a method, named GEEL-PI. For the latter, we construct a deep attention neural network (DANN) to integrate various graph embeddings, and present an ensemble method, named GEEL-FI. The experimental results demonstrate both GEEL-PI and GEEL-FI outperform other state-of-the-art methods. The effectiveness of two ensemble strategies is validated by further experiments. Moreover, the case studies show that GEEL-PI and GEEL-FI can find novel lncRNA-miRNA associations.

Conclusion

The study reveals that graph embedding and ensemble learning based method is efficient for integrating heterogeneous information derived from lncRNA-miRNA interaction network and can achieve better performance on LMI prediction task. In conclusion, GEEL-PI and GEEL-FI are promising for lncRNA-miRNA interaction prediction.

  相似文献   

11.
Chronic obstructive pulmonary disease (COPD) is a complex human disease with a high mortality rate. So far, the studies of COPD have not been well organized despite the well-documented role of cigarette smoking in the genesis of COPD. In the recent years, microarray analyses have helped to identify some potential disease related genes. However, the low reproducibility of many published gene signatures has been criticized. It therefore suggested that incorporation of network or pathway information into prognostic biomarker discovery might improve the prediction performance. In this analysis, we combined protein-protein interactions (PPI) information with the support vector machine (SVM) method to identify potential COPD-related genes that would allow one to distinguish accurately severe emphysema from non-/mildly emphysematous lung tissue. We identified 8 COPD-related feature genes. When compared with another SVM method which did not use the prior PPI information, the prediction accuracy was significantly enhanced (AUC was increased from 0.513 to 0.909). On the base of results obtained one can suppose that incorporating network of prior knowledge into gene selection methods significantly improves classification accuracy. Consequently, the gene expression profiles from human emphysematous lung tissue may provide insight into the pathogenesis, and a good classification prediction algorithm based on prior biological knowledge can further strengthen this performance.  相似文献   

12.
The problems associated with gene identification and the prediction of gene structure in DNA sequences have been the focus of increased attention over the past few years with the recent acquisition by large-scale sequencing projects of an immense amount of genome data. A variety of prediction programs have been developed in order to address these problems. This paper presents a review of the computational approaches and gene-finders used commonly for gene prediction in eukaryotic genomes. Two approaches, in general, have been adopted for this purpose: similarity-based and ab initio techniques. The information gleaned from these methods is then combined via a variety of algorithms, including Dynamic Programming (DP) or the Hidden Markov Model (HMM), and then used for gene prediction from the genomic sequences.  相似文献   

13.
Predicting the biological function of all the genes of an organism is one of the fundamental goals of computational system biology. In the last decade, high-throughput experimental methods for studying the functional interactions between gene products (GPs) have been combined with computational approaches based on Bayesian networks for data integration. The result of these computational approaches is an interaction network with weighted links representing connectivity likelihood between two functionally related GPs. The weighted network generated by these computational approaches can be used to predict annotations for functionally uncharacterized GPs. Here we introduce Weighted Network Predictor (WNP), a novel algorithm for function prediction of biologically uncharacterized GPs. Tests conducted on simulated data show that WNP outperforms other 5 state-of-the-art methods in terms of both specificity and sensitivity and that it is able to better exploit and propagate the functional and topological information of the network. We apply our method to Saccharomyces cerevisiae yeast and Arabidopsis thaliana networks and we predict Gene Ontology function for about 500 and 10000 uncharacterized GPs respectively.  相似文献   

14.
Inferring potential drug indications, for either novel or approved drugs, is a key step in drug development. Previous computational methods in this domain have focused on either drug repositioning or matching drug and disease gene expression profiles. Here, we present a novel method for the large‐scale prediction of drug indications (PREDICT) that can handle both approved drugs and novel molecules. Our method is based on the observation that similar drugs are indicated for similar diseases, and utilizes multiple drug–drug and disease–disease similarity measures for the prediction task. On cross‐validation, it obtains high specificity and sensitivity (AUC=0.9) in predicting drug indications, surpassing existing methods. We validate our predictions by their overlap with drug indications that are currently under clinical trials, and by their agreement with tissue‐specific expression information on the drug targets. We further show that disease‐specific genetic signatures can be used to accurately predict drug indications for new diseases (AUC=0.92). This lays the computational foundation for future personalized drug treatments, where gene expression signatures from individual patients would replace the disease‐specific signatures.  相似文献   

15.
The current gold-standard method for cancer safety assessment of drugs is a rodent two-year bioassay, which is associated with significant costs and requires testing a high number of animals over lifetime. Due to the absence of a comprehensive set of short-term assays predicting carcinogenicity, new approaches are currently being evaluated. One promising approach is toxicogenomics, which by virtue of genome-wide molecular profiling after compound treatment can lead to an increased mechanistic understanding, and potentially allow for the prediction of a carcinogenic potential via mathematical modeling. The latter typically involves the extraction of informative genes from omics datasets, which can be used to construct generalizable models allowing for the early classification of compounds with unknown carcinogenic potential. Here we formally describe and compare two novel methodologies for the reproducible extraction of characteristic mRNA signatures, which were employed to capture specific gene expression changes observed for nongenotoxic carcinogens. While the first method integrates multiple gene rankings, generated by diverse algorithms applied to data from different subsamplings of the training compounds, the second approach employs a statistical ratio for the identification of informative genes. Both methods were evaluated on a dataset obtained from the toxicogenomics database TG-GATEs to predict the outcome of a two-year bioassay based on profiles from 14-day treatments. Additionally, we applied our methods to datasets from previous studies and showed that the derived prediction models are on average more accurate than those built from the original signatures. The selected genes were mostly related to p53 signaling and to specific changes in anabolic processes or energy metabolism, which are typically observed in tumor cells. Among the genes most frequently incorporated into prediction models were Phlda3, Cdkn1a, Akr7a3, Ccng1 and Abcb4.  相似文献   

16.

Background

Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study.

Principal Findings

To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction.

Conclusion

Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set.  相似文献   

17.
18.
19.
Pancreatic ductal adenocarcinoma (PDAC) has a poor prognosis, and the 5‐year survival rate was only 7.7%. To improve prognosis, a screening biomarker for early diagnosis of pancreatic cancer is in urgent need. Long non‐coding RNA (lncRNA) expression profiles as potential cancer prognostic biomarkers play critical roles in development of tumorigenesis and metastasis of cancer. However, lncRNA signatures in predicting the survival of a patient with PDAC remain unknown. In the current study, we try to identify potential lncRNA biomarkers and their prognostic values in PDAC. LncRNAs expression profiles and corresponding clinical information for 182 cases with PDAC were acquired from The Cancer Genome Atlas (TCGA). A total of 14 470 lncRNA were identified in the cohort, and 175 PDAC patients had clinical variables. We obtained 108 differential expressed lncRNA via R packages. Univariate and multivariate Cox proportional hazards regression, lasso regression was performed to screen the potential prognostic lncRNA. Five lncRNAs have been recognized to significantly correlate with OS. We established a linear prognostic model of five lncRNA (C9orf139, MIR600HG, RP5‐965G21.4, RP11‐436K8.1, and CTC‐327F10.4) and divided patients into high‐ and low‐risk group according to the prognostic index. The five lncRNAs played independent prognostic biomarkers of OS of PDAC patients and the AUC of the ROC curve for the five lncRNAs signatures prediction 5‐year survival was 0.742. In addition, targeted genes of MIR600HG, C9orf139, and CTC‐327F10.4 were explored and functional enrichment was also conducted. These results suggested that this five‐lncRNAs signature could act as potential prognostic biomarkers in the prediction of PDAC patient's survival.  相似文献   

20.
Colorectal cancer screening is well established. The identification of high risk populations is the key to implement effective risk‐adjusted screening. Good statistical approaches for risk prediction do not exist. The family's colorectal cancer history is used for identification of high risk families and usually assessed by a questionnaire. This paper introduces a prediction algorithm to designate a family for colorectal cancer risk and discusses its statistical properties. The new algorithm uses Bayesian reasoning and a detailed family history illustrated by a pedigree and a Lexis diagram. The algorithm is able to integrate different hereditary mechanisms that define complex latent class or random factor structures. They are generic and do not reflect specific genetic models. This is comparable to strategies in complex segregation analysis. Furthermore, the algorithm can integrate different statistical penetrance models for right censored event data. Computational challenges related to the handling of the likelihood are discussed. Simulation studies assess the predictive quality of the new algorithm in terms of ROC curves and corresponding AUCs. The algorithm is applied to data of a recent study on familial colorectal cancer risk. Its predictive performance is compared to that of a questionnaire currently used in screening for familial colorectal cancer. The results of the proposed algorithm are robust against different inheritance models. Using the simplest hereditary mechanism, the simulation study provides evidence that the algorithm improves detection of families with high cancer risk in comparison to the currently used questionnaire. The applicability of the algorithm goes beyond the field of colorectal cancer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号