首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Identifying cancer driver genes and pathways among all somatic mutations detected in a cohort of tumors is a key challenge in cancer genomics. Traditionally, this is done by prioritizing genes according to the recurrence of alterations that they bear. However, this approach has some known limitations, such as the difficulty to correctly estimate the background mutation rate, and the fact that it cannot identify lowly recurrently mutated driver genes. Here we present a novel approach, Oncodrive-fm, to detect candidate cancer drivers which does not rely on recurrence. First, we hypothesized that any bias toward the accumulation of variants with high functional impact observed in a gene or group of genes may be an indication of positive selection and can thus be used to detect candidate driver genes or gene modules. Next, we developed a method to measure this bias (FM bias) and applied it to three datasets of tumor somatic variants. As a proof of concept of our hypothesis we show that most of the highly recurrent and well-known cancer genes exhibit a clear FM bias. Moreover, this novel approach avoids some known limitations of recurrence-based approaches, and can successfully identify lowly recurrent candidate cancer drivers.  相似文献   

2.
Groupwise functional analysis of gene variants is becoming standard in next-generation sequencing studies. As the function of many genes is unknown and their classification to pathways is scant, functional associations between genes are often inferred from large-scale omics data. Such data types—including protein–protein interactions and gene co-expression networks—are used to examine the interrelations of the implicated genes. Statistical significance is assessed by comparing the interconnectedness of the mutated genes with that of random gene sets. However, interconnectedness can be affected by confounding bias, potentially resulting in false positive findings. We show that genes implicated through de novo sequence variants are biased in their coding-sequence length and longer genes tend to cluster together, which leads to exaggerated p-values in functional studies; we present here an integrative method that addresses these bias. To discern molecular pathways relevant to complex disease, we have inferred functional associations between human genes from diverse data types and assessed them with a novel phenotype-based method. Examining the functional association between de novo gene variants, we control for the heretofore unexplored confounding bias in coding-sequence length. We test different data types and networks and find that the disease-associated genes cluster more significantly in an integrated phenotypic-linkage network than in other gene networks. We present a tool of superior power to identify functional associations among genes mutated in the same disease even after accounting for significant sequencing study bias and demonstrate the suitability of this method to functionally cluster variant genes underlying polygenic disorders.  相似文献   

3.

Background

Recently, a number of studies have performed genome or exome sequencing of hepatocellular carcinoma (HCC) and identified hundreds or even thousands of mutations in protein-coding genes. However, these studies have only focused on a limited number of candidate genes, and many important mutation resources remain to be explored.

Principal Findings

In this study, we integrated mutation data obtained from various sources and performed pathway and network analysis. We identified 113 pathways that were significantly mutated in HCC samples and found that the mutated genes included in these pathways contained high percentages of known cancer genes, and damaging genes and also demonstrated high conservation scores, indicating their important roles in liver tumorigenesis. Five classes of pathways that were mutated most frequently included (a) proliferation and apoptosis related pathways, (b) tumor microenvironment related pathways, (c) neural signaling related pathways, (d) metabolic related pathways, and (e) circadian related pathways. Network analysis further revealed that the mutated genes with the highest betweenness coefficients, such as the well-known cancer genes TP53, CTNNB1 and recently identified novel mutated genes GNAL and the ADCY family, may play key roles in these significantly mutated pathways. Finally, we highlight several key genes (e.g., RPS6KA3 and PCLO) and pathways (e.g., axon guidance) in which the mutations were associated with clinical features.

Conclusions

Our workflow illustrates the increased statistical power of integrating multiple studies of the same subject, which can provide biological insights that would otherwise be masked under individual sample sets. This type of bioinformatics approach is consistent with the necessity of making the best use of the ever increasing data provided in valuable databases, such as TCGA, to enhance the speed of deciphering human cancers.  相似文献   

4.
ABSTRACT: BACKGROUND: Cancer sequencing projects are now measuring somatic mutations in large numbers of cancer genomes. A key challenge in interpreting these data is to distinguish driver mutations, mutations important for cancer development, from passenger mutations that have accumulated in somatic cells but without functional consequences. A common approach to identify genes harboring driver mutations is a single gene test that identifies individual genes that are recurrently mutated in a significant number of cancer genomes. However, the power of this test is reduced by: (1) the necessity of estimating the background mutation rate (BMR) for each gene; (2) the mutational heterogeneity in most cancers meaning that groups of genes (e.g. pathways), rather than single genes, are the primary target of mutations. RESULTS: We investigate the problem of discovering driver pathways, groups of genes containing driver mutations, directly from cancer mutation data and without prior knowledge of pathways or other interactions between genes. We introduce two generative models of somatic mutations in cancer and study the algorithmic complexity of discovering driver pathways in both models. We show that a single gene test for driver genes is highly sensitive to the estimate of the BMR. In contrast, we show that an algorithmic approach that maximizes a straightforward measure of the mutational properties of a driver pathway successfully discovers these groups of genes without an estimate of the BMR. Moreover, this approach is also successful in the case when the observed frequencies of passenger and driver mutations are indistinguishable, a situation where single gene tests fail. CONCLUSIONS: Accurate estimation of the BMR is a challenging task. Thus, methods that do not require an estimate of the BMR, such as the ones we provide here, can give increased power for the discovery of driver genes.  相似文献   

5.
Next-generation sequencing technologies have revolutionized our ability to identify genetic variants, either germline or somatic point mutations, that occur in cancer. Parallelization and miniaturization of DNA sequencing enables massive data throughput and for the first time, large-scale, nucleotide resolution views of cancer genomes can be achieved. Systematic, large-scale sequencing surveys have revealed that the genetic spectrum of mutations in cancers appears to be highly complex with numerous low frequency bystander somatic variations, and a limited number of common, frequently mutated genes. Large sample sizes and deeper resequencing are much needed in resolving clinical and biological relevance of the mutations as well as in detecting somatic variants in heterogeneous samples and cancer cell sub-populations. However, even with the next-generation sequencing technologies, the overwhelming size of the human genome and need for very high fold coverage represents a major challenge for up-scaling cancer genome sequencing projects. Assays to target, capture, enrich or partition disease-specific regions of the genome offer immediate solutions for reducing the complexity of the sequencing libraries. Integration of targeted DNA capture assays and next-generation deep resequencing improves the ability to identify clinically and biologically relevant mutations.  相似文献   

6.
The heterogeneity of cancer genomes in terms of acquired mutations complicates the identification of genes whose modification may exert a driver role in tumorigenesis. In this study, we present a novel method that integrates expression profiles, mutation effects, and systemic properties of mutated genes to identify novel cancer drivers. We applied our method to ovarian cancer samples and were able to identify putative drivers in the majority of carcinomas without mutations in known cancer genes, thus suggesting that it can be used as a complementary approach to find rare driver mutations that cannot be detected using frequency-based approaches.  相似文献   

7.
A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.  相似文献   

8.
Next-generation sequencing has allowed identification of millions of somatic mutations in human cancer cells.A key challenge in interpreting cancer genomes is to distinguish drivers of cancer development among available genetic mutations.To address this issue,we present the first webbased application,consensus cancer driver gene caller(C~3),to identify the consensus driver genes using six different complementary strategies,i.e.,frequency-based,machine learning-based,functional bias-based,clustering-based,statistics model-based,and network-based strategies.This application allows users to specify customized operations when calling driver genes,and provides solid statistical evaluations and interpretable visualizations on the integration results.C~3 is implemented in Python and is freely available for public use at http://drivergene.rwebox.com/c~3.  相似文献   

9.
Cancer is a genetic disease that develops through a series of somatic mutations, a subset of which drive cancer progression. Although cancer genome sequencing studies are beginning to reveal the mutational patterns of genes in various cancers, identifying the small subset of “causative” mutations from the large subset of “non-causative” mutations, which accumulate as a consequence of the disease, is a challenge. In this article, we present an effective machine learning approach for identifying cancer-associated mutations in human protein kinases, a class of signaling proteins known to be frequently mutated in human cancers. We evaluate the performance of 11 well known supervised learners and show that a multiple-classifier approach, which combines the performances of individual learners, significantly improves the classification of known cancer-associated mutations. We introduce several novel features related specifically to structural and functional characteristics of protein kinases and find that the level of conservation of the mutated residue at specific evolutionary depths is an important predictor of oncogenic effect. We consolidate the novel features and the multiple-classifier approach to prioritize and experimentally test a set of rare unconfirmed mutations in the epidermal growth factor receptor tyrosine kinase (EGFR). Our studies identify T725M and L861R as rare cancer-associated mutations inasmuch as these mutations increase EGFR activity in the absence of the activating EGF ligand in cell-based assays.  相似文献   

10.
The central challenges in tumor sequencing studies is to identify driver genes and pathways, investigate their functional relationships, and nominate drug targets. The efficiency of these analyses, particularly for infrequently mutated genes, is compromised when subjects carry different combinations of driver mutations. Mutual exclusivity analysis helps address these challenges. To identify mutually exclusive gene sets (MEGS), we developed a powerful and flexible analytic framework based on a likelihood ratio test and a model selection procedure. Extensive simulations demonstrated that our method outperformed existing methods for both statistical power and the capability of identifying the exact MEGS, particularly for highly imbalanced MEGS. Our method can be used for de novo discovery, for pathway-guided searches, or for expanding established small MEGS. We applied our method to the whole-exome sequencing data for 13 cancer types from The Cancer Genome Atlas (TCGA). We identified multiple previously unreported non-pairwise MEGS in multiple cancer types. For acute myeloid leukemia, we identified a MEGS with five genes (FLT3, IDH2, NRAS, KIT, and TP53) and a MEGS (NPM1, TP53, and RUNX1) whose mutation status was strongly associated with survival (p = 6.7 × 10−4). For breast cancer, we identified a significant MEGS consisting of TP53 and four infrequently mutated genes (ARID1A, AKT1, MED23, and TBL1XR1), providing support for their role as cancer drivers.  相似文献   

11.
Integration of the many available sources of cancer gene information—such as large‐scale tumour‐resequencing studies— identifies the ‘usual suspect’ genes, mutated in many tumour types, as well as different sets of mutated genes according to the specific tumour type. Scaling‐up the analysis reveals that this large collection of mutated genes cluster into a smaller number of signalling pathways and processes. From this, we draw a map of the altered processes, and their combinations, in more than 10 tumours types. Literature searches identify pathways and processes that are covered sparsely in the literature, and invite the proposal of new hypotheses to investigate cancer initiation and progression.  相似文献   

12.
Prathima Iengar 《Genomics》2018,110(5):318-328
Mutations in 15 cancers, sourced from the COSMIC Whole Genomes database, and 297 human pathways, arranged into pathway groups based on the processes they orchestrate, and sourced from the KEGG pathway database, have together been used to identify pathways affected by cancer mutations. Genes studied in ≥ 15, and mutated in ≥ 10 samples of a cancer have been considered recurrently mutated, and pathways with recurrently mutated genes have been considered affected in the cancer. Novel doughnut plots have been presented which enable visualization of the extent to which pathways and genes, in each pathway group, are targeted, in each cancer. The ‘organismal systems’ pathway group (including organism-level pathways; e.g., nervous system) is the most targeted, more than even the well-recognized signal transduction, cell-cycle and apoptosis, and DNA repair pathway groups. The important, yet poorly-recognized, role played by the group merits attention. Pathways affected in ≥ 7 cancers yielded insights into processes affected.  相似文献   

13.
Although the timing with which common epithelial malignancies arise and become established remains a matter of debate, it is clear that by the time they are detected these tumors harbor hundreds of deregulated, aberrantly expressed or mutated genes. This enormous complexity poses formidable challenges to identify gene pathways that are drivers of tumorigenesis, potentially suitable for therapeutic intervention. An alternative approach is to consider cancer pathways as interconnected networks, and search for potential nodal proteins capable of connecting multiple signaling networks of tumor maintenance. We have modeled this approach in advanced prostate cancer, a condition with current limited therapeutic options. We propose that the integration of three signaling networks, including chaperone‐mediated mitochondrial homeostasis, integrin‐dependent cell signaling, and Runx2‐regulated gene expression in the metastatic bone microenvironment plays a critical role in prostate cancer maintenance, and offers novel options for molecular therapy. J. Cell. Biochem. 107: 845–852, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

14.
Large-scale sequencing of human cancer genomes and mouse transposon-induced tumors has identified a vast number of genes mutated in different cancers. One of the outstanding challenges in this field is to determine which genes, when mutated, contribute to cellular transformation and tumor progression. To identify new and conserved genes that drive tumorigenesis we have developed a novel cancer model in a distantly related vertebrate species, the zebrafish, Danio rerio. The Sleeping Beauty (SB) T2/Onc transposon system was adapted for somatic mutagenesis in zebrafish. The carp ß-actin promoter was cloned into T2/Onc to create T2/OncZ. Two transgenic zebrafish lines that contain large concatemers of T2/OncZ were isolated by injection of linear DNA into the zebrafish embryo. The T2/OncZ transposons were mobilized throughout the zebrafish genome from the transgene array by injecting SB11 transposase RNA at the 1-cell stage. Alternatively, the T2/OncZ zebrafish were crossed to a transgenic line that constitutively expresses SB11 transposase. T2/OncZ transposon integration sites were cloned by ligation-mediated PCR and sequenced on a Genome Analyzer II. Between 700–6800 unique integration events in individual fish were mapped to the zebrafish genome. The data show that introduction of transposase by transgene expression or RNA injection results in an even distribution of transposon re-integration events across the zebrafish genome. SB11 mRNA injection resulted in neoplasms in 10% of adult fish at ∼10 months of age. T2/OncZ-induced zebrafish tumors contain many mutated genes in common with human and mouse cancer genes. These analyses validate our mutagenesis approach and provide additional support for the involvement of these genes in human cancers. The zebrafish T2/OncZ cancer model will be useful for identifying novel and conserved genetic drivers of human cancers.  相似文献   

15.
The identification of cancer drivers is a major goal of current cancer research. Finding driver genes within large chromosomal events is especially challenging because such alterations encompass many genes. Previously, we demonstrated that zebrafish malignant peripheral nerve sheath tumors (MPNSTs) are highly aneuploid, much like human tumors. In this study, we examined 147 zebrafish MPNSTs by massively parallel sequencing and identified both large and focal copy number alterations (CNAs). Given the low degree of conserved synteny between fish and mammals, we reasoned that comparative analyses of CNAs from fish versus human MPNSTs would enable elimination of a large proportion of passenger mutations, especially on large CNAs. We established a list of orthologous genes between human and zebrafish, which includes approximately two-thirds of human protein-coding genes. For the subset of these genes found in human MPNST CNAs, only one quarter of their orthologues were co-gained or co-lost in zebrafish, dramatically narrowing the list of candidate cancer drivers for both focal and large CNAs. We conclude that zebrafish-human comparative analysis represents a powerful, and broadly applicable, tool to enrich for evolutionarily conserved cancer drivers.  相似文献   

16.
Synthetic lethality is the synthesis of mutations leading to cell death. Tumor-specific synthetic lethality has been targeted in research to improve cancer therapy. With the advances of techniques in molecular biology, such as RNAi and CRISPR/Cas9 gene editing, efforts have been made to systematically identify synthetic lethal interactions, especially for frequently mutated genes in cancers. However, elucidating the mechanism of synthetic lethality remains a challenge because of the complexity of its influencing conditions. In this study, we proposed a new computational method to identify critical functional features that can accurately predict synthetic lethal interactions. This method incorporates several machine learning algorithms and encodes protein-coding genes by an enrichment system derived from gene ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways to represent their functional features. We built a random forest-based prediction engine by using 2120 selected features and obtained a Matthews correlation coefficient of 0.532. We examined the top 15 features and found that most of them have potential roles in synthetic lethality according to previous studies. These results demonstrate the ability of our proposed method to predict synthetic lethal interactions and provide a basis for further characterization of these particular genetic combinations.  相似文献   

17.
With the advent of whole-genome and whole-exome sequencing, high-quality catalogs of recurrently mutated cancer genes are becoming available for many cancer types. Increasing access to sequencing technology, including bench-top sequencers, provide the opportunity to re-sequence a limited set of cancer genes across a patient cohort with limited processing time. Here, we re-sequenced a set of cancer genes in T-cell acute lymphoblastic leukemia (T-ALL) using Nimblegen sequence capture coupled with Roche/454 technology. First, we investigated how a maximal sensitivity and specificity of mutation detection can be achieved through a benchmark study. We tested nine combinations of different mapping and variant-calling methods, varied the variant calling parameters, and compared the predicted mutations with a large independent validation set obtained by capillary re-sequencing. We found that the combination of two mapping algorithms, namely BWA-SW and SSAHA2, coupled with the variant calling algorithm Atlas-SNP2 yields the highest sensitivity (95%) and the highest specificity (93%). Next, we applied this analysis pipeline to identify mutations in a set of 58 cancer genes, in a panel of 18 T-ALL cell lines and 15 T-ALL patient samples. We confirmed mutations in known T-ALL drivers, including PHF6, NF1, FBXW7, NOTCH1, KRAS, NRAS, PIK3CA, and PTEN. Interestingly, we also found mutations in several cancer genes that had not been linked to T-ALL before, including JAK3. Finally, we re-sequenced a small set of 39 candidate genes and identified recurrent mutations in TET1, SPRY3 and SPRY4. In conclusion, we established an optimized analysis pipeline for Roche/454 data that can be applied to accurately detect gene mutations in cancer, which led to the identification of several new candidate T-ALL driver mutations.  相似文献   

18.
Cancer is a genetic disease that results from a variety of genomic alterations. Identification of some of these causal genetic events has enabled the development of targeted therapeutics and spurred efforts to discover the key genes that drive cancer formation. Rapidly improving sequencing and genotyping technology continues to generate increasingly large datasets that require analytical methods to identify functional alterations that deserve additional investigation. This review examines statistical and computational approaches for the identification of functional changes among sets of single-nucleotide substitutions. Frequency-based methods identify the most highly mutated genes in large-scale cancer sequencing efforts while bioinformatics approaches are effective for independent evaluation of both non-synonymous mutations and polymorphisms. We also review current knowledge and tools that can be utilized for analysis of alterations in non-protein-coding genomic sequence.  相似文献   

19.
Large‐scale cancer genome sequencing has uncovered thousands of gene mutations, but distinguishing tumor driver genes from functionally neutral passenger mutations is a major challenge. We analyzed 800 cancer genomes of eight types to find single‐nucleotide variants (SNVs) that precisely target phosphorylation machinery, important in cancer development and drug targeting. Assuming that cancer‐related biological systems involve unexpectedly frequent mutations, we used novel algorithms to identify genes with significant phosphorylation‐associated SNVs (pSNVs), phospho‐mutated pathways, kinase networks, drug targets, and clinically correlated signaling modules. We highlight increased survival of patients with TP53 pSNVs, hierarchically organized cancer kinase modules, a novel pSNV in EGFR, and an immune‐related network of pSNVs that correlates with prolonged survival in ovarian cancer. Our findings include multiple actionable cancer gene candidates (FLNB, GRM1, POU2F1), protein complexes (HCF1, ASF1), and kinases (PRKCZ). This study demonstrates new ways of interpreting cancer genomes and presents new leads for cancer research.  相似文献   

20.

Background

Chromatin regulatory factors are emerging as important genes in cancer development and are regarded as interesting candidates for novel targets for cancer treatment. However, we lack a comprehensive understanding of the role of this group of genes in different cancer types.

Results

We have analyzed 4,623 tumor samples from thirteen anatomical sites to determine which chromatin regulatory factors are candidate drivers in these different sites. We identify 34 chromatin regulatory factors that are likely drivers in tumors from at least one site, all with relatively low mutational frequency. We also analyze the relative importance of mutations in this group of genes for the development of tumorigenesis in each site, and in different tumor types from the same site.

Conclusions

We find that, although tumors from all thirteen sites show mutations in likely driver chromatin regulatory factors, these are more prevalent in tumors arising from certain tissues. With the exception of hematopoietic, liver and kidney tumors, as a median, the mutated factors are less than one fifth of all mutated drivers across all sites analyzed. We also show that mutations in two of these genes, MLL and EP300, correlate with broad expression changes across cancer cell lines, thus presenting at least one mechanism through which these mutations could contribute to tumorigenesis in cells of the corresponding tissues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号