首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider modeling jointly microarray RNA expression and DNA copy number data. We propose Bayesian mixture models that define latent Gaussian probit scores for the DNA and RNA, and integrate between the two platforms via a regression of the RNA probit scores on the DNA probit scores. Such a regression conveniently allows us to include additional sample specific covariates such as biological conditions and clinical outcomes. The two developed methods are aimed respectively to make inference on differential behaviour of genes in patients showing different subtypes of breast cancer and to predict the pathological complete response (pCR) of patients borrowing strength across the genomic platforms. Posterior inference is carried out via MCMC simulations. We demonstrate the proposed methodology using a published data set consisting of 121 breast cancer patients.  相似文献   

2.
Genomic aberrations recurrent in a particular cancer type can be important prognostic markers for tumor progression. Typically in early tumorigenesis, cells incur a breakdown of the DNA replication machinery that results in an accumulation of genomic aberrations in the form of duplications, deletions, translocations, and other genomic alterations. Microarray methods allow for finer mapping of these aberrations than has previously been possible; however, data processing and analysis methods have not taken full advantage of this higher resolution. Attention has primarily been given to analysis on the single sample level, where multiple adjacent probes are necessarily used as replicates for the local region containing their target sequences. However, regions of concordant aberration can be short enough to be detected by only one, or very few, array elements. We describe a method called Multiple Sample Analysis for assessing the significance of concordant genomic aberrations across multiple experiments that does not require a-priori definition of aberration calls for each sample. If there are multiple samples, representing a class, then by exploiting the replication across samples our method can detect concordant aberrations at much higher resolution than can be derived from current single sample approaches. Additionally, this method provides a meaningful approach to addressing population-based questions such as determining important regions for a cancer subtype of interest or determining regions of copy number variation in a population. Multiple Sample Analysis also provides single sample aberration calls in the locations of significant concordance, producing high resolution calls per sample, in concordant regions. The approach is demonstrated on a dataset representing a challenging but important resource: breast tumors that have been formalin-fixed, paraffin-embedded, archived, and subsequently UV-laser capture microdissected and hybridized to two-channel BAC arrays using an amplification protocol. We demonstrate the accurate detection on simulated data, and on real datasets involving known regions of aberration within subtypes of breast cancer at a resolution consistent with that of the array. Similarly, we apply our method to previously published datasets, including a 250K SNP array, and verify known results as well as detect novel regions of concordant aberration. The algorithm has been fully implemented and tested and is freely available as a Java application at http://www.cbil.upenn.edu/MSA.  相似文献   

3.

Introduction

In breast cancer, the basal-like subtype has high levels of genomic instability relative to other breast cancer subtypes with many basal-like-specific regions of aberration. There is evidence that this genomic instability extends to smaller scale genomic aberrations, as shown by a previously described micro-deletion event in the PTEN gene in the Basal-like SUM149 breast cancer cell line.

Methods

We sought to identify if small regions of genomic DNA copy number changes exist by using a high density, gene-centric Comparative Genomic Hybridizations (CGH) array on cell lines and primary tumors. A custom tiling array for CGH (244,000 probes, 200 bp tiling resolution) was created to identify small regions of genomic change, which was focused on previously identified basal-like-specific, and general cancer genes. Tumor genomic DNA from 94 patients and 2 breast cancer cell lines was labeled and hybridized to these arrays. Aberrations were called using SWITCHdna and the smallest 25% of SWITCHdna-defined genomic segments were called micro-aberrations (<64 contiguous probes, ∼ 15 kb).

Results

Our data showed that primary tumor breast cancer genomes frequently contained many small-scale copy number gains and losses, termed micro-aberrations, most of which are undetectable using typical-density genome-wide aCGH arrays. The basal-like subtype exhibited the highest incidence of these events. These micro-aberrations sometimes altered expression of the involved gene. We confirmed the presence of the PTEN micro-amplification in SUM149 and by mRNA-seq showed that this resulted in loss of expression of all exons downstream of this event. Micro-aberrations disproportionately affected the 5′ regions of the affected genes, including the promoter region, and high frequency of micro-aberrations was associated with poor survival.

Conclusion

Using a high-probe-density, gene-centric aCGH microarray, we present evidence of small-scale genomic aberrations that can contribute to gene inactivation. These events may contribute to tumor formation through mechanisms not detected using conventional DNA copy number analyses.  相似文献   

4.
Ovarian cancer (OC) is the most lethal gynaecological cancer with genomic complexity and extensive heterogeneity. This study aimed to characterize the molecular features of OC based on the gene expression profile of 2752 previously characterized metabolism-relevant genes and provide new strategies to improve the clinical status of patients with OC. Finally, three molecular subtypes (C1, C2 and C3) were identified. The C2 subtype displayed the worst prognosis, upregulated immune-cell infiltration status and expression level of immune checkpoint genes, lower burden of copy number gains and losses and suboptimal response to targeted drug bevacizumab. The C1 subtype showed downregulated immune-cell infiltration status and expression level of immune checkpoint genes, the lowest incidence of BRCA mutation and optimal response to targeted drug bevacizumab. The C3 subtype had an intermediate immune status, the highest incidence of BRCA mutation and a secondary optimal response to bevacizumab. Gene signatures of C1 and C2 subtypes with an opposite expression level were mainly enriched in proteolysis and immune-related biological process. The C3 subtype was mainly enriched in the T cell-related biological process. The prognostic and immune status of subtypes were validated in the Gene Expression Omnibus (GEO) dataset, which was predicted with a 45-gene classifier. These findings might improve the understanding of the diversity and therapeutic strategies for OC.  相似文献   

5.
Summary .  The central dogma of molecular biology relates DNA with mRNA. Array CGH measures DNA copy number and gene expression microarrays measure the amount of mRNA. Methods that integrate data from these two platforms may uncover meaningful biological relationships that further our understanding of cancer. We develop nonparametric tests for the detection of copy number induced differential gene expression. The tests incorporate the uncertainty of the calling of genomic aberrations. The test is preceded by a "tuning algorithm" that discards certain genes to improve the overall power of the false discovery rate selection procedure. Moreover, the test statistics are "shrunken" to borrow information across neighboring genes that share the same array CGH signature. For each gene we also estimate its effect, its amount of differential expression due to copy number changes, and calculate the coefficient of determination. The method is illustrated on breast cancer data, in which it confirms previously reported findings, now with a more profound statistical underpinning.  相似文献   

6.

Background

Cancer is a heterogeneous disease caused by genomic aberrations and characterized by significant variability in clinical outcomes and response to therapies. Several subtypes of common cancers have been identified based on alterations of individual cancer genes, such as HER2, EGFR, and others. However, cancer is a complex disease driven by the interaction of multiple genes, so the copy number status of individual genes is not sufficient to define cancer subtypes and predict responses to treatments. A classification based on genome-wide copy number patterns would be better suited for this purpose.

Method

To develop a more comprehensive cancer taxonomy based on genome-wide patterns of copy number abnormalities, we designed an unsupervised classification algorithm that identifies genomic subgroups of tumors. This algorithm is based on a modified genomic Non-negative Matrix Factorization (gNMF) algorithm and includes several additional components, namely a pilot hierarchical clustering procedure to determine the number of clusters, a multiple random initiation scheme, a new stop criterion for the core gNMF, as well as a 10-fold cross-validation stability test for quality assessment.

Result

We applied our algorithm to identify genomic subgroups of three major cancer types: non-small cell lung carcinoma (NSCLC), colorectal cancer (CRC), and malignant melanoma. High-density SNP array datasets for patient tumors and established cell lines were used to define genomic subclasses of the diseases and identify cell lines representative of each genomic subtype. The algorithm was compared with several traditional clustering methods and showed improved performance. To validate our genomic taxonomy of NSCLC, we correlated the genomic classification with disease outcomes. Overall survival time and time to recurrence were shown to differ significantly between the genomic subtypes.

Conclusions

We developed an algorithm for cancer classification based on genome-wide patterns of copy number aberrations and demonstrated its superiority to existing clustering methods. The algorithm was applied to define genomic subgroups of three cancer types and identify cell lines representative of these subgroups. Our data enabled the assembly of representative cell line panels for testing drug candidates.  相似文献   

7.
8.
High-grade serous ovarian cancer (HGSOC) is the most aggressive histological type of epithelial ovarian cancer, which is characterized by a high frequency of somatic TP53 mutations. We performed exome analyses of tumors and matched normal tissues of 34 Japanese patients with HGSOC and observed a substantial number of patients without TP53 mutation (24%, 8/34). Combined with the results of copy number variation analyses, we subdivided the 34 patients with HGSOC into subtypes designated ST1 and ST2. ST1 showed intact p53 pathway and was characterized by fewer somatic mutations and copy number alterations. In contrast, the p53 pathway was impaired in ST2, which is characterized by abundant somatic mutations and copy number alterations. Gene expression profiles combined with analyses using the Gene Ontology resource indicate the involvement of specific biological processes (mitosis and DNA helicase) that are relevant to genomic stability and cancer etiology. In particular we demonstrate the presence of a novel subtype of patients with HGSOC that is characterized by an intact p53 pathway, with limited genomic alterations and specific gene expression profiles.  相似文献   

9.
DNA microarray gene expression and microarray-based comparative genomic hybridization (aCGH) have been widely used for biomedical discovery. Because of the large number of genes and the complex nature of biological networks, various analysis methods have been proposed. One such method is "gene shaving," a procedure which identifies subsets of the genes with coherent expression patterns and large variation across samples. Since combining genomic information from multiple sources can improve classification and prediction of diseases, in this paper we proposed a new method, "ICA gene shaving" (ICA, independent component analysis), for jointly analyzing gene expression and copy number data. First we used ICA to analyze joint measurements, gene expression and copy number, of a biological system and project the data onto statistically independent biological processes. Next, we used these results to identify patterns of variation in the data and then applied an iterative shaving method. We investigated the properties of our proposed method by analyzing both simulated and real data. We demonstrated that the robustness of our method to noise using simulated data. Using breast cancer data, we showed that our method is superior to the Generalized Singular Value Decomposition (GSVD) gene shaving method for identifying genes associated with breast cancer.  相似文献   

10.
Basal-like breast cancer is a molecularly distinct subtype of breast cancer that is highly aggressive and has a poor prognosis. MicroRNA-29c (miR-29c) has been shown to be significantly down-regulated in basal-like breast tumors and to be involved in cell invasion and sensitivity to chemotherapy. However, little is known about the genetic and regulatory factors contributing to the altered expression of miR-29c in basal-like breast cancer. We here report that epigenetic modifications at the miR-29c promoter, rather than copy number variation of the gene, may drive the lower expression of miR-29c in basal-like breast cancer. Bisulfite sequencing of CpG sites in the miR-29c promoter region showed higher methylation in basal-like breast cancer cell lines compared to luminal subtype cells with a significant inverse correlation between expression and methylation of miR-29c. Analysis of primary breast tumors using The Cancer Genome Atlas (TCGA) dataset confirmed significantly higher levels of methylation of the promoter in basal-like breast tumors compared to all other subtypes. Furthermore, inhibition of CpG methylation with 5-aza-CdR increases miR-29c expression in basal-like breast cancer cells. Flourescent In Situ Hybridization (FISH) revealed chromosomal abnormalities at miR-29c loci in breast cancer cell lines, but with no correlation between copy number variation and expression of miR-29c. Our data demonstrated that dysregulation of miR-29c in basal-like breast cancer cells may be in part driven by methylation at CpG sites. Epigenetic control of the miR-29c promoter by epigenetic modifiers may provide a potential therapeutic target to overcome the aggressive behavior of these cancers.  相似文献   

11.

Background  

Genomic copy number changes and regional alterations in epigenetic states have been linked to grade in breast cancer. However, the relative contribution of specific alterations to the pathology of different breast cancer subtypes remains unclear. The heterogeneity and interplay of genomic and epigenetic variations means that large datasets and statistical data mining methods are required to uncover recurrent patterns that are likely to be important in cancer progression.  相似文献   

12.
13.
Yi Y  Mirosevich J  Shyr Y  Matusik R  George AL 《Genomics》2005,85(3):401-412
Microarray technology can be used to assess simultaneously global changes in expression of mRNA or genomic DNA copy number among thousands of genes in different biological states. In many cases, it is desirable to determine if altered patterns of gene expression correlate with chromosomal abnormalities or assess expression of genes that are contiguous in the genome. We describe a method, differential gene locus mapping (DIGMAP), which aligns the known chromosomal location of a gene to its expression value deduced by microarray analysis. The method partitions microarray data into subsets by chromosomal location for each gene interrogated by an array. Microarray data in an individual subset can then be clustered by physical location of genes at a subchromosomal level based upon ordered alignment in genome sequence. A graphical display is generated by representing each genomic locus with a colored cell that quantitatively reflects its differential expression value. The clustered patterns can be viewed and compared based on their expression signatures as defined by differential values between control and experimental samples. In this study, DIGMAP was tested using previously published studies of breast cancer analyzed by comparative genomic hybridization (CGH) and prostate cancer gene expression profiles assessed by cDNA microarray experiments. Analysis of the breast cancer CGH data demonstrated the ability of DIGMAP to deduce gene amplifications and deletions. Application of the DIGMAP method to the prostate data revealed several carcinoma-related loci, including one at 16q13 with marked differential expression encompassing 19 known genes including 9 encoding metallothionein proteins. We conclude that DIGMAP is a powerful computational tool enabling the coupled analysis of microarray data with genome location.  相似文献   

14.
We introduce a nonparametric Bayesian model for a phase II clinical trial with patients presenting different subtypes of the disease under study. The objective is to estimate the success probability of an experimental therapy for each subtype. We consider the case when small sample sizes require extensive borrowing of information across subtypes, but the subtypes are not a priori exchangeable. The lack of a priori exchangeability hinders the straightforward use of traditional hierarchical models to implement borrowing of strength across disease subtypes. We introduce instead a random partition model for the set of disease subtypes. This is a variation of the product partition model that allows us to model a nonexchangeable prior structure. Like a hierarchical model, the proposed clustering approach considers all observations, across all disease subtypes, to estimate individual success probabilities. But in contrast to standard hierarchical models, the model considers disease subtypes a priori nonexchangeable. This implies that when assessing the success probability for a particular type our model borrows more information from the outcome of the patients sharing the same prognosis than from the others. Our data arise from a phase II clinical trial of patients with sarcoma, a rare type of cancer affecting connective or supportive tissues and soft tissue (e.g., cartilage and fat). Each patient presents one subtype of the disease and subtypes are grouped by good, intermediate, and poor prognosis. The prior model should respect the varying prognosis across disease subtypes. The practical motivation for the proposed approach is that the number of accrued patients within each disease subtype is small. Thus it is not possible to carry out a clinical study of possible new therapies for rare conditions, because it would be impossible to plan for sufficiently large sample size to achieve the desired power. We carry out a simulation study to compare the proposed model with a model that assumes similar success probabilities for all subtypes with the same prognosis, i.e., a fixed partition of subtypes by prognosis. When the assumption is satisfied the two models perform comparably. But the proposed model outperforms the competing model when the assumption is incorrect.  相似文献   

15.
《IRBM》2022,43(1):62-74
BackgroundThe prediction of breast cancer subtypes plays a key role in the diagnosis and prognosis of breast cancer. In recent years, deep learning (DL) has shown good performance in the intelligent prediction of breast cancer subtypes. However, most of the traditional DL models use single modality data, which can just extract a few features, so it cannot establish a stable relationship between patient characteristics and breast cancer subtypes.DatasetWe used the TCGA-BRCA dataset as a sample set for molecular subtype prediction of breast cancer. It is a public dataset that can be obtained through the following link: https://portal.gdc.cancer.gov/projects/TCGA-BRCAMethodsIn this paper, a Hybrid DL model based on the multimodal data is proposed. We combine the patient's gene modality data with image modality data to construct a multimodal fusion framework. According to the different forms and states, we set up feature extraction networks respectively, and then we fuse the output of the two feature networks based on the idea of weighted linear aggregation. Finally, the fused features are used to predict breast cancer subtypes. In particular, we use the principal component analysis to reduce the dimensionality of high-dimensional data of gene modality and filter the data of image modality. Besides, we also improve the traditional feature extraction network to make it show better performance.ResultsThe results show that compared with the traditional DL model, the Hybrid DL model proposed in this paper is more accurate and efficient in predicting breast cancer subtypes. Our model achieved a prediction accuracy of 88.07% in 10 times of 10-fold cross-validation. We did a separate AUC test for each subtype, and the average AUC value obtained was 0.9427. In terms of subtype prediction accuracy, our model is about 7.45% higher than the previous average.  相似文献   

16.

Background

The characterization of copy number alteration patterns in breast cancer requires high-resolution genome-wide profiling of a large panel of tumor specimens. To date, most genome-wide array comparative genomic hybridization studies have used tumor panels of relatively large tumor size and high Nottingham Prognostic Index (NPI) that are not as representative of breast cancer demographics.

Results

We performed an oligo-array-based high-resolution analysis of copy number alterations in 171 primary breast tumors of relatively small size and low NPI, which was therefore more representative of breast cancer demographics. Hierarchical clustering over the common regions of alteration identified a novel subtype of high-grade estrogen receptor (ER)-negative breast cancer, characterized by a low genomic instability index. We were able to validate the existence of this genomic subtype in one external breast cancer cohort. Using matched array expression data we also identified the genomic regions showing the strongest coordinate expression changes ('hotspots'). We show that several of these hotspots are located in the phosphatome, kinome and chromatinome, and harbor members of the 122-breast cancer CAN-list. Furthermore, we identify frequently amplified hotspots on 8q22.3 (EDD1, WDSOF1), 8q24.11-13 (THRAP6, DCC1, SQLE, SPG8) and 11q14.1 (NDUFC2, ALG8, USP35) associated with significantly worse prognosis. Amplification of any of these regions identified 37 samples with significantly worse overall survival (hazard ratio (HR) = 2.3 (1.3-1.4) p = 0.003) and time to distant metastasis (HR = 2.6 (1.4-5.1) p = 0.004) independently of NPI.

Conclusion

We present strong evidence for the existence of a novel subtype of high-grade ER-negative tumors that is characterized by a low genomic instability index. We also provide a genome-wide list of common copy number alteration regions in breast cancer that show strong coordinate aberrant expression, and further identify novel frequently amplified regions that correlate with poor prognosis. Many of the genes associated with these regions represent likely novel oncogenes or tumor suppressors.  相似文献   

17.

Background

Multiple breast cancer gene expression profiles have been developed that appear to provide similar abilities to predict outcome and may outperform clinical-pathologic criteria; however, the extent to which seemingly disparate profiles provide additive prognostic information is not known, nor do we know whether prognostic profiles perform equally across clinically defined breast cancer subtypes. We evaluated whether combining the prognostic powers of standard breast cancer clinical variables with a large set of gene expression signatures could improve on our ability to predict patient outcomes.

Methods

Using clinical-pathological variables and a collection of 323 gene expression "modules", including 115 previously published signatures, we build multivariate Cox proportional hazards models using a dataset of 550 node-negative systemically untreated breast cancer patients. Models predictive of pathological complete response (pCR) to neoadjuvant chemotherapy were also built using this approach.

Results

We identified statistically significant prognostic models for relapse-free survival (RFS) at 7 years for the entire population, and for the subgroups of patients with ER-positive, or Luminal tumors. Furthermore, we found that combined models that included both clinical and genomic parameters improved prognostication compared with models with either clinical or genomic variables alone. Finally, we were able to build statistically significant combined models for pathological complete response (pCR) predictions for the entire population.

Conclusions

Integration of gene expression signatures and clinical-pathological factors is an improved method over either variable type alone. Highly prognostic models could be created when using all patients, and for the subset of patients with lymph node-negative and ER-positive breast cancers. Other variables beyond gene expression and clinical-pathological variables, like gene mutation status or DNA copy number changes, will be needed to build robust prognostic models for ER-negative breast cancer patients. This combined clinical and genomics model approach can also be used to build predictors of therapy responsiveness, and could ultimately be applied to other tumor types.  相似文献   

18.
We evaluated the value of the ‘alternative slices mirror image method’ used in prostate tissue banking in terms of predicting the sampling of cancerous tissue while preserving the pathological prognostic information. The concordance of diagnosis between banked sections and their mirror image paraffin- sections was studied using 50 cases corresponding to 400 H&E sections taken from 400 banked frozen blocks (two presumed benign and two presumed cancer for each case). The mean number of paraffin blocks in each case was 21. On average 29% of the prostate gland was banked and banked tissue contained cancer in 47 cases (94%). There was no difference between the concordant and discordant groups in terms of the final Gleason score, pathological stage, prostate size, number of banked blocks and the percentage of the prostate submitted for banking. However, concordant cases had larger foci of cancer in the mirror image paraffin block (P?=?0.0088). In addition, the surgical margins sections which are not banked using this method provided important information about the pathological stage, surgical margins status and the final Gleason score in 2.6, 2.6, and 1.3% of cases, respectively. The ‘alternative slices mirror image method’ is a straightforward method that is highly efficient in banking prostatic cancerous tissue. Overall, tumor volume and especially size of tumor foci in the image paraffin block are the most important factors in dictating the success rate of banking frozen cancerous tissue. Including ‘surgical margins’ sections for histology provides additional important prognostic information in a minority of cases.  相似文献   

19.
20.
Breast‐cancer subtypes present with distinct clinical characteristics. Therefore, characterization of subtype‐specific proteins may augment the development of targeted therapies and prognostic biomarkers. To address this issue, MS‐based secretome analysis of eight breast cancer cell lines, corresponding to the three main breast cancer subtypes was performed. More than 5200 non‐redundant proteins were identified with 23, four, and four proteins identified uniquely in basal, HER2‐neu‐amplified, and luminal breast cancer cells, respectively. An in silico mRNA analysis using publicly available breast cancer tissue microarray data was carried out as a preliminary verification step. In particular, the expression profiles of 15 out of 28 proteins included in the microarray (from a total of 31 in our subtype‐specific signature) showed significant correlation with estrogen receptor (ER) expression. A MS‐based analysis of breast cancer tissues was undertaken to verify the results at the proteome level. Eighteen out of 31 proteins were quantified in the proteomes of ER‐positive and ER‐negative breast cancer tissues. Survival analysis using microarray data was performed to examine the prognostic potential of these selected candidates. Three proteins correlated with ER status at both mRNA and protein levels: ABAT, PDZK1, and PTX3, with the former showing significant prognostic potential.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号