首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.

Background

Many mathematical and statistical models and algorithms have been proposed to do biomarker identification in recent years. However, the biomarkers inferred from different datasets suffer a lack of reproducibilities due to the heterogeneity of the data generated from different platforms or laboratories. This motivates us to develop robust biomarker identification methods by integrating multiple datasets.

Methods

In this paper, we developed an integrative method for classification based on logistic regression. Different constant terms are set in the logistic regression model to measure the heterogeneity of the samples. By minimizing the differences of the constant terms within the same dataset, both the homogeneity within the same dataset and the heterogeneity in multiple datasets can be kept. The model is formulated as an optimization problem with a network penalty measuring the differences of the constant terms. The L1 penalty, elastic penalty and network related penalties are added to the objective function for the biomarker discovery purpose. Algorithms based on proximal Newton method are proposed to solve the optimization problem.

Results

We first applied the proposed method to the simulated datasets. Both the AUC of the prediction and the biomarker identification accuracy are improved. We then applied the method to two breast cancer gene expression datasets. By integrating both datasets, the prediction AUC is improved over directly merging the datasets and MetaLasso. And it’s comparable to the best AUC when doing biomarker identification in an individual dataset. The identified biomarkers using network related penalty for variables were further analyzed. Meaningful subnetworks enriched by breast cancer were identified.

Conclusion

A network-based integrative logistic regression model is proposed in the paper. It improves both the prediction and biomarker identification accuracy.
  相似文献   

2.

Background

Numerous gene lists or "classifiers" have been derived from global gene expression data that assign breast cancers to good and poor prognosis groups. A remarkable feature of these molecular signatures is that they have few genes in common, prompting speculation that they may use distinct genes to measure the same pathophysiological process(es), such as proliferation. However, this supposition has not been rigorously tested. If gene-based classifiers function by measuring a minimal number of cellular processes, we hypothesized that the informative genes for these processes could be identified and the data sets could be adjusted for the predictive contributions of those genes. Such adjustment would then attenuate the predictive function of any signature measuring that same process.

Results

We tested this hypothesis directly using a novel iterative-subtractive approach. We evaluated five gene expression data sets that sample a broad range of breast cancer subtypes. In all data sets, the dominant cluster capable of predicting metastasis was heavily populated by genes that fluctuate in concert with the cell cycle. When six well-characterized classifiers were examined, all contained a higher than expected proportion of genes that correlate with this cluster. Furthermore, when the data sets were globally adjusted for the cell cycle cluster, each classifier lost its ability to assign tumors to appropriate high and low risk groups. In contrast, adjusting for other predictive gene clusters did not impact their performance.

Conclusion

These data indicate that the discriminative ability of breast cancer classifiers is dependent upon genes that correlate with cell cycle progression.  相似文献   

3.

Background

A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.

Results

We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data.

Conclusion

We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.  相似文献   

4.

Background

Tumor therapy mainly attacks the metabolism to interfere the tumor's anabolism and signaling of proliferative second messengers. However, the metabolic demands of different cancers are very heterogeneous and depend on their origin of tissue, age, gender and other clinical parameters. We investigated tumor specific regulation in the metabolism of breast cancer.

Methods

For this, we mapped gene expression data from microarrays onto the corresponding enzymes and their metabolic reaction network. We used Haar Wavelet transforms on optimally arranged grid representations of metabolic pathways as a pattern recognition method to detect orchestrated regulation of neighboring enzymes in the network. Significant combined expression patterns were used to select metabolic pathways showing shifted regulation of the aggressive tumors.

Results

Besides up-regulation for energy production and nucleotide anabolism, we found an interesting cellular switch in the interplay of biosynthesis of steroids and bile acids. The biosynthesis of steroids was up-regulated for estrogen synthesis which is needed for proliferative signaling in breast cancer. In turn, the decomposition of steroid precursors was blocked by down-regulation of the bile acid pathway.

Conclusion

We applied an intelligent pattern recognition method for analyzing the regulation of metabolism and elucidated substantial regulation of human breast cancer at the interplay of cholesterol biosynthesis and bile acid metabolism pointing to specific breast cancer treatment.  相似文献   

5.
6.

Background

Oestrogen receptor (ER) positive (luminal) tumours account for the largest proportion of females with breast cancer. Theirs is a heterogeneous disease presenting clinical challenges in managing their treatment. Three main biological luminal groups have been identified but clinically these can be distilled into two prognostic groups in which Luminal A are accorded good prognosis and Luminal B correlate with poor prognosis. Further biomarkers are needed to attain classification consensus. Machine learning approaches like Artificial Neural Networks (ANNs) have been used for classification and identification of biomarkers in breast cancer using high throughput data. In this study, we have used an artificial neural network (ANN) approach to identify DACH1 as a candidate luminal marker and its role in predicting clinical outcome in breast cancer is assessed.

Materials and methods

A reiterative ANN approach incorporating a network inferencing algorithm was used to identify ER-associated biomarkers in a publically available cDNA microarray dataset. DACH1 was identified in having a strong influence on ER associated markers and a positive association with ER. Its clinical relevance in predicting breast cancer specific survival was investigated by statistically assessing protein expression levels after immunohistochemistry in a series of unselected breast cancers, formatted as a tissue microarray.

Results

Strong nuclear DACH1 staining is more prevalent in tubular and lobular breast cancer. Its expression correlated with ER-alpha positive tumours expressing PgR, epithelial cytokeratins (CK)18/19 and ‘luminal-like’ markers of good prognosis including FOXA1 and RERG (p<0.05). DACH1 is increased in patients showing longer cancer specific survival and disease free interval and reduced metastasis formation (p<0.001). Nuclear DACH1 showed a negative association with markers of aggressive growth and poor prognosis.

Conclusion

Nuclear DACH1 expression appears to be a Luminal A biomarker predictive of good prognosis, but is not independent of clinical stage, tumour size, NPI status or systemic therapy.  相似文献   

7.

Aims

We will examine the latest advances in genomic and proteomic laboratory technology. Through an extensive literature review we aim to critically appraise those studies which have utilized these latest technologies and ascertain their potential to identify clinically useful biomarkers.

Methods

An extensive review of the literature was carried out in both online medical journals and through the Royal College of Surgeons in Ireland library.

Results

Laboratory technology has advanced in the fields of genomics and oncoproteomics. Gene expression profiling with DNA microarray technology has allowed us to begin genetic profiling of colorectal cancer tissue. The response to chemotherapy can differ amongst individual tumors. For the first time researchers have begun to isolate and identify the genes responsible. New laboratory techniques allow us to isolate proteins preferentially expressed in colorectal cancer tissue. This could potentially lead to identification of a clinically useful protein biomarker in colorectal cancer screening and treatment.

Conclusion

If a set of discriminating genes could be used for characterization and prediction of chemotherapeutic response, an individualized tailored therapeutic regime could become the standard of care for those undergoing systemic treatment for colorectal cancer. New laboratory techniques of protein identification may eventually allow identification of a clinically useful biomarker that could be used for screening and treatment. At present however, both expression of different gene signatures and isolation of various protein peaks has been limited by study size. Independent multi-centre correlation of results with larger sample sizes is needed to allow translation into clinical practice.  相似文献   

8.

Background

Due to advances in next generation sequencing technologies and corresponding reductions in cost, it is now attainable to investigate genome-wide gene expression and variants at a patient-level, so as to better understand and anticipate heterogeneous responses to therapy. Consequently, it is feasible to inform personalized drug treatment decisions using personal genomics data. However, these efforts are limited due to a lack of reliable computational approaches for predicting effective drugs for individual patients. The reverse gene set enrichment analysis (i.e., connectivity mapping) approach and its variants have been widely and successfully used for drug prediction. However, the performance of these methods is limited by undefined mechanism of action (MoA) of drugs and reliance on cohorts of patients rather than personalized predictions for individual patients.

Results

In this study, we have developed and evaluated a computational approach, known as Mechanism and Drug Miner (MD-Miner), using a network-based computational approach to predict effective drugs and reveal potential drug mechanisms of action at the level of signaling pathways. Specifically, the patient-specific signaling network is constructed by integrating known disease associated genes with patient-derived gene expression profiles. In parallel, a drug mechanism of action network is constructed by integrating drug targets and z-score profiles of drug-induced gene expression (pre vs. post-drug treatment). Potentially effective candidate drugs are prioritized according to the number of common genes between the patient-specific dysfunctional signaling network and drug MoA network. We evaluated the MD-Miner method on the PC-3 prostate cancer cell line, and showed that it significantly improved the success rate of discovering effective drugs compared with the random selection, and could provide insight into potential mechanisms of action.

Conclusions

This work provides a signaling network-based drug repositioning approach. Compared with the reverse gene signature based drug repositioning approaches, the proposed method can provide clues of mechanism of action in terms of signaling transduction networks.
  相似文献   

9.
Accurate molecular classification of cancer using simple rules   总被引:1,自引:0,他引:1  

Background

One intractable problem with using microarray data analysis for cancer classification is how to reduce the extremely high-dimensionality gene feature data to remove the effects of noise. Feature selection is often used to address this problem by selecting informative genes from among thousands or tens of thousands of genes. However, most of the existing methods of microarray-based cancer classification utilize too many genes to achieve accurate classification, which often hampers the interpretability of the models. For a better understanding of the classification results, it is desirable to develop simpler rule-based models with as few marker genes as possible.

Methods

We screened a small number of informative single genes and gene pairs on the basis of their depended degrees proposed in rough sets. Applying the decision rules induced by the selected genes or gene pairs, we constructed cancer classifiers. We tested the efficacy of the classifiers by leave-one-out cross-validation (LOOCV) of training sets and classification of independent test sets.

Results

We applied our methods to five cancerous gene expression datasets: leukemia (acute lymphoblastic leukemia [ALL] vs. acute myeloid leukemia [AML]), lung cancer, prostate cancer, breast cancer, and leukemia (ALL vs. mixed-lineage leukemia [MLL] vs. AML). Accurate classification outcomes were obtained by utilizing just one or two genes. Some genes that correlated closely with the pathogenesis of relevant cancers were identified. In terms of both classification performance and algorithm simplicity, our approach outperformed or at least matched existing methods.

Conclusion

In cancerous gene expression datasets, a small number of genes, even one or two if selected correctly, is capable of achieving an ideal cancer classification effect. This finding also means that very simple rules may perform well for cancerous class prediction.  相似文献   

10.

Background

In human breast cancer normal mammary cells typically develop into hyperplasia, ductal carcinoma in situ, invasive cancer, and metastasis. The changes in gene expression associated with this stepwise progression are unclear. Mice transgenic for mouse mammary tumor virus (MMTV)-Wnt-1 exhibit discrete steps of mammary tumorigenesis, including hyperplasia, invasive ductal carcinoma, and distant metastasis. These mice might therefore be useful models for discovering changes in gene expression during cancer development.

Results

We used cDNA microarrays to determine the expression profiles of five normal mammary glands, seven hyperplastic mammary glands and 23 mammary tumors from MMTV-Wnt-1 transgenic mice, and 12 mammary tumors from MMTV-Neu transgenic mice. Adipose tissues were used to control for fat cells in the vicinity of the mammary glands. In these analyses, we found that the progression of normal virgin mammary glands to hyperplastic tissues and to mammary tumors is accompanied by differences in the expression of several hundred genes at each step. Some of these differences appear to be unique to the effects of Wnt signaling; others seem to be common to tumors induced by both Neu and Wnt-1 oncogenes.

Conclusion

We described gene-expression patterns associated with breast-cancer development in mice, and identified genes that may be significant targets for oncogenic events. The expression data developed provide a resource for illuminating the molecular mechanisms involved in breast cancer development, especially through the identification of genes that are critical in cancer initiation and progression.  相似文献   

11.

Purpose

This study aims to explore gene expression signatures and serum biomarkers to predict intrinsic chemoresistance in epithelial ovarian cancer (EOC).

Patients and Methods

Gene expression profiling data of 322 high-grade EOC cases between 2009 and 2010 in The Cancer Genome Atlas project (TCGA) were used to develop and validate gene expression signatures that could discriminate different responses to first-line platinum/paclitaxel-based treatments. A gene regulation network was then built to further identify hub genes responsible for differential gene expression between the complete response (CR) group and the progressive disease (PD) group. Further, to find more robust serum biomarkers for clinical application, we integrated our gene signatures and gene signatures reported previously to identify secretory protein-encoding genes by searching the DAVID database. In the end, gene-drug interaction network was constructed by searching Comparative Toxicogenomics Database (CTD) and literature.

Results

A 349-gene predictive model and an 18-gene model independent of key clinical features with high accuracy were developed for prediction of chemoresistance in EOC. Among them, ten important hub genes and six critical signaling pathways were identified to have important implications in chemotherapeutic response. Further, ten potential serum biomarkers were identified for predicting chemoresistance in EOC. Finally, we suggested some drugs for individualized treatment.

Conclusion

We have developed the predictive models and serum biomarkers for platinum/paclitaxel response and established the new approach to discover potential serum biomarkers from gene expression profiles. The potential drugs that target hub genes are also suggested.  相似文献   

12.

Background

The use of biological annotation such as genes and pathways in the analysis of gene expression data has aided the identification of genes for follow-up studies and suggested functional information to uncharacterized genes. Several studies have applied similar methods to genome wide association studies and identified a number of disease related pathways. However, many questions remain on how to best approach this problem, such as whether there is a need to obtain a score to summarize association evidence at the gene level, and whether a pathway, dominated by just a few highly significant genes, is of interest.

Methods

We evaluated the performance of two pathway-based methods (Random Set, and Binomial approximation to the hypergeometric test) based on their applications to three data sets of Crohn's disease. We consider both the disease status as a phenotype as well as the residuals after conditioning on IL23R, a known Crohn's related gene, as a phenotype.

Results

Our results show that Random Set method has the most power to identify disease related pathways. We confirm previously reported disease related pathways and provide evidence for IL-2 Receptor Beta Chain in T cell Activation and IL-9 signaling as Crohn's disease associated pathways.

Conclusions

Our results highlight the need to apply powerful gene score methods prior to pathway enrichment tests, and that controlling for genes that attain genome wide significance enable further biological insight.  相似文献   

13.

Background

Glioblastoma is the most aggressive form of brain tumors showing resistance to treatment with various chemotherapeutic agents. The most effective way to eradicate glioblastoma requires the concurrent inhibition of multiple signaling pathways and target molecules involved in the progression of glioblastoma. Recently, we obtained a series of 1,2,3,4-tetrahydroisoquinoline alkaloids with potent anti-cancer activities, including ecteinascidin-770 (ET-770; the compound 1a) and renieramycin M (RM; the compound 2a) from Thai marine invertebrates, together with a 2’-N-4”-pyridinecarbonyl derivative of ET-770 (the compound 3). We attempted to characterize the molecular pathways responsible for cytotoxic effects of these compounds on a human glioblastoma cell line U373MG.

Methods

We studied the genome-wide gene expression profile on microarrays and molecular networks by using pathway analysis tools of bioinformatics.

Results

All of these compounds induced apoptosis of U373MG cells at nanomolar concentrations. The compound 3 reduced the expression of 417 genes and elevated the levels of 84 genes, while ET-770 downregulated 426 genes and upregulated 45 genes. RM decreased the expression of 274 genes and increased the expression of 9 genes. The set of 196 downregulated genes and 6 upregulated genes showed an overlap among all the compounds, suggesting an existence of the common pathways involved in induction of apoptosis. We identified the ErbB (EGFR) signaling pathway as one of the common pathways enriched in the set of downregulated genes, composed of PTK2, AKT3, and GSK3B serving as key molecules that regulate cell movement and the nervous system development. Furthermore, a GSK3B-specific inhibitor induced apoptosis of U373MG cells, supporting an anti-apoptotic role of GSK3B.

Conclusion

Molecular network analysis is a useful approach not only to characterize the glioma-relevant pathways but also to identify the network-based effective drug targets.  相似文献   

14.

Background

Integrative analysis on multi-omics data has gained much attention recently. To investigate the interactive effect of gene expression and DNA methylation on cancer, we propose a directed random walk-based approach on an integrated gene-gene graph that is guided by pathway information.

Methods

Our approach first extracts a single pathway profile matrix out of the gene expression and DNA methylation data by performing the random walk over the integrated graph. We then apply a denoising autoencoder to the pathway profile to further identify important pathway features and genes. The extracted features are validated in the survival prediction task for breast cancer patients.

Results

The results show that the proposed method substantially improves the survival prediction performance compared to that of other pathway-based prediction methods, revealing that the combined effect of gene expression and methylation data is well reflected in the integrated gene-gene graph combined with pathway information. Furthermore, we show that our joint analysis on the methylation features and gene expression profile identifies cancer-specific pathways with genes related to breast cancer.

Conclusions

In this study, we proposed a DRW-based method on an integrated gene-gene graph with expression and methylation profiles in order to utilize the interactions between them. The results showed that the constructed integrated gene-gene graph can successfully reflect the combined effect of methylation features on gene expression profiles. We also found that the selected features by DA can effectively extract topologically important pathways and genes specifically related to breast cancer.
  相似文献   

15.
16.

Background

The Signal-to-Noise-Ratio (SNR) is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs). These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems.

Results and discussion

To evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer). For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC), Nearest Mean Classifier (NMC), Support Vector Machine (SVM) classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA) and one-vs-one (OVO) strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the same computational protocols. The dominant genes are usually easy to find while good dormant genes may not always be available as dormant genes require stronger constraints to be satisfied; but when they are available, they can be used for authentication of diagnosis.

Conclusion

Since GDI based schemes can find a small set of dominant/dormant biomarkers that is adequate to design diagnostic prediction systems, it opens up the possibility of using real-time qPCR assays or antibody based methods such as ELISA for an easy and low cost diagnosis of diseases. The dominant and dormant genes found by GDIs can be used in different ways to design more reliable diagnostic prediction systems.  相似文献   

17.

Background

Breast cancer is the most common type of invasive cancer in woman. It accounts for approximately 18% of all cancer deaths worldwide. It is well known that somatic mutation plays an essential role in cancer development. Hence, we propose that a prognostic prediction model that integrates somatic mutations with gene expression can improve survival prediction for cancer patients and also be able to reveal the genetic mutations associated with survival.

Method

Differential expression analysis was used to identify breast cancer related genes. Genetic algorithm (GA) and univariate Cox regression analysis were applied to filter out survival related genes. DAVID was used for enrichment analysis on somatic mutated gene set. The performance of survival predictors were assessed by Cox regression model and concordance index(C-index).

Results

We investigated the genome-wide gene expression profile and somatic mutations of 1091 breast invasive carcinoma cases from The Cancer Genome Atlas (TCGA). We identified 118 genes with high hazard ratios as breast cancer survival risk gene candidates (log rank p?<? 0.0001 and c-index?=?0.636). Multiple breast cancer survival related genes were found in this gene set, including FOXR2, FOXD1, MTNR1B and SDC1. Further genetic algorithm (GA) revealed an optimal gene set consisted of 88 genes with higher c-index (log rank p?<? 0.0001 and c-index?=?0.656). We validated this gene set on an independent breast cancer data set and achieved a similar performance (log rank p?<? 0.0001 and c-index?=?0.614). Moreover, we revealed 25 functional annotations, 15 gene ontology terms and 14 pathways that were significantly enriched in the genes that showed distinct mutation patterns in the different survival risk groups. These functional gene sets were used as new features for the survival prediction model. In particular, our results suggested that the Fanconi anemia pathway had an important role in breast cancer prognosis.

Conclusions

Our study indicated that the expression levels of the gene signatures remain the effective indicators for breast cancer survival prediction. Combining the gene expression information with other types of features derived from somatic mutations can further improve the performance of survival prediction. The pathways that were associated with survival risk suggested by our study can be further investigated for improving cancer patient survival.
  相似文献   

18.
To develop accurate prognostic models is one of the biggest challenges in “omics”-based cancer research. Here, we propose a novel computational method for identifying dysregulated gene subnetworks as biomarkers to predict cancer recurrence. Applying our method to the DNA methylome of endometrial cancer patients, we identified a subnetwork consisting of differentially methylated (DM) genes, and non-differentially methylated genes, termed Epigenetic Connectors (EC), that are topologically important for connecting the DM genes in a protein-protein interaction network. The ECs are statistically significantly enriched in well-known tumorgenesis and metastasis pathways, and include known epigenetic regulators. Importantly, combining the DMs and ECs as features using a novel random walk procedure, we constructed a support vector machine classifier that significantly improved the prediction accuracy of cancer recurrence and outperformed several alternative methods, demonstrating the effectiveness of our network-based approach.  相似文献   

19.
20.

Background

Breast cancer is the most common malignancy among women worldwide in terms of incidence and mortality. About 10% of North American women will be diagnosed with breast cancer during their lifetime and 20% of those will die of the disease. Breast cancer is a heterogeneous disease and biomarkers able to correctly classify patients into prognostic groups are needed to better tailor treatment options and improve outcomes. One powerful method used for biomarker discovery is sample screening with mass spectrometry, as it allows direct comparison of protein expression between normal and pathological states. The purpose of this study was to use a systematic and objective method to identify biomarkers with possible prognostic value in breast cancer patients, particularly in identifying cases most likely to have lymph node metastasis and to validate their prognostic ability using breast cancer tissue microarrays.

Methods and Findings

Differential proteomic analyses were employed to identify candidate biomarkers in primary breast cancer patients. These analyses identified decorin (DCN) and endoplasmin (HSP90B1) which play important roles regulating the tumour microenvironment and in pathways related to tumorigenesis. This study indicates that high expression of Decorin is associated with lymph node metastasis (p<0.001), higher number of positive lymph nodes (p<0.0001) and worse overall survival (p = 0.01). High expression of HSP90B1 is associated with distant metastasis (p<0.0001) and decreased overall survival (p<0.0001) these patients also appear to benefit significantly from hormonal treatment.

Conclusions

Using quantitative proteomic profiling of primary breast cancers, two new promising prognostic and predictive markers were found to identify patients with worse survival. In addition HSP90B1 appears to identify a group of patients with distant metastasis with otherwise good prognostic features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号