共查询到20条相似文献,搜索用时 0 毫秒
1.
Differential diagnosis among a group of histologically similar cancers poses a challenging problem in clinical medicine. Constructing a classifier based on gene expression signatures comprising multiple discriminatory molecular markers derived from microarray data analysis is an emerging trend for cancer diagnosis. To identify the best genes for classification using a small number of samples relative to the genome size remains the bottleneck of this approach, despite its promise. We have devised a new method of gene selection with reliability analysis, and demonstrated that this method can identify a more compact set of genes than other methods for constructing a classifier with optimum predictive performance for both small round blue cell tumors and leukemia. High consensus between our result and the results produced by methods based on artificial neural networks and statistical techniques confers additional evidence of the validity of our method. This study suggests a way for implementing a reliable molecular cancer classifier based on gene expression signatures. 相似文献
2.
Many examples highlight the power of gene expression profiles, or signatures, to inform an understanding of biological phenotypes. This is perhaps best seen in the context of cancer, where expression signatures have tremendous power to identify new subtypes and to predict clinical outcomes. Although the ability to interpret the meaning of the individual genes in these signatures remains a challenge, this does not diminish the power of the signature to characterize biological states. The use of these signatures as surrogate phenotypes has been particularly important, linking diverse experimental systems that dissect the complexity of biological systems with the in vivo setting in a way that was not previously feasible. 相似文献
3.
Mingyang Bao Lihua Zhang Yueqing Hu 《Journal of cellular and molecular medicine》2020,24(17):9972-9984
Ovarian cancer (OV) is one of the leading causes of cancer deaths in women worldwide. Late diagnosis and heterogeneous treatment result to poor survival outcomes for patients with OV. Therefore, we aimed to develop novel biomarkers for prognosis prediction from the potential molecular mechanism of tumorigenesis. Eight eligible data sets related to OV in GEO database were integrated to identify differential expression genes (DEGs) between tumour tissues and normal. Enrichment analyses discovered DEGs were most significantly enriched in G2/M checkpoint signalling pathway. Subsequently, we constructed a multi‐gene signature based on the LASSO Cox regression model in the TCGA database and time‐dependent ROC curves showed good predictive accuracy for 1‐, 3‐ and 5‐year overall survival. Utility in various types of OV was validated through subgroup survival analysis. Risk scores formulated by the multi‐gene signature stratified patients into high‐risk and low‐risk, and the former inclined worse overall survival than the latter. By incorporating this signature with age and pathological tumour stage, a visual predictive nomogram was established, which was useful for clinicians to predict survival outcome of patients. Furthermore, SNRPD1 and EFNA5 were selected from the multi‐gene signature as simplified prognostic indicators. Higher EFNA5 expression or lower SNRPD1 indicated poorer outcome. The correlation between signature gene expression and clinical characteristics was observed through WGCNA. Drug‐gene interaction was used to identify 16 potentially targeted drugs for OV treatment. In conclusion, we established novel gene signatures as independent prognostic factors to stratify the risk of OV patients and facilitate the implementation of personalized therapies. 相似文献
4.
5.
6.
Background
Ovarian cancer is the fifth leading cause of cancer deaths among women. Early stage disease often remains undetected due the lack of symptoms and reliable biomarkers. The identification of early genetic changes could provide insights into novel signaling pathways that may be exploited for early detection and treatment.Methodology/Principal Findings
Mouse ovarian surface epithelial (MOSE) cells were used to identify stage-dependent changes in gene expression levels and signal transduction pathways by mouse whole genome microarray analyses and gene ontology. These cells have undergone spontaneous transformation in cell culture and transitioned from non-tumorigenic to intermediate and aggressive, malignant phenotypes. Significantly changed genes were overrepresented in a number of pathways, most notably the cytoskeleton functional category. Concurrent with gene expression changes, the cytoskeletal architecture became progressively disorganized, resulting in aberrant expression or subcellular distribution of key cytoskeletal regulatory proteins (focal adhesion kinase, α-actinin, and vinculin). The cytoskeletal disorganization was accompanied by altered patterns of serine and tyrosine phosphorylation as well as changed expression and subcellular localization of integral signaling intermediates APC and PKCβII.Conclusions/Significance
Our studies have identified genes that are aberrantly expressed during MOSE cell neoplastic progression. We show that early stage dysregulation of actin microfilaments is followed by progressive disorganization of microtubules and intermediate filaments at later stages. These stage-specific, step-wise changes provide further insights into the time and spatial sequence of events that lead to the fully transformed state since these changes are also observed in aggressive human ovarian cancer cell lines independent of their histological type. Moreover, our studies support a link between aberrant cytoskeleton organization and regulation of important downstream signaling events that may be involved in cancer progression. Thus, our MOSE-derived cell model represents a unique model for in depth mechanistic studies of ovarian cancer progression. 相似文献7.
8.
MOTIVATION: DNA microarray technologies make it possible to simultaneously monitor thousands of genes' expression levels. A topic of great interest is to study the different expression profiles between microarray samples from cancer patients and normal subjects, by classifying them at gene expression levels. Currently, various clustering methods have been proposed in the literature to classify cancer and normal samples based on microarray data, and they are predominantly data-driven approaches. In this paper, we propose an alternative approach, a model-driven approach, which can reveal the relationship between the global gene expression profile and the subject's health status, and thus is promising in predicting the early development of cancer. RESULTS: In this work, we propose an ensemble dependence model, aimed at exploring the group dependence relationship of gene clusters. Under the framework of hypothesis-testing, we employ genes' dependence relationship as a feature to model and classify cancer and normal samples. The proposed classification scheme is applied to several real cancer datasets, including cDNA, Affymetrix microarray and proteomic data. It is noted that the proposed method yields very promising performance. We further investigate the eigenvalue pattern of the proposed method, and we discover different patterns between cancer and normal samples. Moreover, the transition between cancer and normal patterns suggests that the eigenvalue pattern of the proposed models may have potential to predict the early stage of cancer development. In addition, we examine the effects of possible model mismatch on the proposed scheme. 相似文献
9.
Bagirov AM Ferguson B Ivkovic S Saunders G Yearwood J 《Bioinformatics (Oxford, England)》2003,19(14):1800-1807
MOTIVATION: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. RESULTS: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set. AVAILABILITY: Available on request from the authors. 相似文献
10.
Background
The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data.Results
In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification.Conclusions
The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.11.
Andries E Hagstrom T Atlas SR Willman C 《Journal of bioinformatics and computational biology》2007,5(1):79-104
Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance. 相似文献
12.
13.
Both microRNA (miRNA) and mRNA expression profiles are important methods for cancer type classification. A comparative study of their classification performance will be helpful in choosing the means of classification. Here we evaluated the classification performance of miRNA and mRNA profiles using a new data mining approach based on a novel SVM (Support Vector Machines) based recursive fea- ture elimination (nRFE) algorithm. Computational experiments showed that information encoded in miRNAs is not sufficient to classify cancers; gut-derived samples cluster more accurately when using mRNA expression profiles compared with using miRNA profiles; and poorly differentiated tumors (PDT) could be classified by mRNA expression profiles at the accuracy of 100% versus 93.8% when using miRNA profiles. Furthermore, we showed that mRNA expression profiles have higher capacity in normal tissue classifications than miRNA. We concluded that classification performance using mRNA profiles is superior to that of miRNA profiles in multiple-class cancer classifications. 相似文献
14.
Clinical and pathological heterogeneity of breast cancer, partly responsible of therapeutic failures, reflects complex and combinatory molecular alterations until now poorly documented by classical investigation tools. Thorough molecular typing is crucial. The advent of DNA microarray-based gene expression profiling allowed consistent progresses in this direction. A novel molecular taxonomy of breast cancer has been defined, signatures that predict clinical outcome or therapeutic response have been identified, some of them being tested in ongoing prospective clinical trials. In this review, we present the main results and their potential clinical applications. We also discuss their current limits and future hopes in the therapeutic management of patients. 相似文献
15.
Davis CA Gerick F Hintermair V Friedel CC Fundel K Küffner R Zimmer R 《Bioinformatics (Oxford, England)》2006,22(19):2356-2363
MOTIVATION: Two important questions for the analysis of gene expression measurements from different sample classes are (1) how to classify samples and (2) how to identify meaningful gene signatures (ranked gene lists) exhibiting the differences between classes and sample subsets. Solutions to both questions have immediate biological and biomedical applications. To achieve optimal classification performance, a suitable combination of classifier and gene selection method needs to be specifically selected for a given dataset. The selected gene signatures can be unstable and the resulting classification accuracy unreliable, particularly when considering different subsets of samples. Both unstable gene signatures and overestimated classification accuracy can impair biological conclusions. METHODS: We address these two issues by repeatedly evaluating the classification performance of all models, i.e. pairwise combinations of various gene selection and classification methods, for random subsets of arrays (sampling). A model score is used to select the most appropriate model for the given dataset. Consensus gene signatures are constructed by extracting those genes frequently selected over many samplings. Sampling additionally permits measurement of the stability of the classification performance for each model, which serves as a measure of model reliability. RESULTS: We analyzed a large gene expression dataset with 78 measurements of four different cartilage sample classes. Classifiers trained on subsets of measurements frequently produce models with highly variable performance. Our approach provides reliable classification performance estimates via sampling. In addition to reliable classification performance, we determined stable consensus signatures (i.e. gene lists) for sample classes. Manual literature screening showed that these genes are highly relevant to our gene expression experiment with osteoarthritic cartilage. We compared our approach to others based on a publicly available dataset on breast cancer. AVAILABILITY: R package at http://www.bio.ifi.lmu.de/~davis/edaprakt 相似文献
16.
Bontempi G 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(2):293-300
Because of high dimensionality, machine learning algorithms typically rely on feature selection techniques in order to perform effective classification in microarray gene expression data sets. However, the large number of features compared to the number of samples makes the task of feature selection computationally hard and prone to errors. This paper interprets feature selection as a task of stochastic optimization, where the goal is to select among an exponential number of alternative gene subsets the one expected to return the highest generalization in classification. Blocking is an experimental design strategy which produces similar experimental conditions to compare alternative stochastic configurations in order to be confident that observed differences in accuracy are due to actual differences rather than to fluctuations and noise effects. We propose an original blocking strategy for improving feature selection which aggregates in a paired way the validation outcomes of several learning algorithms to assess a gene subset and compare it to others. This is a novelty with respect to conventional wrappers, which commonly adopt a sole learning algorithm to evaluate the relevance of a given set of variables. The rationale of the approach is that, by increasing the amount of experimental conditions under which we validate a feature subset, we can lessen the problems related to the scarcity of samples and consequently come up with a better selection. The paper shows that the blocking strategy significantly improves the performance of a conventional forward selection for a set of 16 publicly available cancer expression data sets. The experiments involve six different classifiers and show that improvements take place independent of the classification algorithm used after the selection step. Two further validations based on available biological annotation support the claim that blocking strategies in feature selection may improve the accuracy and the quality of the solution. The first validation is based on retrieving PubMEd abstracts associated to the selected genes and matching them to regular expressions describing the biological phenomenon underlying the expression data sets. The biological validation that follows is based on the use of the Bioconductor package GoStats in order to perform Gene Ontology statistical analysis. 相似文献
17.
18.
A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis 总被引:8,自引:0,他引:8
Statnikov A Aliferis CF Tsamardinos I Hardin D Levy S 《Bioinformatics (Oxford, England)》2005,21(5):631-643
MOTIVATION: Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types. RESULTS: Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets. AVAILABILITY: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use. CONTACT: alexander.statnikov@vanderbilt.edu. 相似文献
19.
Graph-based identification of cancer signaling pathways from published gene expression signatures using PubLiME 下载免费PDF全文
Gene expression technology has become a routine application in many laboratories and has provided large amounts of gene expression signatures that have been identified in a variety of cancer types. Interpretation of gene expression signatures would profit from the availability of a procedure capable of assigning differentially regulated genes or entire gene signatures to defined cancer signaling pathways. Here we describe a graph-based approach that identifies cancer signaling pathways from published gene expression signatures. Published gene expression signatures are collected in a database (PubLiME: Published Lists of Microarray Experiments) enabled for cross-platform gene annotation. Significant co-occurrence modules composed of up to 10 genes in different gene expression signatures are identified. Significantly co-occurring genes are linked by an edge in an undirected graph. Edge-betweenness and k-clique clustering combined with graph modularity as a quality measure are used to identify communities in the resulting graph. The identified communities consist of cell cycle, apoptosis, phosphorylation cascade, extra cellular matrix, interferon and immune response regulators as well as communities of unknown function. The genes constituting different communities are characterized by common genomic features and strongly enriched cis-regulatory modules in their upstream regulatory regions that are consistent with pathway assignment of those genes. 相似文献