期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multi-class cancer subtype classification based on gene expression signatures with reliability analysis

Fu LM Fu-Liu CS 《FEBS letters》2004,561(1-3):186-190

Differential diagnosis among a group of histologically similar cancers poses a challenging problem in clinical medicine. Constructing a classifier based on gene expression signatures comprising multiple discriminatory molecular markers derived from microarray data analysis is an emerging trend for cancer diagnosis. To identify the best genes for classification using a small number of samples relative to the genome size remains the bottleneck of this approach, despite its promise. We have devised a new method of gene selection with reliability analysis, and demonstrated that this method can identify a more compact set of genes than other methods for constructing a classifier with optimum predictive performance for both small round blue cell tumors and leukemia. High consensus between our result and the results produced by methods based on artificial neural networks and statistical techniques confers additional evidence of the validity of our method. This study suggests a way for implementing a reliable molecular cancer classifier based on gene expression signatures. 相似文献

2.

Mining gene expression profiles: expression signatures as cancer phenotypes 总被引：6，自引：0，他引：6

Nevins JR Potti A 《Nature reviews. Genetics》2007,8(8):601-609

Many examples highlight the power of gene expression profiles, or signatures, to inform an understanding of biological phenotypes. This is perhaps best seen in the context of cancer, where expression signatures have tremendous power to identify new subtypes and to predict clinical outcomes. Although the ability to interpret the meaning of the individual genes in these signatures remains a challenge, this does not diminish the power of the signature to characterize biological states. The use of these signatures as surrogate phenotypes has been particularly important, linking diverse experimental systems that dissect the complexity of biological systems with the in vivo setting in a way that was not previously feasible. 相似文献

3.

Changes in gene expression and cellular architecture in an ovarian cancer progression model

Creekmore AL Silkworth WT Cimini D Jensen RV Roberts PC Schmelz EM 《PloS one》2011,6(3):e17676

Background

Ovarian cancer is the fifth leading cause of cancer deaths among women. Early stage disease often remains undetected due the lack of symptoms and reliable biomarkers. The identification of early genetic changes could provide insights into novel signaling pathways that may be exploited for early detection and treatment.

Methodology/Principal Findings

Mouse ovarian surface epithelial (MOSE) cells were used to identify stage-dependent changes in gene expression levels and signal transduction pathways by mouse whole genome microarray analyses and gene ontology. These cells have undergone spontaneous transformation in cell culture and transitioned from non-tumorigenic to intermediate and aggressive, malignant phenotypes. Significantly changed genes were overrepresented in a number of pathways, most notably the cytoskeleton functional category. Concurrent with gene expression changes, the cytoskeletal architecture became progressively disorganized, resulting in aberrant expression or subcellular distribution of key cytoskeletal regulatory proteins (focal adhesion kinase, α-actinin, and vinculin). The cytoskeletal disorganization was accompanied by altered patterns of serine and tyrosine phosphorylation as well as changed expression and subcellular localization of integral signaling intermediates APC and PKCβII.

Conclusions/Significance

Our studies have identified genes that are aberrantly expressed during MOSE cell neoplastic progression. We show that early stage dysregulation of actin microfilaments is followed by progressive disorganization of microtubules and intermediate filaments at later stages. These stage-specific, step-wise changes provide further insights into the time and spatial sequence of events that lead to the fully transformed state since these changes are also observed in aggressive human ovarian cancer cell lines independent of their histological type. Moreover, our studies support a link between aberrant cytoskeleton organization and regulation of important downstream signaling events that may be involved in cancer progression. Thus, our MOSE-derived cell model represents a unique model for in depth mechanistic studies of ovarian cancer progression. 相似文献

4.

Ensemble dependence model for classification and prediction of cancer and normal gene expression data

Qiu P Wang ZJ Liu KJ 《Bioinformatics (Oxford, England)》2005,21(14):3114-3121

MOTIVATION: DNA microarray technologies make it possible to simultaneously monitor thousands of genes' expression levels. A topic of great interest is to study the different expression profiles between microarray samples from cancer patients and normal subjects, by classifying them at gene expression levels. Currently, various clustering methods have been proposed in the literature to classify cancer and normal samples based on microarray data, and they are predominantly data-driven approaches. In this paper, we propose an alternative approach, a model-driven approach, which can reveal the relationship between the global gene expression profile and the subject's health status, and thus is promising in predicting the early development of cancer. RESULTS: In this work, we propose an ensemble dependence model, aimed at exploring the group dependence relationship of gene clusters. Under the framework of hypothesis-testing, we employ genes' dependence relationship as a feature to model and classify cancer and normal samples. The proposed classification scheme is applied to several real cancer datasets, including cDNA, Affymetrix microarray and proteomic data. It is noted that the proposed method yields very promising performance. We further investigate the eigenvalue pattern of the proposed method, and we discover different patterns between cancer and normal samples. Moreover, the transition between cancer and normal patterns suggests that the eigenvalue pattern of the proposed models may have potential to predict the early stage of cancer development. In addition, we examine the effects of possible model mismatch on the proposed scheme. 相似文献

5.

Circadian signatures in rat liver: from gene expression to pathways

Meric A Ovacik Siddharth Sukumaran Richard R Almon Debra C DuBois William J Jusko Ioannis P Androulakis 《BMC bioinformatics》2010,11(1):540

相似文献

6.

BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data

Yang Guo Shuhui Liu Zhanhuai Li Xuequn Shang 《BMC bioinformatics》2018,19(5):118

Background

The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data.

Results

In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification.

Conclusions

The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.

相似文献

7.

New algorithms for multi-class cancer diagnosis using tumor gene expression signatures

Bagirov AM Ferguson B Ivkovic S Saunders G Yearwood J 《Bioinformatics (Oxford, England)》2003,19(14):1800-1807

MOTIVATION: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. RESULTS: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set. AVAILABILITY: Available on request from the authors. 相似文献

8.

Regularization strategies for hyperplane classifiers: application to cancer classification with gene expression data

Andries E Hagstrom T Atlas SR Willman C 《Journal of bioinformatics and computational biology》2007,5(1):79-104

Linear discrimination, from the point of view of numerical linear algebra, can be treated as solving an ill-posed system of linear equations. In order to generate a solution that is robust in the presence of noise, these problems require regularization. Here, we examine the ill-posedness involved in the linear discrimination of cancer gene expression data with respect to outcome and tumor subclasses. We show that a filter factor representation, based upon Singular Value Decomposition, yields insight into the numerical ill-posedness of the hyperplane-based separation when applied to gene expression data. We also show that this representation yields useful diagnostic tools for guiding the selection of classifier parameters, thus leading to improved performance. 相似文献

9.

Multi-class cancer classification through gene expression profiles:microRNA versus mRNA

Sihua Peng Xiaomin Zeng Xiaobo Li Xiaoning Peng Liangbiao Chen 《遗传学报》2009,36(7):409-416

Both microRNA （miRNA） and mRNA expression profiles are important methods for cancer type classification. A comparative study of their classification performance will be helpful in choosing the means of classification. Here we evaluated the classification performance of miRNA and mRNA profiles using a new data mining approach based on a novel SVM （Support Vector Machines） based recursive fea- ture elimination （nRFE） algorithm. Computational experiments showed that information encoded in miRNAs is not sufficient to classify cancers; gut-derived samples cluster more accurately when using mRNA expression profiles compared with using miRNA profiles; and poorly differentiated tumors （PDT） could be classified by mRNA expression profiles at the accuracy of 100% versus 93.8% when using miRNA profiles. Furthermore, we showed that mRNA expression profiles have higher capacity in normal tissue classifications than miRNA. We concluded that classification performance using mRNA profiles is superior to that of miRNA profiles in multiple-class cancer classifications. 相似文献

10.

Prognostic classification of breast cancer and gene expression profiling

Bertucci F Finetti P Cervera N Birnbaum D 《Médecine sciences : M/S》2008,24(6-7):599-606

Clinical and pathological heterogeneity of breast cancer, partly responsible of therapeutic failures, reflects complex and combinatory molecular alterations until now poorly documented by classical investigation tools. Thorough molecular typing is crucial. The advent of DNA microarray-based gene expression profiling allowed consistent progresses in this direction. A novel molecular taxonomy of breast cancer has been defined, signatures that predict clinical outcome or therapeutic response have been identified, some of them being tested in ongoing prospective clinical trials. In this review, we present the main results and their potential clinical applications. We also discuss their current limits and future hopes in the therapeutic management of patients. 相似文献

11.

Transfer of clinically relevant gene expression signatures in breast cancer: from Affymetrix microarray to Illumina RNA-Sequencing technology

Debora Fumagalli Alexis Blanchet-Cohen David Brown Christine Desmedt David Gacquer Stefan Michiels Fran?oise Rothé Samira Majjaj Roberto Salgado Denis Larsimont Michail Ignatiadis Marion Maetens Martine Piccart Vincent Detours Christos Sotiriou Benjamin Haibe-Kains 《BMC genomics》2014,15(1)

相似文献

12.

Reliable gene signatures for microarray classification: assessment of stability and performance

Davis CA Gerick F Hintermair V Friedel CC Fundel K Küffner R Zimmer R 《Bioinformatics (Oxford, England)》2006,22(19):2356-2363

MOTIVATION: Two important questions for the analysis of gene expression measurements from different sample classes are (1) how to classify samples and (2) how to identify meaningful gene signatures (ranked gene lists) exhibiting the differences between classes and sample subsets. Solutions to both questions have immediate biological and biomedical applications. To achieve optimal classification performance, a suitable combination of classifier and gene selection method needs to be specifically selected for a given dataset. The selected gene signatures can be unstable and the resulting classification accuracy unreliable, particularly when considering different subsets of samples. Both unstable gene signatures and overestimated classification accuracy can impair biological conclusions. METHODS: We address these two issues by repeatedly evaluating the classification performance of all models, i.e. pairwise combinations of various gene selection and classification methods, for random subsets of arrays (sampling). A model score is used to select the most appropriate model for the given dataset. Consensus gene signatures are constructed by extracting those genes frequently selected over many samplings. Sampling additionally permits measurement of the stability of the classification performance for each model, which serves as a measure of model reliability. RESULTS: We analyzed a large gene expression dataset with 78 measurements of four different cartilage sample classes. Classifiers trained on subsets of measurements frequently produce models with highly variable performance. Our approach provides reliable classification performance estimates via sampling. In addition to reliable classification performance, we determined stable consensus signatures (i.e. gene lists) for sample classes. Manual literature screening showed that these genes are highly relevant to our gene expression experiment with osteoarthritic cartilage. We compared our approach to others based on a publicly available dataset on breast cancer. AVAILABILITY: R package at http://www.bio.ifi.lmu.de/~davis/edaprakt 相似文献

13.

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis 总被引：8，自引：0，他引：8

Statnikov A Aliferis CF Tsamardinos I Hardin D Levy S 《Bioinformatics (Oxford, England)》2005,21(5):631-643

MOTIVATION: Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types. RESULTS: Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets. AVAILABILITY: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use. CONTACT: alexander.statnikov@vanderbilt.edu. 相似文献

14.

Most random gene expression signatures are significantly associated with breast cancer outcome

Venet D Dumont JE Detours V 《PLoS computational biology》2011,7(10):e1002240

相似文献

15.

A blocking strategy to improve gene selection for classification of gene expression data

Bontempi G 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(2):293-300

Because of high dimensionality, machine learning algorithms typically rely on feature selection techniques in order to perform effective classification in microarray gene expression data sets. However, the large number of features compared to the number of samples makes the task of feature selection computationally hard and prone to errors. This paper interprets feature selection as a task of stochastic optimization, where the goal is to select among an exponential number of alternative gene subsets the one expected to return the highest generalization in classification. Blocking is an experimental design strategy which produces similar experimental conditions to compare alternative stochastic configurations in order to be confident that observed differences in accuracy are due to actual differences rather than to fluctuations and noise effects. We propose an original blocking strategy for improving feature selection which aggregates in a paired way the validation outcomes of several learning algorithms to assess a gene subset and compare it to others. This is a novelty with respect to conventional wrappers, which commonly adopt a sole learning algorithm to evaluate the relevance of a given set of variables. The rationale of the approach is that, by increasing the amount of experimental conditions under which we validate a feature subset, we can lessen the problems related to the scarcity of samples and consequently come up with a better selection. The paper shows that the blocking strategy significantly improves the performance of a conventional forward selection for a set of 16 publicly available cancer expression data sets. The experiments involve six different classifiers and show that improvements take place independent of the classification algorithm used after the selection step. Two further validations based on available biological annotation support the claim that blocking strategies in feature selection may improve the accuracy and the quality of the solution. The first validation is based on retrieving PubMEd abstracts associated to the selected genes and matching them to regular expressions describing the biological phenomenon underlying the expression data sets. The biological validation that follows is based on the use of the Bioconductor package GoStats in order to perform Gene Ontology statistical analysis. 相似文献

16.

Graph-based identification of cancer signaling pathways from published gene expression signatures using PubLiME

下载免费PDF全文

Finocchiaro G Mancuso FM Cittaro D Muller H 《Nucleic acids research》2007,35(7):2343-2355

Gene expression technology has become a routine application in many laboratories and has provided large amounts of gene expression signatures that have been identified in a variety of cancer types. Interpretation of gene expression signatures would profit from the availability of a procedure capable of assigning differentially regulated genes or entire gene signatures to defined cancer signaling pathways. Here we describe a graph-based approach that identifies cancer signaling pathways from published gene expression signatures. Published gene expression signatures are collected in a database (PubLiME: Published Lists of Microarray Experiments) enabled for cross-platform gene annotation. Significant co-occurrence modules composed of up to 10 genes in different gene expression signatures are identified. Significantly co-occurring genes are linked by an edge in an undirected graph. Edge-betweenness and k-clique clustering combined with graph modularity as a quality measure are used to identify communities in the resulting graph. The identified communities consist of cell cycle, apoptosis, phosphorylation cascade, extra cellular matrix, interferon and immune response regulators as well as communities of unknown function. The genes constituting different communities are characterized by common genomic features and strongly enriched cis-regulatory modules in their upstream regulatory regions that are consistent with pathway assignment of those genes. 相似文献

17.

A simple but highly effective approach to evaluate the prognostic performance of gene expression signatures

Starmans MH Fung G Steck H Wouters BG Lambin P 《PloS one》2011,6(12):e28320

Background

Highly parallel analysis of gene expression has recently been used to identify gene sets or ‘signatures’ to improve patient diagnosis and risk stratification. Once a signature is generated, traditional statistical testing is used to evaluate its prognostic performance. However, due to the dimensionality of microarrays, this can lead to false interpretation of these signatures.

Principal Findings

A method was developed to test batches of a user-specified number of randomly chosen signatures in patient microarray datasets. The percentage of random generated signatures yielding prognostic value was assessed using ROC analysis by calculating the area under the curve (AUC) in six public available cancer patient microarray datasets. We found that a signature consisting of randomly selected genes has an average 10% chance of reaching significance when assessed in a single dataset, but can range from 1% to ∼40% depending on the dataset in question. Increasing the number of validation datasets markedly reduces this number.

Conclusions

We have shown that the use of an arbitrary cut-off value for evaluation of signature significance is not suitable for this type of research, but should be defined for each dataset separately. Our method can be used to establish and evaluate signature performance of any derived gene signature in a dataset by comparing its performance to thousands of randomly generated signatures. It will be of most interest for cases where few data are available and testing in multiple datasets is limited. 相似文献

18.

Web-based interrogation of gene expression signatures using EXALT

Jun Wu Qingchao Qiu Lu Xie Joseph Fullerton Jian Yu Yu Shyr Alfred L George Yajun Yi 《BMC bioinformatics》2009,10(1):420

Background

Widespread use of high-throughput techniques such as microarrays to monitor gene expression levels has resulted in an explosive growth of data sets in public domains. Integration and exploration of these complex and heterogeneous data have become a major challenge. 相似文献

19.

A meta-analysis of caloric restriction gene expression profiles to infer common signatures and regulatory mechanisms

Plank M Wuttke D van Dam S Clarke SA de Magalhães JP 《Molecular bioSystems》2012,8(4):1339-1349

相似文献

20.

Global gene expression analysis of early response to chemotherapy treatment in ovarian cancer spheroids

Sylvain L'Espérance Magdalena Bachvarova Bernard Tetu Anne-Marie Mes-Masson Dimcho Bachvarov 《BMC genomics》2008,9(1):1-21

相似文献