首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We study statistical methods to detect cancer genes that are over- or down-expressed in some but not all samples in a disease group. This has proven useful in cancer studies where oncogenes are activated only in a small subset of samples. We propose the outlier robust t-statistic (ORT), which is intuitively motivated from the t-statistic, the most commonly used differential gene expression detection method. Using real and simulation studies, we compare the ORT to the recently proposed cancer outlier profile analysis (Tomlins and others, 2005) and the outlier sum statistic of Tibshirani and Hastie (2006). The proposed method often has more detection power and smaller false discovery rates. Supplementary information can be found at http://www.biostat.umn.edu/~baolin/research/ort.html.  相似文献   

2.
We present a new class discovery method for microarray gene expression data. Based on a collection of gene expression profiles from different tissue samples, the method searches for binary class distinctions in the set of samples that show clear separation in the expression levels of specific subsets of genes. Several mutually independent class distinctions may be found, which is difficult to obtain from most commonly used clustering algorithms. Each class distinction can be biologically interpreted in terms of its supporting genes. The mathematical characterization of the favored class distinctions is based on statistical concepts. By analyzing three data sets from cancer gene expression studies, we demonstrate that our method is able to detect biologically relevant structures, for example cancer subtypes, in an unsupervised fashion.  相似文献   

3.
Breast cancer outcome can be predicted using models derived from gene expression data or clinical data. Only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. We rigorously compare three different integration strategies (early, intermediate, and late integration) as well as classifiers employing no integration (only one data type) using five classifiers of varying complexity. We perform our analysis on a set of 295 breast cancer samples, for which gene expression data and an extensive set of clinical parameters are available as well as four breast cancer datasets containing 521 samples that we used as independent validation.mOn the 295 samples, a nearest mean classifier employing a logical OR operation (late integration) on clinical and expression classifiers significantly outperforms all other classifiers. Moreover, regardless of the integration strategy, the nearest mean classifier achieves the best performance. All five classifiers achieve their best performance when integrating clinical and expression data. Repeating the experiments using the 521 samples from the four independent validation datasets also indicated a significant performance improvement when integrating clinical and gene expression data. Whether integration also improves performances on other datasets (e.g. other tumor types) has not been investigated, but seems worthwhile pursuing. Our work suggests that future models for predicting breast cancer outcome should exploit both data types by employing a late OR or intermediate integration strategy based on nearest mean classifiers.  相似文献   

4.
5.
Meta-analysis of gene expression has enabled numerous insights into biological systems, but current methods have several limitations. We developed a method to perform a meta-analysis using the elastic net, a powerful and versatile approach for classification and regression. To demonstrate the utility of our method, we conducted a meta-analysis of lung cancer gene expression based on publicly available data. Using 629 samples from five data sets, we trained a multinomial classifier to distinguish between four lung cancer subtypes. Our meta-analysis-derived classifier included 58 genes and achieved 91% accuracy on leave-one-study-out cross-validation and on three independent data sets. Our method makes meta-analysis of gene expression more systematic and expands the range of questions that a meta-analysis can be used to address. As the amount of publicly available gene expression data continues to grow, our method will be an effective tool to help distill these data into knowledge.  相似文献   

6.
Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF) to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR) to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.  相似文献   

7.
Human diseases are often accompanied by histological changes that confound interpretation of molecular analyses and identification of disease-related effects. We developed population-specific expression analysis (PSEA), a computational method of analyzing gene expression in samples of varying composition that can improve analyses of quantitative molecular data in many biological contexts. PSEA of brains from individuals with Huntington's disease revealed myelin-related abnormalities that were undetected using standard differential expression analysis.  相似文献   

8.
MOTIVATION: DNA microarray technologies make it possible to simultaneously monitor thousands of genes' expression levels. A topic of great interest is to study the different expression profiles between microarray samples from cancer patients and normal subjects, by classifying them at gene expression levels. Currently, various clustering methods have been proposed in the literature to classify cancer and normal samples based on microarray data, and they are predominantly data-driven approaches. In this paper, we propose an alternative approach, a model-driven approach, which can reveal the relationship between the global gene expression profile and the subject's health status, and thus is promising in predicting the early development of cancer. RESULTS: In this work, we propose an ensemble dependence model, aimed at exploring the group dependence relationship of gene clusters. Under the framework of hypothesis-testing, we employ genes' dependence relationship as a feature to model and classify cancer and normal samples. The proposed classification scheme is applied to several real cancer datasets, including cDNA, Affymetrix microarray and proteomic data. It is noted that the proposed method yields very promising performance. We further investigate the eigenvalue pattern of the proposed method, and we discover different patterns between cancer and normal samples. Moreover, the transition between cancer and normal patterns suggests that the eigenvalue pattern of the proposed models may have potential to predict the early stage of cancer development. In addition, we examine the effects of possible model mismatch on the proposed scheme.  相似文献   

9.
Yi Y  Mirosevich J  Shyr Y  Matusik R  George AL 《Genomics》2005,85(3):401-412
Microarray technology can be used to assess simultaneously global changes in expression of mRNA or genomic DNA copy number among thousands of genes in different biological states. In many cases, it is desirable to determine if altered patterns of gene expression correlate with chromosomal abnormalities or assess expression of genes that are contiguous in the genome. We describe a method, differential gene locus mapping (DIGMAP), which aligns the known chromosomal location of a gene to its expression value deduced by microarray analysis. The method partitions microarray data into subsets by chromosomal location for each gene interrogated by an array. Microarray data in an individual subset can then be clustered by physical location of genes at a subchromosomal level based upon ordered alignment in genome sequence. A graphical display is generated by representing each genomic locus with a colored cell that quantitatively reflects its differential expression value. The clustered patterns can be viewed and compared based on their expression signatures as defined by differential values between control and experimental samples. In this study, DIGMAP was tested using previously published studies of breast cancer analyzed by comparative genomic hybridization (CGH) and prostate cancer gene expression profiles assessed by cDNA microarray experiments. Analysis of the breast cancer CGH data demonstrated the ability of DIGMAP to deduce gene amplifications and deletions. Application of the DIGMAP method to the prostate data revealed several carcinoma-related loci, including one at 16q13 with marked differential expression encompassing 19 known genes including 9 encoding metallothionein proteins. We conclude that DIGMAP is a powerful computational tool enabling the coupled analysis of microarray data with genome location.  相似文献   

10.
Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice.  相似文献   

11.
Cancer diagnosis depending on microarray technology has drawn more and more attention in the past few years. Accurate and fast diagnosis results make gene expression profiling produced from microarray widely used by a large range of researchers. Much research work highlights the importance of gene selection and gains good results. However, the minimum sets of genes derived from different methods are seldom overlapping and often inconsistent even for the same set of data, partially because of the complexity of cancer disease. In this paper, cancer classification was attempted in an alternative way of the whole gene expression profile for all samples instead of partial gene sets. Here, the three common sets of data were tested by NIPALS-KPLS method for acute leukemia, prostate cancer and lung cancer respectively. Compared to other conventional methods, the results showed wide improvement in classification accuracy. This paper indicates that sample profile of gene expression may be explored as a better indicator for cancer classification, which deserves further investigation.  相似文献   

12.
DNA microarray gene expression and microarray-based comparative genomic hybridization (aCGH) have been widely used for biomedical discovery. Because of the large number of genes and the complex nature of biological networks, various analysis methods have been proposed. One such method is "gene shaving," a procedure which identifies subsets of the genes with coherent expression patterns and large variation across samples. Since combining genomic information from multiple sources can improve classification and prediction of diseases, in this paper we proposed a new method, "ICA gene shaving" (ICA, independent component analysis), for jointly analyzing gene expression and copy number data. First we used ICA to analyze joint measurements, gene expression and copy number, of a biological system and project the data onto statistically independent biological processes. Next, we used these results to identify patterns of variation in the data and then applied an iterative shaving method. We investigated the properties of our proposed method by analyzing both simulated and real data. We demonstrated that the robustness of our method to noise using simulated data. Using breast cancer data, we showed that our method is superior to the Generalized Singular Value Decomposition (GSVD) gene shaving method for identifying genes associated with breast cancer.  相似文献   

13.
Wen L  Li W  Sobel M  Feng JA 《Proteins》2006,65(1):103-110
Molecular signaling events regulate cellular activity. Cancer stimulating signals trigger cellular responses that evade the regulatory control of cell development. To understand the mechanism of signaling regulation in cancer, it is necessary to identify the activated pathways in cancer. We have developed RepairPATH, a computational algorithm that explores the activated signaling pathways in cancer. The RepairPATH integrates RepairNET, an assembled protein interaction network associated with DNA damage response, with the gene expression profiles derived from the microarray data. Based on the observation that cofunctional proteins often exhibit correlated gene expression profiles, it identifies the activated signaling pathways in cancer by systematically searching the RepairNET for proteins with significantly correlated gene expression profiles. Analyzing the gene expression profiles of breast cancer, we found distinct similarities and differences in the activated signaling pathways between the samples from the patients who developed metastases and the samples from the patients who were disease free within 5 years. The cellular pathways associated with the various DNA repair mechanisms and the cell-cycle checkpoint controls are found to be activated in both sample groups. One of the most intriguing findings is that the pathways associated with different cellular processes are functionally coordinated through BRCA1 in the disease-free sample group, whereas such functional coordination is absent in the samples from patients who developed metastases. Our analysis revealed the potential cellular pathways that regulate the signaling events in breast cancer.  相似文献   

14.
Microarrays can provide genome-wide expression patterns for various cancers, especially for tumor sub-types that may exhibit substantially different patient prognosis. Using such gene expression data, several approaches have been proposed to classify tumor sub-types accurately. These classification methods are not robust, and often dependent on a particular training sample for modelling, which raises issues in utilizing these methods to administer proper treatment for a future patient. We propose to construct an optimal, robust prediction model for classifying cancer sub-types using gene expression data. Our model is constructed in a step-wise fashion implementing cross-validated quadratic discriminant analysis. At each step, all identified models are validated by an independent sample of patients to develop a robust model for future data. We apply the proposed methods to two microarray data sets of cancer: the acute leukemia data by Golub et al. and the colon cancer data by Alon et al. We have found that the dimensionality of our optimal prediction models is relatively small for these cases and that our prediction models with one or two gene factors outperforms or has competing performance, especially for independent samples, to other methods based on 50 or more predictive gene factors. The methodology is implemented and developed by the procedures in R and Splus. The source code can be obtained at http://hesweb1.med.virginia.edu/bioinformatics.  相似文献   

15.
MOTIVATION: The classification of samples using gene expression profiles is an important application in areas such as cancer research and environmental health studies. However, the classification is usually based on a small number of samples, and each sample is a long vector of thousands of gene expression levels. An important issue in parametric modeling for so many gene expression levels is the control of the number of nuisance parameters in the model. Large models often lead to intensive or even intractable computation, while small models may be inadequate for complex data.Methodology: We propose a two-step empirical Bayes classification method as a solution to this issue. At the first step, we use the model-based cluster algorithm with a non-traditional purpose of assigning gene expression levels to form abundance groups. At the second step, by assuming the same variance for all the genes in the same group, we substantially reduce the number of nuisance parameters in our statistical model. RESULTS: The proposed model is more parsimonious, which leads to efficient computation under an empirical Bayes estimation procedure. We consider two real examples and simulate data using our method. Desired low classification error rates are obtained even when a large number of genes are pre-selected for class prediction.  相似文献   

16.
Allele-specific gene expression, ASE, is an important aspect of gene regulation. We developed a novel method MBASED, meta-analysis based allele-specific expression detection for ASE detection using RNA-seq data that aggregates information across multiple single nucleotide variation loci to obtain a gene-level measure of ASE, even when prior phasing information is unavailable. MBASED is capable of one-sample and two-sample analyses and performs well in simulations. We applied MBASED to a panel of cancer cell lines and paired tumor-normal tissue samples, and observed extensive ASE in cancer, but not normal, samples, mainly driven by genomic copy number alterations.  相似文献   

17.
Kudo Y  Okada Y 《Bioinformation》2011,6(5):200-203
We apply a combined method of heuristic attribute reduction and evaluation of relative reducts in rough set theory to gene expression data analysis. Our method extracts as many relative reducts as possible from the gene-expression data and selects the best relative reduct from the viewpoint of constructing useful decision rules. Using a breast cancer dataset and a leukemia dataset, we evaluated the classification accuracy for the test samples and biological meanings of the rules. As a result, our method presented superior classification accuracy comparable to existing salient classifiers. Moreover, our method extracted interesting rules including a novel biomarker gene identified in recent studies. These results indicate the possibility that our method can serve as a useful tool for gene expression data analysis.  相似文献   

18.
19.
Global gene expression profiles of thousands of cancer samples have been completed, giving rise to hundreds of gene expression signatures (GES). Although many expression signatures show promise in predicting patient prognosis or response to therapies, the usefulness of the signatures in understanding the underlying mechanisms of cancer has not been fully exploited. While “reverse genomic” methods can test specific hypotheses of gene regulation, they fare less well in deciphering novel or combinatorial mechanisms of gene regulation. Recently we described SLAMS (stepwise linkage analysis of microarray signatures), a novel method that can prospectively identify genetic regulators of gene expression signatures in cancer. Applying SLAMS on a poor-prognosis wound signature in human breast cancer, we identified CSN5-mediated ubiquitination of MYC as a novel mechanism to activate a biological program favoring metastasis.  相似文献   

20.
Gene set analysis methods are popular tools for identifying differentially expressed gene sets in microarray data. Most existing methods use a permutation test to assess significance for each gene set. The permutation test's assumption of exchangeable samples is often not satisfied for time‐series data and complex experimental designs, and in addition it requires a certain number of samples to compute p‐values accurately. The method presented here uses a rotation test rather than a permutation test to assess significance. The rotation test can compute accurate p‐values also for very small sample sizes. The method can handle complex designs and is particularly suited for longitudinal microarray data where the samples may have complex correlation structures. Dependencies between genes, modeled with the use of gene networks, are incorporated in the estimation of correlations between samples. In addition, the method can test for both gene sets that are differentially expressed and gene sets that show strong time trends. We show on simulated longitudinal data that the ability to identify important gene sets may be improved by taking the correlation structure between samples into account. Applied to real data, the method identifies both gene sets with constant expression and gene sets with strong time trends.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号