首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 29 毫秒
1.

Background

Numerous gene lists or "classifiers" have been derived from global gene expression data that assign breast cancers to good and poor prognosis groups. A remarkable feature of these molecular signatures is that they have few genes in common, prompting speculation that they may use distinct genes to measure the same pathophysiological process(es), such as proliferation. However, this supposition has not been rigorously tested. If gene-based classifiers function by measuring a minimal number of cellular processes, we hypothesized that the informative genes for these processes could be identified and the data sets could be adjusted for the predictive contributions of those genes. Such adjustment would then attenuate the predictive function of any signature measuring that same process.

Results

We tested this hypothesis directly using a novel iterative-subtractive approach. We evaluated five gene expression data sets that sample a broad range of breast cancer subtypes. In all data sets, the dominant cluster capable of predicting metastasis was heavily populated by genes that fluctuate in concert with the cell cycle. When six well-characterized classifiers were examined, all contained a higher than expected proportion of genes that correlate with this cluster. Furthermore, when the data sets were globally adjusted for the cell cycle cluster, each classifier lost its ability to assign tumors to appropriate high and low risk groups. In contrast, adjusting for other predictive gene clusters did not impact their performance.

Conclusion

These data indicate that the discriminative ability of breast cancer classifiers is dependent upon genes that correlate with cell cycle progression.  相似文献   

2.

Background

Gene set analysis based on Gene Ontology (GO) can be a promising method for the analysis of differential expression patterns. However, current studies that focus on individual GO terms have limited analytical power, because the complex structure of GO introduces strong dependencies among the terms, and some genes that are annotated to a GO term cannot be found by statistically significant enrichment.

Results

We proposed a method for enriching clustered GO terms based on semantic similarity, namely cluster enrichment analysis based on GO (CeaGO), to extend the individual term analysis method. Using an Affymetrix HGU95aV2 chip dataset with simulated gene sets, we illustrated that CeaGO was sensitive enough to detect moderate expression changes. When compared to parent-based individual term analysis methods, the results showed that CeaGO may provide more accurate differentiation of gene expression results. When used with two acute leukemia (ALL and ALL/AML) microarray expression datasets, CeaGO correctly identified specifically enriched GO groups that were overlooked by other individual test methods.

Conclusion

By applying CeaGO to both simulated and real microarray data, we showed that this approach could enhance the interpretation of microarray experiments. CeaGO is currently available at http://chgc.sh.cn/en/software/CeaGO/.  相似文献   

3.

Background

Acute lymphoblastic leukemia (ALL) is a common form of cancer in children. Currently, bone marrow biopsy is used for diagnosis. Noninvasive biomarkers for the early diagnosis of pediatric ALL are urgently needed. The aim of this study was to discover potential protein biomarkers for pediatric ALL.

Methods

Ninety-four pediatric ALL patients and 84 controls were randomly divided into a "training" set (45 ALL patients, 34 healthy controls) and a test set (49 ALL patients, 30 healthy controls and 30 pediatric acute myeloid leukemia (AML) patients). Serum proteomic profiles were measured using surface-enhanced laser desorption/ionization-time-of-flight mass spectroscopy (SELDI-TOF-MS). A classification model was established by Biomarker Pattern Software (BPS). Candidate protein biomarkers were purified by HPLC, identified by LC-MS/MS and validated using ProteinChip immunoassays.

Results

A total of 7 protein peaks (9290 m/z, 7769 m/z, 15110 m/z, 7564 m/z, 4469 m/z, 8937 m/z, 8137 m/z) were found with differential expression levels in the sera of pediatric ALL patients and controls using SELDI-TOF-MS and then analyzed by BPS to construct a classification model in the "training" set. The sensitivity and specificity of the model were found to be 91.8%, and 90.0%, respectively, in the test set. Two candidate protein peaks (7769 and 9290 m/z) were found to be down-regulated in ALL patients, where these were identified as platelet factor 4 (PF4) and pro-platelet basic protein precursor (PBP). Two other candidate protein peaks (8137 and 8937 m/z) were found up-regulated in the sera of ALL patients, and these were identified as fragments of the complement component 3a (C3a).

Conclusion

Platelet factor (PF4), connective tissue activating peptide III (CTAP-III) and two fragments of C3a may be potential protein biomarkers of pediatric ALL and used to distinguish pediatric ALL patients from healthy controls and pediatric AML patients. Further studies with additional populations or using pre-diagnostic sera are needed to confirm the importance of these findings as diagnostic markers of pediatric ALL.  相似文献   

4.

Background

One of the major goals in gene and protein expression profiling of cancer is to identify biomarkers and build classification models for prediction of disease prognosis or treatment response. Many traditional statistical methods, based on microarray gene expression data alone and individual genes' discriminatory power, often fail to identify biologically meaningful biomarkers thus resulting in poor prediction performance across data sets. Nonetheless, the variables in multivariable classifiers should synergistically interact to produce more effective classifiers than individual biomarkers.

Results

We developed an integrated approach, namely network-constrained support vector machine (netSVM), for cancer biomarker identification with an improved prediction performance. The netSVM approach is specifically designed for network biomarker identification by integrating gene expression data and protein-protein interaction data. We first evaluated the effectiveness of netSVM using simulation studies, demonstrating its improved performance over state-of-the-art network-based methods and gene-based methods for network biomarker identification. We then applied the netSVM approach to two breast cancer data sets to identify prognostic signatures for prediction of breast cancer metastasis. The experimental results show that: (1) network biomarkers identified by netSVM are highly enriched in biological pathways associated with cancer progression; (2) prediction performance is much improved when tested across different data sets. Specifically, many genes related to apoptosis, cell cycle, and cell proliferation, which are hallmark signatures of breast cancer metastasis, were identified by the netSVM approach. More importantly, several novel hub genes, biologically important with many interactions in PPI network but often showing little change in expression as compared with their downstream genes, were also identified as network biomarkers; the genes were enriched in signaling pathways such as TGF-beta signaling pathway, MAPK signaling pathway, and JAK-STAT signaling pathway. These signaling pathways may provide new insight to the underlying mechanism of breast cancer metastasis.

Conclusions

We have developed a network-based approach for cancer biomarker identification, netSVM, resulting in an improved prediction performance with network biomarkers. We have applied the netSVM approach to breast cancer gene expression data to predict metastasis in patients. Network biomarkers identified by netSVM reveal potential signaling pathways associated with breast cancer metastasis, and help improve the prediction performance across independent data sets.  相似文献   

5.
6.

Background

Signaling studies in cell lines are hampered by non-physiological alterations obtained in vitro. Physiologic primary tumor cells from patients with leukemia require passaging through immune-compromised mice for amplification. The aim was to enable molecular work in patients' ALL cells by establishing siRNA transfection into cells amplified in mice.

Results

We established delivering siRNA into these cells without affecting cell viability. Knockdown of single or multiple genes reduced constitutive or induced protein expression accompanied by marked signaling alterations.

Conclusion

Our novel technique allows using patient-derived tumor cells instead of cell lines for signaling studies in leukemia.  相似文献   

7.

Background

A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.

Results

We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data.

Conclusion

We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.  相似文献   

8.
Yan X  Zheng T 《BMC genomics》2008,9(Z2):S14

Background

Gene expression data extracted from microarray experiments have been used to study the difference between mRNA abundance of genes under different conditions. In one of such experiments, thousands of genes are measured simultaneously, which provides a high-dimensional feature space for discriminating between different sample classes. However, most of these dimensions are not informative about the between-class difference, and add noises to the discriminant analysis.

Results

In this paper we propose and study feature selection methods that evaluate the "informativeness" of a set of genes. Two measures of information based on multigene expression profiles are considered for a backward information-driven screening approach for selecting important gene features. By considering multigene expression profiles, we are able to utilize interaction information among these genes. Using a breast cancer data, we illustrate our methods and compare them to the performance of existing methods.

Conclusion

We illustrate in this paper that methods considering gene-gene interactions have better classification power in gene expression analysis. In our results, we identify important genes with relative large p-values from single gene tests. This indicates that these are genes with weak marginal information but strong interaction information, which will be overlooked by strategies that only examine individual genes.
  相似文献   

9.

Background

Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes.

Results

This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations.

Conclusions

A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation.  相似文献   

10.

Background

The production of cell-based cancer vaccines by gene vectors encoding proteins that stimulate the immune system has advanced rapidly in model systems. We sought to develop non-viral transfection methods that could transform patient tumor cells into cancer vaccines, paving the way for rapid production of autologous cell-based vaccines.

Methods

As the extended culture and expansion of most patient tumor cells is not possible, we sought to first evaluate a new technology that combines electroporation and chemical transfection in order to determine if plasmid-based gene vectors could be instantaneously delivered to the nucleus, and to determine if gene expression was possible in a cell-cycle independent manner. We tested cultured cell lines, a primary murine tumor, and primary human leukemia cells from diagnostic work-up for transgene expression, using both RFP and CD137L expression vectors.

Results

Combined electroporation-transfection directly delivered plasmid DNA to the nucleus of transfected cells, as demonstrated by confocal microscopy and real-time PCR analysis of isolated nuclei. Expression of protein from plasmid vectors could be detected as early as two hours post transfection. However, the kinetics of gene expression from plasmid-based vectors in tumor cell lines indicated that optimal gene expression was still dependent on cell division. We then tested to see if pediatric acute lymphocytic leukemia (ALL) would also display the rapid gene expression kinetics of tumor cells lines, determining gene expression 24 hours after transfection. Six of 12 specimens showed greater than 17% transgene expression, and all samples showed at least some transgene expression.

Conclusion

Given that transgene expression could be detected in a majority of primary tumor samples analyzed within hours, direct electroporation-based transfection of primary leukemia holds the potential to generate patient-specific cancer vaccines. Plasmid-based gene therapy represents a simple means to generate cell-based cancer vaccines and does not require the extensive infrastructure of a virus-based vector system.  相似文献   

11.
Cluster-Rasch models for microarray gene expression data   总被引:1,自引:0,他引:1  
Li H  Hong F 《Genome biology》2001,2(8):research0031.1-research003113

Background

We propose two different formulations of the Rasch statistical models to the problem of relating gene expression profiles to the phenotypes. One formulation allows us to investigate whether a cluster of genes with similar expression profiles is related to the observed phenotypes; this model can also be used for future prediction. The other formulation provides an alternative way of identifying genes that are over- or underexpressed from their expression levels in tissue or cell samples of a given tissue or cell type.

Results

We illustrate the methods on available datasets of a classification of acute leukemias and of 60 cancer cell lines. For tumor classification, the results are comparable to those previously obtained. For the cancer cell lines dataset, we found four clusters of genes that are related to drug response for many of the 90 drugs that we considered. In addition, for each type of cell line, we identified genes that are over- or underexpressed relative to other genes.

Conclusions

The cluster-Rasch model provides a probabilistic model for describing gene expression patterns across samples and can be used to relate gene expression profiles to phenotypes.  相似文献   

12.

Background

The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set.

Results

In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method P athway A nalysis with D own-weighting of O verlapping G enes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results.

Conclusions

PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org.  相似文献   

13.
14.

Background

An important use of data obtained from microarray measurements is the classification of tumor types with respect to genes that are either up or down regulated in specific cancer types. A number of algorithms have been proposed to obtain such classifications. These algorithms usually require parameter optimization to obtain accurate results depending on the type of data. Additionally, it is highly critical to find an optimal set of markers among those up or down regulated genes that can be clinically utilized to build assays for the diagnosis or to follow progression of specific cancer types. In this paper, we employ a mixed integer programming based classification algorithm named hyper-box enclosure method (HBE) for the classification of some cancer types with a minimal set of predictor genes. This optimization based method which is a user friendly and efficient classifier may allow the clinicians to diagnose and follow progression of certain cancer types.

Methodology/Principal Findings

We apply HBE algorithm to some well known data sets such as leukemia, prostate cancer, diffuse large B-cell lymphoma (DLBCL), small round blue cell tumors (SRBCT) to find some predictor genes that can be utilized for diagnosis and prognosis in a robust manner with a high accuracy. Our approach does not require any modification or parameter optimization for each data set. Additionally, information gain attribute evaluator, relief attribute evaluator and correlation-based feature selection methods are employed for the gene selection. The results are compared with those from other studies and biological roles of selected genes in corresponding cancer type are described.

Conclusions/Significance

The performance of our algorithm overall was better than the other algorithms reported in the literature and classifiers found in WEKA data-mining package. Since it does not require a parameter optimization and it performs consistently very high prediction rate on different type of data sets, HBE method is an effective and consistent tool for cancer type prediction with a small number of gene markers.  相似文献   

15.

Background

In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data.

Results

We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed.

Conclusions

The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required.  相似文献   

16.

Background

Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of great interest and challenge to merge data sets from multiple studies to increase the sample size, which may in turn increase the power of statistical inferences. In this study, we combined two lung cancer studies using micorarray GeneChip®, employed two gene shaving methods and a two-step survival test to identify genes with expression patterns that can distinguish diseased from normal samples, and to indicate patient survival, respectively.

Results

In addition to common data transformation and normalization procedures, we applied a distribution transformation method to integrate the two data sets. Gene shaving (GS) methods based on Random Forests (RF) and Fisher's Linear Discrimination (FLD) were then applied separately to the joint data set for cancer gene selection. The two methods discovered 13 and 10 marker genes (5 in common), respectively, with expression patterns differentiating diseased from normal samples. Among these marker genes, 8 and 7 were found to be cancer-related in other published reports. Furthermore, based on these marker genes, the classifiers we built from one data set predicted the other data set with more than 98% accuracy. Using the univariate Cox proportional hazard regression model, the expression patterns of 36 genes were found to be significantly correlated with patient survival (p < 0.05). Twenty-six of these 36 genes were reported as survival-related genes from the literature, including 7 known tumor-suppressor genes and 9 oncogenes. Additional principal component regression analysis further reduced the gene list from 36 to 16.

Conclusion

This study provided a valuable method of integrating microarray data sets with different origins, and new methods of selecting a minimum number of marker genes to aid in cancer diagnosis. After careful data integration, the classification method developed from one data set can be applied to the other with high prediction accuracy.
  相似文献   

17.
18.

Background

Previous research suggested that single gene expression might be correlated with acute myeloid leukemia (AML) survival. Therefore, we conducted a systematical analysis for AML prognostic gene expressions.

Methods

We performed a microarray-based analysis for correlations between gene expression and adult AML overall survival (OS) using datasets GSE12417 and GSE8970. Positive findings were validated in an independent cohort of 50 newly diagnosed, non-acute promyelocytic leukemia (APL) AML patients by quantitative RT-PCR and survival analysis.

Results

Microarray-based analysis suggested that expression of eight genes was each associated with 1-year and 3-year AML OS in both GSE12417 and GSE8970 datasets (p?<?0.05). Next, we validated our findings in an independent cohort of AML samples collected in our hospital. We found that ubiquitin-conjugating enzyme E2E1 (UBE2E1) expression was adversely correlated with AML survival (p?=?0.04). Multivariable analysis showed that UBE2E1 high patients had a significant shorter OS and shorter progression-free survival after adjusting other known prognostic factors (p?=?0.03). At last, we found that UBE2E1 expression was negatively correlated with patients’ response to induction chemotherapy (p?<?0.05).

Conclusions

In summary, we demonstrated that UBE2E1 expression was a novel prognostic factor in adult, non-APL AML patients.
  相似文献   

19.
20.

Introduction

The development of Philadelphia chromosome (Ph) negative acute leukemia/myelodysplastic syndrome (MDS) in patients with Ph-positive chronic myeloid leukemia (CML) is very rare. The features of restrictive usage and absence of partial T cell clones have been found in patients with CML. However, the T-cell clonal evolution of Ph-negative malignancies during treatment for CML is still unknown.

Objective

To investigate the dynamic change of clonal proliferation of T cell receptor (TCR) Vα and Vβ subfamilies in one CML patient who developed Ph-negative acute lymphoblastic leukemia (ALL) after interferon and imatinib therapy.

Methods

The peripheral blood mononuclear cells (PBMC) samples were collected at the 3 time points (diagnosis of Ph-positive chronic phase (CP) CML, developing Ph-negative ALL and post inductive chemotherapy (CT) for Ph-negative ALL, respectively). The CDR3 size of TCR Vα and Vβ repertoire were detected by RT-PCR. The PCR products were further analyzed by genescan to identify T cell clonality.

Results

The CML patient who achieved complete cytogenetic remission (CCR) after 5 years of IFN-α therapy suddenly developed Ph-negative ALL 6 months following switch to imatinib therapy. The expression pattern and clonality of TCR Vα/Vβ T cells changed in different disease stages. The restrictive expression of Vα/Vβ subfamilies could be found in all three stages, and partial subfamily of T cells showed clonal proliferation. Additionally, there have been obvious differences in Vα/Vβ subfamily of T cells between the stages of Ph-positive CML-CP and Ph-negative ALL. The Vα10 and Vβ3 T cells evolved from oligoclonality to polyclonality, the Vβ13 T cells changed from bioclonality to polyclonality, when Ph-negative ALL developed.

Conclusions

Restrictive usage and clonal proliferation of different Vα/Vβ subfamily T cells between the stages of Ph-positive CP and Ph-negative ALL were detected in one patient. These changes may play a role in Ph- negative leukemogenesis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号