首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Considering the recent advances in and the benefits of DNA microarray technologies, many gene filtering approaches have been employed for the diagnosis and prognosis of diseases. In our previous study, we developed a new filtering method, namely, the projective adaptive resonance theory (PART) filtering method. This method was effective in subclass discrimination. In the PART algorithm, the genes with a low variance in gene expression in either class, not both classes, were selected as important genes for modeling. Based on this concept, we developed novel simple filtering methods such as modified signal-to-noise (S2N') in the present study. The discrimination model constructed using these methods showed higher accuracy with higher reproducibility as compared with many conventional filtering methods, including the t-test, S2N, NSC and SAM. The reproducibility of prediction was evaluated based on the correlation between the sets of U-test p-values on randomly divided datasets. With respect to leukemia, lymphoma and breast cancer, the correlation was high; a difference of >0.13 was obtained by the constructed model by using <50 genes selected by S2N'. Improvement was higher in the smaller genes and such higher correlation was observed when t-test, NSC and SAM were used. These results suggest that these modified methods, such as S2N', have high potential to function as new methods for marker gene selection in cancer diagnosis using DNA microarray data. AVAILABILITY: Software is available upon request.  相似文献   

2.
Discrimination of disease patients based on gene expression data is a crucial problem in clinical area. An important issue to solve this problem is to find a discriminative subset of genes from thousands of genes on a microarray or DNA chip. Aiming at finding informative genes for disease classification on microarray, we present a gene selection method based on the forward variable (gene) selection method (FSM) and show, using typical public microarray datasets, that our method can extract a small set of genes being crucial for discriminating different classes with a very high accuracy almost closed to perfect classification.  相似文献   

3.
MOTIVATION: The development of gene expression microarray technology has allowed the identification of differentially expressed genes between different clinical phenotypic classes of cancer from a large pool of candidate genes. Although many class comparisons concerned only a single phenotype, simultaneous assessment of the relationship between gene expression and multiple phenotypes would be warranted to better understand the underlying biological structure. RESULTS: We develop a method to select genes related to multiple clinical phenotypes based on a set of multivariate linear regression models. For each gene, we perform model selection based on the doubly-adjusted R-square statistic and use the maximum of this statistic for gene selection. The method can substantially improve the power in gene selection, compared with a conventional method that uses a single model exclusively for gene selection. Application to a bladder cancer study to correlate pre-treatment gene expressions with pathological stage and grade is given. The methods would be useful for screening for genes related to multiple clinical phenotypes. AVAILABILITY: SAS and MATLAB codes are available from author upon request.  相似文献   

4.
Paul TK  Iba H 《Bio Systems》2005,82(3):208-225
Recently, DNA microarray-based gene expression profiles have been used to correlate the clinical behavior of cancers with the differential gene expression levels in cancerous and normal tissues. To this end, after selection of some predictive genes based on signal-to-noise (S2N) ratio, unsupervised learning like clustering and supervised learning like k-nearest neighbor (k NN) classifier are widely used. Instead of S2N ratio, adaptive searches like Probabilistic Model Building Genetic Algorithm (PMBGA) can be applied for selection of a smaller size gene subset that would classify patient samples more accurately. In this paper, we propose a new PMBGA-based method for identification of informative genes from microarray data. By applying our proposed method to classification of three microarray data sets of binary and multi-type tumors, we demonstrate that the gene subsets selected with our technique yield better classification accuracy.  相似文献   

5.
Classification methods used in microarray studies for gene expression are diverse in the way they deal with the underlying complexity of the data, as well as in the technique used to build the classification model. The MAQC II study on cancer classification problems has found that performance was affected by factors such as the classification algorithm, cross validation method, number of genes, and gene selection method. In this paper, we study the hypothesis that the disease under study significantly determines which method is optimal, and that additionally sample size, class imbalance, type of medical question (diagnostic, prognostic or treatment response), and microarray platform are potentially influential. A systematic literature review was used to extract the information from 48 published articles on non-cancer microarray classification studies. The impact of the various factors on the reported classification accuracy was analyzed through random-intercept logistic regression. The type of medical question and method of cross validation dominated the explained variation in accuracy among studies, followed by disease category and microarray platform. In total, 42% of the between study variation was explained by all the study specific and problem specific factors that we studied together.  相似文献   

6.
MOTIVATION: DNA microarray data analysis has been used previously to identify marker genes which discriminate cancer from normal samples. However, due to the limited sample size of each study, there are few common markers among different studies of the same cancer. With the rapid accumulation of microarray data, it is of great interest to integrate inter-study microarray data to increase sample size, which could lead to the discovery of more reliable markers. RESULTS: We present a novel, simple method of integrating different microarray datasets to identify marker genes and apply the method to prostate cancer datasets. In this study, by applying a new statistical method, referred to as the top-scoring pair (TSP) classifier, we have identified a pair of robust marker genes (HPN and STAT6) by integrating microarray datasets from three different prostate cancer studies. Cross-platform validation shows that the TSP classifier built from the marker gene pair, which simply compares relative expression values, achieves high accuracy, sensitivity and specificity on independent datasets generated using various array platforms. Our findings suggest a new model for the discovery of marker genes from accumulated microarray data and demonstrate how the great wealth of microarray data can be exploited to increase the power of statistical analysis. CONTACT: leixu@jhu.edu.  相似文献   

7.
Genome-wide profiling of gene amplification and deletion in cancer   总被引:3,自引:0,他引:3  
Kashiwagi H  Uchida K 《Human cell》2000,13(3):135-141
Accumulations of genetic changes in somatic cells induce phenotypic transformations leading to cancer. Among these genetic changes, gene amplification and deletion are most frequently observed in several kinds of cancers. Amplification of oncogene and/or deletion of tumor suppressor gene, together with dysfunction of the gene by point mutation, are the main causes of cancer. Genome-wide analysis of amplification and deletion of genes in cancers is basic to resolving the mechanisms of carcinogenesis. Comparative genomic hybridization (CGH) developed in 1992 has been utilized to identify DNA copy number abnormalities in various kind of cancers and several reports have shown its usefulness in screening of the genes involved in carcinogenesis, and also in the identification of prognostic factors in cancer. We have shown that 1q23 gain is associated with neuroblastomas that are resistant to aggressive treatment, and have poor prognosis, and 1q and 13q gains are possibly related to drug resistance in ovarian cancers. Recently, the "rough draft" of the human genome was reported and we are ready to utilize the vast information on genomic sequences in cancer research. Moreover, microarray technology enables us to analyze more than ten thousand genes at a time and revealed genetic abnormalities in cancers at a genome-wide level. By combination of microarray and CGH, a powerful screening method for oncogenes and tumor suppressor genes in cancers, called array-CGH, has been developed by several groups. In this article, we overview these genome-wide analytical methods, CGH and array-CGH, and discuss their potential in molecular characterization of cancers.  相似文献   

8.
MOTIVATION: Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. RESULTS: The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). AVAILABILITY: JAVA software of dynamic SOM tree algorithm is available upon request for academic use. SUPPLEMENTARY INFORMATION: A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf  相似文献   

9.
MOTIVATION: The DNA microarray technology has been increasingly used in cancer research. In the literature, discovery of putative classes and classification to known classes based on gene expression data have been largely treated as separate problems. This paper offers a unified approach to class discovery and classification, which we believe is more appropriate, and has greater applicability, in practical situations. RESULTS: We model the gene expression profile of a tumor sample as from a finite mixture distribution, with each component characterizing the gene expression levels in a class. The proposed method was applied to a leukemia dataset, and good results are obtained. With appropriate choices of genes and preprocessing method, the number of leukemia types and subtypes is correctly inferred, and all the tumor samples are correctly classified into their respective type/subtype. Further evaluation of the method was carried out on other variants of the leukemia data and a colon dataset.  相似文献   

10.
MOTIVATION: Selecting a small number of relevant genes for accurate classification of samples is essential for the development of diagnostic tests. We present the Bayesian model averaging (BMA) method for gene selection and classification of microarray data. Typical gene selection and classification procedures ignore model uncertainty and use a single set of relevant genes (model) to predict the class. BMA accounts for the uncertainty about the best set to choose by averaging over multiple models (sets of potentially overlapping relevant genes). RESULTS: We have shown that BMA selects smaller numbers of relevant genes (compared with other methods) and achieves a high prediction accuracy on three microarray datasets. Our BMA algorithm is applicable to microarray datasets with any number of classes, and outputs posterior probabilities for the selected genes and models. Our selected models typically consist of only a few genes. The combination of high accuracy, small numbers of genes and posterior probabilities for the predictions should make BMA a powerful tool for developing diagnostics from expression data. AVAILABILITY: The source codes and datasets used are available from our Supplementary website.  相似文献   

11.
12.
Pathologic and clinical heterogeneity of breast cancer reflects the poorly documented, complex, and combinatory molecular basis of the disease and is in part responsible for therapeutic failures. The DNA microarray technique allows the analysis of RNA expression of several thousands of genes simultaneously in a sample. There are multiple potential applications of the technique in cancer research. A number of recent studies have shown the promising role of gene expression profiling in breast cancer by identifying new prognostic subclasses unidentifiable by conventional parameters and new prognostic and/or predictive gene signatures, whose predictive impact is superior to conventional histoclinical prognostic factors. In this review we describe current use of DNA microarrays in the prognosis of breast cancer. We also discuss issues that need to be addressed in the near future to allow the method to reach its full potential.  相似文献   

13.
14.
MOTIVATION: The analysis of gene expression data in its chromosomal context has been a recent development in cancer research. However, currently available methods fail to account for variation in the distance between genes, gene density and genomic features (e.g. GC content) in identifying increased or decreased chromosomal regions of gene expression. RESULTS: We have developed a model-based scan statistic that accounts for these aspects of the complex landscape of the human genome in the identification of extreme chromosomal regions of gene expression. This method may be applied to gene expression data regardless of the microarray platform used to generate it. To demonstrate the accuracy and utility of this method, we applied it to a breast cancer gene expression dataset and tested its ability to predict regions containing medium-to-high level DNA amplification (DNA ratio values >2). A classifier was developed from the scan statistic results that had a 10-fold cross-validated classification rate of 93% and a positive predictive value of 88%. This result strongly suggests that the model-based scan statistic and the expression characteristics of an increased chromosomal region of gene expression can be used to accurately predict chromosomal regions containing amplified genes. AVAILABILITY: Functions in the R-language are available from the author upon request. CONTACT: fcouples@umich.edu.  相似文献   

15.
During industrial production process using yeast, cells are exposed to the stress due to the accumulation of ethanol, which affects the cell growth activity and productivity of target products, thus, the ethanol stress-tolerant yeast strains are highly desired. To identify the target gene(s) for constructing ethanol stress tolerant yeast strains, we obtained the gene expression profiles of two strains of Saccharomyces cerevisiae, namely, a laboratory strain and a strain used for brewing Japanese rice wine (sake), in the presence of 5% (v/v) ethanol, using DNA microarray. For the selection of target genes for breeding ethanol stress tolerant strains, clustering of DNA microarray data was performed. For further selection, the ethanol sensitivity of the knockout mutants in each of which the gene selected by DNA microarray analysis is deleted, was also investigated. The integration of the DNA microarray data and the ethanol sensitivity data of knockout strains suggests that the enhancement of expression of genes related to tryptophan biosynthesis might confer the ethanol stress tolerance to yeast cells. Indeed, the strains overexpressing tryptophan biosynthesis genes showed a stress tolerance to 5% ethanol. Moreover, the addition of tryptophan to the culture medium and overexpression of tryptophan permease gene conferred ethanol stress tolerance to yeast cells. These results indicate that overexpression of the genes for trypophan biosynthesis increases the ethanol stress tolerance. Tryptophan supplementation to culture and overexpression of the tryptophan permease gene are also effective for the increase in ethanol stress tolerance. Our methodology for the selection of target genes for constructing ethanol stress tolerant strains, based on the data of DNA microarray analysis and phenotypes of knockout mutants, was validated.  相似文献   

16.
Mutations in the tumor suppressor gene TP53 are associated with a wide range of different cancers and may have prognostic and therapeutic implications. Methods for rapid and sensitive detection of mutations in this gene are therefore required. In order to make screening more effective, a commercially available TP53 genotyping microarray from Asper Biotech has been constructed by arrayed primer extension (APEX). The present study is the first report that blindly evaluates the efficiency of the second generation APEX TP53 genotype chip outside the Asper laboratory and compares it to temporal temperature gradient electrophoresis (TTGE) and sequencing of TP53 for mutation detection in ovarian and breast cancer samples. All nucleotides in the TP53 gene from exon 2-9 are included on the chip by synthesis and application of sequence-specific oligonucleotides. The chip was validated by screening 48 breast and 11 ovarian cancer cases, all of which had previously been analyzed by TTGE and sequencing. APEX scored 17 of 20 sequence variants, missing one deletion, one insertion, and a missense mutation. Resequencing efficiency using APEX was 92% for both DNA strands and 99.5% for sense and/or antisense strand. We conclude that the APEX TP53 microarray is a robust, rapid, and comprehensive screening tool for sequence alterations in tumors.  相似文献   

17.
MOTIVATION: Two important questions for the analysis of gene expression measurements from different sample classes are (1) how to classify samples and (2) how to identify meaningful gene signatures (ranked gene lists) exhibiting the differences between classes and sample subsets. Solutions to both questions have immediate biological and biomedical applications. To achieve optimal classification performance, a suitable combination of classifier and gene selection method needs to be specifically selected for a given dataset. The selected gene signatures can be unstable and the resulting classification accuracy unreliable, particularly when considering different subsets of samples. Both unstable gene signatures and overestimated classification accuracy can impair biological conclusions. METHODS: We address these two issues by repeatedly evaluating the classification performance of all models, i.e. pairwise combinations of various gene selection and classification methods, for random subsets of arrays (sampling). A model score is used to select the most appropriate model for the given dataset. Consensus gene signatures are constructed by extracting those genes frequently selected over many samplings. Sampling additionally permits measurement of the stability of the classification performance for each model, which serves as a measure of model reliability. RESULTS: We analyzed a large gene expression dataset with 78 measurements of four different cartilage sample classes. Classifiers trained on subsets of measurements frequently produce models with highly variable performance. Our approach provides reliable classification performance estimates via sampling. In addition to reliable classification performance, we determined stable consensus signatures (i.e. gene lists) for sample classes. Manual literature screening showed that these genes are highly relevant to our gene expression experiment with osteoarthritic cartilage. We compared our approach to others based on a publicly available dataset on breast cancer. AVAILABILITY: R package at http://www.bio.ifi.lmu.de/~davis/edaprakt  相似文献   

18.
MOTIVATION: High-density DNA microarray measures the activities of several thousand genes simultaneously and the gene expression profiles have been used for the cancer classification recently. This new approach promises to give better therapeutic measurements to cancer patients by diagnosing cancer types with improved accuracy. The Support Vector Machine (SVM) is one of the classification methods successfully applied to the cancer diagnosis problems. However, its optimal extension to more than two classes was not obvious, which might impose limitations in its application to multiple tumor types. We briefly introduce the Multicategory SVM, which is a recently proposed extension of the binary SVM, and apply it to multiclass cancer diagnosis problems. RESULTS: Its applicability is demonstrated on the leukemia data (Golub et al., 1999) and the small round blue cell tumors of childhood data (Khan et al., 2001). Comparable classification accuracy shown in the applications and its flexibility render the MSVM a viable alternative to other classification methods. SUPPLEMENTARY INFORMATION: http://www.stat.ohio-state.edu/~yklee/msvm.htm  相似文献   

19.
High-throughput method for detecting DNA methylation   总被引:4,自引:0,他引:4  
Aberrant DNA methylation of CpG site is among the earliest and most frequent alterations in cancer. Detection of promoter hypermethylation of cancer-related gene may be useful for cancer diagnosis or the detection of recurrence. However, most of the studies have focused on a single gene only and gave little information about the concurrent methylation status of multiple genes. In this study, we attempted to develop a microarray method coupled with linker-PCR for detecting methylation status of multiple genes in the tumor tissue. A series of synthesized oligonucleotides were synthesised and purified to completely match with 16 investigated targets. Then they were immobilized on the aldehyde-coated glass slide to fabricate a DNA microarray for detecting methylation status of these genes. The results indicated that these genes were all methylated in the positive control. However, no methylated was found in these genes for the negative control. Only p16 and p15 genes were methylated in investigated genes for the gastric tumor tissue, whereas others were not methylated. The above results were validated by bisulfite DNA sequencing. Our experiments successfully demonstrated that the DNA microarray could be applied as a high-throughput tool to determine methylation status of the investigated genes.  相似文献   

20.
In this research, we developed a multiplex polymerase chain reaction (multiplex-PCR) coupled with a DNA microarray system simultaneously aiming at many targets in a consecutive reaction to detect a genetically modified organism (GMO). There are a total of 20 probes for detecting a GMO in a DNA microarray which can be classified into three categories according to their purpose: the first for screening GMO from un-transgenic plants based on the common elements such as promoter, reporter and terminator genes; the second for specific gene confirmation based on the target gene sequences such as herbicide-resistance or insect-resistance genes; the third for species-specific genes which the sequences are unique for different plant species. To ensure the reliability of this method, different kinds of positive and negative controls were used in DNA microarray. Commercial GM soybean, maize, rapeseed and cotton were identified by means of this method and further confirmed by PCR analysis and sequencing. The results indicate that this method discriminates between the GMOs very quickly and in a cost-saving and more time efficient way. It can detect more than 95% of currently commercial GMO plants and the limits of detection are 0.5% for soybean and 1% for maize. This method is proved to be a new method for routine analysis of GMOs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号