首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. RESULTS: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set. AVAILABILITY: Available on request from the authors.  相似文献   

2.
DNA甲基化与脂肪组织生长发育   总被引:1,自引:0,他引:1  
DNA甲基化作为一种重要的表观遗传学修饰方式,在维持正常细胞功能、遗传印记、胚胎发育以及人类肿瘤发生中起着重要作用。DNA甲基化最重要的作用是调控基因表达,它是细胞调控基因表达的重要表观遗传机制之一。近年来的研究发现,DNA甲基化在脂肪组织生长发育以及肥胖症发生过程中发挥着重要作用。DNA甲基化通过调控脂肪细胞分化转录因子、转录辅助因子以及其他脂肪代谢相关基因的表达,从而调控脂肪组织的生长发育。该文综述了脂肪组织生长发育过程中DNA甲基化的最新研究进展,探讨了脂肪组织DNA甲基化的研究趋势和未来发展方向。  相似文献   

3.
Most of the conventional feature selection algorithms have a drawback whereby a weakly ranked gene that could perform well in terms of classification accuracy with an appropriate subset of genes will be left out of the selection. Considering this shortcoming, we propose a feature selection algorithm in gene expression data analysis of sample classifications. The proposed algorithm first divides genes into subsets, the sizes of which are relatively small (roughly of size h), then selects informative smaller subsets of genes (of size r < h) from a subset and merges the chosen genes with another gene subset (of size r) to update the gene subset. We repeat this process until all subsets are merged into one informative subset. We illustrate the effectiveness of the proposed algorithm by analyzing three distinct gene expression data sets. Our method shows promising classification accuracy for all the test data sets. We also show the relevance of the selected genes in terms of their biological functions.  相似文献   

4.
DNA microarray technology, originally developed to measure the level of gene expression, has become one of the most widely used tools in genomic study. The crux of microarray design lies in how to select a unique probe that distinguishes a given genomic sequence from other sequences. Due to its significance, probe selection attracts a lot of attention. Various probe selection algorithms have been developed in recent years. Good probe selection algorithms should produce a small number of candidate probes. Efficiency is also crucial because the data involved are usually huge. Most existing algorithms are usually not sufficiently selective and quite a large number of probes are returned. We propose a new direction to tackle the problem and give an efficient algorithm based on randomization to select a small set of probes and demonstrate that such a small set of probes is sufficient to distinguish each sequence from all the other sequences. Based on the algorithm, we have developed probe selection software RandPS, which runs efficiently in practice. The software is available on our website (http://www.csc.liv.ac.uk/ approximately cindy/RandPS/RandPS.htm). We test our algorithm via experiments on different genomes (Escherichia coli, Saccharamyces cerevisiae, etc.) and our algorithm is able to output unique probes for most of the genes efficiently. The other genes can be identified by a combination of at most two probes.  相似文献   

5.
6.
Because of high dimensionality, machine learning algorithms typically rely on feature selection techniques in order to perform effective classification in microarray gene expression data sets. However, the large number of features compared to the number of samples makes the task of feature selection computationally hard and prone to errors. This paper interprets feature selection as a task of stochastic optimization, where the goal is to select among an exponential number of alternative gene subsets the one expected to return the highest generalization in classification. Blocking is an experimental design strategy which produces similar experimental conditions to compare alternative stochastic configurations in order to be confident that observed differences in accuracy are due to actual differences rather than to fluctuations and noise effects. We propose an original blocking strategy for improving feature selection which aggregates in a paired way the validation outcomes of several learning algorithms to assess a gene subset and compare it to others. This is a novelty with respect to conventional wrappers, which commonly adopt a sole learning algorithm to evaluate the relevance of a given set of variables. The rationale of the approach is that, by increasing the amount of experimental conditions under which we validate a feature subset, we can lessen the problems related to the scarcity of samples and consequently come up with a better selection. The paper shows that the blocking strategy significantly improves the performance of a conventional forward selection for a set of 16 publicly available cancer expression data sets. The experiments involve six different classifiers and show that improvements take place independent of the classification algorithm used after the selection step. Two further validations based on available biological annotation support the claim that blocking strategies in feature selection may improve the accuracy and the quality of the solution. The first validation is based on retrieving PubMEd abstracts associated to the selected genes and matching them to regular expressions describing the biological phenomenon underlying the expression data sets. The biological validation that follows is based on the use of the Bioconductor package GoStats in order to perform Gene Ontology statistical analysis.  相似文献   

7.
It has been shown that gene body DNA methylation is associated with gene expression. However, whether and how deviation of gene body DNA methylation between duplicate genes can influence their divergence remains largely unexplored. Here, we aim to elucidate the potential role of gene body DNA methylation in the fate of duplicate genes. We identified paralogous gene pairs from Arabidopsis and rice (Oryza sativa ssp. japonica) genomes and reprocessed their single-base resolution methylome data. We show that methylation in paralogous genes nonlinearly correlates with several gene properties including exon number/gene length, expression level and mutation rate. Further, we demonstrated that divergence of methylation level and pattern in paralogs indeed positively correlate with their sequence and expression divergences. This result held even after controlling for other confounding factors known to influence the divergence of paralogs. We observed that methylation level divergence might be more relevant to the expression divergence of paralogs than methylation pattern divergence. Finally, we explored the mechanisms that might give rise to the divergence of gene body methylation in paralogs. We found that exonic methylation divergence more closely correlates with expression divergence than intronic methylation divergence. We show that genomic environments (e.g., flanked by transposable elements and repetitive sequences) of paralogs generated by various duplication mechanisms are associated with the methylation divergence of paralogs. Overall, our results suggest that the changes in gene body DNA methylation could provide another avenue for duplicate genes to develop differential expression patterns and undergo different evolutionary fates in plant genomes.  相似文献   

8.
9.
10.
MOTIVATION: We investigate two new Bayesian classification algorithms incorporating feature selection. These algorithms are applied to the classification of gene expression data derived from cDNA microarrays. RESULTS: We demonstrate the effectiveness of the algorithms on three gene expression datasets for cancer, showing they compare well with alternative kernel-based techniques. By automatically incorporating feature selection, accurate classifiers can be constructed utilizing very few features and with minimal hand-tuning. We argue that the feature selection is meaningful and some of the highlighted genes appear to be medically important.  相似文献   

11.
Microarrays have been useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, the analysis of microarray data is a challenging task. One of the key problems in microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affects the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses a risk of overfitting. Thus, in this context, feature selection methods become crucial to select relevant genes and, hence, improve classification accuracy. In this study, we investigated feature selection methods based on gene expression profiles and protein interactions. We found that in our setup, the addition of protein interaction information did not contribute to any significant improvement of the classification results. Furthermore, we developed a novel feature selection method that relies exclusively on observed gene expression changes in microarray experiments, which we call “relative Signal-to-Noise ratio” (rSNR). More precisely, the rSNR ranks genes based on their specificity to an experimental condition, by comparing intrinsic variation, i.e. variation in gene expression within an experimental condition, with extrinsic variation, i.e. variation in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We compared different feature selection methods on two time-series microarray datasets and one static microarray dataset. We found that the rSNR performed generally better than the other methods.  相似文献   

12.
Breast cancer has various molecular subtypes and displays high heterogeneity. Aberrant DNA methylation is involved in tumor origin, development and progression. Moreover, distinct DNA methylation patterns are associated with specific breast cancer subtypes. We explored DNA methylation patterns in association with gene expression to assess their impact on the prognosis of breast cancer based on Infinium 450K arrays (training set) from The Cancer Genome Atlas (TCGA). The DNA methylation patterns of 12 featured genes that had a high correlation with gene expression were identified through univariate and multivariable Cox proportional hazards models and used to define the methylation risk score (MRS). An improved ability to distinguish the power of the DNA methylation pattern from the 12 featured genes (p = 0.00103) was observed compared with the average methylation levels (p = 0.956) or gene expression (p = 0.909). Furthermore, MRS provided a good prognostic value for breast cancers even when the patients had the same receptor status. We found that ER-, PR- or Her2- samples with high-MRS had the worst 5-year survival rate and overall survival time. An independent test set including 28 patients with death as an outcome was used to test the validity of the MRS of the 12 featured genes; this analysis obtained a prognostic value equivalent to the training set. The predict power was validated through two independent datasets from the GEO database. The DNA methylation pattern is a powerful predictor of breast cancer survival, and can predict outcomes of the same breast cancer molecular subtypes.  相似文献   

13.
Hepatocellular Carcinoma (HCC) is one of the leading causes of cancer-associated mortality worldwide. However, the role of epigenetic changes such as aberrant DNA methylation in hepatocarcinogenesis remains largely unclear. In this study, we examined the methylation profiles of 59 HCC patients. Using consensus hierarchical clustering with feature selection, we identified three tumor subgroups based on their methylation profiles and correlated these subgroups with clinicopathological parameters. Interestingly, one tumor subgroup is different from the other 2 subgroups and the methylation profile of this subgroup is the most distinctly different from the non-tumorous liver tissues. Significantly, this subgroup of patients was found to be associated with poor overall as well as disease-free survival. To further understand the pathways modulated by the deregulation of methylation in HCC patients, we integrated data from both the methylation as well as the gene expression profiles of these 59 HCC patients. In these patients, while 4416 CpG sites were differentially methylated between the tumors compared to the adjacent non-tumorous tissues, only 536 of these CpG sites were associated with differences in the expression of their associated genes. Pathway analysis revealed that forty-four percent of the most significant upstream regulators of these 536 genes were involved in inflammation-related NFκB pathway. These data suggest that inflammation via the NFκB pathway play an important role in modulating gene expression of HCC patients through methylation. Overall, our analysis provides an understanding on aberrant methylation profile in HCC patients.  相似文献   

14.
15.
DNA甲基化是重要的表观遗传修饰,主要发生在DNA的CpG岛. DNA的甲基化通过DNA甲基转移酶(DNA methyltransferases, DNMTs)完成. DNA甲基化参与了细胞分化、基因组稳定性、X染色体失活、基因印记等多种细胞生物学过程.单基因水平及基因组范围内的DNA甲基化改变在肿瘤发生发展中亦发挥重要作用. 抑癌基因的异常甲基化引起的表达抑制,可导致肿瘤细胞的增殖失控和侵袭转移,并参与肿瘤组织的血管生成过程.在许多肿瘤的研究中都发现了基因组整体DNA低甲基化所导致的染色体不稳定性. 本文从DNA的异常高甲基化和低甲基化两方面论述了DNA甲基化在细胞恶变发生发展过程中的改变及其影响,并阐述了DNA甲基化改变在肿瘤诊断和治疗中的作用.  相似文献   

16.
17.
18.
Integrated analysis of DNA methylation and gene expression can reveal specific epigenetic patterns that are important during carcinogenesis. We built an integrated database of DNA methylation and gene expression termed MENT (Methylation and Expression database of Normal and Tumor tissues) to provide researchers information on both DNA methylation and gene expression in diverse cancers. It contains integrated data of DNA methylation, gene expression, correlation of DNA methylation and gene expression in paired samples, and clinicopathological conditions gathered from the GEO (Gene Expression Omnibus) and TCGA (The Cancer Genome Atlas). A user-friendly interface allows users to search for differential DNA methylation by either ‘gene search’ or ‘dataset search’. The ‘gene search’ returns which conditions are differentially methylated in a gene of interest, while ‘dataset search’ returns which genes are differentially methylated in a condition of interest based on filtering options such as direction, DM (differential methylation value), and p-value. MENT is the first database which provides both DNA methylation and gene expression information in diverse normal and tumor tissues. Its user-friendly interface allows users to easily search and view both DNA methylation and gene expression patterns. MENT is freely available at http://mgrc.kribb.re.kr:8080/MENT/.  相似文献   

19.
RNA介导的DNA甲基化作用(RNA-directed DNA Methylation,RdDM)是首次在植物中发现的基因组表观修饰现象,RdDM通过RNA-DNA序列相互作用直接导致DNA甲基化。植物中的RdDM和siRNA介导的mRNA降解现象,都是通过RNA使序列特异性基因发生沉默,它们对于植物的染色体重排、抵御病毒感染、基因表达调控和发育的许多过程起到了非常重要的作用。在植物中有很多的文献报道RdDM现象,但是对于其具体调控机理还不是很清楚。这里对RNA介导的植物DNA甲基化的基本特征进行了简要概述,主要对RdDM机理的研究进展进行了综述,其中包括RdDM过程中的DNA甲基转移酶的种类及其作用机理,DNA甲基化与染色质修饰之间的关系,以及与RdDM相关的重要蛋白质的研究等。在植物中,转录和转录后水平都可能发生RdDM,诱发基因沉默,前者常涉及靶基因启动子的甲基化,后者则牵涉到编码区的甲基化。RdDM的发生依赖于RNAi途径中相似的siRNA和酶,如DCL3、RdR2、SDE4和AGO4。植物中至少含有三类DNA甲基转移酶DRM1/2、MET1和CMT3,其作用部位是与RNA同源的DNA区域中的所有胞嘧啶,而组蛋白H3第九位赖氨酸的甲基化影响着胞嘧啶的甲基化。  相似文献   

20.
We have observed extensive interindividual differences in DNA methylation of 8590 CpG sites of 6229 genes in 153 human adult cerebellum samples, enriched in CpG island “shores” and at further distances from CpG islands. To search for genetic factors that regulate this variation, we performed a genome-wide association study (GWAS) mapping of methylation quantitative trait loci (mQTLs) for the 8590 testable CpG sites. cis association refers to correlation of methylation with SNPs within 1 Mb of a CpG site. 736 CpG sites showed phenotype-wide significant cis association with 2878 SNPs (after permutation correction for all tested markers and methylation phenotypes). In trans analysis of methylation, which tests for distant regulation effects, associations of 12 CpG sites and 38 SNPs remained significant after phenotype-wide correction. To examine the functional effects of mQTLs, we analyzed 85 genes that were with genetically regulated methylation we observed and for which we had quality gene expression data. Ten genes showed SNP-methylation-expression three-way associations—the same SNP simultaneously showed significant association with both DNA methylation and gene expression, while DNA methylation was significantly correlated with gene expression. Thus, we demonstrated that DNA methylation is frequently a heritable continuous quantitatively variable trait in human brain. Unlike allele-specific methylation, genetic polymorphisms mark both cis- and trans-regulatory genetic sites at measurable distances from their CpG sites. Some of the genetically regulated DNA methylation is directly connected with genetically regulated gene expression variation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号