首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
基于基因表达谱的肿瘤分型和特征基因选取   总被引:20,自引:0,他引:20  
在分析基因表达谱数据特性的基础上,提出了一个将之用于肿瘤分子分型和选型和选取相应亚型特征基因的策略。该策略包括三个步骤:首先采用一个无监督的基因过滤算法以降低用于分型计算的数据的噪声,其次提出了一个概率模型对样本中的分类结构进行建模,最后基于聚类的结果采用相对熵的方法获得对分类贡献大的基因作为特征基因,应用该策略对两个公开发表的数据集进行了再挖掘,结果表明不但获得了其他方法可以得到的信息,而且还提供了更精细、更具有显著生物学意义的信息,具有明显的优越性。  相似文献   

2.
基因表达聚类分析技术的现状与发展   总被引:5,自引:0,他引:5  
随着多个生物基因组测序的完成、DNA芯片技术的广泛应用,基因表达数据分析已成为后基因组时代的研究热点.聚类分析能将功能相关的基因按表达谱的相似程度归纳成类,有助于对未知功能的基因进行研究,是目前基因表达分析研究的主要计算技术之一.已有多种聚类分析算法用于基因表达数据分析,各种算法因其着眼点、原理等方面的差异,而各有其优缺点.如何对各种聚类算法的有效性进行分析、并开发新型的、适合于基因表达数据分析的方法已是当务之急.  相似文献   

3.
系统发育谱算法作为一种有效的大规模基因组功能注释方法,已经被成功的应用到原核生物基因组的功能注释中去。通过对系统发育谱方法中的一个关键环节——相似谱的聚类进行分析,提出了一种基于统计建模的方法来对相似的系统发育谱进行聚类。实验表明,该方法在保证较高的覆盖率的同时,还有效的提高了算法的整体速度,且当参与建模的系统发育谱的数目越大时,算法的精确度越高。  相似文献   

4.
基于遗传算法的基因表达数据的K-均值聚类分析   总被引:1,自引:0,他引:1  
聚类算法在基因表达数据的分析处理过程中得到日益广泛的应用。本文通过把K-均值聚类算法引入到遗传算法中,结合基因微阵列的特点,来讨论一种基于遗传算法的K-均值聚类模型,目的是利用遗传算法的全局性来提高聚类算法找到全局最优的可能性,实验结果证明,该算法可以很好地解决某些基因表达数据的聚类分析问题。  相似文献   

5.
系统发育谱方法是目前研究较多的一种基于非同源性的生物大分子功能注释方法。针对现有算法存在的一些缺陷,从两个方面对该方法做了改进:一是构造基于权重的系统发育谱;二是采用改进的聚类算法对发育谱的相似性进行分析。从NCBI上下载100条Escherichia coli K12蛋白质作为实验数据,分别使用改进的算法和经典的层次聚类算法、K均值聚类算法对相似谱进行分析。结果显示,提出的改进算法在对相似谱聚类的精确度上明显优于后两种聚类算法。  相似文献   

6.
DNA微阵列技术的发展为基因表达研究提供更有效的工具。分析这些大规模基因数据主要应用聚类方法。最近,提出双聚类技术来发现子矩阵以揭示各种生物模式。多目标优化算法可以同时优化多个相互冲突的目标,因而是求解基因表达矩阵的双聚类的一种很好的方法。本文基于克隆选择原理提出了一个新奇的多目标免疫优化双聚类算法,来挖掘微阵列数据的双聚类。在两个真实数据集上的实验结果表明该方法比其他多目标进化双聚娄算法表现出更优越的性能。  相似文献   

7.
与实验条件相关的基因功能模块聚类分析方法   总被引:2,自引:0,他引:2  
喻辉  郭政  李霞  屠康 《生物物理学报》2004,20(3):225-232
针对细胞内基因功能模块化的现象,定义了“基因功能模块”和“特征功能模块”两个概念,并基于这两个概念提出一种“与实验条件相关的基因功能模块聚类算法”。该算法综合利用基因功能知识与基因表达谱信息,将基因聚类为与实验条件相关的基因功能模块。向基因表达谱中加入水平逐渐升高的数据噪音,根据基因功能模块对数据噪音的抵抗力,确定最稳定的基因功能模块,即特征功能模块。加噪音实验显示,在基因芯片技术可能发生的噪音范围内,该算法对噪音的稳健性优于层次聚类和模糊C均值聚类。将模块聚类算法应用在NCI60数据集上,发现了8个与实验条件高度相关的特征功能模块。  相似文献   

8.
基因表达谱聚类/分类技术研究及展望   总被引:3,自引:0,他引:3       下载免费PDF全文
随着人类及多种模式生物全基因组测序基本完成,人类基因组计划的研究进入后基因组时代.后基因组时代研究的焦点已经从测序转向功能研究。聚类/分类技术作为分析基因表达谱和识别基因功能的重要工具之一,近年来获得很大的发展。对目前基因表达谱聚类/分类技术及它们的发展,进行了综述性的研究,分析了它们的优缺点,结合我们的研究,提出了解决问题的思路和方法,为基因表达谱的进一步研究提供了新的途径。  相似文献   

9.
单细胞转录组测序技术提供单个细胞分辨率的基因表达谱,有助于更准确地揭示细胞异质性。聚类是识别生物组织中细胞类型的主要方法,选择合适的聚类算法可以提升单细胞转录组测序数据分析的性能。本文阐述了k-means、层次聚类(hierarchical clustering, HC)、 Leiden、 SC3、 SCENA、 LAK、 SIMLR和dropClust等8种典型的单细胞聚类算法,在12个带有真实标签的单细胞转录组测序数据集上进行聚类比较分析。采用轮廓系数、 Calinski-Harabasz指数、调整兰德指数、调整互信息、 FMI指数、 V-measure、 Jaccard系数和变异系数等8个评价指标,对8种聚类算法的性能进行分析评价。根据实验结果,发现HC、 SC3、k-means、 SCENA的聚类泛用性与鲁棒性最佳,在大规模数据集上SIMLR算法表现最好;在小规模数据集上Leiden算法表现最好,但是存在依赖邻居节点参数和稳定性低的问题;dropClust算法在泛用性和鲁棒性上最差。此外,8种聚类方法的性能都与数据质量有关,当数据的变异系数较低时,聚类算法的评分指标普遍增高,反...  相似文献   

10.
基于基因表达谱的疾病亚型特征基因挖掘方法   总被引:1,自引:0,他引:1  
在本研究中,提出了一种基于基因表达谱的疾病亚型特征基因挖掘方法,该方法基于过滤后基因表达谱,融合无监督聚类识别疾病亚型技术和提出的衡量特征基因对疾病亚型鉴别能力的模式质量测度,以嵌入的方式实现特征基因挖掘。最后将提出的方法应用于40例结肠癌组织与22例正常结肠组织中2000个基因的表达谱实验数据,结果显示:提出的方法是一种可行的疾病亚型特征基因挖掘方法,方法的优势在于可并行实现疾病亚型划分和特征基因识别。  相似文献   

11.
Following sequence alignment, clustering algorithms are among the most utilized techniques in gene expression data analysis. Clustering gene expression patterns allows researchers to determine which gene expression patterns are alike and most likely to participate in the same biological process being investigated. Gene expression data also allow the clustering of whole samples of data, which makes it possible to find which samples are similar and, consequently, which sampled biological conditions are alike. Here, a novel similarity measure calculation and the resulting rank-based clustering algorithm are presented. The clustering was applied in 418 gene expression samples from 13 data series spanning three model organisms: Homo sapiens, Mus musculus, and Arabidopsis thaliana. The initial results are striking: more than 91% of the samples were clustered as expected. The MESs (most expressed sequences) approach outperformed some of the most used clustering algorithms applied to this kind of data such as hierarchical clustering and K-means. The clustering performance suggests that the new similarity measure is an alternative to the traditional correlation/distance measures typically used in clustering algorithms.  相似文献   

12.
Scoring clustering solutions by their biological relevance   总被引:1,自引:0,他引:1  
MOTIVATION: A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. RESULTS: We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. AVAILABILITY: The software is available from the authors upon request.  相似文献   

13.
14.
MOTIVATION: Arrays allow measurements of the expression levels of thousands of mRNAs to be made simultaneously. The resulting data sets are information rich but require extensive mining to enhance their usefulness. Information theoretic methods are capable of assessing similarities and dissimilarities between data distributions and may be suited to the analysis of gene expression experiments. The purpose of this study was to investigate information theoretic data mining approaches to discover temporal patterns of gene expression from array-derived gene expression data. RESULTS: The Kullback-Leibler divergence, an information-theoretic distance that measures the relative dissimilarity between two data distribution profiles, was used in conjunction with an unsupervised self-organizing map algorithm. Two published, array-derived gene expression data sets were analyzed. The patterns obtained with the KL clustering method were found to be superior to those obtained with the hierarchical clustering algorithm using the Pearson correlation distance measure. The biological significance of the results was also examined. AVAILABILITY: Software code is available by request from the authors. All programs were written in ANSI C and Matlab (Mathworks Inc., Natick, MA).  相似文献   

15.
MOTIVATION: Cluster analysis of gene expression profiles has been widely applied to clustering genes for gene function discovery. Many approaches have been proposed. The rationale is that the genes with the same biological function or involved in the same biological process are more likely to co-express, hence they are more likely to form a cluster with similar gene expression patterns. However, most existing methods, including model-based clustering, ignore known gene functions in clustering. RESULTS: To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions as prior probabilities in model-based clustering. In contrast to a global mixture model applicable to all the genes in the standard model-based clustering, we use a stratified mixture model: one stratum corresponds to the genes of unknown function while each of the other ones corresponding to the genes sharing the same biological function or pathway; the genes from the same stratum are assumed to have the same prior probability of coming from a cluster while those from different strata are allowed to have different prior probabilities of coming from the same cluster. We derive a simple EM algorithm that can be used to fit the stratified model. A simulation study and an application to gene function prediction demonstrate the advantage of our proposal over the standard method. CONTACT: weip@biostat.umn.edu  相似文献   

16.
SUMMARY: Besides classical clustering methods such as hierarchical clustering, in recent years biclustering has become a popular approach to analyze biological data sets, e.g. gene expression data. The Biclustering Analysis Toolbox (BicAT) is a software platform for clustering-based data analysis that integrates various biclustering and clustering techniques in terms of a common graphical user interface. Furthermore, BicAT provides different facilities for data preparation, inspection and postprocessing such as discretization, filtering of biclusters according to specific criteria or gene pair analysis for constructing gene interconnection graphs. The possibility to use different biclustering algorithms inside a single graphical tool allows the user to compare clustering results and choose the algorithm that best fits a specific biological scenario. The toolbox is described in the context of gene expression analysis, but is also applicable to other types of data, e.g. data from proteomics or synthetic lethal experiments. AVAILABILITY: The BicAT toolbox is freely available at http://www.tik.ee.ethz.ch/sop/bicat and runs on all operating systems. The Java source code of the program and a developer's guide is provided on the website as well. Therefore, users may modify the program and add further algorithms or extensions.  相似文献   

17.
Ji X  Li-Ling J  Sun Z 《FEBS letters》2003,542(1-3):125-131
In this work we have developed a new framework for microarray gene expression data analysis. This framework is based on hidden Markov models. We have benchmarked the performance of this probability model-based clustering algorithm on several gene expression datasets for which external evaluation criteria were available. The results showed that this approach could produce clusters of quality comparable to two prevalent clustering algorithms, but with the major advantage of determining the number of clusters. We have also applied this algorithm to analyze published data of yeast cell cycle gene expression and found it able to successfully dig out biologically meaningful gene groups. In addition, this algorithm can also find correlation between different functional groups and distinguish between function genes and regulation genes, which is helpful to construct a network describing particular biological associations. Currently, this method is limited to time series data. Supplementary materials are available at http://www.bioinfo.tsinghua.edu.cn/~rich/hmmgep_supp/.  相似文献   

18.
The availability of a great range of prior biological knowledge about the roles and functions of genes and gene-gene interactions allows us to simplify the analysis of gene expression data to make it more robust, compact, and interpretable. Here, we objectively analyze the applicability of functional clustering for the identification of groups of functionally related genes. The analysis is performed in terms of gene expression classification and uses predictive accuracy as an unbiased performance measure. Features of biological samples that originally corresponded to genes are replaced by features that correspond to the centroids of the gene clusters and are then used for classifier learning. Using 10 benchmark data sets, we demonstrate that functional clustering significantly outperforms random clustering without biological relevance. We also show that functional clustering performs comparably to gene expression clustering, which groups genes according to the similarity of their expression profiles. Finally, the suitability of functional clustering as a feature extraction technique is evaluated and discussed.  相似文献   

19.
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号