首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 106 毫秒
1.
夏遥  孔薇 《生物磁学》2011,(Z1):4742-4747
目的:基于阿尔茨海默病微阵列基因表达数据,分析研究微阵列基因表达数据预处理的新的有效方法。方法:首先采用标准差滤波、FSC(特征记分准则)和WPT-SAM(小波包变换-微阵列数据显著性分析)方法对微阵列基因表达数据进行预处理,比较处理后获得的基因数和FDR值;然后采用分类聚类方法对处理后的数据进行分类聚类和分层决策聚类,比较分类聚类结果。结果:标准差滤波和FSC方法获得的初筛基因数据较WPT-SAM方法多,但FDR值也高、后续分类聚类结果较WPT-SAM方法差。结论:WPT-SAM方法在预处理微阵列基因表达数据中,是比较灵活理想的分析方法。  相似文献   

2.
目的:基于阿尔茨海默病微阵列基因表达数据,分析研究微阵列基因表达数据预处理的新的有效方法.方法:首先采用标准差滤波、FSC(特征记分准则)和WPT-SAM(小波包变换-微阵列数据显著性分析)方法对微阵列基因表达数据进行预处理,比较处理后获得的基因数和FDR值;然后采用分类聚类方法对处理后的数据进行分类聚类和分层决策聚类,比较分类聚类结果.结果:标准差滤波和FSC方法获得的初筛基因数据较WPT-SAM方法多,但FDR值也高、后续分类聚类结果较WPT-SAM方法差.结论:WPT-SAM方法在预处理微阵列基因表达数据中,是比较灵活理想的分析方法.  相似文献   

3.
基于遗传算法的基因表达数据的K-均值聚类分析   总被引:1,自引:0,他引:1  
聚类算法在基因表达数据的分析处理过程中得到日益广泛的应用。本文通过把K-均值聚类算法引入到遗传算法中,结合基因微阵列的特点,来讨论一种基于遗传算法的K-均值聚类模型,目的是利用遗传算法的全局性来提高聚类算法找到全局最优的可能性,实验结果证明,该算法可以很好地解决某些基因表达数据的聚类分析问题。  相似文献   

4.
聚类数目是影响聚类效果的关键参数,通常需要人工确定,对于较难获得这一先验知识的复杂生物数据集,聚类分析会因此受到限制。针对这一问题,文章提出一种自动确定最佳聚类数目的方法,该方法利用体现"类内紧凑类间离散"思想的优化聚类算法来执行主要计算,结合目标函数二阶差分的判定准则,通过聚类算法的自学习来确定最佳聚类数。实验结果显示,该方法能在复杂数据集上自动得到合理的聚类数目。  相似文献   

5.
:分析了当前常用的标准化方法在肿瘤基因芯片中引起错误分类的原因,提出了一种基于类均值的标准化方法.该方法对基因表达谱进行双向标准化,并将标准化过程与聚类过程相互缠绕,利用聚类结果来修正参照表达水平.选取了5组肿瘤基因芯片数据,用层次聚类和K-均值聚类算法在不同的方差水平上分别对常用的标准化和基于类均值的标准化处理后的基因表达数据进行聚类分析比较.实验结果表明,基于类均值的标准化方法能有效提高肿瘤基因表达谱聚类结果的质量.  相似文献   

6.
与实验条件相关的基因功能模块聚类分析方法   总被引:2,自引:0,他引:2  
喻辉  郭政  李霞  屠康 《生物物理学报》2004,20(3):225-232
针对细胞内基因功能模块化的现象,定义了“基因功能模块”和“特征功能模块”两个概念,并基于这两个概念提出一种“与实验条件相关的基因功能模块聚类算法”。该算法综合利用基因功能知识与基因表达谱信息,将基因聚类为与实验条件相关的基因功能模块。向基因表达谱中加入水平逐渐升高的数据噪音,根据基因功能模块对数据噪音的抵抗力,确定最稳定的基因功能模块,即特征功能模块。加噪音实验显示,在基因芯片技术可能发生的噪音范围内,该算法对噪音的稳健性优于层次聚类和模糊C均值聚类。将模块聚类算法应用在NCI60数据集上,发现了8个与实验条件高度相关的特征功能模块。  相似文献   

7.
刘万霖  李栋  朱云平  贺福初 《遗传》2007,29(12):1434-1442
随着微阵列数据的快速增长, 微阵列基因表达数据日益成为生物信息学研究的重要数据源。利用微阵列基因表达数据构建基因调控网络也成为一个研究热点。通过构建基因调控网络, 可以解读复杂的调控关系, 发现细胞内的调控模式, 并进而在系统尺度上理解生物学进程。近年来, 人们引入了多种算法来利用基因芯片数据构建基因调控网络。文章回顾了这些算法的发展历史, 尤其是其在理论和方法上的改进, 给出了一些相关的软件平台, 并预测了该领域可能的发展趋势。  相似文献   

8.
基于Fiedler向量的基因表达谱数据分类方法   总被引:1,自引:0,他引:1  
尝试将一种基于图的Fiedler向量的聚类算法引入到基因表达谱数据的肿瘤分类中来。该方法将分属不同类的所有样本通过高斯权构造Laplace完全图,经SVD分解后获得Fiedler向量,利用各样本所对应的Fiedler向量分量的符号差异来进行基因表达谱数据的分类。通过模拟数据仿真实验和对白血病两个亚型(ALL与AML)及结肠癌真实数据实验,证明了这一方法的有效性。  相似文献   

9.
微阵列DNA芯片技术可以并行分析成千上万个基因的表达情况,它为研究药物的作用机制提供了一个新的高效技术平台.用9种已知和未知作用机理的抗真菌化合物处理酵母细胞,并得到酵母细胞的全基因表达谱,然后对其进行聚类分析.结果表明作用机制类似的化合物具有相近的聚类关系.两性霉素B和制霉菌素、酮康唑和克霉唑都是已知的作用机制类似的抗真菌药物.通过对基因表达谱进行聚类分析,发现前一组和后一组分别被聚类在一起.另外已知澳洲茄胺抑制的是细胞膜上麦角固醇的合成,聚类分析表明它与酮康唑,克霉唑的聚类很靠近.对微阵列DNA芯片产生的基因表达谱进行聚类分析,由于作用机制相似的药物会被聚类在一起,因此根据未知药物和已知药物的聚类关系,可以了解未知药物的作用机制,这对于加速新药开发的步伐具有十分重要的意义.  相似文献   

10.
单细胞转录组测序技术提供单个细胞分辨率的基因表达谱,有助于更准确地揭示细胞异质性。聚类是识别生物组织中细胞类型的主要方法,选择合适的聚类算法可以提升单细胞转录组测序数据分析的性能。本文阐述了k-means、层次聚类(hierarchical clustering, HC)、 Leiden、 SC3、 SCENA、 LAK、 SIMLR和dropClust等8种典型的单细胞聚类算法,在12个带有真实标签的单细胞转录组测序数据集上进行聚类比较分析。采用轮廓系数、 Calinski-Harabasz指数、调整兰德指数、调整互信息、 FMI指数、 V-measure、 Jaccard系数和变异系数等8个评价指标,对8种聚类算法的性能进行分析评价。根据实验结果,发现HC、 SC3、k-means、 SCENA的聚类泛用性与鲁棒性最佳,在大规模数据集上SIMLR算法表现最好;在小规模数据集上Leiden算法表现最好,但是存在依赖邻居节点参数和稳定性低的问题;dropClust算法在泛用性和鲁棒性上最差。此外,8种聚类方法的性能都与数据质量有关,当数据的变异系数较低时,聚类算法的评分指标普遍增高,反...  相似文献   

11.
Biclustering is an important tool in microarray analysis when only a subset of genes co-regulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarray data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this paper, we present a new biclustering algorithm based on the geometrical viewpoint of coherent gene expression profiles. In this method, we perform pattern identification based on the Hough transform in a column-pair space. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our studies show that the approach can discover significant biclusters with respect to the increased noise level and regulatory complexity. Furthermore, we also test the ability of our method to locate biologically verifiable biclusters within an annotated set of genes.  相似文献   

12.
13.
Summary Sparse singular value decomposition (SSVD) is proposed as a new exploratory analysis tool for biclustering or identifying interpretable row–column associations within high‐dimensional data matrices. SSVD seeks a low‐rank, checkerboard structured matrix approximation to data matrices. The desired checkerboard structure is achieved by forcing both the left‐ and right‐singular vectors to be sparse, that is, having many zero entries. By interpreting singular vectors as regression coefficient vectors for certain linear regressions, sparsity‐inducing regularization penalties are imposed to the least squares regression to produce sparse singular vectors. An efficient iterative algorithm is proposed for computing the sparse singular vectors, along with some discussion of penalty parameter selection. A lung cancer microarray dataset and a food nutrition dataset are used to illustrate SSVD as a biclustering method. SSVD is also compared with some existing biclustering methods using simulated datasets.  相似文献   

14.
Biclustering microarray data by Gibbs sampling   总被引:1,自引:0,他引:1  
MOTIVATION: Gibbs sampling has become a method of choice for the discovery of noisy patterns, known as motifs, in DNA and protein sequences. Because handling noise in microarray data presents similar challenges, we have adapted this strategy to the biclustering of discretized microarray data. RESULTS: In contrast with standard clustering that reveals genes that behave similarly over all the conditions, biclustering groups genes over only a subset of conditions for which those genes have a sharp probability distribution. We have opted for a simple probabilistic model of the biclusters because it has the key advantage of providing a transparent probabilistic interpretation of the biclusters in the form of an easily interpretable fingerprint. Furthermore, Gibbs sampling does not suffer from the problem of local minima that often characterizes Expectation-Maximization. We demonstrate the effectiveness of our approach on two synthetic data sets as well as a data set from leukemia patients.  相似文献   

15.
cluML     
cluML is a new markup language for microarray data clustering and cluster validity assessment. The XML-based format has been designed to address some of the limitations observed in traditional formats, such as inability to store multiple clustering (including biclustering) and validation results within a dataset. cluML is an effective tool to support biomedical knowledge representation in gene expression data analysis. Although cluML was developed for DNA microarray analysis applications, it can be effectively used for the representation of clustering and for the validation of other biomedical and physical data that has no limitations.  相似文献   

16.
MOTIVATION: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. RESULTS: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings.  相似文献   

17.

Background  

The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only [1]. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis.  相似文献   

18.
Biclustering extends the traditional clustering techniques by attempting to find (all) subgroups of genes with similar expression patterns under to-be-identified subsets of experimental conditions when applied to gene expression data. Still the real power of this clustering strategy is yet to be fully realized due to the lack of effective and efficient algorithms for reliably solving the general biclustering problem. We report a QUalitative BIClustering algorithm (QUBIC) that can solve the biclustering problem in a more general form, compared to existing algorithms, through employing a combination of qualitative (or semi-quantitative) measures of gene expression data and a combinatorial optimization technique. One key unique feature of the QUBIC algorithm is that it can identify all statistically significant biclusters including biclusters with the so-called ‘scaling patterns’, a problem considered to be rather challenging; another key unique feature is that the algorithm solves such general biclustering problems very efficiently, capable of solving biclustering problems with tens of thousands of genes under up to thousands of conditions in a few minutes of the CPU time on a desktop computer. We have demonstrated a considerably improved biclustering performance by our algorithm compared to the existing algorithms on various benchmark sets and data sets of our own. QUBIC was written in ANSI C and tested using GCC (version 4.1.2) on Linux. Its source code is available at: http://csbl.bmb.uga.edu/∼maqin/bicluster. A server version of QUBIC is also available upon request.  相似文献   

19.

Background  

The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号