首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 328 毫秒
1.
唐羽  李敏 《生物信息学》2014,12(1):38-45
蛋白质网络聚类是识别功能模块的重要手段,不仅有利于理解生物系统的组织结构,对预测蛋白质功能也具有重要的意义.聚类结果的可视化分析是实现蛋白质网络聚类的有效途径.本论文基于开源的Cytoscape平台,设计并实现了一个蛋白质网络聚类分析及可视化插件CytoCluster.该插件集成了MCODE,FAG-EC,HC-PIN,OH-PIN,IPCA,EAGLE等六种典型的聚类算法;实现了聚类结果的可视化,将分析所得的clusters以缩略图列表的形式直观地显示出来,对于单个cluster,可显示在原网络中的位置,并能生成相应的子图单独显示;可对聚类结果进行导出,记录了算法名称、参数、聚类结果等信息.该插件具有良好的扩展性,提供了统一的算法接口,可不断添加新的聚类算法.  相似文献   

2.
蔡娟  王建新  李敏  陈钢 《生物信息学》2011,9(3):185-188
生物网络中的聚类分析是功能模块识别及蛋白质功能预测的重要方法,聚类结果的可视化对于快速有效地分析生物网络结构也具有重要作用。通过分析生物网络显示和分析平台Cytoscape的架构,设计了一个使用方便的聚类分析和显示插件ClusterViz。这是一个可扩展的聚类算法的集成平台,可以不断增加其中的聚类算法,并对不同算法的结果进行比较分析,目前已实现了三种典型的算法实例。该插件能够成为蛋白质相互作用网络机理研究的一个有效工具。  相似文献   

3.
关键蛋白质是指那些在蛋白质相互作用网络中承担重要作用、移除后会使蛋白质复合物功能丧失并导致生物无法存活的节点。随着蛋白质数据库的不断完善和高通量技术的发展,使得通过计算方法的关键蛋白预测得到广泛应用。针对目前软件多为桌面应用程序、用户难以迅速适应的情况,本文设计并实现了一个基于WEB的关键蛋白质预测平台EssentialProtein Finder(EP Finder)。该平台集成了DC、BC、CC、EC、LAC、SC和NC7种关键蛋白质预测算法,还提供包含SN、SP、PPV、NPV、ACC、F和折刀曲线图在内的7种评估方法。平台对蛋白质网络图、算法运行及评估结果提供了可视化展示。该平台具有良好的扩展性。  相似文献   

4.
基于蛋白质网络功能模块的蛋白质功能预测   总被引:1,自引:0,他引:1  
在破译了基因序列的后基因组时代,随着系统生物学实验的快速发展,产生了大量的蛋白质相互作用数据,利用这些数据寻找功能模块及预测蛋白质功能在功能基因组研究中具有重要意义.打破了传统的基于蛋白质间相似度的聚类模式,直接从蛋白质功能团的角度出发,考虑功能团间的一阶和二阶相互作用,提出了模块化聚类方法(MCM),对实验数据进行聚类分析,来预测模块内未知蛋白质的功能.通过超几何分布P值法和增、删、改相互作用的方法对聚类结果进行预测能力分析和稳定性分析.结果表明,模块化聚类方法具有较高的预测准确度和覆盖率,有很好的容错性和稳定性.此外,模块化聚类分析得到了一些具有高预测准确度的未知蛋白质的预测结果,将会对生物实验有指导意义,其算法对其他具有相似结构的网络也具有普遍意义.  相似文献   

5.
聚类分析在黄霉素发酵过程中的应用   总被引:2,自引:0,他引:2  
【目的】将聚类分析的方法应用于黄霉素摇瓶发酵条件的优化过程中。【方法】通过系统聚类算法、K均值聚类算法和模糊C均值聚类算法对不同批次黄霉素发酵的摇瓶数据的聚类分析进行比较,发现模糊C均值聚类算法优于其他聚类算法,确定了以模糊C均值聚类算法对黄霉素摇瓶发酵数据进行聚类分析。【结果】然后利用模糊C均值聚类算法选取优质组样本,并利用优质样本优化了黄霉素摇瓶发酵的控制参数分布范围。【结论】这充分证明了聚类分析在发酵过程的优化过程中有良好的实用性。  相似文献   

6.
利用基因芯片可以得到不同基因在不同生命过程中的表达,因此在医学诊断与病变分析中受到重视,并开始大量应用.经测定发现,不同基因在病变过程的不同阶段中的表达是不相同的,由此可以得到在病变过程的不同基因的表达特征.在本文中,我们给出了乳腺癌在转移过程中的基因表达特征的聚类分析法分析,并改进了k-means聚类算法,使之具有自动搜索聚类数的功能,并且有助于改善k-means算法的聚类结果陷入局部最小值的状况.通过对平均聚类误差指标的比较,kr—means要优于k-means算法.本文所得到的结果可供乳腺癌诊断与病变分析参考,同时可以应用于小型基因检测芯片的制备,也可以用于构建基因网络调控图.  相似文献   

7.
细胞生物过程具有时序动态性,蛋白质功能模块是驱动细胞生物过程的功能单位。为了蛋白质功能模块识别,本文将细胞生物过程建模为动态时序表达相关蛋白质相互作用网络(DTEPIN);构建子块矩阵以表示动态时序表达相关蛋白质相互作用网络;利用子块矩阵特殊性,分析时空复杂度和并行性;优化设计马尔可夫聚类算法,以识别动态时序表达相关蛋白质相互作用网络中的蛋白质功能模块。为了支持基于子块矩阵马尔可夫聚类过程,本文运用图形处理器并行计算矩阵乘积。实验结果表明,与已有同类算法相比,所设计算法识别的蛋白质功能模块,统计匹配质量更高且精确匹配数量更多。  相似文献   

8.
本文主要介绍MATLAB生物信息工具箱的数据聚类分析功能,该功能主要用于基因芯片数据的分析。将要分析的数据先转化成XLS格式的文件,通过函数xlsread读入MATLAB Workspace,存储为两个变量。对缺失数据进行估算,从而减小结果误差。函数clustergram对数据分级聚类,并产生数据的热红外分布图和树状图。通过更改相关参数可以改变其颜色配置,距离算法,并可做双向聚类。  相似文献   

9.
基于遗传算法的基因表达数据的K-均值聚类分析   总被引:1,自引:0,他引:1  
聚类算法在基因表达数据的分析处理过程中得到日益广泛的应用。本文通过把K-均值聚类算法引入到遗传算法中,结合基因微阵列的特点,来讨论一种基于遗传算法的K-均值聚类模型,目的是利用遗传算法的全局性来提高聚类算法找到全局最优的可能性,实验结果证明,该算法可以很好地解决某些基因表达数据的聚类分析问题。  相似文献   

10.
微阵列DNA芯片技术可以并行分析成千上万个基因的表达情况,它为研究药物的作用机制提供了一个新的高效技术平台.用9种已知和未知作用机理的抗真菌化合物处理酵母细胞,并得到酵母细胞的全基因表达谱,然后对其进行聚类分析.结果表明作用机制类似的化合物具有相近的聚类关系.两性霉素B和制霉菌素、酮康唑和克霉唑都是已知的作用机制类似的抗真菌药物.通过对基因表达谱进行聚类分析,发现前一组和后一组分别被聚类在一起.另外已知澳洲茄胺抑制的是细胞膜上麦角固醇的合成,聚类分析表明它与酮康唑,克霉唑的聚类很靠近.对微阵列DNA芯片产生的基因表达谱进行聚类分析,由于作用机制相似的药物会被聚类在一起,因此根据未知药物和已知药物的聚类关系,可以了解未知药物的作用机制,这对于加速新药开发的步伐具有十分重要的意义.  相似文献   

11.
Choi H 《Proteomics》2012,12(10):1663-1668
Protein complex identification is an important goal of protein-protein interaction analysis. To date, development of computational methods for detecting protein complexes has been largely motivated by genome-scale interaction data sets from high-throughput assays such as yeast two-hybrid or tandem affinity purification coupled with mass spectrometry (TAP-MS). However, due to the popularity of small to intermediate-scale affinity purification-mass spectrometry (AP-MS) experiments, protein complex detection is increasingly discussed in local network analysis. In such data sets, protein complexes cannot be detected using binary interaction data alone because the data contain interactions with tagged proteins only and, as a result, interactions between all other proteins remain unobserved, limiting the scope of existing algorithms. In this article, we provide a pragmatic review of network graph-based computational algorithms for protein complex analysis in global interactome data, without requiring any computational background. We discuss the practical gap in applying these algorithms to recently surging small to intermediate-scale AP-MS data sets, and review alternative clustering algorithms using quantitative proteomics data and their limitations.  相似文献   

12.
Genesis: cluster analysis of microarray data   总被引:26,自引:0,他引:26  
  相似文献   

13.

Background  

Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. This task is commonly executed using clustering procedures, which aim at detecting densely connected regions within the interaction graphs. There exists a wealth of clustering algorithms, some of which have been applied to this problem. One of the most successful clustering procedures in this context has been the Markov Cluster algorithm (MCL), which was recently shown to outperform a number of other procedures, some of which were specifically designed for partitioning protein interactions graphs. A novel promising clustering procedure termed Affinity Propagation (AP) was recently shown to be particularly effective, and much faster than other methods for a variety of problems, but has not yet been applied to partition protein interaction graphs.  相似文献   

14.
We introduce a new algorithm, called ClusFCM, which combines techniques of clustering and fuzzy cognitive maps (FCM) for prediction of protein functions. ClusFCM takes advantage of protein homologies and protein interaction network topology to improve low recall predictions associated with existing prediction methods. ClusFCM exploits the fact that proteins of known function tend to cluster together and deduce functions not only through their direct interaction with other proteins, but also from other proteins in the network. We use ClusFCM to annotate protein functions for Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), and Drosophila melanogaster (fly) using protein-protein interaction data from the General Repository for Interaction Datasets (GRID) database and functional labels from Gene Ontology (GO) terms. The algorithm's performance is compared with four state-of-the-art methods for function prediction--Majority, chi(2) statistics, Markov random field (MRF), and FunctionalFlow--using measures of Matthews correlation coefficient, harmonic mean, and area under the receiver operating characteristic (ROC) curves. The results indicate that ClusFCM predicts protein functions with high recall while not lowering precision. Supplementary information is available at http://www.egr.vcu.edu/cs/dmb/ClusFCM/.  相似文献   

15.
Protein interactions play a vital part in the function of a cell. As experimental techniques for detection and validation of protein interactions are time consuming, there is a need for computational methods for this task. Protein interactions appear to form a network with a relatively high degree of local clustering. In this paper we exploit this clustering by suggesting a score based on triplets of observed protein interactions. The score utilises both protein characteristics and network properties. Our score based on triplets is shown to complement existing techniques for predicting protein interactions, outperforming them on data sets which display a high degree of clustering. The predicted interactions score highly against test measures for accuracy. Compared to a similar score derived from pairwise interactions only, the triplet score displays higher sensitivity and specificity. By looking at specific examples, we show how an experimental set of interactions can be enriched and validated. As part of this work we also examine the effect of different prior databases upon the accuracy of prediction and find that the interactions from the same kingdom give better results than from across kingdoms, suggesting that there may be fundamental differences between the networks. These results all emphasize that network structure is important and helps in the accurate prediction of protein interactions. The protein interaction data set and the program used in our analysis, and a list of predictions and validations, are available at http://www.stats.ox.ac.uk/bioinfo/resources/PredictingInteractions.  相似文献   

16.
Detecting protein complexes from protein interaction networks is one major task in the postgenome era. Previous developed computational algorithms identifying complexes mainly focus on graph partition or dense region finding. Most of these traditional algorithms cannot discover overlapping complexes which really exist in the protein-protein interaction (PPI) networks. Even if some density-based methods have been developed to identify overlapping complexes, they are not able to discover complexes that include peripheral proteins. In this study, motivated by recent successful application of generative network model to describe the generation process of PPI networks and to detect communities from social networks, we develop a regularized sparse generative network model (RSGNM), by adding another process that generates propensities using exponential distribution and incorporating Laplacian regularizer into an existing generative network model, for protein complexes identification. By assuming that the propensities are generated using exponential distribution, the estimators of propensities will be sparse, which not only has good biological interpretation but also helps to control the overlapping rate among detected complexes. And the Laplacian regularizer will lead to the estimators of propensities more smooth on interaction networks. Experimental results on three yeast PPI networks show that RSGNM outperforms six previous competing algorithms in terms of the quality of detected complexes. In addition, RSGNM is able to detect overlapping complexes and complexes including peripheral proteins simultaneously. These results give new insights about the importance of generative network models in protein complexes identification.  相似文献   

17.
MOTIVATION: A promising and reliable approach to annotate gene function is clustering genes not only by using gene expression data but also literature information, especially gene networks. RESULTS: We present a systematic method for gene clustering by combining these totally different two types of data, particularly focusing on network modularity, a global feature of gene networks. Our method is based on learning a probabilistic model, which we call a hidden modular random field in which the relation between hidden variables directly represents a given gene network. Our learning algorithm which minimizes an energy function considering the network modularity is practically time-efficient, regardless of using the global network property. We evaluated our method by using a metabolic network and microarray expression data, changing with microarray datasets, parameters of our model and gold standard clusters. Experimental results showed that our method outperformed other four competing methods, including k-means and existing graph partitioning methods, being statistically significant in all cases. Further detailed analysis showed that our method could group a set of genes into a cluster which corresponds to the folate metabolic pathway while other methods could not. From these results, we can say that our method is highly effective for gene clustering and annotating gene function.  相似文献   

18.
Using indirect protein-protein interactions for protein complex prediction   总被引:1,自引:0,他引:1  
Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein-protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号