首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
简要介绍了系统发育谱法的原理,着重阐述了K—mean聚类算法在对基因系统发育谱分析中的改进,并与传统的K—mean聚类算法进行比较。实验结果表明,改进的K—mean聚类算法在运用系统发育谱法进行基因功能注释上是快而有效的,可以快速收敛到近似最优解。  相似文献   

2.
系统发育谱算法作为一种有效的大规模基因组功能注释方法,已经被成功的应用到原核生物基因组的功能注释中去。通过对系统发育谱方法中的一个关键环节——相似谱的聚类进行分析,提出了一种基于统计建模的方法来对相似的系统发育谱进行聚类。实验表明,该方法在保证较高的覆盖率的同时,还有效的提高了算法的整体速度,且当参与建模的系统发育谱的数目越大时,算法的精确度越高。  相似文献   

3.
群体分型是一种有助于更好的理解人类身心健康等复杂生物学问题的有效方法,聚类是一种为了对样本分组来降低复杂性的定义肠型的方法,而传统K-means聚类算法的K值选取无法确定,本文在传统K-means聚类算法的基础上进行了改进,并公开数据集上进行了验证,实验表明改进算法能够解决K值选取无法确定的问题,且聚类结果的稳定性、准确性和聚类质量都得到显著提高。将改进后的模型运用于肠道菌群OTUs数据,发现不仅能够有效地区分2-型糖尿病患者样本间的相似性,而且能鉴定出影响菌群结构异质性最大的OTUs菌,为临床解决2-型糖尿病问题提供了一种新的思路。  相似文献   

4.
:分析了当前常用的标准化方法在肿瘤基因芯片中引起错误分类的原因,提出了一种基于类均值的标准化方法.该方法对基因表达谱进行双向标准化,并将标准化过程与聚类过程相互缠绕,利用聚类结果来修正参照表达水平.选取了5组肿瘤基因芯片数据,用层次聚类和K-均值聚类算法在不同的方差水平上分别对常用的标准化和基于类均值的标准化处理后的基因表达数据进行聚类分析比较.实验结果表明,基于类均值的标准化方法能有效提高肿瘤基因表达谱聚类结果的质量.  相似文献   

5.
贝叶斯聚类在基因表达谱知识挖掘中的应用   总被引:1,自引:0,他引:1  
在大规模基因表达谱的数据分析中引入了一种全新的基于贝叶斯模型的聚类算法,从生物学背景出发,研究了该算法应用在大规模基因表达谱中的理论基础和算法优越性,并应用该算法对两个公共的基因表达数据集进行了知识再挖掘。结果表明,与其他聚类算法相比,该算法在知识发现方面具有显著的优越性。挖掘出的生物学知识对该领域研究人员的实验设计也有一定的启发性。  相似文献   

6.
与实验条件相关的基因功能模块聚类分析方法   总被引:2,自引:0,他引:2  
喻辉  郭政  李霞  屠康 《生物物理学报》2004,20(3):225-232
针对细胞内基因功能模块化的现象,定义了“基因功能模块”和“特征功能模块”两个概念,并基于这两个概念提出一种“与实验条件相关的基因功能模块聚类算法”。该算法综合利用基因功能知识与基因表达谱信息,将基因聚类为与实验条件相关的基因功能模块。向基因表达谱中加入水平逐渐升高的数据噪音,根据基因功能模块对数据噪音的抵抗力,确定最稳定的基因功能模块,即特征功能模块。加噪音实验显示,在基因芯片技术可能发生的噪音范围内,该算法对噪音的稳健性优于层次聚类和模糊C均值聚类。将模块聚类算法应用在NCI60数据集上,发现了8个与实验条件高度相关的特征功能模块。  相似文献   

7.
系统聚类分析在细菌全细胞脂肪酸模式识别中的应用   总被引:2,自引:1,他引:1  
用欧氏距离系数和指数相关系数,结合8种常用的系统聚类算法,对用毛细管柱气相色谱祛绘制的34株莫拉氏菌(Moraxella)及其类属菌和13株嗜肺军团杆菌(Legionella pneumo-phila)的全细胞脂肪酸气相色谱图,进行了聚类分析。比较了欧氏距离系数的8种系统聚类算法所得的聚类树状谱。结果表明,奠拉氏菌与嗜肺军团杆菌可以明确区分。在奠拉氏菌中,我国分离的两个新种与目前该属的主要标准株也能明确区分。两种相似系数中,欧氏距离系数的聚类结果较好;8种系统聚类算法中,最长距离法和类平均法的聚类结果较好。  相似文献   

8.
利用RAPD技术对拟青霉属菌株进行分类鉴定   总被引:13,自引:0,他引:13  
用9个引物对来自安徽和浙江的16株12种拟青霉(Paecilomyces Bainier)进行RAPD分析。获得的平均相似性系数表明,种间的相似系数在21%-46%之间,RAPD指纹图谱在拟青霉属不同种间具明显的种的特异性,可区别所有形态近似的种类,是鉴定菌种的有效途径,RAPD结果也暗示分离自灰僵蚕的RCEF197可能是一新种,对比RAPD聚类树状图和基于ITS构建的分子系统发育图,表明该聚类树状图不适于分析种间亲缘关系。  相似文献   

9.
利用RAPD技术对拟青霉属菌株进行分类鉴定   总被引:1,自引:0,他引:1  
用9个引物对来自安徽和浙江的16株12种拟青霉 (Paecilomyces Bainier) 进行RAPD分析,获得的平均相似性系数表明,种间的相似系数在21%~46%之间。RAPD指纹图谱在拟青霉属不同种间具明显的种的特异性,可区别所有形态近似的种类,是鉴定菌种的有效途径。RAPD结果也暗示分离自灰僵蚕的RCEF197可能是一新种。对比RAPD聚类树状图和基于ITS构建的分子系统发育图,表明该聚类树状图不适于分析种间亲缘关系。  相似文献   

10.
用9个引物对来自安徽和浙江的16株12种拟青霉 (Paecilomyces Bainier) 进行RAPD分析,获得的平均相似性系数表明,种间的相似系数在21%~46%之间。RAPD指纹图谱在拟青霉属不同种间具明显的种的特异性,可区别所有形态近似的种类,是鉴定菌种的有效途径。RAPD结果也暗示分离自灰僵蚕的RCEF197可能是一新种。对比RAPD聚类树状图和基于ITS构建的分子系统发育图,表明该聚类树状图不适于分析种间亲缘关系。  相似文献   

11.
Onychophorans (velvet worms) use an adhesive, protein‐based slime secretion for prey capture and defence. The glue‐like slime is ejected via a pair of modified limbs and the sticky threads entangle the victim. In this study, we analysed the protein composition of slime in twelve species of Onychophora from different parts of the world, including two species of Peripatidae from Costa Rica and Brazil and ten species of Peripatopsidae from Australia, using sodium dodecyl sulphate polyacrylamide gel electrophoresis. Our results revealed high intraspecific conservation in protein composition of slime in each species studied. In contrast, the protein profiles differ considerably in both number and position of bands between the species. We observed the highest number of differences (in 20 of 33 considered band positions) between a peripatid and a peripatopsid species, whereas the lowest number of differences (in four band positions) occurs between two closely related egg‐laying species. The reconstructed maximum parsimony cladogram based on the electrophoretic characters largely reflects the phylogenetic relationships of the species studied, suggesting that the slime protein profiles contain useful phylogenetic information. Based on our findings, we suggest that the slime protein profiling is a valuable, non‐invasive method for identifying the onychophoran species. Moreover, this method might help to discover potentially new species of Onychophora, given that the ~200 described species most likely underrepresent the actual diversity of the group.  相似文献   

12.
Co-conservation (phylogenetic profiles) is a well-established method for predicting functional relationships between proteins. Several publicly available databases use this method and additional clustering strategies to develop networks of protein interactions (cluster co-conservation (CCC)). CCC has previously been limited to interactions within a single target species. We have extended CCC to develop protein interaction networks based on co-conservation between protein pairs across multiple species, cross-species cluster co-conservation.  相似文献   

13.
A domain interaction map based on phylogenetic profiling   总被引:2,自引:0,他引:2  
Phylogenetic profiling is a well established method for predicting functional relations and physical interactions between proteins. We present a new method for finding such relations based on phylogenetic profiling of conserved domains rather than proteins, avoiding computationally expensive all versus all sequence comparisons among genomes. The resulting domain interaction map (DIMA) can be explored directly or mapped to a genome of interest. We demonstrate that the performance of DIMA is comparable to that of classical phylogenetic profiling and its predictions often yield information that cannot be detected by profiling of entire protein chains. We provide a list of novel domain associations predicted by our method.  相似文献   

14.
The minimal set of proteins necessary to maintain a vertebrate cell forms an interesting core of cellular machinery. The known proteome of human red blood cell consists of about 1400 proteins. We treated this protein complement of one of the simplest human cells as a model and asked the questions on its function and origins. The proteome was mapped onto phylogenetic profiles, i.e. vectors of species possessing homologues of human proteins. A novel clustering approach was devised, utilising similarity in the phylogenetic spread of homologues as distance measure. The clustering based on phylogenetic profiles yielded several distinct protein classes differing in phylogenetic taxonomic spread, presumed evolutionary history and functional properties. Notably, small clusters of proteins common to vertebrates or Metazoa and other multicellular eukaryotes involve biological functions specific to multicellular organisms, such as apoptosis or cell-cell signaling, respectively. Also, a eukaryote-specific cluster is identified, featuring GTP-ase signalling and ubiquitination. Another cluster, made up of proteins found in most organisms, including bacteria and archaea, involves basic molecular functions such as oxidation-reduction and glycolysis. Approximately one third of erythrocyte proteins do not fall in any of the clusters, reflecting the complexity of protein evolution in comparison to our simple model. Basically, the clustering obtained divides the proteome into old and new parts, the former originating from bacterial ancestors, the latter from inventions within multicellular eukaryotes. Thus, the model human cell proteome appears to be made up of protein sets distinct in their history and biological roles. The current work shows that phylogenetic profiles concept allows protein clustering in a way relevant both to biological function and evolutionary history.  相似文献   

15.
“Phylogenetic profiling” is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence–absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence–absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence–absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity—from 30% to 100%—and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will “auto-tune” with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence–absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes.  相似文献   

16.
Zhou Y  Wang R  Li L  Xia X  Sun Z 《Journal of molecular biology》2006,359(4):1150-1159
Identifying potential protein interactions is of great importance in understanding the topologies of cellular networks, which is much needed and valued in current systematic biological studies. The development of our computational methods to predict protein-protein interactions have been spurred on by the massive sequencing efforts of the genomic revolution. Among these methods is phylogenetic profiling, which assumes that proteins under similar evolutionary pressures with similar phylogenetic profiles might be functionally related. Here, we introduce a method for inferring functional linkages between proteins from their evolutionary scenarios. The term evolutionary scenario refers to a series of events that occurred in speciation over time, which can be reconstructed given a phylogenetic profile and a species tree. Common evolutionary pressures on two proteins can then be inferred by comparing their evolutionary scenarios, which is a direct indication of their functional linkage. This scenario method has proven to have better performance compared with the classical phylogenetic profile method, when applied to the same test set. In addition, predicted results of the two methods are found to be fairly different, suggesting the possibility of merging them in order to achieve a better performance. We analyzed the influence of the topology of the phylogenetic tree on the performance of this method, and found it to be robust to perturbations in the topology of the tree. However, if a completely random tree is incorporated, performance will decline significantly. The evolutionary scenario method was used for inferring functional linkages in 67 species, and 40,006 linkages were predicted. We examine our prediction for budding yeast and find that almost all predicted linkages are supported by further evidence.  相似文献   

17.
Phylogenetic profiles have been widely applied in functional genomics research, especially in the prediction of protein-protein interactions (PPIs). A key issue in phylogenetic profiling is how to effectively select reference organisms from the available hundreds of genomes. In this study, we performed an assessment of reference organism selection based on the genetic distance between the target organism and 167 reference organisms. We found that inclusion of reference organisms from all distance levels had better performance in the prediction of PPIs than that at each distance level. The PPI prediction reached an optimal level when 70% of the reference organisms at all distance levels were selected; and this performance was similar to that in the optimal condition based on the taxonomy tree in our previous study. Because measurement of genetic distance is direct and simple compared to the topology of the taxonomy tree, we suggest selecting reference organisms based on genetic distance in the construction of phylogenetic profiles.  相似文献   

18.
In this paper, we introduce a probabilistic measure for computing the similarity between two biological sequences without alignment. The computation of the similarity measure is based on the Kullback-Leibler divergence of two constructed Markov models. We firstly validate the method on clustering nine chromosomes from three species. Secondly, we give the result of similarity search based on our new method. We lastly apply the measure to the construction of phylogenetic tree of 48 HEV genome sequences. Our results indicate that the weighted relative entropy is an efficient and powerful alignment-free measure for the analysis of sequences in the genomic scale.  相似文献   

19.
Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define ‘correlation feature space’ for samples based on the gene expression profiles by iterative employment of Pearson’s correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号