首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 937 毫秒
1.
系统发育谱方法是目前研究较多的一种基于非同源性的生物大分子功能注释方法。针对现有算法存在的一些缺陷,从两个方面对该方法做了改进:一是构造基于权重的系统发育谱;二是采用改进的聚类算法对发育谱的相似性进行分析。从NCBI上下载100条Escherichia coli K12蛋白质作为实验数据,分别使用改进的算法和经典的层次聚类算法、K均值聚类算法对相似谱进行分析。结果显示,提出的改进算法在对相似谱聚类的精确度上明显优于后两种聚类算法。  相似文献   

2.
简要介绍了系统发育谱法的原理,着重阐述了K—mean聚类算法在对基因系统发育谱分析中的改进,并与传统的K—mean聚类算法进行比较。实验结果表明,改进的K—mean聚类算法在运用系统发育谱法进行基因功能注释上是快而有效的,可以快速收敛到近似最优解。  相似文献   

3.
随着后基因组时代的到来,系统发育谱方法作为一种非同源性的功能注释方法,已经被成功的应用到基因组功能预测、蛋白质相互作用预测等一些重要领域的研究中去。本文阐述了系统发育谱法的基本原理,详细地介绍了现有的几种系统发育谱的构建方法,并提出了利用ortholog来构建基因的系统发育谱的思想。  相似文献   

4.
基因表达聚类分析技术的现状与发展   总被引:5,自引:0,他引:5  
随着多个生物基因组测序的完成、DNA芯片技术的广泛应用,基因表达数据分析已成为后基因组时代的研究热点.聚类分析能将功能相关的基因按表达谱的相似程度归纳成类,有助于对未知功能的基因进行研究,是目前基因表达分析研究的主要计算技术之一.已有多种聚类分析算法用于基因表达数据分析,各种算法因其着眼点、原理等方面的差异,而各有其优缺点.如何对各种聚类算法的有效性进行分析、并开发新型的、适合于基因表达数据分析的方法已是当务之急.  相似文献   

5.
随着元基因组数据的不断增多,建立一个包含高品质的元基因组样本(也称为"微生物群落")数据的集成化的分析平台成为可能,使得微生物群落样本能够被有效分析、比较与搜索,从中发现更加深入的生物学意义。然而,一方面目前大部分元基因组数据库仅仅提供了简单的数据存储,缺乏良好的样本注释或者仅仅提供了很少的分析功能。另一方面,用于计算微生物群落数据相似性的方法所能够接受的样本数据量非常有限。长期以来,科学家们一直在寻找有效的方法计算海量微生物群落之间的相似性,从而研究样本之间的相似度并发现元基因组数据信息的相关性。Meta-Mesh是一个全新的在线元基因组分析系统,它包括元基因组数据库和分析平台,可以对元基因组样本进行系统、有效地分析,并实现样本的群落结构比较和精确搜索。其中,元基因组数据库已经从公共领域和内部实验室收集了超过7 000个高品质、带有有效注释的样本。同时,Meta-Mesh的分析平台提供了多种在线分析工具,可以对元基因组样本进行群落的结构分析与注释,多角度比较,并能通过快速索引策略和群落结构相似性算法在数据库中高效搜索近似的样本。Meta-Mesh通过"人体微生物群落样本的数据库搜索识别"以及"基于相似度矩阵的样本的聚类"等一系列的元基因组研究案例证明了其分析方面的性能。作为一个在线的元基因组数据库和分析系统,Meta-Mesh将服务于元基因组样本的快速分析、识别、比对、搜索等相关领域。  相似文献   

6.
直系同源(orthology)是指由于物种形成事件而享有共同祖先的基因之间的关系,直系同源基因之间通常具有相似的结构和生物学功能.由于基因组和转录组序列的快速积累,精确的识别直系同源基因有助于功能基因的注释,比较和进化基因组学研究.综述了现有的识别直系同源基因的主要方法,并列举了由此构建的数据库.这些方法可以归纳为三大类,第一类是基于序列相似性的方法,具有识别速度快以及灵敏度高等优点;第二类是基于构建系统发育树的方法,具有准确性高和信息量大等优点;第三类是将上述两种方法结合起来的混合方法,更好地平衡了灵敏性和准确性.最后总结了识别过程所面临的问题.  相似文献   

7.
目前,基于计算机数学方法对基因的功能注释已成为热点及挑战,其中以机器学习方法应用最为广泛。生物信息学家不断提出有效、快速、准确的机器学习方法用于基因功能的注释,极大促进了生物医学的发展。本文就关于机器学习方法在基因功能注释的应用与进展作一综述。主要介绍几种常用的方法,包括支持向量机、k近邻算法、决策树、随机森林、神经网络、马尔科夫随机场、logistic回归、聚类算法和贝叶斯分类器,并对目前机器学习方法应用于基因功能注释时如何选择数据源、如何改进算法以及如何提高预测性能上进行讨论。  相似文献   

8.
为确定红锥(Castanopsis hystrix)叶绿体基因组的结构组成情况,判定其在锥属中的进化位置及与同锥属叶绿体基因组的区别,为锥属物种鉴定、遗传多样性分析和资源保护提供相关依据。使用Illumina HiSeq 2500测序平台对红锥叶绿体基因组进行测序,通过生物信息学分析方法进行序列组装、注释和特征分析,并利用R、Python、MISA、CodonW和MEGA 6等生物信息学软件对其基因组结构和数目、密码子偏好性、序列重复、简单重复序列(simple sequence repeat,SSR)位点和系统发育进行分析。结果表明红锥叶绿体基因组大小为153754 bp,呈现四分体结构;共拥有130个基因,包含85个编码基因、37个tRNA基因和8个rRNA基因;通过密码子偏好性分析,平均有效密码子数为55.5,说明其密码子随机性强、偏好性低;通过SSR及长重复片段分析,检测到45个重复序列及111个SSR位点;与近缘种比较,发现其叶绿体基因组序列高度保守,尤其蛋白质编码序列相似度极高;此外,系统发育分析发现红锥与海南锥聚为一支,关系密切。本研究得到了红锥的叶绿体基因组基本情况与系统发育位置,为红锥的物种辨别、天然种群遗传多样性与功能基因组学提供前期研究铺垫。  相似文献   

9.
:分析了当前常用的标准化方法在肿瘤基因芯片中引起错误分类的原因,提出了一种基于类均值的标准化方法.该方法对基因表达谱进行双向标准化,并将标准化过程与聚类过程相互缠绕,利用聚类结果来修正参照表达水平.选取了5组肿瘤基因芯片数据,用层次聚类和K-均值聚类算法在不同的方差水平上分别对常用的标准化和基于类均值的标准化处理后的基因表达数据进行聚类分析比较.实验结果表明,基于类均值的标准化方法能有效提高肿瘤基因表达谱聚类结果的质量.  相似文献   

10.
吴琼  李伟程  李敏  李瑜  孙天松 《微生物学报》2022,62(4):1438-1451
【目的】Limosilactobacillus fermentum具有增强免疫力、产胞外多糖(exopolysaccharide,EPS)等多种功能特性,广泛应用于食品领域,具有较高经济价值。本文从群体遗传学角度,解析L. fermentum F-6的遗传背景和功能基因特征,为其开发利用提供遗传学基础。【方法】本研究对NCBI已公开的23株L. fermentum全基因组序列和1株模式菌株ATCC 14931T的基因组序列进行比较基因组学分析。利用Roary软件识别核心基因集与泛基因集;采用rapid annotation using subsystem technology(RAST)网站对基因组进行功能注释,以探究F-6基因组特征。【结果】以识别到的997个核心基因构建系统发育树,发现聚类趋势与分离源无关,但F-6与3株食品分离株聚在同一分支。功能注释分析发现,24株L. fermentum中仅F-6含有参与支链氨基酸合成途径的基因(ilvD、leuA等),可为机体提供必需氨基酸。F-6含有大量编码糖基转移酶和UDP-葡萄糖4-表异构酶的基因,且含有1个完整的eps基因簇。与其他L...  相似文献   

11.
Kim Y  Subramaniam S 《Proteins》2006,62(4):1115-1124
Phylogenetic profiles encode patterns of presence or absence of genes across genomes, and these profiles can be used to assign functional relationships to nonhomologous pairs of proteins (Pellegrini et al., Proc Natl Acad Sci USA 1999;96:4284-4288). Although it is well known that many proteins were created from combinations of domains, most of the existing implementations of phylogenetic profiles do not consider this fact. Here, we introduce an extension that considers the multidomain nature of proteins and test the method against the known interaction data sets. Whereas earlier implementations associated one entire sequence with one protein phylogenetic profile (Single-Profile), our method instead breaks the sequence into a set of segments of predetermined size and constructs a separate profile for each segment (Multiple-Profile). The results show that the Multiple-Profile method performs as well as the Single-Profile method. However, the two methods share, surprisingly, a small fraction of their predictions, indicating that the Multiple-Profile method can detect known interactions missed by the Single-Profile method. Thus, the Multiple-Profile method can be used with other methods to determine functional relationships on a genome scale with wider coverage.  相似文献   

12.
Zhou Y  Wang R  Li L  Xia X  Sun Z 《Journal of molecular biology》2006,359(4):1150-1159
Identifying potential protein interactions is of great importance in understanding the topologies of cellular networks, which is much needed and valued in current systematic biological studies. The development of our computational methods to predict protein-protein interactions have been spurred on by the massive sequencing efforts of the genomic revolution. Among these methods is phylogenetic profiling, which assumes that proteins under similar evolutionary pressures with similar phylogenetic profiles might be functionally related. Here, we introduce a method for inferring functional linkages between proteins from their evolutionary scenarios. The term evolutionary scenario refers to a series of events that occurred in speciation over time, which can be reconstructed given a phylogenetic profile and a species tree. Common evolutionary pressures on two proteins can then be inferred by comparing their evolutionary scenarios, which is a direct indication of their functional linkage. This scenario method has proven to have better performance compared with the classical phylogenetic profile method, when applied to the same test set. In addition, predicted results of the two methods are found to be fairly different, suggesting the possibility of merging them in order to achieve a better performance. We analyzed the influence of the topology of the phylogenetic tree on the performance of this method, and found it to be robust to perturbations in the topology of the tree. However, if a completely random tree is incorporated, performance will decline significantly. The evolutionary scenario method was used for inferring functional linkages in 67 species, and 40,006 linkages were predicted. We examine our prediction for budding yeast and find that almost all predicted linkages are supported by further evidence.  相似文献   

13.
MOTIVATION: The phylogenetic profile of a protein is a string that encodes the presence or absence of the protein in every fully sequenced genome. Because proteins that participate in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion, the phylogenetic profiles of such proteins are often 'similar' or at least 'related' to each other. The question we address in this paper is the following: how to measure the 'similarity' between two profiles, in an evolutionarily relevant way, in order to develop efficient function prediction methods? RESULTS: We show how the profiles can be mapped to a high-dimensional vector space which incorporates evolutionarily relevant information, and we provide an algorithm to compute efficiently the inner product in that space, which we call the tree kernel. The tree kernel can be used by any kernel-based analysis method for classification or data mining of phylogenetic profiles. As an application a Support Vector Machine (SVM) trained to predict the functional class of a gene from its phylogenetic profile is shown to perform better with the tree kernel than with a naive kernel that does not include any information about the phylogenetic relationships among species. Moreover a kernel principal component analysis (KPCA) of the phylogenetic profiles illustrates the sensitivity of the tree kernel to evolutionarily relevant variations.  相似文献   

14.
Wu H  Su Z  Mao F  Olman V  Xu Y 《Nucleic acids research》2005,33(9):2822-2837
We present a computational method for the prediction of functional modules encoded in microbial genomes. In this work, we have also developed a formal measure to quantify the degree of consistency between the predicted and the known modules, and have carried out statistical significance analysis of consistency measures. We first evaluate the functional relationship between two genes from three different perspectives—phylogenetic profile analysis, gene neighborhood analysis and Gene Ontology assignments. We then combine the three different sources of information in the framework of Bayesian inference, and we use the combined information to measure the strength of gene functional relationship. Finally, we apply a threshold-based method to predict functional modules. By applying this method to Escherichia coli K12, we have predicted 185 functional modules. Our predictions are highly consistent with the previously known functional modules in E.coli. The application results have demonstrated that our approach is highly promising for the prediction of functional modules encoded in a microbial genome.  相似文献   

15.
Sun J  Xu J  Liu Z  Liu Q  Zhao A  Shi T  Li Y 《Bioinformatics (Oxford, England)》2005,21(16):3409-3415
MOTIVATION: The increasing availability of complete genome sequences provides excellent opportunity for the further development of tools for functional studies in proteomics. Several experimental approaches and in silico algorithms have been developed to cluster proteins into networks of biological significance that may provide new biological insights, especially into understanding the functions of many uncharacterized proteins. Among these methods, the phylogenetic profiles method has been widely used to predict protein-protein interactions. It involves the selection of reference organisms and identification of homologous proteins. Up to now, no published report has systematically studied the effects of the reference genome selection and the identification of homologous proteins upon the accuracy of this method. RESULTS: In this study, we optimized the phylogenetic profiles method by integrating phylogenetic relationships among reference organisms and sequence homology information to improve prediction accuracy. Our results revealed that the selection of the reference organisms set and the criteria for homology identification significantly are two critical factors for the prediction accuracy of this method. Our refined phylogenetic profiles method shows greater performance and potentially provides more reliable functional linkages compared with previous methods.  相似文献   

16.
“Phylogenetic profiling” is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence–absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence–absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence–absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity—from 30% to 100%—and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will “auto-tune” with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence–absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes.  相似文献   

17.
MOTIVATION: Functional linkages implicate pairwise relationships between proteins that work together to implement biological tasks. During evolution, functionally linked proteins are likely to be preserved or eliminated across a range of genomes in a correlated fashion. Based on this hypothesis, phylogenetic profiling-based approaches try to detect pairs of protein families that show similar evolutionary patterns. Traditionally, the evolutionary pattern of a protein is encoded by either a binary profile of presence and absence of this protein across species or an occurrence profile that indicates the distribution of copies of this protein across species. RESULTS: In our study, we characterize each protein by its enhanced phylogenetic tree, a novel graphical model of the evolution of a protein family with explicitly marked by speciation and duplication events. By topological comparison between enhanced phylogenetic trees, we are able to detect the functionally associated protein pairs. Because the enhanced phylogenetic trees contain more evolutionary information of proteins, our method shows greater performance and discovers functional linkages among proteins more reliably compared with the conventional approaches.  相似文献   

18.
Automated DNA sequencing technology is so rapid that analysis has become the rate-limiting step. Hundreds of prokaryotic genome sequences are publicly available, with new genomes uploaded at the rate of approximately 20 per month. As a result, this growing body of genome sequences will include microorganisms not previously identified, isolated, or observed. We hypothesize that evolutionary pressure exerted by an ecological niche selects for a similar genetic repertoire in those prokaryotes that occupy the same niche, and that this is due to both vertical and horizontal transmission. To test this, we have developed a novel method to classify prokaryotes, by calculating their Pfam protein domain distributions and clustering them with all other sequenced prokaryotic species. Clusters of organisms are visualized in two dimensions as 'mountains' on a topological map. When compared to a phylogenetic map constructed using 16S rRNA, this map more accurately clusters prokaryotes according to functional and environmental attributes. We demonstrate the ability of this map, which we term a "niche map", to cluster according to ecological niche both quantitatively and qualitatively, and propose that this method be used to associate uncharacterized prokaryotes with their ecological niche as a means of predicting their functional role directly from their genome sequence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号