首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
2.
3.
4.
5.
6.
7.
8.
9.
Functionally related genes often appear in each other's neighborhood on the genome; however, the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper, we address the problem of automatically discovering clusters of entities, be they genes or domains: we formalize the abstract problem as a discovery problem called the (pi)pattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E. Coli protein sequences.  相似文献   

10.
周荣阁  张静 《生物信息学》2011,9(2):120-124,130
识别真核基因的转录因子结合位点(或称模体)是后基因组时代的一项主要工作,对共表达或共调控的基因同时进行分析可以提高模体识别的准确性.本文基于2×2列联表的对数线性模型,以模体出现的基因条数计数,对酵母核糖体蛋白(RP)基因普遍使用的转录调控模体进行分析,然后用U-检验进一步筛选出相对于背景序列来说过表达的模体.这些模体为酵母RP基因潜在的转录调控元件,与实验获得的转录因子结合位点的符合率达90%.本方法的优点在于用严格的统计标准在一组基因启动子中搜索普遍使用的模体,克服了以往分析中对模体使用普遍性的模糊判断.本文的方法也可以有效地搜索共表达基因族的组合调控模体对.研究中还发现一个现象:2×2列联表中反映属性相关程度的Pearson相关系数与对数线性模型的交互效应之间存在着明显的相关性.这一结果提示,可以用对数线性模型的交互效应来评价两属性的关联情况.  相似文献   

11.
12.
We developed an algorithm, Lever, that systematically maps metazoan DNA regulatory motifs or motif combinations to sets of genes. Lever assesses whether the motifs are enriched in cis-regulatory modules (CRMs), predicted by our PhylCRM algorithm, in the noncoding sequences surrounding the genes. Lever analysis allows unbiased inference of functional annotations to regulatory motifs and candidate CRMs. We used human myogenic differentiation as a model system to statistically assess greater than 25,000 pairings of gene sets and motifs or motif combinations. We assigned functional annotations to candidate regulatory motifs predicted previously and identified gene sets that are likely to be co-regulated via shared regulatory motifs. Lever allows moving beyond the identification of putative regulatory motifs in mammalian genomes, toward understanding their biological roles. This approach is general and can be applied readily to any cell type, gene expression pattern or organism of interest.  相似文献   

13.
14.
15.
MOTIVATION: Association pattern discovery (APD) methods have been successfully applied to gene expression data. They find groups of co-regulated genes in which the genes are either up- or down-regulated throughout the identified conditions. These methods, however, fail to identify similarly expressed genes whose expressions change between up- and down-regulation from one condition to another. In order to discover these hidden patterns, we propose the concept of mining co-regulated gene profiles. Co-regulated gene profiles contain two gene sets such that genes within the same set behave identically (up or down) while genes from different sets display contrary behavior. To reduce and group the large number of similar resulting patterns, we propose a new similarity measure that can be applied together with hierarchical clustering methods. RESULTS: We tested our proposed method on two well-known yeast microarray data sets. Our implementation mined the data effectively and discovered patterns of co-regulated genes that are hidden to traditional APD methods. The high content of biologically relevant information in these patterns is demonstrated by the significant enrichment of co-regulated genes with similar functions. Our experimental results show that the Mining Attribute Profile (MAP) method is an efficient tool for the analysis of gene expression data and competitive with bi-clustering techniques.  相似文献   

16.
华琳  郑卫英  刘红  林慧  高磊 《生物工程学报》2008,24(9):1643-1648
利用随机森林-通路分析法,通过袋外样本OOB的分类错误率筛选特征代谢通路,在特征通路上作基因表达相关性研究并对通路上的基因采用MAP(Mining attribute profile)算法挖掘不同实验条件下基因的共调控表达模式,对共调控表达模式进行聚类.分析结果显示同一特征代谢通路上的基因表达倾向相似,有2条特征代谢通路存在共表达模式.其中一条通路含108个表达模式,对这些模式进行聚类,其最低聚类的相似系数仍高达0.623.说明同一特征代谢通路上的基因共表达模式在不同实验条件下仍具有高度的相似性.对以通路作为基因模块进行复杂疾病的研究具有借鉴意义.  相似文献   

17.
18.
For RNA-binding protein Pasilla, which has been shown to play a role in alternative splicing regulation, binding sites and clusters of binding sites are found in silico in the whole genome of D. melanogaster. The current study analyzes the occurrence of splice sites in binding site clusters. Several hundred thousand binding site motifs and thousands of significant motif clusters were identified. It was discovered that exon-intron borders in D. melanogaster genes are reliably found within Pasilla binding motif clusters, with a higher frequency than could be otherwise expected based on a random model. Additionally, donor splice sites are found in Pasilla clusters twice as often as acceptor sites. This phenomena is observed both for exons annotated as alternatively spliced and for exons annotated as constitutive. These observations support the hypothesis that Pasilla plays a functional role in splicing regulation of D. melanogaster.  相似文献   

19.
20.
MOTIVATION: Discovery of regulatory motifs in unaligned DNA sequences remains a fundamental problem in computational biology. Two categories of algorithms have been developed to identify common motifs from a set of DNA sequences. The first can be called a 'multiple genes, single species' approach. It proposes that a degenerate motif is embedded in some or all of the otherwise unrelated input sequences and tries to describe a consensus motif and identify its occurrences. It is often used for co-regulated genes identified through experimental approaches. The second approach can be called 'single gene, multiple species'. It requires orthologous input sequences and tries to identify unusually well conserved regions by phylogenetic footprinting. Both approaches perform well, but each has some limitations. It is tempting to combine the knowledge of co-regulation among different genes and conservation among orthologous genes to improve our ability to identify motifs. RESULTS: Based on the Consensus algorithm previously established by our group, we introduce a new algorithm called PhyloCon (Phylogenetic Consensus) that takes into account both conservation among orthologous genes and co-regulation of genes within a species. This algorithm first aligns conserved regions of orthologous sequences into multiple sequence alignments, or profiles, then compares profiles representing non-orthologous sequences. Motifs emerge as common regions in these profiles. Here we present a novel statistic to compare profiles of DNA sequences and a greedy approach to search for common subprofiles. We demonstrate that PhyloCon performs well on both synthetic and biological data. AVAILABILITY: Software available upon request from the authors. http://ural.wustl.edu/softwares.html  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号