期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Large sets of bioinformatical data provide a challenge in time consumption while solving the cluster identification problem, and that is why a parallel algorithm is so needed for identifying dense clusters in a noisy background. Our algorithm works on a graph representation of the data set to be analyzed. It identifies clusters through the identification of densely intraconnected subgraphs. We have employed a minimum spanning tree (MST) representation of the graph and solve the cluster identification problem using this representation. The computational bottleneck of our algorithm is the construction of an MST of a graph, for which a parallel algorithm is employed. Our high-level strategy for the parallel MST construction algorithm is to first partition the graph, then construct MSTs for the partitioned subgraphs and auxiliary bipartite graphs based on the subgraphs, and finally merge these MSTs to derive an MST of the original graph. The computational results indicate that when running on 150 CPUs, our algorithm can solve a cluster identification problem on a data set with 1,000,000 data points almost 100 times faster than on single CPU, indicating that this program is capable of handling very large data clustering problems in an efficient manner. We have implemented the clustering algorithm as the software CLUMP. 相似文献

7.

Discovery of sequence motifs related to coexpression of genes using evolutionary computation 总被引：3，自引：0，他引：3

下载免费PDF全文

Fogel GB Weekes DG Varga G Dow ER Harlow HB Onyia JE Su C 《Nucleic acids research》2004,32(13):3826-3835

相似文献

8.

STIF: Identification of stress-upregulated transcription factor binding sites in Arabidopsis thaliana

Sundar AS Varghese SM Shameer K Karaba N Udayakumar M Sowdhamini R 《Bioinformation》2008,2(10):431-437

相似文献

9.

Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites

Qin ZS McCue LA Thompson W Mayerhofer L Lawrence CE Liu JS 《Nature biotechnology》2003,21(4):435-439

相似文献

10.

基于信息量的调控元件预测方法 总被引：3，自引：0，他引：3

谢雪英孙啸谢建明陆祖宏《生物物理学报》2003,19(4):424-430

设计基于信息含量的调控元件识别算法,对酵母的基因表达数据聚类结果进行分析,旨在预测共表达基因上游非编码区可能存在的转录因子结合位点。分析已知受相同调控因子作用的基因上游序列的结果表明,算法能正确识别具有单一保守核心序列的调控元件和具有间隔子(spacer)的保守序列．通过分析共表达基因,算法提取出的候选调控元件,部分可能具有生物学意义,这还有待于生物学实验的进一步验证。相似文献

11.

A new systematic computational approach to predicting target genes of transcription factors

Dai X He J Zhao X 《Nucleic acids research》2007,35(13):4433-4440

相似文献

12.

Combining genome and mouse knockout expression data to highlight binding sites for the transcription factor HNF1alpha

Lockwood CR Frayling TM 《In silico biology》2003,3(1-2):57-70

相似文献

13.

Nucleotide sequence of an Escherichia coli tRNA (Leu 1) operon and identification of the transcription promoter signal 总被引：13，自引：3，他引：10

下载免费PDF全文

G Duester R K Campen W M Holmes 《Nucleic acids research》1981,9(9):2121-2139

相似文献

14.

Definition of transcriptional promoters in the human beta globin locus control region

Routledge SJ Proudfoot NJ 《Journal of molecular biology》2002,323(4):601-611

相似文献

15.

ABF1 binding sites in yeast RNA polymerase genes 总被引：18，自引：0，他引：18

F Della Seta I Treich J M Buhler A Sentenac 《The Journal of biological chemistry》1990,265(25):15168-15175

相似文献

16.

Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae 总被引：12，自引：0，他引：12

Hughes JD Estep PW Tavazoie S Church GM 《Journal of molecular biology》2000,296(5):1205-1214

相似文献

17.

Genome-wide prediction and analysis of function-specific transcription factor binding sites

Long F Liu H Hahn C Sumazin P Zhang MQ Zilberstein A 《In silico biology》2004,4(4):395-410

相似文献

18.

Horizontal transfer of a large and highly toxic secondary metabolic gene cluster between fungi

Slot JC Rokas A 《Current biology : CB》2011,21(2):134-139

相似文献

19.

Permutation pattern discovery in biosequences.

Revital Eres Gad M Landau Laxmi Parida 《Journal of computational biology》2004,11(6):1050-1060

Functionally related genes often appear in each other's neighborhood on the genome; however, the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper, we address the problem of automatically discovering clusters of entities, be they genes or domains: we formalize the abstract problem as a discovery problem called the (pi)pattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E. Coli protein sequences. 相似文献

20.

A discriminative model for identifying spatial cis-regulatory modules.

Eran Segal Roded Sharan 《Journal of computational biology》2005,12(6):822-834

相似文献