共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
3.
4.
5.
6.
Olman Victor Mao Fenglou Wu Hongwei Xu Ying 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(2):344-352
Large sets of bioinformatical data provide a challenge in time consumption while solving the cluster identification problem, and that is why a parallel algorithm is so needed for identifying dense clusters in a noisy background. Our algorithm works on a graph representation of the data set to be analyzed. It identifies clusters through the identification of densely intraconnected subgraphs. We have employed a minimum spanning tree (MST) representation of the graph and solve the cluster identification problem using this representation. The computational bottleneck of our algorithm is the construction of an MST of a graph, for which a parallel algorithm is employed. Our high-level strategy for the parallel MST construction algorithm is to first partition the graph, then construct MSTs for the partitioned subgraphs and auxiliary bipartite graphs based on the subgraphs, and finally merge these MSTs to derive an MST of the original graph. The computational results indicate that when running on 150 CPUs, our algorithm can solve a cluster identification problem on a data set with 1,000,000 data points almost 100 times faster than on single CPU, indicating that this program is capable of handling very large data clustering problems in an efficient manner. We have implemented the clustering algorithm as the software CLUMP. 相似文献
7.
8.
9.
10.
11.
12.
13.
14.
15.
ABF1 binding sites in yeast RNA polymerase genes 总被引:18,自引:0,他引:18
F Della Seta I Treich J M Buhler A Sentenac 《The Journal of biological chemistry》1990,265(25):15168-15175
16.
17.
18.
19.
Functionally related genes often appear in each other's neighborhood on the genome; however, the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper, we address the problem of automatically discovering clusters of entities, be they genes or domains: we formalize the abstract problem as a discovery problem called the (pi)pattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E. Coli protein sequences. 相似文献