首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.

Backgrounds

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs.

Methods

Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes.

Result

A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.  相似文献   

2.
Colak R  Moser F  Chu JS  Schönhuth A  Chen N  Ester M 《PloS one》2010,5(10):e13348

Background

Computational prediction of functionally related groups of genes (functional modules) from large-scale data is an important issue in computational biology. Gene expression experiments and interaction networks are well studied large-scale data sources, available for many not yet exhaustively annotated organisms. It has been well established, when analyzing these two data sources jointly, modules are often reflected by highly interconnected (dense) regions in the interaction networks whose participating genes are co-expressed. However, the tractability of the problem had remained unclear and methods by which to exhaustively search for such constellations had not been presented.

Methodology/Principal Findings

We provide an algorithmic framework, referred to as Densely Connected Biclustering (DECOB), by which the aforementioned search problem becomes tractable. To benchmark the predictive power inherent to the approach, we computed all co-expressed, dense regions in physical protein and genetic interaction networks from human and yeast. An automatized filtering procedure reduces our output which results in smaller collections of modules, comparable to state-of-the-art approaches. Our results performed favorably in a fair benchmarking competition which adheres to standard criteria. We demonstrate the usefulness of an exhaustive module search, by using the unreduced output to more quickly perform GO term related function prediction tasks. We point out the advantages of our exhaustive output by predicting functional relationships using two examples.

Conclusion/Significance

We demonstrate that the computation of all densely connected and co-expressed regions in interaction networks is an approach to module discovery of considerable value. Beyond confirming the well settled hypothesis that such co-expressed, densely connected interaction network regions reflect functional modules, we open up novel computational ways to comprehensively analyze the modular organization of an organism based on prevalent and largely available large-scale datasets.

Availability

Software and data sets are available at http://www.sfu.ca/~ester/software/DECOB.zip.  相似文献   

3.
4.
5.
6.
7.
8.
9.

Background

De novo genome assembly of next-generation sequencing data is one of the most important current problems in bioinformatics, essential in many biological applications. In spite of significant amount of work in this area, better solutions are still very much needed.

Results

We present a new program, SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.

Conclusions

SAGE benefits from innovations in almost every aspect of the assembly process: error correction of input reads, string-overlap graph construction, read copy counts estimation, overlap graph analysis and reduction, contig extraction, and scaffolding. We hope that these new ideas will help advance the current state-of-the-art in an essential area of research in genomics.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-302) contains supplementary material, which is available to authorized users.  相似文献   

10.
Valor LM  Grant SG 《PloS one》2007,2(12):e1303

Background

Gene expression profiling using microarrays is a powerful technology widely used to study regulatory networks. Profiling of mRNA levels in mutant organisms has the potential to identify genes regulated by the mutated protein.

Methodology/Principle Findings

Using tissues from multiple lines of knockout mice we have examined genome-wide changes in gene expression. We report that a significant proportion of changed genes were found near the targeted gene.

Conclusions/Significance

The apparent clustering of these genes was explained by the presence of flanking DNA from the parental ES cell. We provide recommendations for the analysis and reporting of microarray data from knockout mice  相似文献   

11.
12.

Background

Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies.

Principal Findings

We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly.

Conclusions

These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler.  相似文献   

13.
14.

Background

Despite recent discoveries of new molecular targets and pathways, the search for an effective therapy for Glioblastoma Multiforme (GBM) continues. A newly emerged field, radiogenomics, links gene expression profiles with MRI phenotypes. MRI-FLAIR is a noninvasive diagnostic modality and was previously found to correlate with cellular invasion in GBM. Thus, our radiogenomic screen has the potential to reveal novel molecular determinants of invasion. Here, we present the first comprehensive radiogenomic analysis using quantitative MRI volumetrics and large-scale gene- and microRNA expression profiling in GBM.

Methods

Based on The Cancer Genome Atlas (TCGA), discovery and validation sets with gene, microRNA, and quantitative MR-imaging data were created. Top concordant genes and microRNAs correlated with high FLAIR volumes from both sets were further characterized by Kaplan Meier survival statistics, microRNA-gene correlation analyses, and GBM molecular subtype-specific distribution.

Results

The top upregulated gene in both the discovery (4 fold) and validation (11 fold) sets was PERIOSTIN (POSTN). The top downregulated microRNA in both sets was miR-219, which is predicted to bind to POSTN. Kaplan Meier analysis demonstrated that above median expression of POSTN resulted in significantly decreased survival and shorter time to disease progression (P<0.001). High POSTN and low miR-219 expression were significantly associated with the mesenchymal GBM subtype (P<0.0001).

Conclusion

Here, we propose a novel diagnostic method to screen for molecular cancer subtypes and genomic correlates of cellular invasion. Our findings also have potential therapeutic significance since successful molecular inhibition of invasion will improve therapy and patient survival in GBM.  相似文献   

15.
16.
17.

Background

Previous studies using hierarchical clustering approach to analyze resting-state fMRI data were limited to a few slices or regions-of-interest (ROIs) after substantial data reduction.

Purpose

To develop a framework that can perform voxel-wise hierarchical clustering of whole-brain resting-state fMRI data from a group of subjects.

Materials and Methods

Resting-state fMRI measurements were conducted for 86 adult subjects using a single-shot echo-planar imaging (EPI) technique. After pre-processing and co-registration to a standard template, pair-wise cross-correlation coefficients (CC) were calculated for all voxels inside the brain and translated into absolute Pearson''s distances after imposing a threshold CC≥0.3. The group averages of the Pearson''s distances were then used to perform hierarchical clustering with the developed framework, which entails gray matter masking and an iterative scheme to analyze the dendrogram.

Results

With the hierarchical clustering framework, we identified most of the functional connectivity networks reported previously in the literature, such as the motor, sensory, visual, memory, and the default-mode functional networks (DMN). Furthermore, the DMN and visual system were split into their corresponding hierarchical sub-networks.

Conclusion

It is feasible to use the proposed hierarchical clustering scheme for voxel-wise analysis of whole-brain resting-state fMRI data. The hierarchical clustering result not only confirmed generally the finding in functional connectivity networks identified previously using other data processing techniques, such as ICA, but also revealed directly the hierarchical structure within the functional connectivity networks.  相似文献   

18.
19.

Background

Recently, CD4+IL-17A+ T helper 17 (Th17) cells were identified and reported in several diseased states, including autoimmunity, infection and various peripheral nervous system tumors. However, the presence of Th17 in glia-derived tumors of the central nervous system has not been studied.

Methodology/Principal Findings

In this report, we demonstrate that mRNA expression for the Th17 cell cytokine IL-17A, as well as Th17 cells, are present in human glioma. The mRNA expression for IL-17A in glioma was recapitulated in an immunocompetent mouse model of malignant glioma. Furthermore, the presence of Th17 cells was confirmed in both human and mouse glioma. Interestingly, some Th17 cells present in mouse glioma co-expressed the Th1 and Th2 lineage markers, IFN-γ and IL-4, respectively, but predominantly co-expressed the Treg lineage marker FoxP3.

Conclusions

These data confirm the presence of Th17 cells in glia-derived CNS tumors and provide the rationale for further investigation into the role of Th17 cells in malignant glioma.  相似文献   

20.

Background

Cancer cells typically exhibit large-scale aberrant methylation of gene promoters. Some of the genes with promoter methylation alterations play “driver” roles in tumorigenesis, whereas others are only “passengers”.

Results

Based on the assumption that promoter methylation alteration of a driver gene may lead to expression alternation of a set of genes associated with cancer pathways, we developed a computational framework for integrating promoter methylation and gene expression data to identify driver methylation aberrations of cancer. Applying this approach to breast cancer data, we identified many novel cancer driver genes and found that some of the identified driver genes were subtype-specific for basal-like, luminal-A and HER2+ subtypes of breast cancer.

Conclusion

The proposed framework proved effective in identifying cancer driver genes from genome-wide gene methylation and expression data of cancer. These results may provide new molecular targets for potential targeted and selective epigenetic therapy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号