期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel Clustering Algorithm for Large-Scale Biological Data Sets

Minchao Wang Wu Zhang Wang Ding Dongbo Dai Huiran Zhang Hao Xie Luonan Chen Yike Guo Jiang Xie 《PloS one》2014,9(4)

Backgrounds

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs.

Methods

Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes.

Result

A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. 相似文献

2.

Module discovery by exhaustive search for densely connected, co-expressed regions in biomolecular interaction networks

Colak R Moser F Chu JS Schönhuth A Chen N Ester M 《PloS one》2010,5(10):e13348

Background

Computational prediction of functionally related groups of genes (functional modules) from large-scale data is an important issue in computational biology. Gene expression experiments and interaction networks are well studied large-scale data sources, available for many not yet exhaustively annotated organisms. It has been well established, when analyzing these two data sources jointly, modules are often reflected by highly interconnected (dense) regions in the interaction networks whose participating genes are co-expressed. However, the tractability of the problem had remained unclear and methods by which to exhaustively search for such constellations had not been presented.

Methodology/Principal Findings

We provide an algorithmic framework, referred to as Densely Connected Biclustering (DECOB), by which the aforementioned search problem becomes tractable. To benchmark the predictive power inherent to the approach, we computed all co-expressed, dense regions in physical protein and genetic interaction networks from human and yeast. An automatized filtering procedure reduces our output which results in smaller collections of modules, comparable to state-of-the-art approaches. Our results performed favorably in a fair benchmarking competition which adheres to standard criteria. We demonstrate the usefulness of an exhaustive module search, by using the unreduced output to more quickly perform GO term related function prediction tasks. We point out the advantages of our exhaustive output by predicting functional relationships using two examples.

Conclusion/Significance

We demonstrate that the computation of all densely connected and co-expressed regions in interaction networks is an approach to module discovery of considerable value. Beyond confirming the well settled hypothesis that such co-expressed, densely connected interaction network regions reflect functional modules, we open up novel computational ways to comprehensively analyze the modular organization of an organism based on prevalent and largely available large-scale datasets.

Availability

Software and data sets are available at http://www.sfu.ca/~ester/software/DECOB.zip. 相似文献

3.

Pseudogenes transcribed in breast invasive carcinoma show subtype-specific expression and ceRNA potential

Joshua D Welch Jeanette Baran-Gale Charles M Perou Praveen Sethupathy Jan F Prins 《BMC genomics》2015,16(1)

相似文献

4.

hSAGEing: an improved SAGE-based software for identification of human tissue-specific or common tumor markers and suppressors

Yang CH Chuang LY Shih TM Chang HW 《PloS one》2010,5(12):e14369

相似文献

5.

QServer: a biclustering server for prediction and assessment of co-expressed gene clusters

Zhou F Ma Q Li G Xu Y 《PloS one》2012,7(3):e32660

相似文献

6.

Comprehensive analysis of forty yeast microarray datasets reveals a novel subset of genes (APha-RiB) consistently negatively associated with ribosome biogenesis

Basel Abu-Jamous Rui Fa David J Roberts Asoke K Nandi 《BMC bioinformatics》2014,15(1)

相似文献

7.

Patterns of gene expression during Arabidopsis flower development from the time of initiation to maturation

Patrick T. Ryan Diarmuid S. ó’Maoiléidigh Hajk-Georg Drost Kamila Kwa?niewska Alexander Gabel Ivo Grosse Emmanuelle Graciet Marcel Quint Frank Wellmer 《BMC genomics》2015,16(1)

相似文献

8.

Functional Clustering of Periodic Transcriptional Profiles through ARMA(p,q)

Ning Li Timothy McMurry Arthur Berg Zhong Wang Scott A. Berceli Rongling Wu 《PloS one》2010,5(4)

相似文献

9.

SAGE: String-overlap Assembly of GEnomes

Lucian Ilie Bahlul Haider Michael Molnar Roberto Solis-Oba 《BMC bioinformatics》2014,15(1)

Background

De novo genome assembly of next-generation sequencing data is one of the most important current problems in bioinformatics, essential in many biological applications. In spite of significant amount of work in this area, better solutions are still very much needed.

Results

We present a new program, SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.

Conclusions

SAGE benefits from innovations in almost every aspect of the assembly process: error correction of input reads, string-overlap graph construction, read copy counts estimation, overlap graph analysis and reduction, contig extraction, and scaffolding. We hope that these new ideas will help advance the current state-of-the-art in an essential area of research in genomics.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-302) contains supplementary material, which is available to authorized users. 相似文献

10.

Clustered gene expression changes flank targeted gene loci in knockout mice

Valor LM Grant SG 《PloS one》2007,2(12):e1303

Background

Gene expression profiling using microarrays is a powerful technology widely used to study regulatory networks. Profiling of mRNA levels in mutant organisms has the potential to identify genes regulated by the mutated protein.

Methodology/Principle Findings

Using tissues from multiple lines of knockout mice we have examined genome-wide changes in gene expression. We report that a significant proportion of changed genes were found near the targeted gene.

Conclusions/Significance

The apparent clustering of these genes was explained by the presence of flanking DNA from the parental ES cell. We provide recommendations for the analysis and reporting of microarray data from knockout mice 相似文献

11.

A SAGE based approach to human glomerular endothelium: defining the transcriptome,finding a novel molecule and highlighting endothelial diversity

Guerkan Sengoelge Wolfgang Winnicki Anne Kupczok Arndt von Haeseler Michael Schuster Walter Pfaller Paul Jennings Ansgar Weltermann Sophia Blake Gere Sunder-Plassmann 《BMC genomics》2014,15(1)

相似文献

12.

Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler

Daniel R. Zerbino Gayle K. McEwen Elliott H. Margulies Ewan Birney 《PloS one》2009,4(12)

Background

Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies.

Principal Findings

We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly.

Conclusions

These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler. 相似文献

13.

deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies

Chen Chu Zhaoben Fang Xing Hua Yaning Yang Enguo Chen Allen W. Cowley Jr. Mingyu Liang Pengyuan Liu Yan Lu 《BMC genomics》2015,16(1)

相似文献

14.

Radiogenomic mapping of edema/cellular invasion MRI-phenotypes in glioblastoma multiforme

Zinn PO Mahajan B Majadan B Sathyan P Singh SK Majumder S Jolesz FA Colen RR 《PloS one》2011,6(10):e25451

Background

Despite recent discoveries of new molecular targets and pathways, the search for an effective therapy for Glioblastoma Multiforme (GBM) continues. A newly emerged field, radiogenomics, links gene expression profiles with MRI phenotypes. MRI-FLAIR is a noninvasive diagnostic modality and was previously found to correlate with cellular invasion in GBM. Thus, our radiogenomic screen has the potential to reveal novel molecular determinants of invasion. Here, we present the first comprehensive radiogenomic analysis using quantitative MRI volumetrics and large-scale gene- and microRNA expression profiling in GBM.

Methods

Based on The Cancer Genome Atlas (TCGA), discovery and validation sets with gene, microRNA, and quantitative MR-imaging data were created. Top concordant genes and microRNAs correlated with high FLAIR volumes from both sets were further characterized by Kaplan Meier survival statistics, microRNA-gene correlation analyses, and GBM molecular subtype-specific distribution.

Results

The top upregulated gene in both the discovery (4 fold) and validation (11 fold) sets was PERIOSTIN (POSTN). The top downregulated microRNA in both sets was miR-219, which is predicted to bind to POSTN. Kaplan Meier analysis demonstrated that above median expression of POSTN resulted in significantly decreased survival and shorter time to disease progression (P<0.001). High POSTN and low miR-219 expression were significantly associated with the mesenchymal GBM subtype (P<0.0001).

Conclusion

Here, we propose a novel diagnostic method to screen for molecular cancer subtypes and genomic correlates of cellular invasion. Our findings also have potential therapeutic significance since successful molecular inhibition of invasion will improve therapy and patient survival in GBM. 相似文献

15.

Transcriptome Profiles of Carcinoma-in-Situ and Invasive Non-Small Cell Lung Cancer as Revealed by SAGE

Kim M. Lonergan Raj Chari Bradley P. Coe Ian M. Wilson Ming-Sound Tsao Raymond T. Ng Calum MacAulay Stephen Lam Wan L. Lam 《PloS one》2010,5(2)

相似文献

16.

Microarray expression profiles of 20.000 genes across 23 healthy porcine tissues 总被引：1，自引：0，他引：1

Hornshøj H Conley LN Hedegaard J Sørensen P Panitz F Bendixen C 《PloS one》2007,2(11):e1203

相似文献

17.

Analysis of Whole-Brain Resting-State fMRI Data Using Hierarchical Clustering Approach

Yanlu Wang Tie-Qiang Li 《PloS one》2013,8(10)

Background

Previous studies using hierarchical clustering approach to analyze resting-state fMRI data were limited to a few slices or regions-of-interest (ROIs) after substantial data reduction.

Purpose

To develop a framework that can perform voxel-wise hierarchical clustering of whole-brain resting-state fMRI data from a group of subjects.

Materials and Methods

Resting-state fMRI measurements were conducted for 86 adult subjects using a single-shot echo-planar imaging (EPI) technique. After pre-processing and co-registration to a standard template, pair-wise cross-correlation coefficients (CC) were calculated for all voxels inside the brain and translated into absolute Pearson''s distances after imposing a threshold CC≥0.3. The group averages of the Pearson''s distances were then used to perform hierarchical clustering with the developed framework, which entails gray matter masking and an iterative scheme to analyze the dendrogram.

Results

With the hierarchical clustering framework, we identified most of the functional connectivity networks reported previously in the literature, such as the motor, sensory, visual, memory, and the default-mode functional networks (DMN). Furthermore, the DMN and visual system were split into their corresponding hierarchical sub-networks.

Conclusion

It is feasible to use the proposed hierarchical clustering scheme for voxel-wise analysis of whole-brain resting-state fMRI data. The hierarchical clustering result not only confirmed generally the finding in functional connectivity networks identified previously using other data processing techniques, such as ICA, but also revealed directly the hierarchical structure within the functional connectivity networks. 相似文献

18.

An integrated genomic and metabolomic framework for cell wall biology in rice

Kai Guo Weihua Zou Yongqing Feng Mingliang Zhang Jing Zhang Fen Tu Guosheng Xie Lingqiang Wang Yangting Wang Sebastian Klie Staffan Persson Liangcai Peng 《BMC genomics》2014,15(1)

相似文献

19.

The presence of IL-17A and T helper 17 cells in experimental mouse brain tumors and human glioma

Wainwright DA Sengupta S Han Y Ulasov IV Lesniak MS 《PloS one》2010,5(10):e15390

Background

Recently, CD4⁺IL-17A⁺ T helper 17 (Th17) cells were identified and reported in several diseased states, including autoimmunity, infection and various peripheral nervous system tumors. However, the presence of Th17 in glia-derived tumors of the central nervous system has not been studied.

Methodology/Principal Findings

In this report, we demonstrate that mRNA expression for the Th17 cell cytokine IL-17A, as well as Th17 cells, are present in human glioma. The mRNA expression for IL-17A in glioma was recapitulated in an immunocompetent mouse model of malignant glioma. Furthermore, the presence of Th17 cells was confirmed in both human and mouse glioma. Interestingly, some Th17 cells present in mouse glioma co-expressed the Th1 and Th2 lineage markers, IFN-γ and IL-4, respectively, but predominantly co-expressed the Treg lineage marker FoxP3.

Conclusions

These data confirm the presence of Th17 cells in glia-derived CNS tumors and provide the rationale for further investigation into the role of Th17 cells in malignant glioma. 相似文献

20.

An Integrated Approach to Uncover Driver Genes in Breast Cancer Methylation Genomes

Xiaopei Shen Shan Li Lin Zhang Hongdong Li Guini Hong XianXiao Zhou Tingting Zheng Wenjing Zhang Chunxiang Hao Tongwei Shi Chunyang Liu Zheng Guo 《PloS one》2013,8(4)

Background

Cancer cells typically exhibit large-scale aberrant methylation of gene promoters. Some of the genes with promoter methylation alterations play “driver” roles in tumorigenesis, whereas others are only “passengers”.

Results

Based on the assumption that promoter methylation alteration of a driver gene may lead to expression alternation of a set of genes associated with cancer pathways, we developed a computational framework for integrating promoter methylation and gene expression data to identify driver methylation aberrations of cancer. Applying this approach to breast cancer data, we identified many novel cancer driver genes and found that some of the identified driver genes were subtype-specific for basal-like, luminal-A and HER2+ subtypes of breast cancer.

Conclusion

The proposed framework proved effective in identifying cancer driver genes from genome-wide gene methylation and expression data of cancer. These results may provide new molecular targets for potential targeted and selective epigenetic therapy. 相似文献